Canlı İnternet Yayınları İçin Otomatik Konuşma Tanıma Tekniği Kullanılarak Alt Yazı Oluşturulması

Kutan Koruyan

doi:10.17671/btd.31441

-

Year 2015, , 111 - , 11.02.2015

Kutan Koruyan

https://doi.org/10.17671/btd.31441

Abstract

— Captioning technique used to display speaking language or its translation or to give information about images or sounds on television, cinema or other images as text has been used since the beginning of the 1900s and has developed to take its contemporary form. The development of informatics has greatly contributed to the progress of captioning techniques; it has especially become easier to convert speech to text with the aid of speech recognition. Furthermore, the captions for the hearing impaired, especially with speech recognition technique, is an alternative to sign language on live events. This technique is commercial and predominantly used with special hardware and software, and increases costs for the individual usage or small-sized companies. The announcement of voice search of Google Chrome in 2011 has been the start of this work. In this study, an application converting the speech to the text and displaying the live captions on a video broadcasted live on a web page using Google supported open source Web Speech API with the help of a media server is represented. The broadcast of a video on a web page is performed by the HTML5 video element, and the web application is coded using JavaScript and PHP programming languages and jQuery library

Keywords

— Automatic Speech Recognition, Live Webcast, Live Captions, HTML5, Internet

References

Internet: Federal Communications Commission, Closed Captioning on 002014.
Internet: Federal Communications Commission, Captioning of Internet Video Programming, http://www.fcc.gov/guides/captioning- internet-video-programming, 05.05.2014.
Internet: CISCO, Cisco Visual Networking Index: Forecast and Methodology, http://www.cisco.com/c/en/us/solutions/collateral/service-provider/ip- ngn-ip-next-generation-network/white_paper_c11-481360.html, 02014. 2013–2018, Internet: http://googlevideo.blogspot.com.tr/2006/09/finally-caption- playback.html, 10.05.2014. Finally Caption Playback,
Internet: K. Harrenstien, Automatic Captions in YouTube, http://googleblog.blogspot.com.tr/2009/11/automatic-captions-in- youtube.html, 10.05.2014. Internet: https://support.google.com/youtube/answer/3038280, 02.06.2014. Automatic Captions,
Internet: J. Siegel, YouTube is going LIVE, http://youtube- global.blogspot.com.tr/2011/04/youtube-is-going-live.html, 15.05.2014.
Internet: Google, Youtube Streaming Guide: Live Caption Requirements, 002014.
C. Munteanu, G. Penn, R. Baecker, T. Zhang, "Automatic Speech Recognition for Webcasts: How Good is Good Enough and What to Do When it Isn't", ICMI '06 8th International Conference on Multimodal Interfaces, New York, USA, 39-42, 2006.
C. Aksoylar, S. O. Mutluergil, H. Erdogan, "Bir Türkçe Konuşma Tanıma Communications Applications Conference, Antalya, 512-515, 2009. Signal Processing and B. Shiver, "Utilizing automatic speech recognition to improve Deaf accessibility on the Web", DePaul CDM School of Computing
Research Symposium (SOCRS 2011), Chicago, 4-8, 2011.
A. Iglesias, L. Moreno, B. Ruiz, J. L. Pajares, J. Jiménez, J. F. López, P. Revuelta, J. Hernandez, "Web educational services for all: the APEINTA project", 8th International Cross-Disciplinary Conference on Web Accessibility, New York, USA, 1-3, 2011.
M. Walde, "‘SpeechText’: Enhancing Learning and Teaching by Using Automatic Speech Recognition to Synchronised Multimedia", ED-MEDIA 2005 World Conference on
Educational Multimedia, Hypermedia & Telecommunications, Montreal, 4765-4769, 2005. Create Accessible
S. Pfeiffer, The Definitive Guide to HTML5 Video, Apress, A.B.D., 2010.
B. Gold, N. Morgan, Speech and Audio Signal Processing: Processing and Perception of Speech and Music, John Wiley & Sons Inc., New York, 1999.
J. C. Junqua, Robust speech recognition in embedded system and PC applications, Kluwer Academic Publishers, Norwell, MA, A.B.D., 2000.
J. Holmes, W. Holmes, Speech Synthesis and Recognition, Taylor & Francis Inc., Bristol, PA, 2000.
M. A. Anusuya, S. K. Katti, "Speech Recognition by Machine: A Review", International Journal of Computer Science and Information Security, 6(3), 181-205, 2009.
M. Anniss, How Does Voice Recognition Work?, Gareth Stevens Publishing, New York, 2014.
D. C Gibbon, Z. Liu, Introduction to video search engines. Springer, Berlin, 2008.
A. Choudhary, R. Kshirsagar, “Process Speech Recognition System using Artificial Intelligence Technique”, International Journal of Soft Computing and Engineering, 2(5), 239-242, 2012.
A. Burton, "Do you understand me now? - An introduction to speech recognition", 10th Interactive Multimedia Conference, Southampton, 2012.
W. Lasecki, C. Miller, A. Sadilek, A. Abumoussa, D. Borrello, R. Kushalnagar, J. Bigham, "Real-time captioning by groups of non- experts", 25th annual ACM symposium on User interface software and technology (UIST '12), New York, 23-34, 2012.
Internet: J. Adorf, Web Speech API, http://home.in.tum.de/~adorf/, 002014 https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html, 02014 Web Speech API Specification,

Canlı İnternet Yayınları İçin Otomatik Konuşma Tanıma Tekniği Kullanılarak Alt Yazı Oluşturulması

Year 2015, , 111 - , 11.02.2015

Kutan Koruyan

https://doi.org/10.17671/btd.31441

Abstract

Televizyon, sinema veya diğer farklı görüntülerde konuşulan dili, çevirisini gösterme veya görüntü ve ses hakkında metin şeklinde bilgi verme amaçlı uygulanan alt yazı tekniği 1900’lerin başlarında kullanılmaya başlanmış ve gelişerek günümüzdeki haline gelmiştir. Bilişimdeki gelişmeler ise alt yazı tekniklerinin ilerlemesine büyük katkı sağlamış, özellikle konuşmaların metne dönüştürülmesi konuşma tanıma teknikleri ile daha kolay bir hale gelmiştir. Bunun yanında, işitme engelliler için kullanılan alt yazılar ise özellikle canlı yayınlarda konuşma tanıma tekniği ile işaret diline alternatif olarak kullanılmaktadır. Bu teknik daha çok ticari amaçlı özel donanım ve yazılımlarla beraber kullanılmakta, bireysel kullanım veya küçük ölçekli kurumlar için maliyet oluşturmaktadır. 2011’de Google Chrome’un Türkçe’yi de destekleyen sesle aramayı dünyaya duyurması ise bu çalışmanın çıkış noktasını oluşturmuştur. Bu çalışmada, bir medya sunucusu yardımıyla internet sayfasında canlı yayınlanan bir videodaki konuşmaların Google’ın desteklediği açık kaynak kodlu Web Speech API kullanılarak metne dönüştürülmesi ve anlık alt yazı haline getirilmesi uygulaması anlatılmaktadır. Çalışmada, web sayfasında video yayını HTML5 dilinin getirdiği video elementi ile sağlanmakta, web uygulaması JavaScript ve PHP programlama dilleri ve jQuery kütüphanesi kullanılarak yazılmıştır. Ayrıca, geliştirilen bu web uygulamasının verimi insan ve teknik bazda irdelenmiştir.

Keywords

Otomatik Konuşma Tanıma; Canlı İnternet Yayını; Canlı Altyazı; HTML5; İnternet

References

Internet: Federal Communications Commission, Closed Captioning on 002014.
Internet: Federal Communications Commission, Captioning of Internet Video Programming, http://www.fcc.gov/guides/captioning- internet-video-programming, 05.05.2014.
Internet: CISCO, Cisco Visual Networking Index: Forecast and Methodology, http://www.cisco.com/c/en/us/solutions/collateral/service-provider/ip- ngn-ip-next-generation-network/white_paper_c11-481360.html, 02014. 2013–2018, Internet: http://googlevideo.blogspot.com.tr/2006/09/finally-caption- playback.html, 10.05.2014. Finally Caption Playback,
Internet: K. Harrenstien, Automatic Captions in YouTube, http://googleblog.blogspot.com.tr/2009/11/automatic-captions-in- youtube.html, 10.05.2014. Internet: https://support.google.com/youtube/answer/3038280, 02.06.2014. Automatic Captions,
Internet: J. Siegel, YouTube is going LIVE, http://youtube- global.blogspot.com.tr/2011/04/youtube-is-going-live.html, 15.05.2014.
Internet: Google, Youtube Streaming Guide: Live Caption Requirements, 002014.
C. Munteanu, G. Penn, R. Baecker, T. Zhang, "Automatic Speech Recognition for Webcasts: How Good is Good Enough and What to Do When it Isn't", ICMI '06 8th International Conference on Multimodal Interfaces, New York, USA, 39-42, 2006.
C. Aksoylar, S. O. Mutluergil, H. Erdogan, "Bir Türkçe Konuşma Tanıma Communications Applications Conference, Antalya, 512-515, 2009. Signal Processing and B. Shiver, "Utilizing automatic speech recognition to improve Deaf accessibility on the Web", DePaul CDM School of Computing
Research Symposium (SOCRS 2011), Chicago, 4-8, 2011.
A. Iglesias, L. Moreno, B. Ruiz, J. L. Pajares, J. Jiménez, J. F. López, P. Revuelta, J. Hernandez, "Web educational services for all: the APEINTA project", 8th International Cross-Disciplinary Conference on Web Accessibility, New York, USA, 1-3, 2011.
M. Walde, "‘SpeechText’: Enhancing Learning and Teaching by Using Automatic Speech Recognition to Synchronised Multimedia", ED-MEDIA 2005 World Conference on
Educational Multimedia, Hypermedia & Telecommunications, Montreal, 4765-4769, 2005. Create Accessible
S. Pfeiffer, The Definitive Guide to HTML5 Video, Apress, A.B.D., 2010.
B. Gold, N. Morgan, Speech and Audio Signal Processing: Processing and Perception of Speech and Music, John Wiley & Sons Inc., New York, 1999.
J. C. Junqua, Robust speech recognition in embedded system and PC applications, Kluwer Academic Publishers, Norwell, MA, A.B.D., 2000.
J. Holmes, W. Holmes, Speech Synthesis and Recognition, Taylor & Francis Inc., Bristol, PA, 2000.
M. A. Anusuya, S. K. Katti, "Speech Recognition by Machine: A Review", International Journal of Computer Science and Information Security, 6(3), 181-205, 2009.
M. Anniss, How Does Voice Recognition Work?, Gareth Stevens Publishing, New York, 2014.
D. C Gibbon, Z. Liu, Introduction to video search engines. Springer, Berlin, 2008.
A. Choudhary, R. Kshirsagar, “Process Speech Recognition System using Artificial Intelligence Technique”, International Journal of Soft Computing and Engineering, 2(5), 239-242, 2012.
A. Burton, "Do you understand me now? - An introduction to speech recognition", 10th Interactive Multimedia Conference, Southampton, 2012.
W. Lasecki, C. Miller, A. Sadilek, A. Abumoussa, D. Borrello, R. Kushalnagar, J. Bigham, "Real-time captioning by groups of non- experts", 25th annual ACM symposium on User interface software and technology (UIST '12), New York, 23-34, 2012.
Internet: J. Adorf, Web Speech API, http://home.in.tum.de/~adorf/, 002014 https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html, 02014 Web Speech API Specification,

There are 23 citations in total.

Details

Primary Language	Turkish
Journal Section	Articles
Authors	Kutan Koruyan
Publication Date	February 11, 2015
Submission Date	February 11, 2015
Published in Issue	Year 2015

Cite

APA	Koruyan, K. (2015). Canlı İnternet Yayınları İçin Otomatik Konuşma Tanıma Tekniği Kullanılarak Alt Yazı Oluşturulması. Bilişim Teknolojileri Dergisi, 8(2), 111. https://doi.org/10.17671/btd.31441