Canlı İnternet Yayınları İçin Otomatik Konuşma Tanıma Tekniği Kullanılarak Alt Yazı Oluşturulması

Kutan Koruyan

doi:10.17671/btd.31441

-

Yıl 2015, Cilt: 8 Sayı: 2, 111 - , 11.02.2015

Kutan Koruyan

https://doi.org/10.17671/btd.31441

Öz

— Captioning technique used to display speaking language or its translation or to give information about images or sounds on television, cinema or other images as text has been used since the beginning of the 1900s and has developed to take its contemporary form. The development of informatics has greatly contributed to the progress of captioning techniques; it has especially become easier to convert speech to text with the aid of speech recognition. Furthermore, the captions for the hearing impaired, especially with speech recognition technique, is an alternative to sign language on live events. This technique is commercial and predominantly used with special hardware and software, and increases costs for the individual usage or small-sized companies. The announcement of voice search of Google Chrome in 2011 has been the start of this work. In this study, an application converting the speech to the text and displaying the live captions on a video broadcasted live on a web page using Google supported open source Web Speech API with the help of a media server is represented. The broadcast of a video on a web page is performed by the HTML5 video element, and the web application is coded using JavaScript and PHP programming languages and jQuery library

Anahtar Kelimeler

— Automatic Speech Recognition , Live Webcast , Live Captions , HTML5 , Internet

Kaynakça

Internet: Federal Communications Commission, Closed Captioning on 002014.
Internet: Federal Communications Commission, Captioning of Internet Video Programming, http://www.fcc.gov/guides/captioning- internet-video-programming, 05.05.2014.
Internet: CISCO, Cisco Visual Networking Index: Forecast and Methodology, http://www.cisco.com/c/en/us/solutions/collateral/service-provider/ip- ngn-ip-next-generation-network/white_paper_c11-481360.html, 02014. 2013–2018, Internet: http://googlevideo.blogspot.com.tr/2006/09/finally-caption- playback.html, 10.05.2014. Finally Caption Playback,
Internet: K. Harrenstien, Automatic Captions in YouTube, http://googleblog.blogspot.com.tr/2009/11/automatic-captions-in- youtube.html, 10.05.2014. Internet: https://support.google.com/youtube/answer/3038280, 02.06.2014. Automatic Captions,
Internet: J. Siegel, YouTube is going LIVE, http://youtube- global.blogspot.com.tr/2011/04/youtube-is-going-live.html, 15.05.2014.
Internet: Google, Youtube Streaming Guide: Live Caption Requirements, 002014.
C. Munteanu, G. Penn, R. Baecker, T. Zhang, "Automatic Speech Recognition for Webcasts: How Good is Good Enough and What to Do When it Isn't", ICMI '06 8th International Conference on Multimodal Interfaces, New York, USA, 39-42, 2006.
C. Aksoylar, S. O. Mutluergil, H. Erdogan, "Bir Türkçe Konuşma Tanıma Communications Applications Conference, Antalya, 512-515, 2009. Signal Processing and B. Shiver, "Utilizing automatic speech recognition to improve Deaf accessibility on the Web", DePaul CDM School of Computing
Research Symposium (SOCRS 2011), Chicago, 4-8, 2011.
A. Iglesias, L. Moreno, B. Ruiz, J. L. Pajares, J. Jiménez, J. F. López, P. Revuelta, J. Hernandez, "Web educational services for all: the APEINTA project", 8th International Cross-Disciplinary Conference on Web Accessibility, New York, USA, 1-3, 2011.
M. Walde, "‘SpeechText’: Enhancing Learning and Teaching by Using Automatic Speech Recognition to Synchronised Multimedia", ED-MEDIA 2005 World Conference on
Educational Multimedia, Hypermedia & Telecommunications, Montreal, 4765-4769, 2005. Create Accessible
S. Pfeiffer, The Definitive Guide to HTML5 Video, Apress, A.B.D., 2010.
B. Gold, N. Morgan, Speech and Audio Signal Processing: Processing and Perception of Speech and Music, John Wiley & Sons Inc., New York, 1999.
J. C. Junqua, Robust speech recognition in embedded system and PC applications, Kluwer Academic Publishers, Norwell, MA, A.B.D., 2000.
J. Holmes, W. Holmes, Speech Synthesis and Recognition, Taylor & Francis Inc., Bristol, PA, 2000.
M. A. Anusuya, S. K. Katti, "Speech Recognition by Machine: A Review", International Journal of Computer Science and Information Security, 6(3), 181-205, 2009.
M. Anniss, How Does Voice Recognition Work?, Gareth Stevens Publishing, New York, 2014.
D. C Gibbon, Z. Liu, Introduction to video search engines. Springer, Berlin, 2008.
A. Choudhary, R. Kshirsagar, “Process Speech Recognition System using Artificial Intelligence Technique”, International Journal of Soft Computing and Engineering, 2(5), 239-242, 2012.
A. Burton, "Do you understand me now? - An introduction to speech recognition", 10th Interactive Multimedia Conference, Southampton, 2012.
W. Lasecki, C. Miller, A. Sadilek, A. Abumoussa, D. Borrello, R. Kushalnagar, J. Bigham, "Real-time captioning by groups of non- experts", 25th annual ACM symposium on User interface software and technology (UIST '12), New York, 23-34, 2012.
Internet: J. Adorf, Web Speech API, http://home.in.tum.de/~adorf/, 002014 https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html, 02014 Web Speech API Specification,

Canlı İnternet Yayınları İçin Otomatik Konuşma Tanıma Tekniği Kullanılarak Alt Yazı Oluşturulması

Yıl 2015, Cilt: 8 Sayı: 2, 111 - , 11.02.2015

Kutan Koruyan

https://doi.org/10.17671/btd.31441

Öz

Televizyon, sinema veya diğer farklı görüntülerde konuşulan dili, çevirisini gösterme veya görüntü ve ses hakkında metin şeklinde bilgi verme amaçlı uygulanan alt yazı tekniği 1900’lerin başlarında kullanılmaya başlanmış ve gelişerek günümüzdeki haline gelmiştir. Bilişimdeki gelişmeler ise alt yazı tekniklerinin ilerlemesine büyük katkı sağlamış, özellikle konuşmaların metne dönüştürülmesi konuşma tanıma teknikleri ile daha kolay bir hale gelmiştir. Bunun yanında, işitme engelliler için kullanılan alt yazılar ise özellikle canlı yayınlarda konuşma tanıma tekniği ile işaret diline alternatif olarak kullanılmaktadır. Bu teknik daha çok ticari amaçlı özel donanım ve yazılımlarla beraber kullanılmakta, bireysel kullanım veya küçük ölçekli kurumlar için maliyet oluşturmaktadır. 2011’de Google Chrome’un Türkçe’yi de destekleyen sesle aramayı dünyaya duyurması ise bu çalışmanın çıkış noktasını oluşturmuştur. Bu çalışmada, bir medya sunucusu yardımıyla internet sayfasında canlı yayınlanan bir videodaki konuşmaların Google’ın desteklediği açık kaynak kodlu Web Speech API kullanılarak metne dönüştürülmesi ve anlık alt yazı haline getirilmesi uygulaması anlatılmaktadır. Çalışmada, web sayfasında video yayını HTML5 dilinin getirdiği video elementi ile sağlanmakta, web uygulaması JavaScript ve PHP programlama dilleri ve jQuery kütüphanesi kullanılarak yazılmıştır. Ayrıca, geliştirilen bu web uygulamasının verimi insan ve teknik bazda irdelenmiştir.

Anahtar Kelimeler

Otomatik Konuşma Tanıma; Canlı İnternet Yayını; Canlı Altyazı; HTML5; İnternet

Kaynakça

Internet: Federal Communications Commission, Closed Captioning on 002014.
Internet: Federal Communications Commission, Captioning of Internet Video Programming, http://www.fcc.gov/guides/captioning- internet-video-programming, 05.05.2014.
Internet: CISCO, Cisco Visual Networking Index: Forecast and Methodology, http://www.cisco.com/c/en/us/solutions/collateral/service-provider/ip- ngn-ip-next-generation-network/white_paper_c11-481360.html, 02014. 2013–2018, Internet: http://googlevideo.blogspot.com.tr/2006/09/finally-caption- playback.html, 10.05.2014. Finally Caption Playback,
Internet: K. Harrenstien, Automatic Captions in YouTube, http://googleblog.blogspot.com.tr/2009/11/automatic-captions-in- youtube.html, 10.05.2014. Internet: https://support.google.com/youtube/answer/3038280, 02.06.2014. Automatic Captions,
Internet: J. Siegel, YouTube is going LIVE, http://youtube- global.blogspot.com.tr/2011/04/youtube-is-going-live.html, 15.05.2014.
Internet: Google, Youtube Streaming Guide: Live Caption Requirements, 002014.
C. Munteanu, G. Penn, R. Baecker, T. Zhang, "Automatic Speech Recognition for Webcasts: How Good is Good Enough and What to Do When it Isn't", ICMI '06 8th International Conference on Multimodal Interfaces, New York, USA, 39-42, 2006.
C. Aksoylar, S. O. Mutluergil, H. Erdogan, "Bir Türkçe Konuşma Tanıma Communications Applications Conference, Antalya, 512-515, 2009. Signal Processing and B. Shiver, "Utilizing automatic speech recognition to improve Deaf accessibility on the Web", DePaul CDM School of Computing
Research Symposium (SOCRS 2011), Chicago, 4-8, 2011.
A. Iglesias, L. Moreno, B. Ruiz, J. L. Pajares, J. Jiménez, J. F. López, P. Revuelta, J. Hernandez, "Web educational services for all: the APEINTA project", 8th International Cross-Disciplinary Conference on Web Accessibility, New York, USA, 1-3, 2011.
M. Walde, "‘SpeechText’: Enhancing Learning and Teaching by Using Automatic Speech Recognition to Synchronised Multimedia", ED-MEDIA 2005 World Conference on
Educational Multimedia, Hypermedia & Telecommunications, Montreal, 4765-4769, 2005. Create Accessible
S. Pfeiffer, The Definitive Guide to HTML5 Video, Apress, A.B.D., 2010.
B. Gold, N. Morgan, Speech and Audio Signal Processing: Processing and Perception of Speech and Music, John Wiley & Sons Inc., New York, 1999.
J. C. Junqua, Robust speech recognition in embedded system and PC applications, Kluwer Academic Publishers, Norwell, MA, A.B.D., 2000.
J. Holmes, W. Holmes, Speech Synthesis and Recognition, Taylor & Francis Inc., Bristol, PA, 2000.
M. A. Anusuya, S. K. Katti, "Speech Recognition by Machine: A Review", International Journal of Computer Science and Information Security, 6(3), 181-205, 2009.
M. Anniss, How Does Voice Recognition Work?, Gareth Stevens Publishing, New York, 2014.
D. C Gibbon, Z. Liu, Introduction to video search engines. Springer, Berlin, 2008.
A. Choudhary, R. Kshirsagar, “Process Speech Recognition System using Artificial Intelligence Technique”, International Journal of Soft Computing and Engineering, 2(5), 239-242, 2012.
A. Burton, "Do you understand me now? - An introduction to speech recognition", 10th Interactive Multimedia Conference, Southampton, 2012.
W. Lasecki, C. Miller, A. Sadilek, A. Abumoussa, D. Borrello, R. Kushalnagar, J. Bigham, "Real-time captioning by groups of non- experts", 25th annual ACM symposium on User interface software and technology (UIST '12), New York, 23-34, 2012.
Internet: J. Adorf, Web Speech API, http://home.in.tum.de/~adorf/, 002014 https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html, 02014 Web Speech API Specification,

Toplam 23 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Bölüm	Makaleler
Yazarlar	Kutan Koruyan
Yayımlanma Tarihi	11 Şubat 2015
Gönderilme Tarihi	11 Şubat 2015
Yayımlandığı Sayı	Yıl 2015 Cilt: 8 Sayı: 2

Kaynak Göster

APA	Koruyan, K. (2015). Canlı İnternet Yayınları İçin Otomatik Konuşma Tanıma Tekniği Kullanılarak Alt Yazı Oluşturulması. Bilişim Teknolojileri Dergisi, 8(2), 111. https://doi.org/10.17671/btd.31441

Kapak Resmi İndir

Makale Dosyaları

Tam Metin