Araştırma Makalesi
BibTex RIS Kaynak Göster

Uzun kısa-dönem bellekli sinir ağlarıyla prozodik açıdan türkçe ağız tanıma

Yıl 2020, Cilt: 35 Sayı: 1, 213 - 224, 25.10.2019
https://doi.org/10.17341/gazimmfd.453677

Öz

Ağızlar ait oldukları
dilden bazı özellikler bakımından ayrılan ve ülkenin belli bir bölgesine özgü
olan konuşma biçimleridir. Ağızlara özgü karakteristiklerin elde edilmesi ve
bunlar kullanılarak ağızların tanınması, konuşma işleme alanında popüler
konular arasındadır. Özellikle, büyük ölçekli konuşma tanıma sistemlerinin
başarımlarını arttırmak için konuşmanın ağzının öncelikli olarak belirlenmesi
istenmektedir. Diller/ağızlar birbirinden tonlama, vurgu ve ritim gibi prozodik
özelliklerle ayrılır. Bu algısal özellikler fiziksel düzeyde sırasıyla perde,
enerji ve sürenin ölçülmesiyle elde edilmektedir. Son yıllarda, derin sinir
ağlarının popüler hale gelmesiyle birlikte Uzun Kısa-Dönem Bellekli (LSTM)
sinir ağları dizi sınıflandırma ve dil modelleme problemlerinde sıklıkla
kullanılmaktadır. LSTM sinir ağları, uzun dönemli bağlam bilgisini modellemede
başarılıdır. Bu çalışmada prozodik özellikler kullanılarak LSTM sinir ağları
ile Türkçe ağız tanıma yapılmıştır. Burada LSTM sinir ağları hem dizi
sınıflandırıcı hem de dil modelleyici olarak kullanılmıştır. Önerilen
yöntemlerin Ankara, Alanya, Kıbrıs ve Trabzon ağızlarından oluşan Türkçe veri
kümesi üzerinde %78,7 doğruluk oranı verdiği gözlenmiştir.

Kaynakça

  • [1] Muthusamy Y. K., Barnard E. and Cole R. A., “Reviewing Automatic Language Identification,” IEEE Signal Process. Mag., vol. 11, no. 4, pp. 33–41, 1994.
  • [2] Zhao J., Shu H., Zhang L., Wang X., Gong Q., and Li P., “Cortical competition during language discrimination,” Neuroimage, vol. 43, no. 3, pp. 624–633, 2008.
  • [3] Kaya Y. and Ertuğrul Ö. F., “A novel feature extraction approach for text-based language identification: Binary patterns,” Journal of the Faculty of Engineering and Architecture of Gazi University, vol. 31, no. 4, pp. 1085–94, 2016.
  • [4] Ramus F. and Mehler J., “Language identification with suprasegmental cues: a study based on speech resynthesis.,” J. Acoust. Soc. Am., vol. 105, no. 1, pp. 512–21, 1999.
  • [5] Sugiyama M., “Automatic Language Recognition Using Acoustic Features,” IEEE Int. Conf. Acoust. Speech, Signal Process., pp. 813–816 vol.2, 1991.
  • [6] Zissman M. A., “Comparison of four approaches to automatic language identification of telephone speech,” IEEE Trans. Speech Audio Process., vol. 4, no. 1, pp. 31–44, 1996.
  • [7] Demir N., “Ağız Terimi Üzerine,” Türkbilig, pp. 105–116, 2002.
  • [8] Etman A. and Louis A. A., “American dialect identification using phonotactic and prosodic features,” IntelliSys 2015 - Proc. 2015 SAI Intell. Syst. Conf., pp. 963–970, 2015.
  • [9] Huang R., Hansen J. H. L., and Angkititrakul P., “Dialect/Accent Classification Using Unrestricted Audio,” IEEE Trans. Audio. Speech. Lang. Processing, vol. 15, no. 2, 2007.
  • [10] Biadsy F., “Automatic Dialect and Accent Recognition and its Application to Speech Recognition,” PhD Thesis,Columbia Univ., pp. 1–171, 2011.
  • [11] Lin C. Y. and Wang H. C., “Language identification using pitch contour information,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. I, pp. 601–604, 2005.
  • [12] Martinez D., Lleida E., Ortega A., Miguel A., “Prosodic Features and Formant Modeling for an Ivector-Based Language Recognition System,” 2013 Ieee Int. Conf. Acoust. Speech Signal Process., pp. 6847–6851, 2013.
  • [13] Adami A. G. and Hermansky H., “Segmentation of Speech for Speaker and Language Recognition OGI School of Science and Engineering , Oregon Health and Science University , Portland , USA International Computer Science Institute , Berkeley , California , USA,” Eurospeech, pp. 1–4, 2003.
  • [14] Dehak N., Dumouchel P., and Kenny P., “Modeling prosodic features with joint factor analysis for speaker verification,” IEEE Trans. Audio, Speech Lang. Process., vol. 15, no. 7, pp. 2095–2103, 2007.
  • [15] Adami A. G., “Modeling prosodic differences for speaker recognition,” Speech Commun., vol. 49, no. 4, pp. 277–291, 2007.
  • [16] Rouas J. L., “Automatic prosodic variations modeling for language and dialect discrimination,” IEEE Trans. Audio, Speech Lang. Process., vol. 15, no. 6, pp. 1904–1911, 2007.
  • [17] Thyme-Gobbel A. E. and Hutchins S. E., “On using prosodic cues in automatic language identification,” Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP ’96, vol. 3. 1996.
  • [18] Ng R. W. M., Lee T., and Leung C., “Spoken Language Recognition With Prosodic Features,” vol. 21, no. 9, pp. 1841–1853, 2013.
  • [19] Fernandez R., Rendel A., Ramabhadran B., and Hoory R., “Prosody Contour Prediction with {Long Short-Term Memory}, Bi-Directional, Deep Recurrent Neural Networks,” Proc. Interspeech, no. September, pp. 2268–2272, 2014.
  • [20] Sundermeyer M., Schl R., and Ney H., “LSTM Neural Networks for Language Modeling,” Proc. Interspeech, pp. 194–197, 2012.
  • [21] Mikolov T., Karafiat M., Burget L., Cernocky J., and Khudanpur S., “Recurrent Neural Network based Language Model,” Interspeech, no. September, pp. 1045–1048, 2010.
  • [22] Mary L. and Yegnanarayana B., “Extraction and representation of prosodic features for language and speaker recognition,” Speech Commun., vol. 50, no. 10, pp. 782–796, 2008.
  • [23] Kockmann M., Burget L., and Cernocky J. H., “Investigations into prosodic syllable contour features for speaker recognition,” in ICASSP 2010, 2010, pp. 4418–4421.
  • [24] Ferrer L., Bratt H., Richey C., Franco H., Abrash V., and Precoda K., “Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems,” Speech Commun., vol. 69, pp. 31–45, 2015.
  • [25] Foil J. T., “Language identification using noisy speech,” in ICASSP 1986, 1986, pp. 861–864.
  • [26] Işık G. and Artuner H., “A Dataset For Turkish Dialect Recognition and Classification with Deep Learning,” in 26. IEEE Signal Processing and Communications Applications Conference (SIU), 2018.
  • [27] Demir N., “Ağız Araştırmalarında Kaynak Kişi Meselesi,” Folk. Prof. Dr. Dursun Yıldırım Armağanı, p. 11, 1998.
  • [28] Boersma P. and Weenink D., “Praat: doing phonetics by computer [Computer program],” 2018. [Online]. Available: http://www.praat.org/. [Accessed: 03-Feb-2018].
  • [29] Graves A. and Jaitly N., “Towards End-To-End Speech Recognition with Recurrent Neural Networks,” JMLR Workshop Conf. Proc., vol. 32, no. 1, pp. 1764–1772, 2014.
  • [30] Sutskever I., Vinyals O., and V Le Q., “Sequence to sequence learning with neural networks,” Adv. Neural Inf. Process. Syst., pp. 3104–3112, 2014.
  • [31] Sak H., Senior A., and Beaufays F., “Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling,” Interspeech, no. September, pp. 338–342, 2014.
  • [32] Bengio Y., Simard P., and Frasconi P., “Learning long-term dependencies with gradient descent is difficult,” IEEE Trans. Neural Networks, vol. 5, pp. 157–166, 1994.
  • [33] Hochreiter S. and Schmidhuber J., “Long Short-Term Memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
  • [34] Cummins F., Gers F., and Schmidhuber J., “Automatic discrimination among languages based on prosody alone,” no. IDSIA-03-99, 1999.
  • [35] De Jong N. H. and Wempe T., “Praat script to detect syllable nuclei and measure speech rate automatically,” Behav. Res. Methods, vol. 41, no. 2, pp. 385–390, 2009.
  • [36] Bottou L., “Large-Scale Machine Learning with Stochastic Gradient Descent,” Proc. COMPSTAT’2010, pp. 177–186, 2010.
  • [37] Chollet F., “Keras,” Github, 2015. [Online]. Available: https://github.com/fchollet/keras. [Accessed: 15-Nov-2017].
  • [38] Williams R. J. and Peng J., “An efficient gradient-based algorithm for online training of recurrent network trajectories,” Neural Comput., vol. 4, pp. 491–501, 1990.
Yıl 2020, Cilt: 35 Sayı: 1, 213 - 224, 25.10.2019
https://doi.org/10.17341/gazimmfd.453677

Öz

Kaynakça

  • [1] Muthusamy Y. K., Barnard E. and Cole R. A., “Reviewing Automatic Language Identification,” IEEE Signal Process. Mag., vol. 11, no. 4, pp. 33–41, 1994.
  • [2] Zhao J., Shu H., Zhang L., Wang X., Gong Q., and Li P., “Cortical competition during language discrimination,” Neuroimage, vol. 43, no. 3, pp. 624–633, 2008.
  • [3] Kaya Y. and Ertuğrul Ö. F., “A novel feature extraction approach for text-based language identification: Binary patterns,” Journal of the Faculty of Engineering and Architecture of Gazi University, vol. 31, no. 4, pp. 1085–94, 2016.
  • [4] Ramus F. and Mehler J., “Language identification with suprasegmental cues: a study based on speech resynthesis.,” J. Acoust. Soc. Am., vol. 105, no. 1, pp. 512–21, 1999.
  • [5] Sugiyama M., “Automatic Language Recognition Using Acoustic Features,” IEEE Int. Conf. Acoust. Speech, Signal Process., pp. 813–816 vol.2, 1991.
  • [6] Zissman M. A., “Comparison of four approaches to automatic language identification of telephone speech,” IEEE Trans. Speech Audio Process., vol. 4, no. 1, pp. 31–44, 1996.
  • [7] Demir N., “Ağız Terimi Üzerine,” Türkbilig, pp. 105–116, 2002.
  • [8] Etman A. and Louis A. A., “American dialect identification using phonotactic and prosodic features,” IntelliSys 2015 - Proc. 2015 SAI Intell. Syst. Conf., pp. 963–970, 2015.
  • [9] Huang R., Hansen J. H. L., and Angkititrakul P., “Dialect/Accent Classification Using Unrestricted Audio,” IEEE Trans. Audio. Speech. Lang. Processing, vol. 15, no. 2, 2007.
  • [10] Biadsy F., “Automatic Dialect and Accent Recognition and its Application to Speech Recognition,” PhD Thesis,Columbia Univ., pp. 1–171, 2011.
  • [11] Lin C. Y. and Wang H. C., “Language identification using pitch contour information,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. I, pp. 601–604, 2005.
  • [12] Martinez D., Lleida E., Ortega A., Miguel A., “Prosodic Features and Formant Modeling for an Ivector-Based Language Recognition System,” 2013 Ieee Int. Conf. Acoust. Speech Signal Process., pp. 6847–6851, 2013.
  • [13] Adami A. G. and Hermansky H., “Segmentation of Speech for Speaker and Language Recognition OGI School of Science and Engineering , Oregon Health and Science University , Portland , USA International Computer Science Institute , Berkeley , California , USA,” Eurospeech, pp. 1–4, 2003.
  • [14] Dehak N., Dumouchel P., and Kenny P., “Modeling prosodic features with joint factor analysis for speaker verification,” IEEE Trans. Audio, Speech Lang. Process., vol. 15, no. 7, pp. 2095–2103, 2007.
  • [15] Adami A. G., “Modeling prosodic differences for speaker recognition,” Speech Commun., vol. 49, no. 4, pp. 277–291, 2007.
  • [16] Rouas J. L., “Automatic prosodic variations modeling for language and dialect discrimination,” IEEE Trans. Audio, Speech Lang. Process., vol. 15, no. 6, pp. 1904–1911, 2007.
  • [17] Thyme-Gobbel A. E. and Hutchins S. E., “On using prosodic cues in automatic language identification,” Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP ’96, vol. 3. 1996.
  • [18] Ng R. W. M., Lee T., and Leung C., “Spoken Language Recognition With Prosodic Features,” vol. 21, no. 9, pp. 1841–1853, 2013.
  • [19] Fernandez R., Rendel A., Ramabhadran B., and Hoory R., “Prosody Contour Prediction with {Long Short-Term Memory}, Bi-Directional, Deep Recurrent Neural Networks,” Proc. Interspeech, no. September, pp. 2268–2272, 2014.
  • [20] Sundermeyer M., Schl R., and Ney H., “LSTM Neural Networks for Language Modeling,” Proc. Interspeech, pp. 194–197, 2012.
  • [21] Mikolov T., Karafiat M., Burget L., Cernocky J., and Khudanpur S., “Recurrent Neural Network based Language Model,” Interspeech, no. September, pp. 1045–1048, 2010.
  • [22] Mary L. and Yegnanarayana B., “Extraction and representation of prosodic features for language and speaker recognition,” Speech Commun., vol. 50, no. 10, pp. 782–796, 2008.
  • [23] Kockmann M., Burget L., and Cernocky J. H., “Investigations into prosodic syllable contour features for speaker recognition,” in ICASSP 2010, 2010, pp. 4418–4421.
  • [24] Ferrer L., Bratt H., Richey C., Franco H., Abrash V., and Precoda K., “Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems,” Speech Commun., vol. 69, pp. 31–45, 2015.
  • [25] Foil J. T., “Language identification using noisy speech,” in ICASSP 1986, 1986, pp. 861–864.
  • [26] Işık G. and Artuner H., “A Dataset For Turkish Dialect Recognition and Classification with Deep Learning,” in 26. IEEE Signal Processing and Communications Applications Conference (SIU), 2018.
  • [27] Demir N., “Ağız Araştırmalarında Kaynak Kişi Meselesi,” Folk. Prof. Dr. Dursun Yıldırım Armağanı, p. 11, 1998.
  • [28] Boersma P. and Weenink D., “Praat: doing phonetics by computer [Computer program],” 2018. [Online]. Available: http://www.praat.org/. [Accessed: 03-Feb-2018].
  • [29] Graves A. and Jaitly N., “Towards End-To-End Speech Recognition with Recurrent Neural Networks,” JMLR Workshop Conf. Proc., vol. 32, no. 1, pp. 1764–1772, 2014.
  • [30] Sutskever I., Vinyals O., and V Le Q., “Sequence to sequence learning with neural networks,” Adv. Neural Inf. Process. Syst., pp. 3104–3112, 2014.
  • [31] Sak H., Senior A., and Beaufays F., “Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling,” Interspeech, no. September, pp. 338–342, 2014.
  • [32] Bengio Y., Simard P., and Frasconi P., “Learning long-term dependencies with gradient descent is difficult,” IEEE Trans. Neural Networks, vol. 5, pp. 157–166, 1994.
  • [33] Hochreiter S. and Schmidhuber J., “Long Short-Term Memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
  • [34] Cummins F., Gers F., and Schmidhuber J., “Automatic discrimination among languages based on prosody alone,” no. IDSIA-03-99, 1999.
  • [35] De Jong N. H. and Wempe T., “Praat script to detect syllable nuclei and measure speech rate automatically,” Behav. Res. Methods, vol. 41, no. 2, pp. 385–390, 2009.
  • [36] Bottou L., “Large-Scale Machine Learning with Stochastic Gradient Descent,” Proc. COMPSTAT’2010, pp. 177–186, 2010.
  • [37] Chollet F., “Keras,” Github, 2015. [Online]. Available: https://github.com/fchollet/keras. [Accessed: 15-Nov-2017].
  • [38] Williams R. J. and Peng J., “An efficient gradient-based algorithm for online training of recurrent network trajectories,” Neural Comput., vol. 4, pp. 491–501, 1990.
Toplam 38 adet kaynakça vardır.

Ayrıntılar

Birincil Dil Türkçe
Bölüm Makaleler
Yazarlar

Gültekin Işık 0000-0003-3037-5586

Harun Artuner 0000-0002-6044-379X

Yayımlanma Tarihi 25 Ekim 2019
Gönderilme Tarihi 15 Ağustos 2018
Kabul Tarihi 15 Aralık 2018
Yayımlandığı Sayı Yıl 2020 Cilt: 35 Sayı: 1

Kaynak Göster

APA Işık, G., & Artuner, H. (2019). Uzun kısa-dönem bellekli sinir ağlarıyla prozodik açıdan türkçe ağız tanıma. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi, 35(1), 213-224. https://doi.org/10.17341/gazimmfd.453677
AMA Işık G, Artuner H. Uzun kısa-dönem bellekli sinir ağlarıyla prozodik açıdan türkçe ağız tanıma. GUMMFD. Ekim 2019;35(1):213-224. doi:10.17341/gazimmfd.453677
Chicago Işık, Gültekin, ve Harun Artuner. “Uzun kısa-dönem Bellekli Sinir ağlarıyla Prozodik açıdan türkçe ağız tanıma”. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi 35, sy. 1 (Ekim 2019): 213-24. https://doi.org/10.17341/gazimmfd.453677.
EndNote Işık G, Artuner H (01 Ekim 2019) Uzun kısa-dönem bellekli sinir ağlarıyla prozodik açıdan türkçe ağız tanıma. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi 35 1 213–224.
IEEE G. Işık ve H. Artuner, “Uzun kısa-dönem bellekli sinir ağlarıyla prozodik açıdan türkçe ağız tanıma”, GUMMFD, c. 35, sy. 1, ss. 213–224, 2019, doi: 10.17341/gazimmfd.453677.
ISNAD Işık, Gültekin - Artuner, Harun. “Uzun kısa-dönem Bellekli Sinir ağlarıyla Prozodik açıdan türkçe ağız tanıma”. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi 35/1 (Ekim 2019), 213-224. https://doi.org/10.17341/gazimmfd.453677.
JAMA Işık G, Artuner H. Uzun kısa-dönem bellekli sinir ağlarıyla prozodik açıdan türkçe ağız tanıma. GUMMFD. 2019;35:213–224.
MLA Işık, Gültekin ve Harun Artuner. “Uzun kısa-dönem Bellekli Sinir ağlarıyla Prozodik açıdan türkçe ağız tanıma”. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi, c. 35, sy. 1, 2019, ss. 213-24, doi:10.17341/gazimmfd.453677.
Vancouver Işık G, Artuner H. Uzun kısa-dönem bellekli sinir ağlarıyla prozodik açıdan türkçe ağız tanıma. GUMMFD. 2019;35(1):213-24.