Uzun kısa-dönem bellekli sinir ağlarıyla prozodik açıdan türkçe ağız tanıma

Gültekin Işık; Harun Artuner

doi:10.17341/gazimmfd.453677

Araştırma Makalesi

Uzun kısa-dönem bellekli sinir ağlarıyla prozodik açıdan türkçe ağız tanıma

Yıl 2020, Cilt: 35 Sayı: 1, 213 - 224, 25.10.2019

Gültekin Işık Harun Artuner

https://doi.org/10.17341/gazimmfd.453677

Cited By: 5

Öz

Ağızlar ait oldukları
dilden bazı özellikler bakımından ayrılan ve ülkenin belli bir bölgesine özgü
olan konuşma biçimleridir. Ağızlara özgü karakteristiklerin elde edilmesi ve
bunlar kullanılarak ağızların tanınması, konuşma işleme alanında popüler
konular arasındadır. Özellikle, büyük ölçekli konuşma tanıma sistemlerinin
başarımlarını arttırmak için konuşmanın ağzının öncelikli olarak belirlenmesi
istenmektedir. Diller/ağızlar birbirinden tonlama, vurgu ve ritim gibi prozodik
özelliklerle ayrılır. Bu algısal özellikler fiziksel düzeyde sırasıyla perde,
enerji ve sürenin ölçülmesiyle elde edilmektedir. Son yıllarda, derin sinir
ağlarının popüler hale gelmesiyle birlikte Uzun Kısa-Dönem Bellekli (LSTM)
sinir ağları dizi sınıflandırma ve dil modelleme problemlerinde sıklıkla
kullanılmaktadır. LSTM sinir ağları, uzun dönemli bağlam bilgisini modellemede
başarılıdır. Bu çalışmada prozodik özellikler kullanılarak LSTM sinir ağları
ile Türkçe ağız tanıma yapılmıştır. Burada LSTM sinir ağları hem dizi
sınıflandırıcı hem de dil modelleyici olarak kullanılmıştır. Önerilen
yöntemlerin Ankara, Alanya, Kıbrıs ve Trabzon ağızlarından oluşan Türkçe veri
kümesi üzerinde %78,7 doğruluk oranı verdiği gözlenmiştir.

Anahtar Kelimeler

Türkçe ağız tanıma, uzun kısa-dönem bellekli sinir ağları, prozodi, dil modeli, legendre polinomları

Kaynakça

[1] Muthusamy Y. K., Barnard E. and Cole R. A., “Reviewing Automatic Language Identification,” IEEE Signal Process. Mag., vol. 11, no. 4, pp. 33–41, 1994.
[2] Zhao J., Shu H., Zhang L., Wang X., Gong Q., and Li P., “Cortical competition during language discrimination,” Neuroimage, vol. 43, no. 3, pp. 624–633, 2008.
[3] Kaya Y. and Ertuğrul Ö. F., “A novel feature extraction approach for text-based language identification: Binary patterns,” Journal of the Faculty of Engineering and Architecture of Gazi University, vol. 31, no. 4, pp. 1085–94, 2016.
[4] Ramus F. and Mehler J., “Language identification with suprasegmental cues: a study based on speech resynthesis.,” J. Acoust. Soc. Am., vol. 105, no. 1, pp. 512–21, 1999.
[5] Sugiyama M., “Automatic Language Recognition Using Acoustic Features,” IEEE Int. Conf. Acoust. Speech, Signal Process., pp. 813–816 vol.2, 1991.
[6] Zissman M. A., “Comparison of four approaches to automatic language identification of telephone speech,” IEEE Trans. Speech Audio Process., vol. 4, no. 1, pp. 31–44, 1996.
[7] Demir N., “Ağız Terimi Üzerine,” Türkbilig, pp. 105–116, 2002.
[8] Etman A. and Louis A. A., “American dialect identification using phonotactic and prosodic features,” IntelliSys 2015 - Proc. 2015 SAI Intell. Syst. Conf., pp. 963–970, 2015.
[9] Huang R., Hansen J. H. L., and Angkititrakul P., “Dialect/Accent Classification Using Unrestricted Audio,” IEEE Trans. Audio. Speech. Lang. Processing, vol. 15, no. 2, 2007.
[10] Biadsy F., “Automatic Dialect and Accent Recognition and its Application to Speech Recognition,” PhD Thesis,Columbia Univ., pp. 1–171, 2011.
[11] Lin C. Y. and Wang H. C., “Language identification using pitch contour information,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. I, pp. 601–604, 2005.
[12] Martinez D., Lleida E., Ortega A., Miguel A., “Prosodic Features and Formant Modeling for an Ivector-Based Language Recognition System,” 2013 Ieee Int. Conf. Acoust. Speech Signal Process., pp. 6847–6851, 2013.
[13] Adami A. G. and Hermansky H., “Segmentation of Speech for Speaker and Language Recognition OGI School of Science and Engineering , Oregon Health and Science University , Portland , USA International Computer Science Institute , Berkeley , California , USA,” Eurospeech, pp. 1–4, 2003.
[14] Dehak N., Dumouchel P., and Kenny P., “Modeling prosodic features with joint factor analysis for speaker verification,” IEEE Trans. Audio, Speech Lang. Process., vol. 15, no. 7, pp. 2095–2103, 2007.
[15] Adami A. G., “Modeling prosodic differences for speaker recognition,” Speech Commun., vol. 49, no. 4, pp. 277–291, 2007.
[16] Rouas J. L., “Automatic prosodic variations modeling for language and dialect discrimination,” IEEE Trans. Audio, Speech Lang. Process., vol. 15, no. 6, pp. 1904–1911, 2007.
[17] Thyme-Gobbel A. E. and Hutchins S. E., “On using prosodic cues in automatic language identification,” Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP ’96, vol. 3. 1996.
[18] Ng R. W. M., Lee T., and Leung C., “Spoken Language Recognition With Prosodic Features,” vol. 21, no. 9, pp. 1841–1853, 2013.
[19] Fernandez R., Rendel A., Ramabhadran B., and Hoory R., “Prosody Contour Prediction with {Long Short-Term Memory}, Bi-Directional, Deep Recurrent Neural Networks,” Proc. Interspeech, no. September, pp. 2268–2272, 2014.
[20] Sundermeyer M., Schl R., and Ney H., “LSTM Neural Networks for Language Modeling,” Proc. Interspeech, pp. 194–197, 2012.
[21] Mikolov T., Karafiat M., Burget L., Cernocky J., and Khudanpur S., “Recurrent Neural Network based Language Model,” Interspeech, no. September, pp. 1045–1048, 2010.
[22] Mary L. and Yegnanarayana B., “Extraction and representation of prosodic features for language and speaker recognition,” Speech Commun., vol. 50, no. 10, pp. 782–796, 2008.
[23] Kockmann M., Burget L., and Cernocky J. H., “Investigations into prosodic syllable contour features for speaker recognition,” in ICASSP 2010, 2010, pp. 4418–4421.
[24] Ferrer L., Bratt H., Richey C., Franco H., Abrash V., and Precoda K., “Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems,” Speech Commun., vol. 69, pp. 31–45, 2015.
[25] Foil J. T., “Language identification using noisy speech,” in ICASSP 1986, 1986, pp. 861–864.
[26] Işık G. and Artuner H., “A Dataset For Turkish Dialect Recognition and Classification with Deep Learning,” in 26. IEEE Signal Processing and Communications Applications Conference (SIU), 2018.
[27] Demir N., “Ağız Araştırmalarında Kaynak Kişi Meselesi,” Folk. Prof. Dr. Dursun Yıldırım Armağanı, p. 11, 1998.
[28] Boersma P. and Weenink D., “Praat: doing phonetics by computer [Computer program],” 2018. [Online]. Available: http://www.praat.org/. [Accessed: 03-Feb-2018].
[29] Graves A. and Jaitly N., “Towards End-To-End Speech Recognition with Recurrent Neural Networks,” JMLR Workshop Conf. Proc., vol. 32, no. 1, pp. 1764–1772, 2014.
[30] Sutskever I., Vinyals O., and V Le Q., “Sequence to sequence learning with neural networks,” Adv. Neural Inf. Process. Syst., pp. 3104–3112, 2014.
[31] Sak H., Senior A., and Beaufays F., “Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling,” Interspeech, no. September, pp. 338–342, 2014.
[32] Bengio Y., Simard P., and Frasconi P., “Learning long-term dependencies with gradient descent is difficult,” IEEE Trans. Neural Networks, vol. 5, pp. 157–166, 1994.
[33] Hochreiter S. and Schmidhuber J., “Long Short-Term Memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
[34] Cummins F., Gers F., and Schmidhuber J., “Automatic discrimination among languages based on prosody alone,” no. IDSIA-03-99, 1999.
[35] De Jong N. H. and Wempe T., “Praat script to detect syllable nuclei and measure speech rate automatically,” Behav. Res. Methods, vol. 41, no. 2, pp. 385–390, 2009.
[36] Bottou L., “Large-Scale Machine Learning with Stochastic Gradient Descent,” Proc. COMPSTAT’2010, pp. 177–186, 2010.
[37] Chollet F., “Keras,” Github, 2015. [Online]. Available: https://github.com/fchollet/keras. [Accessed: 15-Nov-2017].
[38] Williams R. J. and Peng J., “An efficient gradient-based algorithm for online training of recurrent network trajectories,” Neural Comput., vol. 4, pp. 491–501, 1990.

Yıl 2020, Cilt: 35 Sayı: 1, 213 - 224, 25.10.2019

Gültekin Işık Harun Artuner

https://doi.org/10.17341/gazimmfd.453677

Cited By: 5

Öz

Kaynakça

[1] Muthusamy Y. K., Barnard E. and Cole R. A., “Reviewing Automatic Language Identification,” IEEE Signal Process. Mag., vol. 11, no. 4, pp. 33–41, 1994.
[2] Zhao J., Shu H., Zhang L., Wang X., Gong Q., and Li P., “Cortical competition during language discrimination,” Neuroimage, vol. 43, no. 3, pp. 624–633, 2008.
[3] Kaya Y. and Ertuğrul Ö. F., “A novel feature extraction approach for text-based language identification: Binary patterns,” Journal of the Faculty of Engineering and Architecture of Gazi University, vol. 31, no. 4, pp. 1085–94, 2016.
[4] Ramus F. and Mehler J., “Language identification with suprasegmental cues: a study based on speech resynthesis.,” J. Acoust. Soc. Am., vol. 105, no. 1, pp. 512–21, 1999.
[5] Sugiyama M., “Automatic Language Recognition Using Acoustic Features,” IEEE Int. Conf. Acoust. Speech, Signal Process., pp. 813–816 vol.2, 1991.
[6] Zissman M. A., “Comparison of four approaches to automatic language identification of telephone speech,” IEEE Trans. Speech Audio Process., vol. 4, no. 1, pp. 31–44, 1996.
[7] Demir N., “Ağız Terimi Üzerine,” Türkbilig, pp. 105–116, 2002.
[8] Etman A. and Louis A. A., “American dialect identification using phonotactic and prosodic features,” IntelliSys 2015 - Proc. 2015 SAI Intell. Syst. Conf., pp. 963–970, 2015.
[9] Huang R., Hansen J. H. L., and Angkititrakul P., “Dialect/Accent Classification Using Unrestricted Audio,” IEEE Trans. Audio. Speech. Lang. Processing, vol. 15, no. 2, 2007.
[10] Biadsy F., “Automatic Dialect and Accent Recognition and its Application to Speech Recognition,” PhD Thesis,Columbia Univ., pp. 1–171, 2011.
[11] Lin C. Y. and Wang H. C., “Language identification using pitch contour information,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. I, pp. 601–604, 2005.
[12] Martinez D., Lleida E., Ortega A., Miguel A., “Prosodic Features and Formant Modeling for an Ivector-Based Language Recognition System,” 2013 Ieee Int. Conf. Acoust. Speech Signal Process., pp. 6847–6851, 2013.
[13] Adami A. G. and Hermansky H., “Segmentation of Speech for Speaker and Language Recognition OGI School of Science and Engineering , Oregon Health and Science University , Portland , USA International Computer Science Institute , Berkeley , California , USA,” Eurospeech, pp. 1–4, 2003.
[14] Dehak N., Dumouchel P., and Kenny P., “Modeling prosodic features with joint factor analysis for speaker verification,” IEEE Trans. Audio, Speech Lang. Process., vol. 15, no. 7, pp. 2095–2103, 2007.
[15] Adami A. G., “Modeling prosodic differences for speaker recognition,” Speech Commun., vol. 49, no. 4, pp. 277–291, 2007.
[16] Rouas J. L., “Automatic prosodic variations modeling for language and dialect discrimination,” IEEE Trans. Audio, Speech Lang. Process., vol. 15, no. 6, pp. 1904–1911, 2007.
[17] Thyme-Gobbel A. E. and Hutchins S. E., “On using prosodic cues in automatic language identification,” Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP ’96, vol. 3. 1996.
[18] Ng R. W. M., Lee T., and Leung C., “Spoken Language Recognition With Prosodic Features,” vol. 21, no. 9, pp. 1841–1853, 2013.
[19] Fernandez R., Rendel A., Ramabhadran B., and Hoory R., “Prosody Contour Prediction with {Long Short-Term Memory}, Bi-Directional, Deep Recurrent Neural Networks,” Proc. Interspeech, no. September, pp. 2268–2272, 2014.
[20] Sundermeyer M., Schl R., and Ney H., “LSTM Neural Networks for Language Modeling,” Proc. Interspeech, pp. 194–197, 2012.
[21] Mikolov T., Karafiat M., Burget L., Cernocky J., and Khudanpur S., “Recurrent Neural Network based Language Model,” Interspeech, no. September, pp. 1045–1048, 2010.
[22] Mary L. and Yegnanarayana B., “Extraction and representation of prosodic features for language and speaker recognition,” Speech Commun., vol. 50, no. 10, pp. 782–796, 2008.
[23] Kockmann M., Burget L., and Cernocky J. H., “Investigations into prosodic syllable contour features for speaker recognition,” in ICASSP 2010, 2010, pp. 4418–4421.
[24] Ferrer L., Bratt H., Richey C., Franco H., Abrash V., and Precoda K., “Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems,” Speech Commun., vol. 69, pp. 31–45, 2015.
[25] Foil J. T., “Language identification using noisy speech,” in ICASSP 1986, 1986, pp. 861–864.
[26] Işık G. and Artuner H., “A Dataset For Turkish Dialect Recognition and Classification with Deep Learning,” in 26. IEEE Signal Processing and Communications Applications Conference (SIU), 2018.
[27] Demir N., “Ağız Araştırmalarında Kaynak Kişi Meselesi,” Folk. Prof. Dr. Dursun Yıldırım Armağanı, p. 11, 1998.
[28] Boersma P. and Weenink D., “Praat: doing phonetics by computer [Computer program],” 2018. [Online]. Available: http://www.praat.org/. [Accessed: 03-Feb-2018].
[29] Graves A. and Jaitly N., “Towards End-To-End Speech Recognition with Recurrent Neural Networks,” JMLR Workshop Conf. Proc., vol. 32, no. 1, pp. 1764–1772, 2014.
[30] Sutskever I., Vinyals O., and V Le Q., “Sequence to sequence learning with neural networks,” Adv. Neural Inf. Process. Syst., pp. 3104–3112, 2014.
[31] Sak H., Senior A., and Beaufays F., “Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling,” Interspeech, no. September, pp. 338–342, 2014.
[32] Bengio Y., Simard P., and Frasconi P., “Learning long-term dependencies with gradient descent is difficult,” IEEE Trans. Neural Networks, vol. 5, pp. 157–166, 1994.
[33] Hochreiter S. and Schmidhuber J., “Long Short-Term Memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
[34] Cummins F., Gers F., and Schmidhuber J., “Automatic discrimination among languages based on prosody alone,” no. IDSIA-03-99, 1999.
[35] De Jong N. H. and Wempe T., “Praat script to detect syllable nuclei and measure speech rate automatically,” Behav. Res. Methods, vol. 41, no. 2, pp. 385–390, 2009.
[36] Bottou L., “Large-Scale Machine Learning with Stochastic Gradient Descent,” Proc. COMPSTAT’2010, pp. 177–186, 2010.
[37] Chollet F., “Keras,” Github, 2015. [Online]. Available: https://github.com/fchollet/keras. [Accessed: 15-Nov-2017].
[38] Williams R. J. and Peng J., “An efficient gradient-based algorithm for online training of recurrent network trajectories,” Neural Comput., vol. 4, pp. 491–501, 1990.

Toplam 38 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Bölüm	Makaleler
Yazarlar	Gültekin Işık 0000-0003-3037-5586 Harun Artuner 0000-0002-6044-379X
Yayımlanma Tarihi	25 Ekim 2019
Gönderilme Tarihi	15 Ağustos 2018
Kabul Tarihi	15 Aralık 2018
Yayımlandığı Sayı	Yıl 2020 Cilt: 35 Sayı: 1

Kaynak Göster

APA	Işık, G., & Artuner, H. (2019). Uzun kısa-dönem bellekli sinir ağlarıyla prozodik açıdan türkçe ağız tanıma. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi, 35(1), 213-224. https://doi.org/10.17341/gazimmfd.453677
AMA	Işık G, Artuner H. Uzun kısa-dönem bellekli sinir ağlarıyla prozodik açıdan türkçe ağız tanıma. GUMMFD. Ekim 2019;35(1):213-224. doi:10.17341/gazimmfd.453677
Chicago	Işık, Gültekin, ve Harun Artuner. “Uzun kısa-dönem Bellekli Sinir ağlarıyla Prozodik açıdan türkçe ağız tanıma”. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi 35, sy. 1 (Ekim 2019): 213-24. https://doi.org/10.17341/gazimmfd.453677.
EndNote	Işık G, Artuner H (01 Ekim 2019) Uzun kısa-dönem bellekli sinir ağlarıyla prozodik açıdan türkçe ağız tanıma. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi 35 1 213–224.
IEEE	G. Işık ve H. Artuner, “Uzun kısa-dönem bellekli sinir ağlarıyla prozodik açıdan türkçe ağız tanıma”, GUMMFD, c. 35, sy. 1, ss. 213–224, 2019, doi: 10.17341/gazimmfd.453677.
ISNAD	Işık, Gültekin - Artuner, Harun. “Uzun kısa-dönem Bellekli Sinir ağlarıyla Prozodik açıdan türkçe ağız tanıma”. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi 35/1 (Ekim 2019), 213-224. https://doi.org/10.17341/gazimmfd.453677.
JAMA	Işık G, Artuner H. Uzun kısa-dönem bellekli sinir ağlarıyla prozodik açıdan türkçe ağız tanıma. GUMMFD. 2019;35:213–224.
MLA	Işık, Gültekin ve Harun Artuner. “Uzun kısa-dönem Bellekli Sinir ağlarıyla Prozodik açıdan türkçe ağız tanıma”. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi, c. 35, sy. 1, 2019, ss. 213-24, doi:10.17341/gazimmfd.453677.
Vancouver	Işık G, Artuner H. Uzun kısa-dönem bellekli sinir ağlarıyla prozodik açıdan türkçe ağız tanıma. GUMMFD. 2019;35(1):213-24.

Cited By

Deep Learning Approaches for Classification of Breast Cancer in Ultrasound (US) Images

Journal of the Institute of Science and Technology

https://doi.org/10.21597/jist.1183679

Hyper-parameter optimization of deep learning architectures using artificial bee colony (ABC) algorithm for high performance real-time automatic colorectal cancer (CRC) polyp detection

Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi

Uzun kısa-dönem bellekli sinir ağlarıyla prozodik açıdan türkçe ağız tanıma

Öz

Anahtar Kelimeler

Kaynakça

Öz

Kaynakça

Ayrıntılar

Kaynak Göster

Cited By

Deep Learning Approaches for Classification of Breast Cancer in Ultrasound (US) Images

Journal of the Institute of Science and Technology

https://doi.org/10.21597/jist.1183679

Hyper-parameter optimization of deep learning architectures using artificial bee colony (ABC) algorithm for high performance real-time automatic colorectal cancer (CRC) polyp detection

Applied Intelligence

https://doi.org/10.1007/s10489-022-04299-1

Yoğunluk tabanlı kümeleme yöntemiyle karakteristiği oluşturulan yollar için RNN yöntemi ile kısa zamanlı trafik hız tahmini

Gazi Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi

https://doi.org/10.17341/gazimmfd.921035

Twitter iletileri duygu değerleri ve Bist 30 endeksi günlük değer değişimlerinin Pearson korelasyonu ve Granger nedensellik analizi

Gazi Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi

Aysun GÜRAN

https://doi.org/10.17341/gazimmfd.660018

Turkish Speech Recognition Techniques and Applications of Recurrent Units (LSTM and GRU)

GAZI UNIVERSITY JOURNAL OF SCIENCE

Burak TOMBALOĞLU

https://doi.org/10.35378/gujs.816499