Deep Learning Based Automatic Speech Recognition for Turkish

Burak Tombaloğlu; Hamit Erdem

doi:10.16984/saufenbilder.711888

Research Article

Year 2020, Volume: 24 Issue: 4, 725 - 739, 01.08.2020

Burak Tombaloğlu Hamit Erdem

https://doi.org/10.16984/saufenbilder.711888

Cited By: 2

Abstract

References

‘The most spoken languages worldwide (native speakers in millions) - Statistica, The Statistics portal’, https:// www.statista.com / statistics / 266808 / the- most-spoken-languages-worldwide/, accessed 19 November 2018.
Wang L, Tomg R, Leung C, Sivadas S, Ni C, Ma, B., ‘Cloud-Based Automatic Speech recognition System for Southeast Asian Languages’, International Conference on Orange Technologies (ICOT), IEEE, 2017, pp. 147-150.
Varjokallio, M., Kurimo, M., Virpioja, S., ‘Learning a Subword Vocabulary Based on Unigram Likelihood’, IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2013, pp. 7-12.
Varjokallio, M., Kurimo, ‘A Word – Level Token – Passing Decoder for Subword N-gram LVCSR’, IEEE Spoken Language Technology Workshop (SLT), 2014, pp. 495-500.
Smit, P., Gangireddy, S., R., Enarvi, S., Virpioja, S., Kurimo, M., ‘Character-Based Units for Unlimited Vocabulary Continuous Speech Recognition’, IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2017, pp. 149-156.
Mihajlik, P., Tüske, Z., Tárjan, B., Németh, B., Fegyó, T., ‘Improved Recognition of Spontaneous Hungarian Speech-Morphological and Acoustic Modeling Techniques for a Less Resourced Task’, IEEE Transactions On Audio, Speech, And Language Processing, Vol. 18, No. 6, August 2010, pp. 1588-1600
Arısoy E., Saraclar M., ‘Language Modelling Approaches for Turkish Large Vocabulary Continuous Speech Recognition Based on Lattice Rescoring’, 14th Signal Processing and Communications Applications, IEEE, 2006
Aksungurlu T., Parlak S., Sak H, Saraçlar M., ‘Comparison of Language Modelling Approaches for Turkish Broadcast News’, 16th Signal Processing, Communication and Applications Conference, IEEE, 2008
Arısoy, E., ‘Devoloping an Automatic Transcription and Retrieval system for Spoken Lectures in Turkish’, 25th Signal Processing and Communications Applications Conference (SIU), IEEE, 2017
Dhankar, A., ‘Study of deep Learning and CMU Sphinx in Automatic Speech Recognition’, International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2017, pp. 2296-2301.
Salor,O., Pellom, B.L., Çiloğlu, T., Demirekler, M., ‘Turkish speech corpora and recognition tools developed by porting SONIC: (Towards multilingual speech recognition)’, Computer Speech and Language, Elsevier, 2007, 21, pp. 580–593.
Bayer, A., O., Çiloglu, T., Yondem, M., T., ‘Investigation of Different Language Models for Turkish Speech Recognition’, 14th Signal Processing and Communications Applications, IEEE, 2006
Susman, D., Köprü, S., Yazıcı, A., ‘Turkish Large Vocabulary Continuous Speech Recognition By Using Limited Audio Corpus’, 20th Signal Processing and Communications Applications Conference (SIU), IEEE, 2012
Arısoy E., Saraclar M., ‘Compositional Neural Network Language Models for Agglutinative Languages’, Interspeech 2016, San Francisco, USA, pp. 3494-3498
Büyük, O., Kimanuka, U. A., ‘Turkish Speech Recognition Based on Deep Neural Networks’, Süleyman Demirel University Journal of Natural and Applied Sciences Volume 22, Special Issue, 2018, pp. 319-329
Büyük, O., ‘A new database for Turkish speech recognition on mobile devices and initial speech recognition results using the database’, Pamukkale University Journal of Engineering Sciences Volume 24-2, 2018, pp. 180-184
Ruan, W., Gan, Z., Liu, B., Guo Y., ‘An Improved Tibetan Lhasa Speech Recognition Method Based on Deep Neural Network’, 10th International Conference on Intelligent Computation Technology and Automation, IEEE, 2017, pp. 303-306
Keser, S., Edizkan, R., ‘Phoneme-Based Isolated Turkish Word Recognition With Subspace Classifier’, 17th Signal Processing and Communications Applications Conference , IEEE, 2009.
Asefisaray, B., Haznedaroğlu , A., Erden, M., Arslan, L., M., “Transfer Learning for Automatic Speech Recognition Systems”, 26th Signal Processing and Communications Applications Conference (SIU), 2018
Tombaloğlu, B., Erdem, H., “Development of a MFCC-SVM Based Turkish Speech Recognition system”, 24th Signal Processing and Communication Application Conference (SIU), 2016
Tombaloğlu, B., Erdem, H., “ A SVM based speech to text converter for Turkish language”, 25th Signal Processing and Communication Application Conference (SIU), 2017
Arısoy E., Saraclar M., ‘Lattice Extension and Vocabulary Adaptation for Turkish LVCSR’, IEEE Transactıons on Audio, Speech and Language Processıng, vol. 17, no. 1, 2009
Tunalı, V., ‘A Speaker Dependent Large Vocabulary Isolated Word Speech Recognition System for Turkish’, Msc. Thesis, Marmara University, 2005.
Büyük O., ‘Sub-Word Language Modelling for Turkish Speech Recognition’, Msc. Thesis, Sabanci University, 2005.
Salor, Ö., Pellom,B., Çiloğlu, T., Hacıoğlu, K. and Demirekler, M., ‘On developing new text and audio corpora and speech recognition tools for the Turkish language, ICSLP-2002: Inter. Conf. On Spoken Language Processing, Denver, Colorado USA, pp. 349–352.
Ergenç, İ., ‘Konuşma Dili ve Türkçenin söyleyiş sözlüğü’, Multilingual, Istanbul, 2002, p. 486.
Arısoy E., Saraclar M., ‘Turkish Dictation System for Broadcast News Applications’, 13th European Signal Processing Conference, 2005.
Arısoy E., Dutagacı H., Saraclar M., ‘A unified language model for large vocabulary continuous speech recognition of Turkish’, Signal Processing 86 , Elsevier, 2006, pp. 2844-2862.
Dutagacı H, ‘Statistical Language Models for Large Vocabulary Turkish Speech Recognition’, Msc. Thesis, Boğaziçi University, 2002.
Schiopu, D., ‘Using Statistical Methods in a Speech Recognition System for Romanian Language’, 12th IFAC Conference on Programmable Devices and Embedded Systems, 25-27 September 2013, Velke Karlovice, Czech Republic, pp. 99-103.
Köklükaya, E, Coşkun, İ., "Endüktif Öğrenmeyi Kullanarak Konuşmayı Tanıma". Sakarya University Journal of Science 7, 2003, pp. 87-94.
Gales, M., Young, S., ‘The Application of Hidden Markov Models in Speech Recognition’, Foundations and Trends in Signal Processing, Vol. 1, No. 3, 2007, pp. 195–304.
Stuttle, M., N., ‘A Gaussian Mixture Model Spectral Representation for Speech Recognition’, Ph.D. Thesis, Cambridge University, 2003.
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B., ‘Deep Neural Networks for Acoustic Modelling in Speech Recognition’, IEEE Signal Processing Magazine,Volume: 29 , Issue: 6 , Nov. 2012, pp. 82-97
Alam, M. R., Bennamoun M., Togneri R., Sohel F., ‘Deep Neural Networks for Mobile Person Recognition with Audio-Visual Signals’, Mobile Biometrics, 2017, p. 97-129.
Banumathi, A., C., Chandra, Dr. E., ‘Deep Learning Architectures, Algorithms for Speech Recognition: An Overview’, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 7, Issue 1, January 2017, pp. 213-220.
Siniscalchi, S., M., Svendsen, T., Lee, C., 'An artificial neural network approach to automatic speech processing', Neurocomputing, Elsevier, 2014, Vol. 140, pp. 326-338.
Sharan, R. V., Moir, T. J., `An overview of applications and advancements in automatic sound recognition`, Neurocomputing, Elsevier, 2016, Vol. 200, pp. 22-34.
Sustika, R., Yuliani, A. R., Zaenudin, E., Pardede, H. F., `On Comparison of Deep Learning Architectures for Distant Speech Recognition', 2nd International Conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE), IEEE, 2017.
Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., Alsaadi, F.E., `A survey of deep neural network architectures and their applications', Neurocomputing, Elsevier, 2017, Vol. 234, pp. 533-541.
Yadava, G T., Jayanna, H S., `Creating Language and Acoustic Models using Kaldi to Build An Automatic Speech Recognition System for Kannada Language', 2nd IEEE International Conference On Recent Trends in Electronics Information and Communication Technology (RTEICT), May 19-20, 2017, India, IEEE, pp. 161-165

Deep Learning Based Automatic Speech Recognition for Turkish

Year 2020, Volume: 24 Issue: 4, 725 - 739, 01.08.2020

Burak Tombaloğlu Hamit Erdem

https://doi.org/10.16984/saufenbilder.711888

Cited By: 2

Abstract

Using Deep Neural Networks (DNN) as an advanced Artificial Neural Networks (ANN) has become widespread with the development of computer technology. Although DNN has been applied for solving Automatic Speech Recognition (ASR) problem in some languages, DNN-based Turkish Speech Recognition has not been studied extensively. Turkish language is an agglutinative and a phoneme-based language. In this study, a Deep Belief Network (DBN) based Turkish phoneme and speech recognizer is developed. The proposed system recognizes words in the system vocabulary and phoneme components of out of vocabulary (OOV) words. Sub-word (morpheme) based language modelling is implemented into the system. Each phoneme of Turkish language is also modelled as a sub-word in the model. Sub-word (morpheme) based language model is widely used for agglutinative languages to prevent excessive vocabulary size. The performance of the suggested DBN based ASR system is compared with the conventional recognition method, GMM (Gaussian Mixture Method) based Hidden Markov Model (HMM). Regarding to performance metrics, the recognition rate of Turkish language is improved in compare with previous studies.

Keywords

deep, neural, networks, belief, automatic, speech, recognition, turkish, Deep Belief Networks

References

‘The most spoken languages worldwide (native speakers in millions) - Statistica, The Statistics portal’, https:// www.statista.com / statistics / 266808 / the- most-spoken-languages-worldwide/, accessed 19 November 2018.
Wang L, Tomg R, Leung C, Sivadas S, Ni C, Ma, B., ‘Cloud-Based Automatic Speech recognition System for Southeast Asian Languages’, International Conference on Orange Technologies (ICOT), IEEE, 2017, pp. 147-150.
Varjokallio, M., Kurimo, M., Virpioja, S., ‘Learning a Subword Vocabulary Based on Unigram Likelihood’, IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2013, pp. 7-12.
Varjokallio, M., Kurimo, ‘A Word – Level Token – Passing Decoder for Subword N-gram LVCSR’, IEEE Spoken Language Technology Workshop (SLT), 2014, pp. 495-500.
Smit, P., Gangireddy, S., R., Enarvi, S., Virpioja, S., Kurimo, M., ‘Character-Based Units for Unlimited Vocabulary Continuous Speech Recognition’, IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2017, pp. 149-156.
Mihajlik, P., Tüske, Z., Tárjan, B., Németh, B., Fegyó, T., ‘Improved Recognition of Spontaneous Hungarian Speech-Morphological and Acoustic Modeling Techniques for a Less Resourced Task’, IEEE Transactions On Audio, Speech, And Language Processing, Vol. 18, No. 6, August 2010, pp. 1588-1600
Arısoy E., Saraclar M., ‘Language Modelling Approaches for Turkish Large Vocabulary Continuous Speech Recognition Based on Lattice Rescoring’, 14th Signal Processing and Communications Applications, IEEE, 2006
Aksungurlu T., Parlak S., Sak H, Saraçlar M., ‘Comparison of Language Modelling Approaches for Turkish Broadcast News’, 16th Signal Processing, Communication and Applications Conference, IEEE, 2008
Arısoy, E., ‘Devoloping an Automatic Transcription and Retrieval system for Spoken Lectures in Turkish’, 25th Signal Processing and Communications Applications Conference (SIU), IEEE, 2017
Dhankar, A., ‘Study of deep Learning and CMU Sphinx in Automatic Speech Recognition’, International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2017, pp. 2296-2301.
Salor,O., Pellom, B.L., Çiloğlu, T., Demirekler, M., ‘Turkish speech corpora and recognition tools developed by porting SONIC: (Towards multilingual speech recognition)’, Computer Speech and Language, Elsevier, 2007, 21, pp. 580–593.
Bayer, A., O., Çiloglu, T., Yondem, M., T., ‘Investigation of Different Language Models for Turkish Speech Recognition’, 14th Signal Processing and Communications Applications, IEEE, 2006
Susman, D., Köprü, S., Yazıcı, A., ‘Turkish Large Vocabulary Continuous Speech Recognition By Using Limited Audio Corpus’, 20th Signal Processing and Communications Applications Conference (SIU), IEEE, 2012
Arısoy E., Saraclar M., ‘Compositional Neural Network Language Models for Agglutinative Languages’, Interspeech 2016, San Francisco, USA, pp. 3494-3498
Büyük, O., Kimanuka, U. A., ‘Turkish Speech Recognition Based on Deep Neural Networks’, Süleyman Demirel University Journal of Natural and Applied Sciences Volume 22, Special Issue, 2018, pp. 319-329
Büyük, O., ‘A new database for Turkish speech recognition on mobile devices and initial speech recognition results using the database’, Pamukkale University Journal of Engineering Sciences Volume 24-2, 2018, pp. 180-184
Ruan, W., Gan, Z., Liu, B., Guo Y., ‘An Improved Tibetan Lhasa Speech Recognition Method Based on Deep Neural Network’, 10th International Conference on Intelligent Computation Technology and Automation, IEEE, 2017, pp. 303-306
Keser, S., Edizkan, R., ‘Phoneme-Based Isolated Turkish Word Recognition With Subspace Classifier’, 17th Signal Processing and Communications Applications Conference , IEEE, 2009.
Asefisaray, B., Haznedaroğlu , A., Erden, M., Arslan, L., M., “Transfer Learning for Automatic Speech Recognition Systems”, 26th Signal Processing and Communications Applications Conference (SIU), 2018
Tombaloğlu, B., Erdem, H., “Development of a MFCC-SVM Based Turkish Speech Recognition system”, 24th Signal Processing and Communication Application Conference (SIU), 2016
Tombaloğlu, B., Erdem, H., “ A SVM based speech to text converter for Turkish language”, 25th Signal Processing and Communication Application Conference (SIU), 2017
Arısoy E., Saraclar M., ‘Lattice Extension and Vocabulary Adaptation for Turkish LVCSR’, IEEE Transactıons on Audio, Speech and Language Processıng, vol. 17, no. 1, 2009
Tunalı, V., ‘A Speaker Dependent Large Vocabulary Isolated Word Speech Recognition System for Turkish’, Msc. Thesis, Marmara University, 2005.
Büyük O., ‘Sub-Word Language Modelling for Turkish Speech Recognition’, Msc. Thesis, Sabanci University, 2005.
Salor, Ö., Pellom,B., Çiloğlu, T., Hacıoğlu, K. and Demirekler, M., ‘On developing new text and audio corpora and speech recognition tools for the Turkish language, ICSLP-2002: Inter. Conf. On Spoken Language Processing, Denver, Colorado USA, pp. 349–352.
Ergenç, İ., ‘Konuşma Dili ve Türkçenin söyleyiş sözlüğü’, Multilingual, Istanbul, 2002, p. 486.
Arısoy E., Saraclar M., ‘Turkish Dictation System for Broadcast News Applications’, 13th European Signal Processing Conference, 2005.
Arısoy E., Dutagacı H., Saraclar M., ‘A unified language model for large vocabulary continuous speech recognition of Turkish’, Signal Processing 86 , Elsevier, 2006, pp. 2844-2862.
Dutagacı H, ‘Statistical Language Models for Large Vocabulary Turkish Speech Recognition’, Msc. Thesis, Boğaziçi University, 2002.
Schiopu, D., ‘Using Statistical Methods in a Speech Recognition System for Romanian Language’, 12th IFAC Conference on Programmable Devices and Embedded Systems, 25-27 September 2013, Velke Karlovice, Czech Republic, pp. 99-103.
Köklükaya, E, Coşkun, İ., "Endüktif Öğrenmeyi Kullanarak Konuşmayı Tanıma". Sakarya University Journal of Science 7, 2003, pp. 87-94.
Gales, M., Young, S., ‘The Application of Hidden Markov Models in Speech Recognition’, Foundations and Trends in Signal Processing, Vol. 1, No. 3, 2007, pp. 195–304.
Stuttle, M., N., ‘A Gaussian Mixture Model Spectral Representation for Speech Recognition’, Ph.D. Thesis, Cambridge University, 2003.
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B., ‘Deep Neural Networks for Acoustic Modelling in Speech Recognition’, IEEE Signal Processing Magazine,Volume: 29 , Issue: 6 , Nov. 2012, pp. 82-97
Alam, M. R., Bennamoun M., Togneri R., Sohel F., ‘Deep Neural Networks for Mobile Person Recognition with Audio-Visual Signals’, Mobile Biometrics, 2017, p. 97-129.
Banumathi, A., C., Chandra, Dr. E., ‘Deep Learning Architectures, Algorithms for Speech Recognition: An Overview’, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 7, Issue 1, January 2017, pp. 213-220.
Siniscalchi, S., M., Svendsen, T., Lee, C., 'An artificial neural network approach to automatic speech processing', Neurocomputing, Elsevier, 2014, Vol. 140, pp. 326-338.
Sharan, R. V., Moir, T. J., `An overview of applications and advancements in automatic sound recognition`, Neurocomputing, Elsevier, 2016, Vol. 200, pp. 22-34.
Sustika, R., Yuliani, A. R., Zaenudin, E., Pardede, H. F., `On Comparison of Deep Learning Architectures for Distant Speech Recognition', 2nd International Conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE), IEEE, 2017.
Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., Alsaadi, F.E., `A survey of deep neural network architectures and their applications', Neurocomputing, Elsevier, 2017, Vol. 234, pp. 533-541.
Yadava, G T., Jayanna, H S., `Creating Language and Acoustic Models using Kaldi to Build An Automatic Speech Recognition System for Kannada Language', 2nd IEEE International Conference On Recent Trends in Electronics Information and Communication Technology (RTEICT), May 19-20, 2017, India, IEEE, pp. 161-165

There are 41 citations in total.

Details

Primary Language	English
Subjects	Artificial Intelligence, Computer Software, Software Testing, Verification and Validation, Electrical Engineering
Journal Section	Research Articles
Authors	Burak Tombaloğlu 0000-0003-3994-0422 Hamit Erdem 0000-0003-1704-1581
Publication Date	August 1, 2020
Submission Date	April 2, 2020
Acceptance Date	May 28, 2020
Published in Issue	Year 2020 Volume: 24 Issue: 4

Cite

APA	Tombaloğlu, B., & Erdem, H. (2020). Deep Learning Based Automatic Speech Recognition for Turkish. Sakarya University Journal of Science, 24(4), 725-739. https://doi.org/10.16984/saufenbilder.711888
AMA	Tombaloğlu B, Erdem H. Deep Learning Based Automatic Speech Recognition for Turkish. SAUJS. August 2020;24(4):725-739. doi:10.16984/saufenbilder.711888
Chicago	Tombaloğlu, Burak, and Hamit Erdem. “Deep Learning Based Automatic Speech Recognition for Turkish”. Sakarya University Journal of Science 24, no. 4 (August 2020): 725-39. https://doi.org/10.16984/saufenbilder.711888.
EndNote	Tombaloğlu B, Erdem H (August 1, 2020) Deep Learning Based Automatic Speech Recognition for Turkish. Sakarya University Journal of Science 24 4 725–739.
IEEE	B. Tombaloğlu and H. Erdem, “Deep Learning Based Automatic Speech Recognition for Turkish”, SAUJS, vol. 24, no. 4, pp. 725–739, 2020, doi: 10.16984/saufenbilder.711888.
ISNAD	Tombaloğlu, Burak - Erdem, Hamit. “Deep Learning Based Automatic Speech Recognition for Turkish”. Sakarya University Journal of Science 24/4 (August 2020), 725-739. https://doi.org/10.16984/saufenbilder.711888.
JAMA	Tombaloğlu B, Erdem H. Deep Learning Based Automatic Speech Recognition for Turkish. SAUJS. 2020;24:725–739.
MLA	Tombaloğlu, Burak and Hamit Erdem. “Deep Learning Based Automatic Speech Recognition for Turkish”. Sakarya University Journal of Science, vol. 24, no. 4, 2020, pp. 725-39, doi:10.16984/saufenbilder.711888.
Vancouver	Tombaloğlu B, Erdem H. Deep Learning Based Automatic Speech Recognition for Turkish. SAUJS. 2020;24(4):725-39.

Cited By

Türkçe Konuşmadan Metne Dönüştürme için Ön Eğitimli Modellerin Performans Karşılaştırması: Whisper-Small ve Wav2Vec2-XLS-R-300M

Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi

https://doi.org/10.54525/tbbmd.1252487

Turkish Speech Recognition Techniques and Applications of Recurrent Units (LSTM and GRU)

GAZI UNIVERSITY JOURNAL OF SCIENCE

Burak TOMBALOĞLU

https://doi.org/10.35378/gujs.816499

Download Cover Image

Article Files

Full Text

Sakarya University Journal of Science (SAUJS)