Otomatik Konuşma Tanımaya Genel Bakış, Yaklaşımlar ve Zorluklar: Türkçe Konuşma Tanımanın Gelecekteki Yolu

Saadin Oyucu; Hayri Sever; Hüseyin Polat

doi:10.29109/gujsc.562111

Review

Year 2019, Volume: 7 Issue: 4, 834 - 854, 24.12.2019

Saadin Oyucu , Hayri Sever , Hüseyin Polat

https://doi.org/10.29109/gujsc.562111

Abstract

References

[1] R. K. Moore, PRESENCE: A human-inspired architecture for speech-based human-machine interaction. IEEE Trans. Comput., vol. 56, no. 9, pp. (2007), 1176–1188.
[2] M. Ghorbel, M. Haariz, B. Grandjean, and M. Mokhtari, Toward a generic human machine interface for assistive robots: The AMOR project. 2005 IEEE 9th Int. Conf. Rehabil. Robot., vol. 2005, pp. 168–172, 2005.
[3] A. A. M. Abushariah, T. S. Gunawan, O. O. Khalifa, and M. A. M. Abushariah, English digits speech recognition system based on Hidden Markov Models. Int. Conf. Comput. Commun. Eng., no. May, pp. 1–5, 2010.
[4] C. Kurian and K. Balakrishnan, Speech recognition of Malayalam numbers. World Congr. Nat. Biol. Inspired Comput. NABIC 2009 - Proc., pp. 1475–1479, 2009.
[5] E. C. Paraiso and J. P. A. Barthès, An intelligent speech interface for personal assistants in R&D projects. Expert Syst. Appl., vol. 31, no. 4, pp. (2006), 673–683.
[6] C. Busso et al., Analysis of emotion recognition using facial expressions, speech and multimodal information. 6th Int. Conf. Multimodal interfaces - ICMI ’04, p. 205, 2004.
[7] E. S. Myakotnykh and R. A. Thompson, Adaptive speech quality management in voice-over-IP communications’. 5th Adv. Int. Conf. Telecommun. AICT 2009, no. 978, pp. 64–71, 2009.
[8] Q. Xiao, Biometrics-technology, application, challenge, and computational intelligence solutions’. Comput. Intell. Mag., vol. 2, no. 2, pp. 5–10, 2007.
[9] H. Prakoso, R. Ferdiana, and R. Hartanto, Indonesian Automatic Speech Recognition system using CMUSphinx toolkit and limited dataset. Int. Symp. Electron. Smart Devices, ISESD 2016, pp. 283–286, 2017.
[10] Y. Miao, Kaldi+PDNN: Building DNN-based ASR systems with Kaldi and PDNN. Computing Research Repository, 6: 1–4, 2014.
[11] S. Greibach, A mixed trigrams approach for context sensitive spell checking. International Conference on Intelligent Text Processing and Computational Linguistics, Berlin, Germany, 11-16, 2010.
[12] O. Salor, B. Pellom, T. Ciloglu, K. Hacioglu, and M. Demirekler, On developing new text and audio corpora and speech recognition tools for the Turkish language. Int. Conf. Spok. Lang. Process., vol. 1, no. January 2002, pp. 349–352, 2002.
[13] O. Salor, B. Pellom, and T. Ciloglu, On developing new text and audio corpora for the Turkish language. Spoken Language Process, 1: 302–362, 2002.
[14] Ö. Salor, B. L. Pellom, T. Ciloglu, and M. Demirekler, Turkish speech corpora and recognition tools developed by porting SONIC: Towards multilingual speech recognition. Comput. Speech Lang., vol. 21, no. 4, pp. 580–593, 2007.
[15] Linguistic Data Consortium. “Turkish broadcast news speech and transcripts”, https://catalog.ldc.upenn.edu/LDC2012S06, (02.04.2019).
[16] C. Coltekin, A Freely Available Morphological Analyzer for Turkish. 7th Int. Conf. Lang. Resour. Eval., pp. 820–827, 2010.
[17] J. Jeanmonod, and K. Rebecca, “We are Intech Open, the world's leading publisher of Open Access books Built by scientists, for scientists TOP 1 % Control of a Proportional Hydraulic System”, Intech Open, 1: 64-69, 2018.
[18] P. F. Brown, V. J. Della Pietra, P. V deSouza, J. C. Lai, and R. L. Mercer, Class-Based N-Gram Models of Natural Language. Comput. Linguist., vol. 18, no. 1950, pp. 14–18, 1990.
[19] F. Rosdi, Assessing Automatic Speech Recognition in measuring speech intelligibility: A study of Malay speakers with speech impairments. 6th International Conference on Electrical Engineering and Informatics, Langkawi, Malaysia, 25-32, 2017.
[20] F. Baniardalan, and A. Akbari, A Weighted Denoising Auto-Encoder Applied to Mel Sub-Bands for Robust Speech Recognition. 3rd Iranian Conference on Intelligent Systems and Signal Processing, Shahrood, Iran, 36-39, 2017.
[21] R. Tong, L. Wang, and B. Ma, Transfer learning for children’s speech recognition. International Conference on Asian Language Processing, 36-39, 2018.
[22] A. N. Guglani, J., and Mishra, Continuous Punjabi speech recognition model based on Kaldi ASR toolkit. Int. J. Speech Technol., pp. 1–6, 2018.
[23] S. Xiao, Investigating multi-task learning for automatic speech recognition with code-switching between mandarin and English. International Conference on Asian Language Processing, 27-30, 2018.
[24] K. Lunuwilage, S. Abeysekara, L. Witharama, and S. Mendis, Web based programming tool with speech recognition for visually impaired users. 11th International Conference on Software, Knowledge, Information Management and Applications, 06-15, 2017.
[25] E. Battenberg, Exploring neural transducers for end-to-end speech recognition. IEEE Automatic Speech Recognition and Understanding Workshop, 206-2013, 2017.
[26] P. Sirikongtham, Improving Speech Recognition Using Dynamic Multi - Pipeline API. 15th International Conference on ICT and Knowledge Engineering, 255-263, 2017.
[27] Y. G. Thimmaraja and H. S. Jayanna, Creating language and acoustic models using Kaldi to build an automatic speech recognition system for Kannada language. 2017 2nd IEEE Int. Conf. Recent Trends Electron. Inf. Commun. Technol., pp. 161–165, 2017.
[28] P. S. Sasithon Sangjamraschaikun, Isarn Digit Speech Recognition using HMM. Int. Conf. Inf. Technol., vol. 2, pp. 1–5, 2017.
[29] H. Ho, V. C. Mawardi, and A. B. Dharmawan, Question Answering System with Hidden Markov Model Speech Recognition. pp. 257–262, 2017.
[30] A. Q. Syadida, D. R. Ignatius, M. Setiadi, and A. Setyono, Sphinx4 for Indonesian Continuous Speech Recognition System. pp. 264–267, 2017.
[31] J. Gali, D. Šumarac, S. T. Jovi, and B. Markovi, Prepoznavanje bimodalnog govora bazirano na metodi potpornih vektora. pp. 73–76, 2017.
[32] S. Shahnawazuddin, K. T.Deepak, G. Pradhan, and R. Sinha, Enhancing noise and pitch robustness of children’s ASR. IEEE International Conference on Acoustics, Speech and Signal Processing, 5225-5229, 2017.
[33] A. Ephrat, Looking to listen at the cocktail party: a speaker-independent audio-visual model for speech separation. ACM Transactions on Graphics, 4: 1-11, 2018.
[34] B. Yu, H. Li, and C. Fang, Speech emotion recognition based on optimized support vector machine. Journal of Software, 12: 2726–2733, 2012.
[35] T. Sivanagaraja, M.K. Ho, H. Khong, and Y. Wang, End-to-End Speech Emotion Recognition Using Multi-Scale Convolution Networks. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 299- 304, 2017.
[36] I. Lefter, and C. M. Jonker, Aggression recognition using overlapping speech. Seventh International Conference on Affective Computing and Intelligent Interaction, 299-304, 2017.
[37] E. Engineering, U. T. Mara, and P. Pauh, Automatic gender recognition using linear prediction coefficients and artificial neural network on speech signal. 7th IEEE International Conference on Control System, Computing and Engineering, 24-26, 2017.
[38] Liu Wai Kat and P. Fung, Fast accent identification and accented speech recognition. iIEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258), 1999, pp. 221–224 vol.1.
[39] Liu Wai Kat and P. Fung, Fast accent identification and accented speech recognition. 1999 IEEE Int. Conf. Acoust. Speech, Signal Process. Proceedings. ICASSP99 (Cat. No.99CH36258), pp. 221–224 vol.1, 1999.
[40] H. Sagha and J. Deng, The effect of personality trait , age , and gender on the performance of automatic speech valence recognition. International Conference on Affective Computing and Intelligent Interaction no. i, pp. 86–91, 2017.
[41] S. Lin, T. Tsunakawa, M. Nishida, and M. Nishimura, DNN-based Feature Transformation for Speech Recognition Using Throat Microphone. Asia-Pacific Signal Inf. Process. Assoc., no. December, pp. 1–4, 2017.
[42] I. Shafran, R. Rose, F. Park, Robust Speech Detection and segmentation for real-time ASR applications. Izhak Shafran & Richard Rose AT&T Labs Research, Florham Park, NJ, 432–435, 2003.
[43] T. Ochiai, S. Watanabe, and S. Katagiri, Does speech enhancement work with end-to-end ASR objectives?: Experimental analysis of multichannel end-to-end ASR. IEEE Int. Work. Mach. Learn. Signal Process. MLSP, vol. 2017-Septe, no. 26280063, pp. 1–5, 2017.
[44] S. B. Davis and P. Mermelstein, Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences. IEEE Trans. Acoust., vol. 28, no. 4, pp. 357–366, 1980.
[45] G. Hinton et al., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82–97, 2012.
[46] X. Huang and L. Deng, An overview of modern speech recognition. Handb. Nat. Lang. Process., pp. 339–367, 2010.
[47] C. Huang, E. Chang, J. Zhou, and K. Lee, Accent modeling based on pronunciation dictionary adaptation for large vocabulary Mandarin speech recognition. Interspeech, no. Icslp, pp. 818--821, 2000.
[48] S. Narang and M. Divya Gupta, International Journal of Computer Science and Mobile Computing Speech Feature Extraction Techniques: A Review. Int. J. Comput. Sci. Mob. Comput., vol. 4, no. 3, pp. 107–114, 2015.
[49] A. Winursito, R. Hidayat, and A. Bejo, Improvement of MFCC feature extraction accuracy using PCA in Indonesian speech recognition. 2018 Int. Conf. Inf. Commun. Technol. ICOIACT 2018, vol. 2018-Janua, pp. 379–383, 2018.
[50] N. Dave, Feature Extraction Methods LPC, PLP and MFCC. International Journal for Advance Research in Engineering and Technology, 6: 1–5, (2013).
[51] H. Gupta and D. Gupta, LPC and LPCC method of feature extraction in Speech Recognition System. 6th Int. Conf. - Cloud Syst. Big Data Eng. Conflu. 2016, pp. 498–502, 2016.
[52] A. V. Haridas, R. Marimuthu, and V. G. Sivakumar, A critical review and analysis on techniques of speech recognition: The road ahead. Int. J. Knowledge-Based Intell. Eng. Syst., vol. 22, no. 1, pp. 39–57, 2018.
[53] G. Hinton et al., Deep Neural Networks for Acoustic Modeling in Speech Recognition. IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82–97, 2012.
[54] L. Li et al., Hybrid Deep Neural Network - Hidden Markov Model (DNN-HMM) based speech emotion recognition. Hum. Assoc. Conf. Affect. Comput. Intell. Interact. ACII 2013, pp. 312–317, 2013.
[55] L. Saul and F. Pereira, Aggregate and mixed-order Markov models for statistical language processing. Computing Research Repository, 9706007, 1997.
[56] M. Karafi and J. H. Cernock, Recurrent neural network based language model. Annual Conference of the International Speech Communication Association, 1045-1048, 2010.
[57] N. John, J. Wendy, and N. Philip, Sing formant frequencies in speech recognition. 5th European Conference on Speech Communication and Technology, Rhodes, Greece 22-28, 1997.
[58] S. F. Chen and J. Goodman, An empirical study of smoothing techniques for language modeling. 34th Annu. Meet. Assoc. Comput. Linguist., pp. 310–318, 1996.
[59] A. Stolcke, Entropy-based pruning of backoff language models. DARPA Broadcast News Transcription and Understanding Workshop, 1-5, 2000.
[60] E. Bocchieri and D. Caseiro, Use of geographical meta-data in ASR language and acoustic models. IEEE Int. Conf. Acoust. Speech Signal Process, pp. 5118–5121, 2010.
[61] X. L. Aubert, An overview of decoding techniques for large vocabulary continuous speech recognition. Comput. Speech Lang., vol. 16, no. 1, pp. 89–114, 2002.
[62] B. Hoffmeister, G. Heigold, D. Rybach, R. Schluter, and H. Ney, WFST enabled solutions to ASR problems: Beyond HMM decoding. IEEE Trans. Audio, Speech Lang. Process., vol. 20, no. 2, pp. 551–564, 2012.
[63] S. S. Kang, Regulation of early steps of chondrogenesis in the developing limb. Animal Cells System, 12: 1–9, (2008).
[64] Google Books, “Statistical Methods for Speech Recognition - Frederick Jelinek”, https://books.google.com.tr/books?hl=tr&lr=&id=1C9dzcJTWowC&oi=fnd&pg=PR19&dq=statistical+method++for+speech+recognition&ots=mgWRxGA-cV&sig=CCch38Tpkoh2GIIFVp5dAI0zxI4&redir_esc=y#v=onepage&q=statistical (02.04.2019).
[65] J. F. Gemmeke, A. Hurmalainen, T. Virtanen, and Y. Sun, Toward a practical implementation of exemplar-based noise robust ASR. Eur. Signal Process. Conf., no. Eusipco, pp. 1490–1494, 2011.
[66] D. Macho and Y. M. Cheng, Automatic recognition of wideband telephone speech with limited amount of matched training data. 22nd European Signal Processing Conference, Lisbon, Portugal, 109-112, 2014.
[67] Y. Zheng et al., Accent detection and speech recognition for Shanghai-accented Mandarin. Interspeech, pp. 7–10, 2005.
[68] B. Das, S. Mandal, and P. Mitra, Bengali speech corpus for continuous automatic speech recognition system. Int. Conf. Speech Database Assessments, Orient, pp. 51–55, 2011.
[69] T. Herbig, F. Gerl, and W. Minker, Self-learning speaker identification for enhanced speech recognition. Comput. Speech Lang., vol. 26, no. 3, pp. 210–227, 2012.
[70] A. Ogawa and T. Hori, Error detection and accuracy estimation in automatic speech recognition using deep bidirectional recurrent neural networks. Speech Commun., vol. 89, pp. 70–83, 2017.
[71] D. Yu and L. Deng, Automatic Speech Recognition. Signals and Communication Technology, 1-9, 2015.
[72] B. Tombaloǧlu, and H. Erdem, Development of a MFCC-SVM based Turkish speech recognition system, Signal Processing and Communication Application Conference, 1-4, 2016.

Otomatik Konuşma Tanımaya Genel Bakış, Yaklaşımlar ve Zorluklar: Türkçe Konuşma Tanımanın Gelecekteki Yolu

Year 2019, Volume: 7 Issue: 4, 834 - 854, 24.12.2019

Saadin Oyucu , Hayri Sever , Hüseyin Polat

https://doi.org/10.29109/gujsc.562111

Abstract

İnsanlar
arasındaki en önemli iletişim yöntemi olan konuşmanın, bilgisayarlar tarafından
tanınması önemli bir çalışma alanıdır. Bu araştırma alanında farklı diller
temel alınarak birçok çalışma gerçekleştirilmiştir. Literatürdeki çalışmalar
konuşma tanıma teknolojilerinin başarımının artmasında önemli rol oynamıştır. Bu
çalışmada konuşma tanıma ile ilgili bir literatür taraması sunulmuş ve farklı
dillerde bu araştırma alanında kaydedilen ilerlemeler tartışılmıştır. Konuşma
tanıma sistemlerinde kullanılan veri setleri, özellik çıkarma yaklaşımları,
konuşma tanıma yöntemleri ve performans değerlendirme ölçütleri incelenerek
konuşma tanımanın gelişimi ve bu alandaki zorluklara odaklanılmıştır. Konuşma
tanıma alanında son zamanlarda yapılan çalışmaların olumsuz koşullara (çevre
gürültüsü, konuşmacıda ve dilde değişkenlik) karşı çok daha güçlü yöntemler
geliştirmeye odaklandığı izlenmiştir. Bu nedenle araştırma alanı olarak
genişleyen olumsuz koşullardaki konuşma tanıma ile ilgili yakın geçmişteki
gelişmelere yönelik genel bir bakış açısı sunulmuştur. Böylelikle olumsuz
koşullar altında gerçekleştirilen konuşma tanımadaki tıkanıklık ve zorlukları
aşabilmek için kullanılabilecek yöntemleri seçmede yardımcı olunması
amaçlanmıştır. Ayrıca Türkçe konuşma
tanımada kullanılan ve iyi bilinen yöntemler karşılaştırılmıştır. Türkçe konuşma
tanımanın zorluğu ve bu zorlukların üstesinden gelebilmek için kullanılabilecek
uygun yöntemler irdelenmiştir. Buna bağlı olarak da Türkçe konuşma tanımanın
gelecekteki rotasına ilişkin bir değerlendirme ortaya konulmuştur.

Keywords

Konuşma tanıma, Özellik çıkarımı, Yapay zekâ, Türkçe konuşma tanıma

References

[1] R. K. Moore, PRESENCE: A human-inspired architecture for speech-based human-machine interaction. IEEE Trans. Comput., vol. 56, no. 9, pp. (2007), 1176–1188.
[2] M. Ghorbel, M. Haariz, B. Grandjean, and M. Mokhtari, Toward a generic human machine interface for assistive robots: The AMOR project. 2005 IEEE 9th Int. Conf. Rehabil. Robot., vol. 2005, pp. 168–172, 2005.
[3] A. A. M. Abushariah, T. S. Gunawan, O. O. Khalifa, and M. A. M. Abushariah, English digits speech recognition system based on Hidden Markov Models. Int. Conf. Comput. Commun. Eng., no. May, pp. 1–5, 2010.
[4] C. Kurian and K. Balakrishnan, Speech recognition of Malayalam numbers. World Congr. Nat. Biol. Inspired Comput. NABIC 2009 - Proc., pp. 1475–1479, 2009.
[5] E. C. Paraiso and J. P. A. Barthès, An intelligent speech interface for personal assistants in R&D projects. Expert Syst. Appl., vol. 31, no. 4, pp. (2006), 673–683.
[6] C. Busso et al., Analysis of emotion recognition using facial expressions, speech and multimodal information. 6th Int. Conf. Multimodal interfaces - ICMI ’04, p. 205, 2004.
[7] E. S. Myakotnykh and R. A. Thompson, Adaptive speech quality management in voice-over-IP communications’. 5th Adv. Int. Conf. Telecommun. AICT 2009, no. 978, pp. 64–71, 2009.
[8] Q. Xiao, Biometrics-technology, application, challenge, and computational intelligence solutions’. Comput. Intell. Mag., vol. 2, no. 2, pp. 5–10, 2007.
[9] H. Prakoso, R. Ferdiana, and R. Hartanto, Indonesian Automatic Speech Recognition system using CMUSphinx toolkit and limited dataset. Int. Symp. Electron. Smart Devices, ISESD 2016, pp. 283–286, 2017.
[10] Y. Miao, Kaldi+PDNN: Building DNN-based ASR systems with Kaldi and PDNN. Computing Research Repository, 6: 1–4, 2014.
[11] S. Greibach, A mixed trigrams approach for context sensitive spell checking. International Conference on Intelligent Text Processing and Computational Linguistics, Berlin, Germany, 11-16, 2010.
[12] O. Salor, B. Pellom, T. Ciloglu, K. Hacioglu, and M. Demirekler, On developing new text and audio corpora and speech recognition tools for the Turkish language. Int. Conf. Spok. Lang. Process., vol. 1, no. January 2002, pp. 349–352, 2002.
[13] O. Salor, B. Pellom, and T. Ciloglu, On developing new text and audio corpora for the Turkish language. Spoken Language Process, 1: 302–362, 2002.
[14] Ö. Salor, B. L. Pellom, T. Ciloglu, and M. Demirekler, Turkish speech corpora and recognition tools developed by porting SONIC: Towards multilingual speech recognition. Comput. Speech Lang., vol. 21, no. 4, pp. 580–593, 2007.
[15] Linguistic Data Consortium. “Turkish broadcast news speech and transcripts”, https://catalog.ldc.upenn.edu/LDC2012S06, (02.04.2019).
[16] C. Coltekin, A Freely Available Morphological Analyzer for Turkish. 7th Int. Conf. Lang. Resour. Eval., pp. 820–827, 2010.
[17] J. Jeanmonod, and K. Rebecca, “We are Intech Open, the world's leading publisher of Open Access books Built by scientists, for scientists TOP 1 % Control of a Proportional Hydraulic System”, Intech Open, 1: 64-69, 2018.
[18] P. F. Brown, V. J. Della Pietra, P. V deSouza, J. C. Lai, and R. L. Mercer, Class-Based N-Gram Models of Natural Language. Comput. Linguist., vol. 18, no. 1950, pp. 14–18, 1990.
[19] F. Rosdi, Assessing Automatic Speech Recognition in measuring speech intelligibility: A study of Malay speakers with speech impairments. 6th International Conference on Electrical Engineering and Informatics, Langkawi, Malaysia, 25-32, 2017.
[20] F. Baniardalan, and A. Akbari, A Weighted Denoising Auto-Encoder Applied to Mel Sub-Bands for Robust Speech Recognition. 3rd Iranian Conference on Intelligent Systems and Signal Processing, Shahrood, Iran, 36-39, 2017.
[21] R. Tong, L. Wang, and B. Ma, Transfer learning for children’s speech recognition. International Conference on Asian Language Processing, 36-39, 2018.
[22] A. N. Guglani, J., and Mishra, Continuous Punjabi speech recognition model based on Kaldi ASR toolkit. Int. J. Speech Technol., pp. 1–6, 2018.
[23] S. Xiao, Investigating multi-task learning for automatic speech recognition with code-switching between mandarin and English. International Conference on Asian Language Processing, 27-30, 2018.
[24] K. Lunuwilage, S. Abeysekara, L. Witharama, and S. Mendis, Web based programming tool with speech recognition for visually impaired users. 11th International Conference on Software, Knowledge, Information Management and Applications, 06-15, 2017.
[25] E. Battenberg, Exploring neural transducers for end-to-end speech recognition. IEEE Automatic Speech Recognition and Understanding Workshop, 206-2013, 2017.
[26] P. Sirikongtham, Improving Speech Recognition Using Dynamic Multi - Pipeline API. 15th International Conference on ICT and Knowledge Engineering, 255-263, 2017.
[27] Y. G. Thimmaraja and H. S. Jayanna, Creating language and acoustic models using Kaldi to build an automatic speech recognition system for Kannada language. 2017 2nd IEEE Int. Conf. Recent Trends Electron. Inf. Commun. Technol., pp. 161–165, 2017.
[28] P. S. Sasithon Sangjamraschaikun, Isarn Digit Speech Recognition using HMM. Int. Conf. Inf. Technol., vol. 2, pp. 1–5, 2017.
[29] H. Ho, V. C. Mawardi, and A. B. Dharmawan, Question Answering System with Hidden Markov Model Speech Recognition. pp. 257–262, 2017.
[30] A. Q. Syadida, D. R. Ignatius, M. Setiadi, and A. Setyono, Sphinx4 for Indonesian Continuous Speech Recognition System. pp. 264–267, 2017.
[31] J. Gali, D. Šumarac, S. T. Jovi, and B. Markovi, Prepoznavanje bimodalnog govora bazirano na metodi potpornih vektora. pp. 73–76, 2017.
[32] S. Shahnawazuddin, K. T.Deepak, G. Pradhan, and R. Sinha, Enhancing noise and pitch robustness of children’s ASR. IEEE International Conference on Acoustics, Speech and Signal Processing, 5225-5229, 2017.
[33] A. Ephrat, Looking to listen at the cocktail party: a speaker-independent audio-visual model for speech separation. ACM Transactions on Graphics, 4: 1-11, 2018.
[34] B. Yu, H. Li, and C. Fang, Speech emotion recognition based on optimized support vector machine. Journal of Software, 12: 2726–2733, 2012.
[35] T. Sivanagaraja, M.K. Ho, H. Khong, and Y. Wang, End-to-End Speech Emotion Recognition Using Multi-Scale Convolution Networks. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 299- 304, 2017.
[36] I. Lefter, and C. M. Jonker, Aggression recognition using overlapping speech. Seventh International Conference on Affective Computing and Intelligent Interaction, 299-304, 2017.
[37] E. Engineering, U. T. Mara, and P. Pauh, Automatic gender recognition using linear prediction coefficients and artificial neural network on speech signal. 7th IEEE International Conference on Control System, Computing and Engineering, 24-26, 2017.
[38] Liu Wai Kat and P. Fung, Fast accent identification and accented speech recognition. iIEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258), 1999, pp. 221–224 vol.1.
[39] Liu Wai Kat and P. Fung, Fast accent identification and accented speech recognition. 1999 IEEE Int. Conf. Acoust. Speech, Signal Process. Proceedings. ICASSP99 (Cat. No.99CH36258), pp. 221–224 vol.1, 1999.
[40] H. Sagha and J. Deng, The effect of personality trait , age , and gender on the performance of automatic speech valence recognition. International Conference on Affective Computing and Intelligent Interaction no. i, pp. 86–91, 2017.
[41] S. Lin, T. Tsunakawa, M. Nishida, and M. Nishimura, DNN-based Feature Transformation for Speech Recognition Using Throat Microphone. Asia-Pacific Signal Inf. Process. Assoc., no. December, pp. 1–4, 2017.
[42] I. Shafran, R. Rose, F. Park, Robust Speech Detection and segmentation for real-time ASR applications. Izhak Shafran & Richard Rose AT&T Labs Research, Florham Park, NJ, 432–435, 2003.
[43] T. Ochiai, S. Watanabe, and S. Katagiri, Does speech enhancement work with end-to-end ASR objectives?: Experimental analysis of multichannel end-to-end ASR. IEEE Int. Work. Mach. Learn. Signal Process. MLSP, vol. 2017-Septe, no. 26280063, pp. 1–5, 2017.
[44] S. B. Davis and P. Mermelstein, Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences. IEEE Trans. Acoust., vol. 28, no. 4, pp. 357–366, 1980.
[45] G. Hinton et al., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82–97, 2012.
[46] X. Huang and L. Deng, An overview of modern speech recognition. Handb. Nat. Lang. Process., pp. 339–367, 2010.
[47] C. Huang, E. Chang, J. Zhou, and K. Lee, Accent modeling based on pronunciation dictionary adaptation for large vocabulary Mandarin speech recognition. Interspeech, no. Icslp, pp. 818--821, 2000.
[48] S. Narang and M. Divya Gupta, International Journal of Computer Science and Mobile Computing Speech Feature Extraction Techniques: A Review. Int. J. Comput. Sci. Mob. Comput., vol. 4, no. 3, pp. 107–114, 2015.
[49] A. Winursito, R. Hidayat, and A. Bejo, Improvement of MFCC feature extraction accuracy using PCA in Indonesian speech recognition. 2018 Int. Conf. Inf. Commun. Technol. ICOIACT 2018, vol. 2018-Janua, pp. 379–383, 2018.
[50] N. Dave, Feature Extraction Methods LPC, PLP and MFCC. International Journal for Advance Research in Engineering and Technology, 6: 1–5, (2013).
[51] H. Gupta and D. Gupta, LPC and LPCC method of feature extraction in Speech Recognition System. 6th Int. Conf. - Cloud Syst. Big Data Eng. Conflu. 2016, pp. 498–502, 2016.
[52] A. V. Haridas, R. Marimuthu, and V. G. Sivakumar, A critical review and analysis on techniques of speech recognition: The road ahead. Int. J. Knowledge-Based Intell. Eng. Syst., vol. 22, no. 1, pp. 39–57, 2018.
[53] G. Hinton et al., Deep Neural Networks for Acoustic Modeling in Speech Recognition. IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82–97, 2012.
[54] L. Li et al., Hybrid Deep Neural Network - Hidden Markov Model (DNN-HMM) based speech emotion recognition. Hum. Assoc. Conf. Affect. Comput. Intell. Interact. ACII 2013, pp. 312–317, 2013.
[55] L. Saul and F. Pereira, Aggregate and mixed-order Markov models for statistical language processing. Computing Research Repository, 9706007, 1997.
[56] M. Karafi and J. H. Cernock, Recurrent neural network based language model. Annual Conference of the International Speech Communication Association, 1045-1048, 2010.
[57] N. John, J. Wendy, and N. Philip, Sing formant frequencies in speech recognition. 5th European Conference on Speech Communication and Technology, Rhodes, Greece 22-28, 1997.
[58] S. F. Chen and J. Goodman, An empirical study of smoothing techniques for language modeling. 34th Annu. Meet. Assoc. Comput. Linguist., pp. 310–318, 1996.
[59] A. Stolcke, Entropy-based pruning of backoff language models. DARPA Broadcast News Transcription and Understanding Workshop, 1-5, 2000.
[60] E. Bocchieri and D. Caseiro, Use of geographical meta-data in ASR language and acoustic models. IEEE Int. Conf. Acoust. Speech Signal Process, pp. 5118–5121, 2010.
[61] X. L. Aubert, An overview of decoding techniques for large vocabulary continuous speech recognition. Comput. Speech Lang., vol. 16, no. 1, pp. 89–114, 2002.
[62] B. Hoffmeister, G. Heigold, D. Rybach, R. Schluter, and H. Ney, WFST enabled solutions to ASR problems: Beyond HMM decoding. IEEE Trans. Audio, Speech Lang. Process., vol. 20, no. 2, pp. 551–564, 2012.
[63] S. S. Kang, Regulation of early steps of chondrogenesis in the developing limb. Animal Cells System, 12: 1–9, (2008).
[64] Google Books, “Statistical Methods for Speech Recognition - Frederick Jelinek”, https://books.google.com.tr/books?hl=tr&lr=&id=1C9dzcJTWowC&oi=fnd&pg=PR19&dq=statistical+method++for+speech+recognition&ots=mgWRxGA-cV&sig=CCch38Tpkoh2GIIFVp5dAI0zxI4&redir_esc=y#v=onepage&q=statistical (02.04.2019).
[65] J. F. Gemmeke, A. Hurmalainen, T. Virtanen, and Y. Sun, Toward a practical implementation of exemplar-based noise robust ASR. Eur. Signal Process. Conf., no. Eusipco, pp. 1490–1494, 2011.
[66] D. Macho and Y. M. Cheng, Automatic recognition of wideband telephone speech with limited amount of matched training data. 22nd European Signal Processing Conference, Lisbon, Portugal, 109-112, 2014.
[67] Y. Zheng et al., Accent detection and speech recognition for Shanghai-accented Mandarin. Interspeech, pp. 7–10, 2005.
[68] B. Das, S. Mandal, and P. Mitra, Bengali speech corpus for continuous automatic speech recognition system. Int. Conf. Speech Database Assessments, Orient, pp. 51–55, 2011.
[69] T. Herbig, F. Gerl, and W. Minker, Self-learning speaker identification for enhanced speech recognition. Comput. Speech Lang., vol. 26, no. 3, pp. 210–227, 2012.
[70] A. Ogawa and T. Hori, Error detection and accuracy estimation in automatic speech recognition using deep bidirectional recurrent neural networks. Speech Commun., vol. 89, pp. 70–83, 2017.
[71] D. Yu and L. Deng, Automatic Speech Recognition. Signals and Communication Technology, 1-9, 2015.
[72] B. Tombaloǧlu, and H. Erdem, Development of a MFCC-SVM based Turkish speech recognition system, Signal Processing and Communication Application Conference, 1-4, 2016.

There are 72 citations in total.

Details

Primary Language	Turkish
Subjects	Engineering
Journal Section	Tasarım ve Teknoloji
Authors	Saadin Oyucu 0000-0003-3880-3039 Hayri Sever 0000-0002-8261-0675 Hüseyin Polat 0000-0003-4128-2625
Publication Date	December 24, 2019
Submission Date	May 8, 2019
Published in Issue	Year 2019 Volume: 7 Issue: 4

Cite

APA	Oyucu, S., Sever, H., & Polat, H. (2019). Otomatik Konuşma Tanımaya Genel Bakış, Yaklaşımlar ve Zorluklar: Türkçe Konuşma Tanımanın Gelecekteki Yolu. Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım Ve Teknoloji, 7(4), 834-854. https://doi.org/10.29109/gujsc.562111

Article Files

Full Text

TRINDEX

e-ISSN:2147-9526