Classification Vowel-Consonant Letters with Deep Neural Networks in Turkish and Text-Voice Synchronization on a Basis Syllable Size

Mursel Onder; Halil İbrahim Bayat

doi:10.21597/jist.957879

Research Article

Classification Vowel-Consonant Letters with Deep Neural Networks in Turkish and Text-Voice Synchronization on a Basis Syllable Size

Year 2022, , 41 - 57, 01.03.2022

Mursel Onder , Halil İbrahim Bayat

https://doi.org/10.21597/jist.957879

Abstract

In the study, a syllable-scale synchronization study was carried out by considering the grammatical structure of Turkish to emphasize simultaneously the sound and the text. Therefore, it was aimed to classify the vowels and consonants in Turkish within the word. For this purpose, two different Artificial Neural Network (ANN) models were preferred for this classification, and also the Mel-Frequency Cepstrum Coefficients method was preferred for extracting features of voice data. It has been observed that ANNs give the best results with deep learning. Tests were made with different numbers of coefficients in feature extraction. In the first stage of this study, a certain number of recordings were taken from the vowels and consonants in Turkish. Then, their feature was extracted and prepared for the training of networks. The best network structure and parameters were selected as a result of training and test made with different parameters. In this training, networks were asked to distinguish vowels from consonants. Afterward, the vowel-consonant distinction was made among 10 predetermined vectors of words and phrases. Layer-recurrent Neural Network and Pattern Recognition Network achieved an average success of 97.43% and 98.04%, respectively, in deep learning training carried out through the Mathworks Matlab software. Because Pattern Recognition Network achieved 98.82% success in recognizing vowels and 97.27% in recognizing consonants, this network model was preferred in vowel-consonant classification. After the classification process, timing files were created by determining the transition times of the vowels in the word. In the last step, an interface was created on the C# .NET platform for the synchronization process, and a syllabic algorithm was developed in this interface to emphasize the syllable synchronization of the text. Thus, the desired high precision was achieved in the simultaneous highlighting of the words.

Keywords

Artificial Neural Networks, Deep Learning, Mel-Frequency Cepstrum Coefficients, Sound-Text Synchronization

References

Bayat, H.İ., 2020. Identification of vowel-non vowel letter with artificial neural network and sound-text synchronization at syllable level (Master thesis), Gaziosmanpaşa University, Institute of science and technology, Tokat, Turkey.
Bengio, Y., 2012. Deep learning of representations for unsupervised and transfer learning. In Proceedings of ICML workshop on unsupervised and transfer learning, pp. 17-36.
Cakir, E., 2014. Multilabel sound event classification with neural networks. (Master thesis), Tampere University of Technology, Faculty of Computing and Electrical Engineering, Finland.
Çakır, M.Y., 2017. Real-time high-quality voice recognition. (Master thesis), İstanbul Sabahattin Zaim University, Institute of science and technology, İstanbul, Turkey
Cosi, P., Bengua, Y. and De Maria, R., 1990. Phonetically-based multi-layered neural networks for vowel classification. Speech Communication, 1(9), pp. 15-19.
Dave, N., 2013. Feature extraction methods LPC, PLP and MFCC in speech recognition. Internatıonal journal for advance research ın engıneerıng and technology, 4(1), 5 pp.
Dede, G., 2008. Speech recognition with artificial neural networks (Master thesis), Ankara University, Institute of science and technology, Ankara, Turkey.
Elman, L. J., 1990. Finding structure in time. Cognitive Science, 2(14), pp. 179-211.
Güloğlu, T., 2014. Speech recognition for Turkish phonology using wavelet techniques.(Master thesis), Dokuz Eylül University, Graduate School of Natural and Applied Sciences, İzmir.
Gupta, M., Jin, L. and Homma, N., 2004. Static and dynamic neural networks: from fundamentals to advanced theory. John Wiley & Sons.
Haykin, S., 1999. Neural Networks: A Comprehensive Foundation, 2nd Edition, Prentice-Hall, pp. 823, Ontario, Canada.
Hinton, G., Osindero, S. ve Teh, Y. W., 2006. A fast-learning algorithm for deep belief nets.Neural computation, 18(7), pp. 1527-1554.
Kılıç, E., 2015. The effects of Turkish vowel harmony in word recognition. (Master thesis), DePaul University, The Department of Psychology Collage of Science and Health, Chicago, Illinois, USA.
Kohonen, T., 1987. State of the art in neural computing. In Proceedings, IEEE First International Conference on Neural Networks, pp. 179-190, San Diego, USA
McCulloch, W. S. and Pitts, W. 1943. A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5(4), pp. 115-133.
Meng, Y., Lee, T., Ching, P.C. and Zhu, Y. 2004. Speech recognition on DSP: issue on computational efficiency and performance analysis. Microprocessors and Microsystems, 30(3), pp. 155-164.
Önder, M., In Printing. Elmas-Hece Engineering in Quran Education
Parlaktuna, O., Cakici, T., Tora H. and Barkana, A., 1994. Vowel and consonant recognition in Turkish using neural networks toward continuous speech recognition. Mediterranean Electrotechnical Conference, Antalya, Turkey.
Sirigos, J., V. Darsinos, N. Fakotakis and G. Kokkinakis, 1996. Vowel-non vowel decision using neural networks and rules. Proceedings of Third International Conference on Electronics, Circuits, and Systems, Rodos, Greece.
Tiwari, V., 2010. MFCC and its application in speaker recognition. International Journal on Emerging Technologies, 1(1), pp. 19-22.
Üstün, S.V., 1997. Recognition of vowels in Turkish using artificial neural networks. (Master thesis), Yıldız Technical University, Institute of science and technology, İstanbul, Turkey
Vafeiadis, A., Kalatzis, D., Votis, K., Giakoumis, D., Tzovaras, D., Chen, L. And Hamzaoui, R., 2017, November. Acoustic scene classification: From a hybrid classifier to deep learning.
Wang, J.C., Wang, J.F., He, K.W. and Hsu, C.S., 2006, July. Environmental sound classification using hybrid SVM/KNN classifier and MPEG-7 audio low-level descriptor. In The 2006 IEEE International Joint Conference on Neural Network Proceedings, pp. 1731-1735
Yalçın, N., 2006. Developing a software for teaching initial reading writing to class student of primary education using speech recognition technology. (Doctoral thesis), Institute of science and technology, Ankara, Turkey.
Yavuz, E. and Topuz, V., 2010. Recognation of Turkish vowels by probabilistic neural network using Yule-Walker AR method. International Conference on Hybrid Artificial Intelligence Systems, Berlin, Heidelberg, Germany.

Year 2022, , 41 - 57, 01.03.2022

Mursel Onder , Halil İbrahim Bayat

https://doi.org/10.21597/jist.957879

Abstract

References

Bayat, H.İ., 2020. Identification of vowel-non vowel letter with artificial neural network and sound-text synchronization at syllable level (Master thesis), Gaziosmanpaşa University, Institute of science and technology, Tokat, Turkey.
Bengio, Y., 2012. Deep learning of representations for unsupervised and transfer learning. In Proceedings of ICML workshop on unsupervised and transfer learning, pp. 17-36.
Cakir, E., 2014. Multilabel sound event classification with neural networks. (Master thesis), Tampere University of Technology, Faculty of Computing and Electrical Engineering, Finland.
Çakır, M.Y., 2017. Real-time high-quality voice recognition. (Master thesis), İstanbul Sabahattin Zaim University, Institute of science and technology, İstanbul, Turkey
Cosi, P., Bengua, Y. and De Maria, R., 1990. Phonetically-based multi-layered neural networks for vowel classification. Speech Communication, 1(9), pp. 15-19.
Dave, N., 2013. Feature extraction methods LPC, PLP and MFCC in speech recognition. Internatıonal journal for advance research ın engıneerıng and technology, 4(1), 5 pp.
Dede, G., 2008. Speech recognition with artificial neural networks (Master thesis), Ankara University, Institute of science and technology, Ankara, Turkey.
Elman, L. J., 1990. Finding structure in time. Cognitive Science, 2(14), pp. 179-211.
Güloğlu, T., 2014. Speech recognition for Turkish phonology using wavelet techniques.(Master thesis), Dokuz Eylül University, Graduate School of Natural and Applied Sciences, İzmir.
Gupta, M., Jin, L. and Homma, N., 2004. Static and dynamic neural networks: from fundamentals to advanced theory. John Wiley & Sons.
Haykin, S., 1999. Neural Networks: A Comprehensive Foundation, 2nd Edition, Prentice-Hall, pp. 823, Ontario, Canada.
Hinton, G., Osindero, S. ve Teh, Y. W., 2006. A fast-learning algorithm for deep belief nets.Neural computation, 18(7), pp. 1527-1554.
Kılıç, E., 2015. The effects of Turkish vowel harmony in word recognition. (Master thesis), DePaul University, The Department of Psychology Collage of Science and Health, Chicago, Illinois, USA.
Kohonen, T., 1987. State of the art in neural computing. In Proceedings, IEEE First International Conference on Neural Networks, pp. 179-190, San Diego, USA
McCulloch, W. S. and Pitts, W. 1943. A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5(4), pp. 115-133.
Meng, Y., Lee, T., Ching, P.C. and Zhu, Y. 2004. Speech recognition on DSP: issue on computational efficiency and performance analysis. Microprocessors and Microsystems, 30(3), pp. 155-164.
Önder, M., In Printing. Elmas-Hece Engineering in Quran Education
Parlaktuna, O., Cakici, T., Tora H. and Barkana, A., 1994. Vowel and consonant recognition in Turkish using neural networks toward continuous speech recognition. Mediterranean Electrotechnical Conference, Antalya, Turkey.
Sirigos, J., V. Darsinos, N. Fakotakis and G. Kokkinakis, 1996. Vowel-non vowel decision using neural networks and rules. Proceedings of Third International Conference on Electronics, Circuits, and Systems, Rodos, Greece.
Tiwari, V., 2010. MFCC and its application in speaker recognition. International Journal on Emerging Technologies, 1(1), pp. 19-22.
Üstün, S.V., 1997. Recognition of vowels in Turkish using artificial neural networks. (Master thesis), Yıldız Technical University, Institute of science and technology, İstanbul, Turkey
Vafeiadis, A., Kalatzis, D., Votis, K., Giakoumis, D., Tzovaras, D., Chen, L. And Hamzaoui, R., 2017, November. Acoustic scene classification: From a hybrid classifier to deep learning.
Wang, J.C., Wang, J.F., He, K.W. and Hsu, C.S., 2006, July. Environmental sound classification using hybrid SVM/KNN classifier and MPEG-7 audio low-level descriptor. In The 2006 IEEE International Joint Conference on Neural Network Proceedings, pp. 1731-1735
Yalçın, N., 2006. Developing a software for teaching initial reading writing to class student of primary education using speech recognition technology. (Doctoral thesis), Institute of science and technology, Ankara, Turkey.
Yavuz, E. and Topuz, V., 2010. Recognation of Turkish vowels by probabilistic neural network using Yule-Walker AR method. International Conference on Hybrid Artificial Intelligence Systems, Berlin, Heidelberg, Germany.

There are 25 citations in total.

Details

Primary Language	English
Subjects	Computer Software, Engineering
Journal Section	Bilgisayar Mühendisliği / Computer Engineering
Authors	Mursel Onder 0000-0003-4475-3955 Halil İbrahim Bayat 0000-0002-3014-7113
Publication Date	March 1, 2022
Submission Date	September 17, 2021
Acceptance Date	November 18, 2021
Published in Issue	Year 2022

Cite

APA	Onder, M., & Bayat, H. İ. (2022). Classification Vowel-Consonant Letters with Deep Neural Networks in Turkish and Text-Voice Synchronization on a Basis Syllable Size. Journal of the Institute of Science and Technology, 12(1), 41-57. https://doi.org/10.21597/jist.957879
AMA	Onder M, Bayat Hİ. Classification Vowel-Consonant Letters with Deep Neural Networks in Turkish and Text-Voice Synchronization on a Basis Syllable Size. Iğdır Üniv. Fen Bil Enst. Der. March 2022;12(1):41-57. doi:10.21597/jist.957879
Chicago	Onder, Mursel, and Halil İbrahim Bayat. “Classification Vowel-Consonant Letters With Deep Neural Networks in Turkish and Text-Voice Synchronization on a Basis Syllable Size”. Journal of the Institute of Science and Technology 12, no. 1 (March 2022): 41-57. https://doi.org/10.21597/jist.957879.
EndNote	Onder M, Bayat Hİ (March 1, 2022) Classification Vowel-Consonant Letters with Deep Neural Networks in Turkish and Text-Voice Synchronization on a Basis Syllable Size. Journal of the Institute of Science and Technology 12 1 41–57.
IEEE	M. Onder and H. İ. Bayat, “Classification Vowel-Consonant Letters with Deep Neural Networks in Turkish and Text-Voice Synchronization on a Basis Syllable Size”, Iğdır Üniv. Fen Bil Enst. Der., vol. 12, no. 1, pp. 41–57, 2022, doi: 10.21597/jist.957879.
ISNAD	Onder, Mursel - Bayat, Halil İbrahim. “Classification Vowel-Consonant Letters With Deep Neural Networks in Turkish and Text-Voice Synchronization on a Basis Syllable Size”. Journal of the Institute of Science and Technology 12/1 (March 2022), 41-57. https://doi.org/10.21597/jist.957879.
JAMA	Onder M, Bayat Hİ. Classification Vowel-Consonant Letters with Deep Neural Networks in Turkish and Text-Voice Synchronization on a Basis Syllable Size. Iğdır Üniv. Fen Bil Enst. Der. 2022;12:41–57.
MLA	Onder, Mursel and Halil İbrahim Bayat. “Classification Vowel-Consonant Letters With Deep Neural Networks in Turkish and Text-Voice Synchronization on a Basis Syllable Size”. Journal of the Institute of Science and Technology, vol. 12, no. 1, 2022, pp. 41-57, doi:10.21597/jist.957879.
Vancouver	Onder M, Bayat Hİ. Classification Vowel-Consonant Letters with Deep Neural Networks in Turkish and Text-Voice Synchronization on a Basis Syllable Size. Iğdır Üniv. Fen Bil Enst. Der. 2022;12(1):41-57.

Article Files

Full Text