Application of Deep Learning for Voice Command Classification in Turkish Language

Yusuf Çelik

doi:10.17798/bitlisfen.1477191

EN

Application of Deep Learning for Voice Command Classification in Turkish Language

Abstract

In this study, a deep learning model was developed for the recognition and classification of voice commands using the Turkish Speech Command Dataset. The division of training, validation, and test sets was carried out on an individual basis. This approach aims to prevent the model from memorizing and to enhance its generalization capability. The model was trained using Mel-Frequency Cepstral Coefficients (MFCC) features extracted from voice files, and its classification performance was evaluated in detail. The findings indicate that the model successfully classifies voice commands with a high accuracy rate, achieving an overall accuracy of 92.3% on the test set, highlighting the potential of deep learning approaches in voice recognition technologies

Keywords

References

[1] R. M. Hanifa, K. Isa, and S. Mohamad, "A review on speaker recognition: Technology and challenges," Computers & Electrical Engineering, vol. 90, p. 107005, 2021.
[2] F. Afandi and R. Sarno, "Android application for advanced security system based on voice recognition, biometric authentication, and internet of things," in 2020 International Conference on Smart Technology and Applications (ICoSTA), Feb. 2020, pp. 1-6.
[3] A. Esteva, K. Chou, S. Yeung, N. Naik, A. Madani, A. Mottaghi, and R. Socher, "Deep learning-enabled medical computer vision," NPJ Digital Medicine, vol. 4, no. 1, p. 5, 2021.
[4] C. Li, X. Li, M. Chen, and X. Sun, "Deep learning and image recognition," in 2023 IEEE 6th International Conference on Electronic Information and Communication Technology (ICEICT), July 2023, pp. 557-562.
[5] Y. Celik, M. Talo, O. Yildirim, M. Karabatak, and U. R. Acharya, "Automated invasive ductal carcinoma detection based using deep transfer learning with whole-slide images," Pattern Recognition Letters vol. 133, pp. 232-239, 2020.
[6] K. S. Tai, R. Socher, and C. D. Manning, "Improved semantic representations from tree-structured long short-term memory networks," arXiv preprint arXiv:1503.00075, 2015.
[7] M. Soori, B. Arezoo, and R. Dastres, "Artificial intelligence, machine learning and deep learning in advanced robotics, A review," Cognitive Robotics, 2023.
[8] Z. Bai and X. L. Zhang, "Speaker recognition based on deep learning: An overview," Neural Networks, vol. 140, pp. 65-99, 2021.

[9] S. Shon, S. Mun, W. Kim, and H. Ko, "Autoencoder based domain adaptation for speaker recognition under insufficient channel information," *arXiv preprint arXiv:1708.01227*, 2017.
[10] P. Dhakal, P. Damacharla, A. Y. Javaid, and V. Devabhaktuni, "A near real-time automatic speaker recognition architecture for voice-based user interface," Machine Learning and Knowledge Extraction, vol. 1, no. 1, pp. 504-520, 2019.
[11] M. D. Shakil, M. A. Rahman, M. M. Soliman, and M. A. Islam, "Automatic Isolated Speech Recognition System Using MFCC Analysis and Artificial Neural Network Classifier: Feasible For Diversity of Speech Applications," in 2020 IEEE Student Conference on Research and Development (SCOReD), Sept. 2020, pp. 300-305.
[12] H. Dolka, A. X. VM, and S. Juliet, "Speech emotion recognition using ANN on MFCC features," in 2021 3rd International Conference on Signal Processing and Communication (ICPSC), May 2021, pp. 431-435.
[13] S. Davis and P. Mermelstein, "Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, no. 4, pp. 357-366, 1980.
[14] M. Deng, T. Meng, J. Cao, S. Wang, J. Zhang, and H. Fan, "Heart sound classification based on improved MFCC features and convolutional recurrent neural networks," Neural Networks, vol. 130, pp. 22-32, 2020.
[15] E. Rejaibi, A. Komaty, F. Meriaudeau, S. Agrebi, and A. Othmani, "MFCC-based recurrent neural network for automatic clinical depression recognition and assessment from speech," Biomedical Signal Processing and Control, vol. 71, p. 103107, 2022.
[16] M. Kurtkaya, "Turkish Speech Command Dataset [Data set]," Kaggle, 2020. [Online]. Available: https://www.kaggle.com/murat-kurtkaya/turkish-speech-command-dataset
[17] Z. K. Abdul and A. K. Al-Talabani, "Mel Frequency Cepstral Coefficient and its applications: A Review," IEEE Access, 2022.
[18] T. Maka, "Change point determination in audio data using auditory features," International Journal of Electronics and Telecommunications, vol. 61, no. 2, pp. 185-190, 2015.
[19] M. Tripathi, "Analysis of convolutional neural network based image classification techniques," Journal of Innovative Image Processing (JIIP), vol. 3, no. 02, pp. 100-117, 2021.
[20] Anjana, J. S., and Poorna, S. S., "Language identification from speech features using SVM and LDA," in 2018 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), Mar. 2018, pp. 1-4.
[21] C. Ozdemir and Y. Dogan, "Advancing brain tumor classification through MTAP model: an innovative approach in medical diagnostics," Medical & Biological Engineering & Computing, pp. 1-12, 2024.
[22] B. S. P. Laksono, T. Syaifuddin, and F. Utaminingrum, "Voice Recognition to Classify 'Buka' and 'Tutup' Sound to Open and Closes Door Using Mel Frequency Cepstral Coefficients (MFCC) and Convolutional Neural Network (CNN)," Journal of Information Technology and Computer Science, vol. 9, no. 1, pp. 58-66, 2024.
[23] C. Ozdemir, "Adapting transfer learning models to dataset through pruning and Avg-TopK pooling," Neural Comput & Applic., vol. 36, pp. 6257–6270, 2024. https://doi.org/10.1007/s00521-024-09484-6
[24] C. Ozdemir, "Classification of brain tumors from MR images using a new CNN architecture," Traitement du Signal, vol. 40, no. 2, pp. 611-618, 2023. https://doi.org/10.18280/ts.400219

Details

Primary Language

English

Subjects

Artificial Intelligence (Other)

Journal Section

Research Article

Authors

Yusuf Çelik ^*
0000-0002-7859-7543
Türkiye

Early Pub Date

September 20, 2024

Publication Date

September 26, 2024

Submission Date

May 2, 2024

Acceptance Date

July 25, 2024

Published in Issue

Year 2024 Volume: 13 Number: 3

DOI

https://doi.org/10.17798/bitlisfen.1477191

IZ

https://izlik.org/JA53UG97HH

Cite

RIS / Bibtex

APA

Çelik, Y. (2024). Application of Deep Learning for Voice Command Classification in Turkish Language. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, 13(3), 701-708. https://doi.org/10.17798/bitlisfen.1477191

AMA

1.Çelik Y. Application of Deep Learning for Voice Command Classification in Turkish Language. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi. 2024;13(3):701-708. doi:10.17798/bitlisfen.1477191

Chicago

Çelik, Yusuf. 2024. “Application of Deep Learning for Voice Command Classification in Turkish Language”. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi 13 (3): 701-8. https://doi.org/10.17798/bitlisfen.1477191.

EndNote

Çelik Y (September 1, 2024) Application of Deep Learning for Voice Command Classification in Turkish Language. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi 13 3 701–708.

IEEE

[1]Y. Çelik, “Application of Deep Learning for Voice Command Classification in Turkish Language”, Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, vol. 13, no. 3, pp. 701–708, Sept. 2024, doi: 10.17798/bitlisfen.1477191.

ISNAD

Çelik, Yusuf. “Application of Deep Learning for Voice Command Classification in Turkish Language”. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi 13/3 (September 1, 2024): 701-708. https://doi.org/10.17798/bitlisfen.1477191.

JAMA

1.Çelik Y. Application of Deep Learning for Voice Command Classification in Turkish Language. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi. 2024;13:701–708.

MLA

Çelik, Yusuf. “Application of Deep Learning for Voice Command Classification in Turkish Language”. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, vol. 13, no. 3, Sept. 2024, pp. 701-8, doi:10.17798/bitlisfen.1477191.

Vancouver

1.Yusuf Çelik. Application of Deep Learning for Voice Command Classification in Turkish Language. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi. 2024 Sep. 1;13(3):701-8. doi:10.17798/bitlisfen.1477191

Cited By

Voice Assistant

International Journal of Advanced Research in Science, Communication and Technology

https://doi.org/10.48175/IJARSCT-29488

Deep learning-based detection of bowel sound events in continuous recordings

Scientific Reports

https://doi.org/10.1038/s41598-026-47018-3

Hybrid Spectral and Statistical Feature Modelling with Cross-Correlation and Ensemble Learning for Robust Turkish Voice Command Classification

Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi

https://doi.org/10.19113/sdufenbed.1753641