süleyman demirel üniv. fen bilim. enst. derg.

Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi

1308-6529

Süleyman Demirel Üniversitesi

10.19113/sdufenbed.1753641

Speech Recognition

Konuşma Tanıma

Türkçe Sesli Komut Sınıflandırması için Melez Spektral ve İstatistiksel Özellik Modellemesi: Çapraz Korelasyon ve Topluluk Öğrenmesi Yaklaşımı

Hybrid Spectral and Statistical Feature Modelling with Cross-Correlation and Ensemble Learning for Robust Turkish Voice Command Classification

https://orcid.org/0000-0002-7632-1973

İkizler

Nuri

KARADENİZ TEKNİK ÜNİVERSİTESİ

04 24 2026

30 1 67 82 07 29 2025 12 22 2025

1995

Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi

Öz: Türkçe sesli komutların doğru bir şekilde sınıflandırılması, sesle kontrol edilen teknolojilerin gelişimi ve ana dil bağlamında insan-bilgisayar etkileşiminin sorunsuz bir şekilde gerçekleşmesi açısından kritik öneme sahiptir. Bu çalışmada, konuşma sinyallerinin zamansal, spektral ve zaman-frekans temelli özelliklerini yakalayarak tanıma doğruluğunu artırmayı amaçlayan çeşitli özellik çıkarım modelleri sistematik olarak değerlendirilmiştir. Altı farklı özellik vektörü modeli geliştirilmiş; son modelde ise Bilgi Kazancı tabanlı özellik seçimi ile Doğrusal Öngörümleme Kodlama kullanılarak elde edilen formant frekansları entegre edilerek kapsamlı ve ayrıştırıcı bir temsil elde edilmiştir. Sınıflandırma süreci, yaygın olarak kullanılan altı algoritma ile gerçekleştirilmiştir: Rastgele Orman, k-En Yakın Komşu, Çok Katmanlı Algılayıcı, Lojistik Model Ağacı, Destek Vektör Makineleri ve Rastgele Orman, Çok Katmanlı Algılayıcı ve Lojistik Model Ağacı yöntemlerini birleştiren bir topluluk oylama yöntemi. Topluluk oylama sınıflandırıcısı, %93,94 doğruluk oranı ile en yüksek performansı sergileyerek bireysel sınıflayıcıları ve temel modelleri anlamlı şekilde geride bırakmıştır. Bu çalışma, Türkçe sesli komut tanıma uygulamalarına yönelik sağlam, açıklanabilir ve yüksek performanslı bir özellik çerçevesi sunarak literatüre önemli bir katkı sağlamaktadır. Spektral, zamansal ve artikülatuvar özelliklerin entegrasyonu, sesli komutların daha başarılı bir şekilde ayrıştırılmasını mümkün kılmakta ve Türkçe dilindeki sesli kontrol sistemlerinin gelecekteki uygulamaları için değerli çıkarımlar sunmaktadır.

Accurate classification of Turkish voice commands is essential for advancing voice-controlled technologies and enabling seamless human-computer interaction in native language contexts. This study systematically evaluates multiple feature extraction models capturing temporal, spectral, and time-frequency characteristics of speech signals to enhance recognition accuracy. Six feature vector models were developed, with the final model integrating Information Gain-based feature selection and Linear Predictive Coding-derived formant frequencies to create a comprehensive and discriminative representation. Classification was performed using six widely adopted algorithms: Random Forest, k-Nearest Neighbors, Multilayer Perceptron, Logistic Model Tree, Support Vector Machine, and an Ensemble voting method combining Random Forest, Multilayer Perceptron, and Logistic Model Tree. The Ensemble voting classifier demonstrated superior performance, achieving an accuracy of 93.94%, significantly outperforming individual classifiers and baseline models. This study contributes to the literature by presenting a robust, explainable, and high-performing feature framework tailored for Turkish voice command recognition. The integration of spectral, temporal, and articulatory features enables improved discrimination of speech commands, offering valuable insights for future voice-activated applications in Turkish language environments.

Turkish voice command recognition Feature extraction of speech Ensemble learning Information gain Cross-correlation

Türkçe sesli komut tanıma Özellik çıkarımı Topluluk öğrenmesi Bilgi kazancı Çapraz korelasyon

[1] Jakob, D. 2022. Voice controlled devices and older adults – a systematic literature review. In Proc. International Conference on Human-Computer Interaction, Cham, Switzerland: Springer International Publishing, 175–200.

[2] Saritha, B., Laskar, M. A., Laskar, R. H. 2022. A comprehensive review on speaker recognition. In Advances in Speech and Music Technology: Computational Aspects and Applications, 3–23.

[3] Gormez, Y. 2024. Customized deep learning based Turkish automatic speech recognition system supported by language model. PeerJ Computer Science, 10, e1981.

[4] Çelik, Y. 2024. Application of deep learning for voice command classification in Turkish language. Bitlis Eren University Journal of Science and Technology, 13(3), 701–708.

[5] Kang, O., Pickering, L. 2024. Acoustic and temporal analysis for assessing speaking. In The Concise Companion to Language Assessment, 383.

[6] Abdul, Z. K., Al-Talabani, A. K. 2022. Mel frequency cepstral coefficient and its applications: A review. IEEE Access, 10, 122136–122158.

[7] Badhe, S. S., Shirbahadurkar, S. D., Gulhane, S. R. 2022. Renyi entropy and deep learning-based approach for accent classification. Multimedia Tools and Applications, 81(1), 1467–1499.

[8] Singh, V. K., Sharma, K., Sur, S. N. 2025. Acoustic scene classification using dynamic time warping technique based on short time Fourier transform and discrete wavelet transforms. Circuits, Systems, and Signal Processing, 44(3), 1887–1913.

[9] İkizler, N. 1995. Doğrusal Öngörümleme Kodlama ve Yapay Sinir Ağı Yöntemlerinin Ses Tanımada Kullanılması. Karadeniz Teknik Üniversitesi, Fen Bilimleri Enstitüsü Yüksek Lisans Tezi, 83s, Trabzon.

[10] Malik, M., Malik, M. K., Mehmood, K., Makhdoom, I. 2021. Automatic speech recognition: a survey. Multimedia Tools and Applications, 80(6), 9411–9457.

[11] Debnath, S., Roy, P. 2021. Appearance and shape-based hybrid visual feature extraction: toward audio–visual automatic speech recognition. Signal, Image, and Video Processing, 15(1), 25–32.

[12] Vashisht, V., Pandey, A. K., Yadav, S. P. 2021. Speech recognition using machine learning. IEIE Transactions on Smart Processing and Computation, 10(3), 233–239.

[13] Madhu, G., Bukka, A. 2023. Ensemble learning model for gender recognition using the human voice. In Proc. 2023 1st International Conference on Advanced Electronics, Electronics, Computer Intelligence (ICAEECI), October 2023, 1–5.

[14] Alsobhani, A., ALabboodi, H. M., Mahdi, H. 2021. Speech recognition using convolution deep neural networks. Journal of Physics: Conference Series, 1973(1), 012166, August 2021.

[15] Alharbi, S. et al. 2021. Automatic speech recognition: systematic literature review. IEEE Access, 9, 131858–131876.

[16] Bakır, H., Çayır, A. N., Navruz, T. S. 2024. A comprehensive experimental study for analyzing the effects of data augmentation techniques on voice classification. Multimedia Tools and Applications, 83(6), 17601–17628.

[17] Kurtkaya, M. 2020. Turkish speech command dataset. https://www.kaggle.com/murat-kurtkaya/turkish-speech-command-dataset (Accessed: 15.07.2025).

[18] İkizler, N. 2002. Türkçe’de Konuşmacıdan Bağımsız Hece Tanıma Sistemi, Karadeniz Teknik Üniversitesi, Fen Bilimleri Enstitüsü, Doktora Tezi, 143s, Trabzon.

[19] İkizler, N., Çavdar, İ. H., Ekim, G. 2005. Türkçe’de konuşmacıdan bağımsız ayrık hece tanıma sistemi. In Proc. IEEE Signal Processing and Communications Applications Conference (SIU), vol. 1, Kayseri, Türkiye, May, 55–59.

[20] Ekim, G., İkizler, N., Atasoy, A., Çavdar, İ. H. 2008. A speaker recognition system using cross correlation. In Proc. IEEE 16th Signal Processing and Communications Applications Conference (SIU), Aydın, Türkiye, April, 18–21.

[21] Liu, D., Xu, J., Zhang, P., Yan, Y. 2021. A unified system for multilingual speech recognition and language identification. Speech Communication, 127, 17–28.

[22] Iqbal, Y., Zhang, T., Gunawan, T. S., Pratondo, A., Zhao, X., Geng, Y., et al. 2025. A hybrid speech enhancement technique based on discrete wavelet transform and spectral subtraction. IEEE Access.

[23] Andriyanov, N. 2023. The use of correlation features in the problem of speech recognition. Algorithms, 16(2), 90.

[24] Ikizler, N., Ekim, G. 2025. Investigating the effects of Gaussian noise on epileptic seizure detection: The role of spectral flatness, bandwidth, and entropy. Engineering Science and Technology, International Journal, 64, 102005.

[25] Mohine, S., Gupta, P., Bansod, B. S., Bhalla, R., Basra, A. 2022. Evaluation of acoustic modality features for moving vehicle identification. Multidimensional Systems and Signal Processing, 33(4), 1349–1365.

[26] Nawas, K. K., Barik, M. K., Khan, A. N. 2021. Speaker recognition using random forest. In Proc. ITM Web Conference, vol. 37, 01022.

[27] Singh, M. K. 2024. Feature extraction and classification efficiency analysis using machine learning approach for speech signal. Multimedia Tools and Applications, 83(16), 47069–47084.

[28] Goh, M., Yann, X. L. 2021. A novel sentiments analysis model using perceptron classifier. International Journal of Electronics Engineering Applications, 9(4), 1–10.

[29] Ferrer, L. 2022. Analysis and comparison of classification metrics. arXiv preprint, arXiv:2209.05355.

[30] Omuya, E.O., Okeyo, G. O., Kimwele, M. W. 2021. Feature selection for classification using principal component analysis and information gain. Expert Systems with Applications, 174, 114765.

[31] Unluturk, A. 2023. Speech Command Based Intelligent Control of Multiple Home Devices for Physically Handicapped. In Proc. 2023 5th Global Power, Energy and Communication Conference (GPECOM), June, 560–564, IEEE.

[32] Karakaş, B. 2025. Türkçe Sesli Komut Verilerinin Evrişimsel Sinir Ağı ile Sınıflandırılması. Mühendislik Bilimleri ve Araştırmaları Dergisi, 7(1), 51–59.

[33] Uslu, İ. B., Tora, H., Sümer, E., Türker, M. 2020. Yalıtık Sözcüklü bir Türkçe Konuşma Tanıma Sisteminin Yapay Veri Artırımı ile Tasarımı ve Gerçekleştirilimi. Afyon Kocatepe University Journal of Science and Engineering, 20(6), 1147–1155.

[34] Işık, G. 2019. Türkçe Ağızların Tanınmasında Derin Öğrenme Tekniğinin Kullanılması. Hacettepe Üniversitesi Fen Bilimleri Enstitüsü, Doktora Tezi, 100s, Ankara.