Hybrid Spectral and Statistical Feature Modelling with Cross-Correlation and Ensemble Learning for Robust Turkish Voice Command Classification

Nuri İkizler

doi:10.19113/sdufenbed.1753641

TR EN

Türkçe Sesli Komut Sınıflandırması için Melez Spektral ve İstatistiksel Özellik Modellemesi: Çapraz Korelasyon ve Topluluk Öğrenmesi Yaklaşımı

Abstract

Öz: Türkçe sesli komutların doğru bir şekilde sınıflandırılması, sesle kontrol edilen teknolojilerin gelişimi ve ana dil bağlamında insan-bilgisayar etkileşiminin sorunsuz bir şekilde gerçekleşmesi açısından kritik öneme sahiptir. Bu çalışmada, konuşma sinyallerinin zamansal, spektral ve zaman-frekans temelli özelliklerini yakalayarak tanıma doğruluğunu artırmayı amaçlayan çeşitli özellik çıkarım modelleri sistematik olarak değerlendirilmiştir. Altı farklı özellik vektörü modeli geliştirilmiş; son modelde ise Bilgi Kazancı tabanlı özellik seçimi ile Doğrusal Öngörümleme Kodlama kullanılarak elde edilen formant frekansları entegre edilerek kapsamlı ve ayrıştırıcı bir temsil elde edilmiştir. Sınıflandırma süreci, yaygın olarak kullanılan altı algoritma ile gerçekleştirilmiştir: Rastgele Orman, k-En Yakın Komşu, Çok Katmanlı Algılayıcı, Lojistik Model Ağacı, Destek Vektör Makineleri ve Rastgele Orman, Çok Katmanlı Algılayıcı ve Lojistik Model Ağacı yöntemlerini birleştiren bir topluluk oylama yöntemi. Topluluk oylama sınıflandırıcısı, %93,94 doğruluk oranı ile en yüksek performansı sergileyerek bireysel sınıflayıcıları ve temel modelleri anlamlı şekilde geride bırakmıştır. Bu çalışma, Türkçe sesli komut tanıma uygulamalarına yönelik sağlam, açıklanabilir ve yüksek performanslı bir özellik çerçevesi sunarak literatüre önemli bir katkı sağlamaktadır. Spektral, zamansal ve artikülatuvar özelliklerin entegrasyonu, sesli komutların daha başarılı bir şekilde ayrıştırılmasını mümkün kılmakta ve Türkçe dilindeki sesli kontrol sistemlerinin gelecekteki uygulamaları için değerli çıkarımlar sunmaktadır.

Keywords

Hybrid Spectral and Statistical Feature Modelling with Cross-Correlation and Ensemble Learning for Robust Turkish Voice Command Classification

Abstract

Accurate classification of Turkish voice commands is essential for advancing voice-controlled technologies and enabling seamless human-computer interaction in native language contexts. This study systematically evaluates multiple feature extraction models capturing temporal, spectral, and time-frequency characteristics of speech signals to enhance recognition accuracy. Six feature vector models were developed, with the final model integrating Information Gain-based feature selection and Linear Predictive Coding-derived formant frequencies to create a comprehensive and discriminative representation. Classification was performed using six widely adopted algorithms: Random Forest, k-Nearest Neighbors, Multilayer Perceptron, Logistic Model Tree, Support Vector Machine, and an Ensemble voting method combining Random Forest, Multilayer Perceptron, and Logistic Model Tree. The Ensemble voting classifier demonstrated superior performance, achieving an accuracy of 93.94%, significantly outperforming individual classifiers and baseline models. This study contributes to the literature by presenting a robust, explainable, and high-performing feature framework tailored for Turkish voice command recognition. The integration of spectral, temporal, and articulatory features enables improved discrimination of speech commands, offering valuable insights for future voice-activated applications in Turkish language environments.

Keywords

References

[1] Jakob, D. 2022. Voice controlled devices and older adults – a systematic literature review. In Proc. International Conference on Human-Computer Interaction, Cham, Switzerland: Springer International Publishing, 175–200.
[2] Saritha, B., Laskar, M. A., Laskar, R. H. 2022. A comprehensive review on speaker recognition. In Advances in Speech and Music Technology: Computational Aspects and Applications, 3–23.
[3] Gormez, Y. 2024. Customized deep learning based Turkish automatic speech recognition system supported by language model. PeerJ Computer Science, 10, e1981.
[4] Çelik, Y. 2024. Application of deep learning for voice command classification in Turkish language. Bitlis Eren University Journal of Science and Technology, 13(3), 701–708.
[5] Kang, O., Pickering, L. 2024. Acoustic and temporal analysis for assessing speaking. In The Concise Companion to Language Assessment, 383.
[6] Abdul, Z. K., Al-Talabani, A. K. 2022. Mel frequency cepstral coefficient and its applications: A review. IEEE Access, 10, 122136–122158.
[7] Badhe, S. S., Shirbahadurkar, S. D., Gulhane, S. R. 2022. Renyi entropy and deep learning-based approach for accent classification. Multimedia Tools and Applications, 81(1), 1467–1499.
[8] Singh, V. K., Sharma, K., Sur, S. N. 2025. Acoustic scene classification using dynamic time warping technique based on short time Fourier transform and discrete wavelet transforms. Circuits, Systems, and Signal Processing, 44(3), 1887–1913.

[9] İkizler, N. 1995. Doğrusal Öngörümleme Kodlama ve Yapay Sinir Ağı Yöntemlerinin Ses Tanımada Kullanılması. Karadeniz Teknik Üniversitesi, Fen Bilimleri Enstitüsü Yüksek Lisans Tezi, 83s, Trabzon.
[10] Malik, M., Malik, M. K., Mehmood, K., Makhdoom, I. 2021. Automatic speech recognition: a survey. Multimedia Tools and Applications, 80(6), 9411–9457.
[11] Debnath, S., Roy, P. 2021. Appearance and shape-based hybrid visual feature extraction: toward audio–visual automatic speech recognition. Signal, Image, and Video Processing, 15(1), 25–32.
[12] Vashisht, V., Pandey, A. K., Yadav, S. P. 2021. Speech recognition using machine learning. IEIE Transactions on Smart Processing and Computation, 10(3), 233–239.
[13] Madhu, G., Bukka, A. 2023. Ensemble learning model for gender recognition using the human voice. In Proc. 2023 1st International Conference on Advanced Electronics, Electronics, Computer Intelligence (ICAEECI), October 2023, 1–5.
[14] Alsobhani, A., ALabboodi, H. M., Mahdi, H. 2021. Speech recognition using convolution deep neural networks. Journal of Physics: Conference Series, 1973(1), 012166, August 2021.
[15] Alharbi, S. et al. 2021. Automatic speech recognition: systematic literature review. IEEE Access, 9, 131858–131876.
[16] Bakır, H., Çayır, A. N., Navruz, T. S. 2024. A comprehensive experimental study for analyzing the effects of data augmentation techniques on voice classification. Multimedia Tools and Applications, 83(6), 17601–17628.
[17] Kurtkaya, M. 2020. Turkish speech command dataset. https://www.kaggle.com/murat-kurtkaya/turkish-speech-command-dataset (Accessed: 15.07.2025).
[18] İkizler, N. 2002. Türkçe’de Konuşmacıdan Bağımsız Hece Tanıma Sistemi, Karadeniz Teknik Üniversitesi, Fen Bilimleri Enstitüsü, Doktora Tezi, 143s, Trabzon.
[19] İkizler, N., Çavdar, İ. H., Ekim, G. 2005. Türkçe’de konuşmacıdan bağımsız ayrık hece tanıma sistemi. In Proc. IEEE Signal Processing and Communications Applications Conference (SIU), vol. 1, Kayseri, Türkiye, May, 55–59.
[20] Ekim, G., İkizler, N., Atasoy, A., Çavdar, İ. H. 2008. A speaker recognition system using cross correlation. In Proc. IEEE 16th Signal Processing and Communications Applications Conference (SIU), Aydın, Türkiye, April, 18–21.
[21] Liu, D., Xu, J., Zhang, P., Yan, Y. 2021. A unified system for multilingual speech recognition and language identification. Speech Communication, 127, 17–28.
[22] Iqbal, Y., Zhang, T., Gunawan, T. S., Pratondo, A., Zhao, X., Geng, Y., et al. 2025. A hybrid speech enhancement technique based on discrete wavelet transform and spectral subtraction. IEEE Access.
[23] Andriyanov, N. 2023. The use of correlation features in the problem of speech recognition. Algorithms, 16(2), 90.
[24] Ikizler, N., Ekim, G. 2025. Investigating the effects of Gaussian noise on epileptic seizure detection: The role of spectral flatness, bandwidth, and entropy. Engineering Science and Technology, International Journal, 64, 102005.
[25] Mohine, S., Gupta, P., Bansod, B. S., Bhalla, R., Basra, A. 2022. Evaluation of acoustic modality features for moving vehicle identification. Multidimensional Systems and Signal Processing, 33(4), 1349–1365.
[26] Nawas, K. K., Barik, M. K., Khan, A. N. 2021. Speaker recognition using random forest. In Proc. ITM Web Conference, vol. 37, 01022.
[27] Singh, M. K. 2024. Feature extraction and classification efficiency analysis using machine learning approach for speech signal. Multimedia Tools and Applications, 83(16), 47069–47084.
[28] Goh, M., Yann, X. L. 2021. A novel sentiments analysis model using perceptron classifier. International Journal of Electronics Engineering Applications, 9(4), 1–10.
[29] Ferrer, L. 2022. Analysis and comparison of classification metrics. arXiv preprint, arXiv:2209.05355.
[30] Omuya, E.O., Okeyo, G. O., Kimwele, M. W. 2021. Feature selection for classification using principal component analysis and information gain. Expert Systems with Applications, 174, 114765.
[31] Unluturk, A. 2023. Speech Command Based Intelligent Control of Multiple Home Devices for Physically Handicapped. In Proc. 2023 5th Global Power, Energy and Communication Conference (GPECOM), June, 560–564, IEEE.
[32] Karakaş, B. 2025. Türkçe Sesli Komut Verilerinin Evrişimsel Sinir Ağı ile Sınıflandırılması. Mühendislik Bilimleri ve Araştırmaları Dergisi, 7(1), 51–59.
[33] Uslu, İ. B., Tora, H., Sümer, E., Türker, M. 2020. Yalıtık Sözcüklü bir Türkçe Konuşma Tanıma Sisteminin Yapay Veri Artırımı ile Tasarımı ve Gerçekleştirilimi. Afyon Kocatepe University Journal of Science and Engineering, 20(6), 1147–1155.
[34] Işık, G. 2019. Türkçe Ağızların Tanınmasında Derin Öğrenme Tekniğinin Kullanılması. Hacettepe Üniversitesi Fen Bilimleri Enstitüsü, Doktora Tezi, 100s, Ankara.

Details

Primary Language

English

Subjects

Speech Recognition

Journal Section

Research Article

Authors

Nuri İkizler ^*
0000-0002-7632-1973
Türkiye

Publication Date

April 24, 2026

Submission Date

July 29, 2025

Acceptance Date

December 22, 2025

Published in Issue

Year 2026 Volume: 30 Number: 1

DOI

https://doi.org/10.19113/sdufenbed.1753641

IZ

https://izlik.org/JA93NA43GK

Cite

RIS / Bibtex

APA

İkizler, N. (2026). Hybrid Spectral and Statistical Feature Modelling with Cross-Correlation and Ensemble Learning for Robust Turkish Voice Command Classification. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 30(1), 67-82. https://doi.org/10.19113/sdufenbed.1753641

AMA

1.İkizler N. Hybrid Spectral and Statistical Feature Modelling with Cross-Correlation and Ensemble Learning for Robust Turkish Voice Command Classification. J. Nat. Appl. Sci. 2026;30(1):67-82. doi:10.19113/sdufenbed.1753641

Chicago

İkizler, Nuri. 2026. “Hybrid Spectral and Statistical Feature Modelling With Cross-Correlation and Ensemble Learning for Robust Turkish Voice Command Classification”. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi 30 (1): 67-82. https://doi.org/10.19113/sdufenbed.1753641.

EndNote

İkizler N (April 1, 2026) Hybrid Spectral and Statistical Feature Modelling with Cross-Correlation and Ensemble Learning for Robust Turkish Voice Command Classification. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi 30 1 67–82.

IEEE

[1]N. İkizler, “Hybrid Spectral and Statistical Feature Modelling with Cross-Correlation and Ensemble Learning for Robust Turkish Voice Command Classification”, J. Nat. Appl. Sci., vol. 30, no. 1, pp. 67–82, Apr. 2026, doi: 10.19113/sdufenbed.1753641.

ISNAD

İkizler, Nuri. “Hybrid Spectral and Statistical Feature Modelling With Cross-Correlation and Ensemble Learning for Robust Turkish Voice Command Classification”. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi 30/1 (April 1, 2026): 67-82. https://doi.org/10.19113/sdufenbed.1753641.

JAMA

1.İkizler N. Hybrid Spectral and Statistical Feature Modelling with Cross-Correlation and Ensemble Learning for Robust Turkish Voice Command Classification. J. Nat. Appl. Sci. 2026;30:67–82.

MLA

İkizler, Nuri. “Hybrid Spectral and Statistical Feature Modelling With Cross-Correlation and Ensemble Learning for Robust Turkish Voice Command Classification”. Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, vol. 30, no. 1, Apr. 2026, pp. 67-82, doi:10.19113/sdufenbed.1753641.

Vancouver

1.Nuri İkizler. Hybrid Spectral and Statistical Feature Modelling with Cross-Correlation and Ensemble Learning for Robust Turkish Voice Command Classification. J. Nat. Appl. Sci. 2026 Apr. 1;30(1):67-82. doi:10.19113/sdufenbed.1753641