A CNN–NCP Based Hybrid Deep Learning Model for Speech-Driven Gender Classification

Sevda Olgun; Caner Balım; Nevzat Olgun

doi:10.17798/bitlisfen.1803512

A CNN–NCP Based Hybrid Deep Learning Model for Speech-Driven Gender Classification

Abstract

Speech is one of the most natural and effective forms of human communication, carrying both linguistic and non-linguistic information. It plays a crucial role in many applications such as gender classification, biometric authentication, and personalized human-computer interaction. This study aims to investigate the contribution of a hybrid deep learning model based on Neural Circuit Policies (NCP), inspired by biological neural systems, for gender classification on Turkish speech data, by evaluating its performance in terms of accuracy and computational efficiency in comparison with conventional recurrent models. Mel-Frequency Cepstral Coefficients (MFCC) and log-Mel spectrogram features are combined to simultaneously capture the spectral and temporal properties of speech signals. These features are learned as low-level acoustic patterns via Conv1D layers. Long-term temporal dependencies are modeled using Liquid Time Constant (LTC) cells defined within the NCP architecture. To evaluate the generalizability of the model, the experiments were conducted under a speaker-independent setup, and ablation studies were performed by removing different components of the architecture to clearly assess the contribution of the NCP component. Cross-validation was applied on the Mozilla Common Voice 12.0 Turkish dataset during the experiments. The Conv1D+NCP model achieved 99.29% accuracy and 99.28% F1-score, while the LSTM-based model yielded slightly lower results. The NCP-based model offers high performance and computational efficiency with fewer parameters, making it a powerful alternative for real-time applications

Keywords

Ethical Statement

The study is complied with research and publication ethics

References

F. Altunbey Özbay and E. Özbay, “Ses verilerinden cinsiyet tespiti için yeni bir yaklaşım: Optimizasyon yöntemleri ile özellik seçimi,” Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi, vol. 38, no. 2, pp. 1179–1192, 2022, doi: 10.17341/gazimmfd.938294.
S. Safavi, M. Russell, and P. Jančovič, “Automatic speaker, age-group and gender identification from children’s speech,” Comput Speech Lang, vol. 50, pp. 141–156, 2018, doi: 10.1016/j.csl.2018.01.001.
H. A. Sánchez-Hevia, R. Gil-Pita, M. Utrilla-Manso, and M. Rosa-Zurera, “Age group classification and gender recognition from speech with temporal convolutional neural networks,” Multimed Tools Appl, vol. 81, no. 3, pp. 3535–3552, 2022, doi: 10.1007/s11042-021-11614-4.
S. Chaudhary and D. K. Sharma, “Gender Identification based on Voice Signal Characteristics,” in 2018 International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), 2018, pp. 869–874. doi: 10.1109/ICACCCN.2018.8748676.
S. J. Chaudhari and R. M. Kagalkar, “Methodology for gender identification, classification and recognition of human age,” Int. J. Comput. Appl, vol. 975, p. 8887, 2015.
M. Alsulaiman, Z. Ali, and G. Muhammad, “Gender Classification with Voice Intensity,” in 2011 UKSim 5th European Symposium on Computer Modeling and Simulation, 2011, pp. 205–209. doi: 10.1109/EMS.2011.37.
S. E. Tranter and D. A. Reynolds, “An overview of automatic speaker diarization systems,” IEEE Trans Audio Speech Lang Process, vol. 14, no. 5, pp. 1557–1565, 2006, doi: 10.1109/TASL.2006.878256.
E. H. Alkhammash, M. Hadjouni, and A. M. Elshewey, “A Hybrid Ensemble Stacking Model for Gender Voice Recognition Approach,” Electronics (Basel), vol. 11, no. 11, 2022, doi: 10.3390/electronics11111750.

S. Hızlısoy, E. Çolakoğlu, and R. S. Arslan, “Speech-to-Gender Recognition Based on Machine Learning Algorithms,” International Journal of Applied Mathematics Electronics and Computers, vol. 10, no. 4, pp. 84–92, 2022, doi: 10.18100/ijamec.1221455.
J. Ahmad, M. Fiaz, S. Kwon, M. Sodanil, B. Vo, and S. W. Baik, “Gender identification using mfcc for telephone applications-a comparative study,” arXiv preprint arXiv:1601.01577, 2016.
H. Purwins, B. Li, T. Virtanen, J. Schlüter, S. Chang, and T. Sainath, “Deep Learning for Audio Signal Processing,” Aug. 2019, doi: 10.1109/JSTSP.2019.2908700.
M. Lechner, R. Hasani, A. Amini, T. A. Henzinger, D. Rus, and R. Grosu, “Neural circuit policies enabling auditable autonomy,” Nat Mach Intell, vol. 2, no. 10, pp. 642–652, 2020, doi: 10.1038/s42256-020-00237-3.
N. Olgun, Lazer işaretleri ile yapay zeka temelli hedef analizi (Artificial intelligence based target analysis with laser signals), Fırat University, Turkey, 2022.
Mozilla, “Mozilla Common Voice (2022) Common Voice,” https://commonvoice.mozilla.org/tr/datasets .
H. A. Younis et al., “Multimodal age and gender estimation for adaptive human-robot interaction: A systematic literature review,” Processes, vol. 11, no. 5, p. 1488, 2023.
B. K. Munoli, K. A. K. Jain, P. Kumar, A. R. PS, and others, “Human voice analysis to determine age and gender,” in 2023 International Conference on Recent Trends in Electronics and Communication (ICRTEC), 2023, pp. 1–4.
E. Yücesoy and V. V. Nabiyev, “Gender identification of a speaker using MFCC and GMM,” in 2013 8th International Conference on Electrical and Electronics Engineering (ELECO), 2013, pp. 626–629. doi: 10.1109/ELECO.2013.6713922.
Ç. Bakır, “Alman Dili Üzerinde Konuşmacı Cinsiyetinin Otomatik Olarak Belirlenmesi,” Academic Platform - Journal of Engineering and Science, vol. 4, no. 2, 2016, doi: 10.21541/apjes.49291.
S. R. Zaman, D. Sadekeen, M. A. Alfaz, and R. Shahriyar, “One Source to Detect them All: Gender, Age, and Emotion Detection from Voice,” in 2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC), 2021, pp. 338–343. doi: 10.1109/COMPSAC51774.2021.00055.
B. K. Munoli, K. A. K. Jain, P. Kumar, A. R. P. S, and Ashwini, “Human Voice Analysis to Determine Age and Gender,” in 2023 International Conference on Recent Trends in Electronics and Communication (ICRTEC), 2023, pp. 1–4. doi: 10.1109/ICRTEC56977.2023.10111890.
A. A. Mohammed and Y. F. Al-Irhayim, “An overview for assessing a number of systems for estimating age and gender of speakers,” Tikrit Journal of Pure Science, vol. 26, no. 1, pp. 94–100, 2021.
V. S. Kone, A. Anagal, S. Anegundi, P. Jadhav, U. Kulkarni, and M. S. M, “Voice-based Gender and Age Recognition System,” in 2023 International Conference on Advancement in Computation & Computer Technologies (InCACCT), 2023, pp. 74–80. doi: 10.1109/InCACCT57535.2023.10141801.
S. Safavi, M. Russell, and P. Jančovič, “Automatic speaker, age-group and gender identification from children’s speech,” Comput Speech Lang, vol. 50, pp. 141–156, 2018, doi: 10.1016/j.csl.2018.01.001.
Y. Zhao and X. Shu, “Speech emotion analysis using convolutional neural network (CNN) and gamma classifier-based error correcting output codes (ECOC),” Sci Rep, vol. 13, no. 1, p. 20398, 2023, doi: 10.1038/s41598-023-47118-4.
K. Donuk and D. Hanbay, “Konuşma Duygu Tanıma için Akustik Özelliklere Dayalı LSTM Tabanlı Bir Yaklaşım,” Computer Science, vol. 7, no. 2, pp. 54–67, 2022, doi: 10.53070/bbd.1113379.
M. Lechner, R. Hasani, A. Amini, T. Henzinger, D. Rus, and R. Grosu, “Neural circuit policies enabling auditable autonomy,” Nat Mach Intell, vol. 2, pp. 642–652, Aug. 2020, doi: 10.1038/s42256-020-00237-3.
I. Malashin, V. Tynchenko, A. Gantimurov, V. Nelyub, and A. Borodulin, “Applications of Long Short-Term Memory (LSTM) Networks in Polymeric Sciences: A Review,” Polymers (Basel), vol. 16, no. 18, 2024, doi: 10.3390/polym16182607.

Details

Primary Language

English

Subjects

Biomechanical Engineering

Journal Section

Research Article

Authors

Sevda Olgun ^*
0009-0002-2211-4271
Türkiye

Caner Balım
0000-0002-1010-129X
Türkiye

Nevzat Olgun
0000-0003-2461-4923
Türkiye

Publication Date

March 24, 2026

Submission Date

October 14, 2025

Acceptance Date

February 3, 2026

Published in Issue

Year 2026 Volume: 15 Number: 1

DOI

https://doi.org/10.17798/bitlisfen.1803512

IZ

https://izlik.org/JA64ZZ28JX

Cite

RIS / Bibtex

APA

Olgun, S., Balım, C., & Olgun, N. (2026). A CNN–NCP Based Hybrid Deep Learning Model for Speech-Driven Gender Classification. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, 15(1), 312-321. https://doi.org/10.17798/bitlisfen.1803512

AMA

1.Olgun S, Balım C, Olgun N. A CNN–NCP Based Hybrid Deep Learning Model for Speech-Driven Gender Classification. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi. 2026;15(1):312-321. doi:10.17798/bitlisfen.1803512

Chicago

Olgun, Sevda, Caner Balım, and Nevzat Olgun. 2026. “A CNN–NCP Based Hybrid Deep Learning Model for Speech-Driven Gender Classification”. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi 15 (1): 312-21. https://doi.org/10.17798/bitlisfen.1803512.

EndNote

Olgun S, Balım C, Olgun N (March 1, 2026) A CNN–NCP Based Hybrid Deep Learning Model for Speech-Driven Gender Classification. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi 15 1 312–321.

IEEE

[1]S. Olgun, C. Balım, and N. Olgun, “A CNN–NCP Based Hybrid Deep Learning Model for Speech-Driven Gender Classification”, Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, vol. 15, no. 1, pp. 312–321, Mar. 2026, doi: 10.17798/bitlisfen.1803512.

ISNAD

Olgun, Sevda - Balım, Caner - Olgun, Nevzat. “A CNN–NCP Based Hybrid Deep Learning Model for Speech-Driven Gender Classification”. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi 15/1 (March 1, 2026): 312-321. https://doi.org/10.17798/bitlisfen.1803512.

JAMA

1.Olgun S, Balım C, Olgun N. A CNN–NCP Based Hybrid Deep Learning Model for Speech-Driven Gender Classification. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi. 2026;15:312–321.

MLA

Olgun, Sevda, et al. “A CNN–NCP Based Hybrid Deep Learning Model for Speech-Driven Gender Classification”. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, vol. 15, no. 1, Mar. 2026, pp. 312-21, doi:10.17798/bitlisfen.1803512.

Vancouver

1.Sevda Olgun, Caner Balım, Nevzat Olgun. A CNN–NCP Based Hybrid Deep Learning Model for Speech-Driven Gender Classification. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi. 2026 Mar. 1;15(1):312-21. doi:10.17798/bitlisfen.1803512