Research Article
BibTex RIS Cite

A CNN–NCP Based Hybrid Deep Learning Model for Speech-Driven Gender Classification

Year 2026, Volume: 15 Issue: 1 , 312 - 321 , 24.03.2026
https://doi.org/10.17798/bitlisfen.1803512
https://izlik.org/JA64ZZ28JX

Abstract

Speech is one of the most natural and effective forms of human communication, carrying both linguistic and non-linguistic information. It plays a crucial role in many applications such as gender classification, biometric authentication, and personalized human-computer interaction. This study aims to investigate the contribution of a hybrid deep learning model based on Neural Circuit Policies (NCP), inspired by biological neural systems, for gender classification on Turkish speech data, by evaluating its performance in terms of accuracy and computational efficiency in comparison with conventional recurrent models. Mel-Frequency Cepstral Coefficients (MFCC) and log-Mel spectrogram features are combined to simultaneously capture the spectral and temporal properties of speech signals. These features are learned as low-level acoustic patterns via Conv1D layers. Long-term temporal dependencies are modeled using Liquid Time Constant (LTC) cells defined within the NCP architecture. To evaluate the generalizability of the model, the experiments were conducted under a speaker-independent setup, and ablation studies were performed by removing different components of the architecture to clearly assess the contribution of the NCP component. Cross-validation was applied on the Mozilla Common Voice 12.0 Turkish dataset during the experiments. The Conv1D+NCP model achieved 99.29% accuracy and 99.28% F1-score, while the LSTM-based model yielded slightly lower results. The NCP-based model offers high performance and computational efficiency with fewer parameters, making it a powerful alternative for real-time applications

Ethical Statement

The study is complied with research and publication ethics

References

  • F. Altunbey Özbay and E. Özbay, “Ses verilerinden cinsiyet tespiti için yeni bir yaklaşım: Optimizasyon yöntemleri ile özellik seçimi,” Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi, vol. 38, no. 2, pp. 1179–1192, 2022, doi: 10.17341/gazimmfd.938294.
  • S. Safavi, M. Russell, and P. Jančovič, “Automatic speaker, age-group and gender identification from children’s speech,” Comput Speech Lang, vol. 50, pp. 141–156, 2018, doi: 10.1016/j.csl.2018.01.001.
  • H. A. Sánchez-Hevia, R. Gil-Pita, M. Utrilla-Manso, and M. Rosa-Zurera, “Age group classification and gender recognition from speech with temporal convolutional neural networks,” Multimed Tools Appl, vol. 81, no. 3, pp. 3535–3552, 2022, doi: 10.1007/s11042-021-11614-4.
  • S. Chaudhary and D. K. Sharma, “Gender Identification based on Voice Signal Characteristics,” in 2018 International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), 2018, pp. 869–874. doi: 10.1109/ICACCCN.2018.8748676.
  • S. J. Chaudhari and R. M. Kagalkar, “Methodology for gender identification, classification and recognition of human age,” Int. J. Comput. Appl, vol. 975, p. 8887, 2015.
  • M. Alsulaiman, Z. Ali, and G. Muhammad, “Gender Classification with Voice Intensity,” in 2011 UKSim 5th European Symposium on Computer Modeling and Simulation, 2011, pp. 205–209. doi: 10.1109/EMS.2011.37.
  • S. E. Tranter and D. A. Reynolds, “An overview of automatic speaker diarization systems,” IEEE Trans Audio Speech Lang Process, vol. 14, no. 5, pp. 1557–1565, 2006, doi: 10.1109/TASL.2006.878256.
  • E. H. Alkhammash, M. Hadjouni, and A. M. Elshewey, “A Hybrid Ensemble Stacking Model for Gender Voice Recognition Approach,” Electronics (Basel), vol. 11, no. 11, 2022, doi: 10.3390/electronics11111750.
  • S. Hızlısoy, E. Çolakoğlu, and R. S. Arslan, “Speech-to-Gender Recognition Based on Machine Learning Algorithms,” International Journal of Applied Mathematics Electronics and Computers, vol. 10, no. 4, pp. 84–92, 2022, doi: 10.18100/ijamec.1221455.
  • J. Ahmad, M. Fiaz, S. Kwon, M. Sodanil, B. Vo, and S. W. Baik, “Gender identification using mfcc for telephone applications-a comparative study,” arXiv preprint arXiv:1601.01577, 2016.
  • H. Purwins, B. Li, T. Virtanen, J. Schlüter, S. Chang, and T. Sainath, “Deep Learning for Audio Signal Processing,” Aug. 2019, doi: 10.1109/JSTSP.2019.2908700.
  • M. Lechner, R. Hasani, A. Amini, T. A. Henzinger, D. Rus, and R. Grosu, “Neural circuit policies enabling auditable autonomy,” Nat Mach Intell, vol. 2, no. 10, pp. 642–652, 2020, doi: 10.1038/s42256-020-00237-3.
  • N. Olgun, Lazer işaretleri ile yapay zeka temelli hedef analizi (Artificial intelligence based target analysis with laser signals), Fırat University, Turkey, 2022.
  • Mozilla, “Mozilla Common Voice (2022) Common Voice,” https://commonvoice.mozilla.org/tr/datasets .
  • H. A. Younis et al., “Multimodal age and gender estimation for adaptive human-robot interaction: A systematic literature review,” Processes, vol. 11, no. 5, p. 1488, 2023.
  • B. K. Munoli, K. A. K. Jain, P. Kumar, A. R. PS, and others, “Human voice analysis to determine age and gender,” in 2023 International Conference on Recent Trends in Electronics and Communication (ICRTEC), 2023, pp. 1–4.
  • E. Yücesoy and V. V. Nabiyev, “Gender identification of a speaker using MFCC and GMM,” in 2013 8th International Conference on Electrical and Electronics Engineering (ELECO), 2013, pp. 626–629. doi: 10.1109/ELECO.2013.6713922.
  • Ç. Bakır, “Alman Dili Üzerinde Konuşmacı Cinsiyetinin Otomatik Olarak Belirlenmesi,” Academic Platform - Journal of Engineering and Science, vol. 4, no. 2, 2016, doi: 10.21541/apjes.49291.
  • S. R. Zaman, D. Sadekeen, M. A. Alfaz, and R. Shahriyar, “One Source to Detect them All: Gender, Age, and Emotion Detection from Voice,” in 2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC), 2021, pp. 338–343. doi: 10.1109/COMPSAC51774.2021.00055.
  • B. K. Munoli, K. A. K. Jain, P. Kumar, A. R. P. S, and Ashwini, “Human Voice Analysis to Determine Age and Gender,” in 2023 International Conference on Recent Trends in Electronics and Communication (ICRTEC), 2023, pp. 1–4. doi: 10.1109/ICRTEC56977.2023.10111890.
  • A. A. Mohammed and Y. F. Al-Irhayim, “An overview for assessing a number of systems for estimating age and gender of speakers,” Tikrit Journal of Pure Science, vol. 26, no. 1, pp. 94–100, 2021.
  • V. S. Kone, A. Anagal, S. Anegundi, P. Jadhav, U. Kulkarni, and M. S. M, “Voice-based Gender and Age Recognition System,” in 2023 International Conference on Advancement in Computation & Computer Technologies (InCACCT), 2023, pp. 74–80. doi: 10.1109/InCACCT57535.2023.10141801.
  • S. Safavi, M. Russell, and P. Jančovič, “Automatic speaker, age-group and gender identification from children’s speech,” Comput Speech Lang, vol. 50, pp. 141–156, 2018, doi: 10.1016/j.csl.2018.01.001.
  • Y. Zhao and X. Shu, “Speech emotion analysis using convolutional neural network (CNN) and gamma classifier-based error correcting output codes (ECOC),” Sci Rep, vol. 13, no. 1, p. 20398, 2023, doi: 10.1038/s41598-023-47118-4.
  • K. Donuk and D. Hanbay, “Konuşma Duygu Tanıma için Akustik Özelliklere Dayalı LSTM Tabanlı Bir Yaklaşım,” Computer Science, vol. 7, no. 2, pp. 54–67, 2022, doi: 10.53070/bbd.1113379.
  • M. Lechner, R. Hasani, A. Amini, T. Henzinger, D. Rus, and R. Grosu, “Neural circuit policies enabling auditable autonomy,” Nat Mach Intell, vol. 2, pp. 642–652, Aug. 2020, doi: 10.1038/s42256-020-00237-3.
  • I. Malashin, V. Tynchenko, A. Gantimurov, V. Nelyub, and A. Borodulin, “Applications of Long Short-Term Memory (LSTM) Networks in Polymeric Sciences: A Review,” Polymers (Basel), vol. 16, no. 18, 2024, doi: 10.3390/polym16182607.
There are 27 citations in total.

Details

Primary Language English
Subjects Biomechanical Engineering
Journal Section Research Article
Authors

Sevda Olgun 0009-0002-2211-4271

Caner Balım 0000-0002-1010-129X

Nevzat Olgun 0000-0003-2461-4923

Submission Date October 14, 2025
Acceptance Date February 3, 2026
Publication Date March 24, 2026
DOI https://doi.org/10.17798/bitlisfen.1803512
IZ https://izlik.org/JA64ZZ28JX
Published in Issue Year 2026 Volume: 15 Issue: 1

Cite

IEEE [1]S. Olgun, C. Balım, and N. Olgun, “A CNN–NCP Based Hybrid Deep Learning Model for Speech-Driven Gender Classification”, Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, vol. 15, no. 1, pp. 312–321, Mar. 2026, doi: 10.17798/bitlisfen.1803512.

Bitlis Eren University
Journal of Science Editor
Bitlis Eren University Graduate Institute
Bes Minare Mah. Ahmet Eren Bulvari, Merkez Kampus, 13000 BITLIS