EN
Unified voice analysis: speaker recognition, age group and gender estimation using spectral features and machine learning classifiers
Abstract
Predicting speaker's personal traits from voice data has been a subject of attention in many fields such as forensic cases, automatic voice response systems, and biomedical applications. Within the scope of this study, gender and age group prediction was made with the voice data recorded from 24 volunteers. Mel-frequency cepstral coefficients (MFCC) were extracted from the audio data as hybrid time/frequency domain features, and fundamental frequencies and formants were extracted as frequency domain features. These obtained features were fused in a feature pool and age group and gender estimation studies were carried out with 4 different machine learning algorithms. According to the results obtained, the age groups of the participants could be classified with 93% accuracy and the genders with 99% accuracy with the Support Vector Machines algorithm. Also, speaker recognition task was successfully completed with 93% accuracy with the Support Vector Machines.
Keywords
References
- [1] A. Rana, A. Dumka, R. Singh, M. Rashid, N. Ahmad, and M. K. Panda, “An Efficient Machine Learning Approach for Diagnosing Parkinson’s Disease by Utilizing Voice Features,” Electronics (Basel), vol. 11, no. 22, p. 3782, 2022.
- [2] E. H. Houssein, A. Hammad, and A. A. Ali, “Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review,” Neural Comput Appl, vol. 34, no. 15, pp. 12527–12557, 2022.
- [3] E. Dritsas and M. Trigka, “Stroke risk prediction with machine learning techniques,” Sensors, vol. 22, no. 13, p. 4670, 2022.
- [4] M. M. Kumbure, C. Lohrmann, P. Luukka, and J. Porras, “Machine learning techniques and data for stock market forecasting: A literature review,” Expert Syst Appl, vol. 197, p. 116659, 2022.
- [5] N. N. Arslan, D. Ozdemir, and H. Temurtas, “ECG heartbeats classification with dilated convolutional autoencoder,” Signal Image Video Process, vol. 18, no. 1, pp. 417–426, 2024, doi: 10.1007/s11760-023-02737-2.
- [6] S. B. Kotsiantis, I. Zaharakis, and P. Pintelas, “Supervised machine learning: A review of classification techniques,” Emerging artificial intelligence applications in computer engineering, vol. 160, no. 1, pp. 3–24, 2007.
- [7] S. Duan, J. Zhang, P. Roe, and M. Towsey, “A survey of tagging techniques for music, speech and environmental sound,” Artif Intell Rev, vol. 42, no. 4, pp. 637–661, 2014, doi: 10.1007/s10462-012-9362-y.
- [8] S. Jayalakshmy and G. F. Sudha, “GTCC-based BiLSTM deep-learning framework for respiratory sound classification using empirical mode decomposition,” Neural Comput Appl, vol. 33, no. 24, pp. 17029–17040, 2021, doi: 10.1007/s00521-021-06295-x.
Details
Primary Language
English
Subjects
Speech Recognition
Journal Section
Research Article
Publication Date
June 30, 2024
Submission Date
January 19, 2024
Acceptance Date
February 12, 2024
Published in Issue
Year 2024 Number: 057
APA
Akgün, K., & Sadık, Ş. A. (2024). Unified voice analysis: speaker recognition, age group and gender estimation using spectral features and machine learning classifiers. Journal of Scientific Reports-A, 057, 12-26. https://doi.org/10.59313/jsr-a.1422792
AMA
1.Akgün K, Sadık ŞA. Unified voice analysis: speaker recognition, age group and gender estimation using spectral features and machine learning classifiers. JSR-A. 2024;(057):12-26. doi:10.59313/jsr-a.1422792
Chicago
Akgün, Kaya, and Şerif Ali Sadık. 2024. “Unified Voice Analysis: Speaker Recognition, Age Group and Gender Estimation Using Spectral Features and Machine Learning Classifiers”. Journal of Scientific Reports-A, nos. 057: 12-26. https://doi.org/10.59313/jsr-a.1422792.
EndNote
Akgün K, Sadık ŞA (June 1, 2024) Unified voice analysis: speaker recognition, age group and gender estimation using spectral features and machine learning classifiers. Journal of Scientific Reports-A 057 12–26.
IEEE
[1]K. Akgün and Ş. A. Sadık, “Unified voice analysis: speaker recognition, age group and gender estimation using spectral features and machine learning classifiers”, JSR-A, no. 057, pp. 12–26, June 2024, doi: 10.59313/jsr-a.1422792.
ISNAD
Akgün, Kaya - Sadık, Şerif Ali. “Unified Voice Analysis: Speaker Recognition, Age Group and Gender Estimation Using Spectral Features and Machine Learning Classifiers”. Journal of Scientific Reports-A. 057 (June 1, 2024): 12-26. https://doi.org/10.59313/jsr-a.1422792.
JAMA
1.Akgün K, Sadık ŞA. Unified voice analysis: speaker recognition, age group and gender estimation using spectral features and machine learning classifiers. JSR-A. 2024;:12–26.
MLA
Akgün, Kaya, and Şerif Ali Sadık. “Unified Voice Analysis: Speaker Recognition, Age Group and Gender Estimation Using Spectral Features and Machine Learning Classifiers”. Journal of Scientific Reports-A, no. 057, June 2024, pp. 12-26, doi:10.59313/jsr-a.1422792.
Vancouver
1.Kaya Akgün, Şerif Ali Sadık. Unified voice analysis: speaker recognition, age group and gender estimation using spectral features and machine learning classifiers. JSR-A. 2024 Jun. 1;(057):12-26. doi:10.59313/jsr-a.1422792