Konuşmacının Yaş ve Cinsiyetine Göre Sınıflandırılmasında DVM Çekirdeğinin Etkisi

Ergün Yücesoy

doi:10.31202/ecjse.707179

Research Article

Konuşmacının Yaş ve Cinsiyetine Göre Sınıflandırılmasında DVM Çekirdeğinin Etkisi

Year 2020, , 970 - 982, 30.09.2020

Ergün Yücesoy

https://doi.org/10.31202/ecjse.707179

Cited By: 3

Abstract

Bu çalışmada kısa süreli telefon konuşmalarından konuşmacının yaş ve cinsiyet grubunun otomatik olarak belirlemesi konusu ele alınmıştır. Çalışmada Mel Frekansı Kepstrum katsayıları (MFKK) ve bu katsayılardan türetilen delta parametreleri öznitelik olarak kullanılırken yaş ve cinsiyet sınıflarının modellenmesinde Genel Arkaplan Modelinden (GAM) uyarlanarak oluşturulan Gauss Karışım Modelleri (GKM) kullanılmıştır. Her konuşma için oluşturulan GKM modelleri süpervektörlere dönüştürülmüş ve bir Destek Vektör Makinesine (DVM) uygulanarak konuşmacının yaş ve cinsiyet grubuna göre sınıflandırılmıştır. Çalışmada doğrusal, polinomiyal, radya tabanlı (RBF) ve GKM-KL çekirdeği olmak üzer dört farklı DVM çekirdeği kullanılırken GKM bileşen sayısı da 32 ile 512 arasında değiştirilmiştir. aGender veri tabanı ile yapılan testlerde en iyi sınıflandırma oranı 256 bileşenli GKM’lerin GKM-KL çekirdeği ile sınıflandırılması sonucunda % 60.95 olarak elde edilmiştir.

Keywords

Yaş ve cinsiyet tanıma, konuşma işleme, gauss karışım modeli, destek vektör makineleri

References

F. Metze et al., “Comparison of four approaches to age and gender recognition for telephone applications,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2007, vol. 4, pp. IV-1089-IV-1092.
D. C. Tanner and M. E. Tanner, Forensic aspects of speech patterns: voice prints, speaker profiling, lie and intoxication detection. Lawyers & Judges Publishing Company, 2004.
S. Bhukya, “Effect of Gender on Improving Speech Recognition System,” in International Journal of Computer Applications, 2018, vol. 179, no. 14, pp. 22–30.
M. Li, C.-S. Jung, and K. Han, “Combining five acoustic level modeling methods for automatic speaker age and gender recognition,” in Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, 2010, pp. 2826–2829.
Z. Qawaqneh, A. A. Mallouh, and B. D. Barkana, “Deep neural network framework and transformed MFCCs for speaker’s age and gender classification,” in Knowledge-Based Systems, 2017, vol. 115, pp. 5–14.
S. Safavi, M. Russell, and P. Jančovič, “Automatic speaker, age-group and gender identification from children’s speech,” in Computer Speech and Language, 2018, vol. 50, pp. 141–156.
C. BAKIR, “Automatic Speaker Gender Identification for the German Language,” in Balkan Journal of Electrical and Computer Engineering, 2016, vol. 4, no. 2, pp. 79–83.
O. Büyük and L. M. Arslan, “An investigation of multi-language age classification from voice,” in BIOSIGNALS 2019 - 12th International Conference on Bio-Inspired Systems and Signal Processing, Proceedings; Part of 12th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2019, 2019, pp. 85–92.
E. Fokoue, Z. Ma, and E. Fokoué, “Speaker Gender Recognition via MFCCs and SVMs,” (2013), Accessed from https://scholarworks.rit.edu/article/1749
J. Přibil, A. Přibilová, and J. Matoušek, “GMM-based speaker age and gender classification in Czech and Slovak,” in Journal of Electrical Engineering, 2017, vol. 68, no. 1, pp. 3–12.
F. Faek, “Objective Gender and Age Recognition from Speech Sentences,” in Aro, The Scientific Journal of Koya University, 2015, vol. 3, no. 2, pp. 24–29.
J. Równicka and S. Kacprzak, “Speaker Age Classification and Regression Using i-Vectors,” 2016, pp. 1402–1406.
B. Schuller et al., “The INTERSPEECH 2010 paralinguistic challenge,” in Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, 2010, pp. 2794–2797.
S. B. Davis and P. Mermelstein, “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences,” in IEEE Transactions on Acoustics, Speech, and Signal Processing, 1980, vol. 28, no. 4, pp. 357–366.
S. S. Stevens, J. Volkmann, and E. B. Newman, “A Scale for the Measurement of the Psychological Magnitude Pitch,” in Journal of the Acoustical Society of America, 1937, vol. 8, no. 3, pp. 185–190.
J. W. Picone, “Signal Modeling Techniques in Speech Recognition,” in Proceedings of the IEEE, 1993, vol. 81, no. 9, pp. 1215–1247.
L. Rabiner, “Fundamentals of speech recognition,” Fundam. speech Recognit., 1993.
S. Furui, “Comparison of Speaker Recognition Methods Using Statistical Features and Dynamic Features,” in IEEE Transactions on Acoustics, Speech, and Signal Processing, 1981, vol. 29, no. 3, pp. 342–350.
J. S. Mason and X. Zhang, “Velocity and acceleration features in speaker recognition,” in [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing, 1991, pp. 3673–3676.
C. Cortes and V. Vapnik, “Support-vector networks,” in Machine learning, 1995, vol. 20, no. 3, pp. 273–297.
R. Collobert and S. Bengio, “SVMTorch: Support Vector Machines for large-scale regression problems,” in Journal of Machine Learning Research, 2001, vol. 1, no. 2, pp. 143–160.
J. Bilmes, “A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models,” Tech. Rep. ICSI-TR-97-021, Univ. Berkeley, vol. 4, 2000.
M. Azam and N. Bouguila, “Speaker verification using adapted bounded Gaussian mixture model,” in Proceedings - 2018 IEEE 19th International Conference on Information Reuse and Integration for Data Science, IRI 2018, 2018, vol. 10, no. 1–3, pp. 300–307.
W. M. Campbell, D. E. Sturim, D. A. Reynolds, and A. Solomonoff, “SVM based speaker verification using a GMM supervector kernel and NAP variability compensation,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2006, vol. 1, pp. 97–100.

Year 2020, , 970 - 982, 30.09.2020

Ergün Yücesoy

https://doi.org/10.31202/ecjse.707179

Cited By: 3

Abstract

References

F. Metze et al., “Comparison of four approaches to age and gender recognition for telephone applications,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2007, vol. 4, pp. IV-1089-IV-1092.
D. C. Tanner and M. E. Tanner, Forensic aspects of speech patterns: voice prints, speaker profiling, lie and intoxication detection. Lawyers & Judges Publishing Company, 2004.
S. Bhukya, “Effect of Gender on Improving Speech Recognition System,” in International Journal of Computer Applications, 2018, vol. 179, no. 14, pp. 22–30.
M. Li, C.-S. Jung, and K. Han, “Combining five acoustic level modeling methods for automatic speaker age and gender recognition,” in Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, 2010, pp. 2826–2829.
Z. Qawaqneh, A. A. Mallouh, and B. D. Barkana, “Deep neural network framework and transformed MFCCs for speaker’s age and gender classification,” in Knowledge-Based Systems, 2017, vol. 115, pp. 5–14.
S. Safavi, M. Russell, and P. Jančovič, “Automatic speaker, age-group and gender identification from children’s speech,” in Computer Speech and Language, 2018, vol. 50, pp. 141–156.
C. BAKIR, “Automatic Speaker Gender Identification for the German Language,” in Balkan Journal of Electrical and Computer Engineering, 2016, vol. 4, no. 2, pp. 79–83.
O. Büyük and L. M. Arslan, “An investigation of multi-language age classification from voice,” in BIOSIGNALS 2019 - 12th International Conference on Bio-Inspired Systems and Signal Processing, Proceedings; Part of 12th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2019, 2019, pp. 85–92.
E. Fokoue, Z. Ma, and E. Fokoué, “Speaker Gender Recognition via MFCCs and SVMs,” (2013), Accessed from https://scholarworks.rit.edu/article/1749
J. Přibil, A. Přibilová, and J. Matoušek, “GMM-based speaker age and gender classification in Czech and Slovak,” in Journal of Electrical Engineering, 2017, vol. 68, no. 1, pp. 3–12.
F. Faek, “Objective Gender and Age Recognition from Speech Sentences,” in Aro, The Scientific Journal of Koya University, 2015, vol. 3, no. 2, pp. 24–29.
J. Równicka and S. Kacprzak, “Speaker Age Classification and Regression Using i-Vectors,” 2016, pp. 1402–1406.
B. Schuller et al., “The INTERSPEECH 2010 paralinguistic challenge,” in Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, 2010, pp. 2794–2797.
S. B. Davis and P. Mermelstein, “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences,” in IEEE Transactions on Acoustics, Speech, and Signal Processing, 1980, vol. 28, no. 4, pp. 357–366.
S. S. Stevens, J. Volkmann, and E. B. Newman, “A Scale for the Measurement of the Psychological Magnitude Pitch,” in Journal of the Acoustical Society of America, 1937, vol. 8, no. 3, pp. 185–190.
J. W. Picone, “Signal Modeling Techniques in Speech Recognition,” in Proceedings of the IEEE, 1993, vol. 81, no. 9, pp. 1215–1247.
L. Rabiner, “Fundamentals of speech recognition,” Fundam. speech Recognit., 1993.
S. Furui, “Comparison of Speaker Recognition Methods Using Statistical Features and Dynamic Features,” in IEEE Transactions on Acoustics, Speech, and Signal Processing, 1981, vol. 29, no. 3, pp. 342–350.
J. S. Mason and X. Zhang, “Velocity and acceleration features in speaker recognition,” in [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing, 1991, pp. 3673–3676.
C. Cortes and V. Vapnik, “Support-vector networks,” in Machine learning, 1995, vol. 20, no. 3, pp. 273–297.
R. Collobert and S. Bengio, “SVMTorch: Support Vector Machines for large-scale regression problems,” in Journal of Machine Learning Research, 2001, vol. 1, no. 2, pp. 143–160.
J. Bilmes, “A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models,” Tech. Rep. ICSI-TR-97-021, Univ. Berkeley, vol. 4, 2000.
M. Azam and N. Bouguila, “Speaker verification using adapted bounded Gaussian mixture model,” in Proceedings - 2018 IEEE 19th International Conference on Information Reuse and Integration for Data Science, IRI 2018, 2018, vol. 10, no. 1–3, pp. 300–307.
W. M. Campbell, D. E. Sturim, D. A. Reynolds, and A. Solomonoff, “SVM based speaker verification using a GMM supervector kernel and NAP variability compensation,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2006, vol. 1, pp. 97–100.

There are 24 citations in total.

Details

Primary Language	Turkish
Subjects	Engineering
Journal Section	Makaleler
Authors	Ergün Yücesoy 0000-0003-1707-384X
Publication Date	September 30, 2020
Submission Date	March 21, 2020
Acceptance Date	May 3, 2020
Published in Issue	Year 2020

Cite

IEEE	E. Yücesoy, “Konuşmacının Yaş ve Cinsiyetine Göre Sınıflandırılmasında DVM Çekirdeğinin Etkisi”, ECJSE, vol. 7, no. 3, pp. 970–982, 2020, doi: 10.31202/ecjse.707179.

Cited By

Yapay Zekâ Çağında Duygu Analizi: Büyük Dil Modellerinin Yükselişi ve Klasik Yaklaşımlarla Karşılaştırılması

Afyon Kocatepe University Journal of Sciences and Engineering

https://doi.org/10.35414/akufemubid.1484569

Speech-to-Gender Recognition Based on Machine Learning Algorithms

International Journal of Applied Mathematics Electronics and Computers

https://doi.org/10.18100/ijamec.1221455

Effect of Inclusion of Delta Derivatives and Log Energy to MFCC Features on Age and Gender Classification

Journal of the Institute of Science and Technology

Ergün YÜCESOY

https://doi.org/10.21597/jist.772804

Article Files

Full Text

Açık Dergi Erişimi (BOAI)

Bu eser Creative Commons Atıf-GayriTicari 4.0 Uluslararası Lisansı ile lisanslanmıştır.