MFKK Özniteliklerine Eklenen Logaritmik Enerji ve Delta Parametrelerinin Yaş ve Cinsiyet Sınıflandırma Üzerindeki Etkileri

Ergün Yücesoy

doi:10.21597/jist.772804

TR EN

MFKK Özniteliklerine Eklenen Logaritmik Enerji ve Delta Parametrelerinin Yaş ve Cinsiyet Sınıflandırma Üzerindeki Etkileri

Abstract

Konuşmacıların yaş ve cinsiyet gruplarının otomatik olarak belirlenmesi önemli bir araştırma konusudur ve başta çağrı merkezleri olmak üzere birçok alanda farklı amaçlarla kullanılmaktadır. Bu çalışmada Mel Frekansı Kepstrum Katsayılarına (MFKK) eklenen logaritmik enerji ve delta parametrelerinin otomatik yaş ve cinsiyet tanıma üzerindeki etkileri araştırılmıştır. Konuşma sinyallerinden çıkarılan MFKK öznitelikleri, Gauss Karışım Modeli (GKM) süpervektörlerine dönüştürüldükten sonra Destek Vektör Makinesine (DVM) uygulanmış ve gerçekleştirilen optimizasyon süreci sonunda konuşmacıların yaş ve cinsiyet gruplarına karar verilmiştir. Çalışmada MFKK’ya eklenen parametrelerin yanı sıra MFKK sayısının ve GKM bileşen sayısının başarı üzerindeki etkileri de araştırılmıştır. MFKK sayısı 8 ile 20, GKM bileşen sayısı ise 32 ile 256 arasında değiştirilerek sistem üzerinde testler yapılmıştır. aGender veritabanının geliştirme bölümündeki 299 konuşmacının 1388 konuşması ile yapılan testlerde en yüksek sınıflandırma oranı, 12 kepstral katsayıya logaritmik enerji, delta ve delta-delta parametrelerinin eklenmesi sonucunda %60.23 olarak hesaplanmıştır. Çalışmada optimum GKM bileşen sayısı 128 olarak belirlenirken, logaritmik enerji, delta ve delta-delta parametrelerinin başarı üzerindeki etkileri sırasıyla %1.17, %3.24 ve %4.61 olarak saptanmıştır.

Keywords

Effect of Inclusion of Delta Derivatives and Log Energy to MFCC Features on Age and Gender Classification

Abstract

Automatic recognition of the age and gender groups of the speakers is an important research topic and is used for different purposes in many fields, especially in call centers. In this study, the effects of logarithmic energy and delta parameters added to Mel Frequency Cepstral Coefficients (MFCC) on automatic age and gender recognition were investigated. After transforming the MFCC features extracted from speech signals into Gaussian Mixture Model (GMM) supervectors, they were applied to the Support Vector Machine (DVM) and the age and gender groups of the speakers were decided at the end of the optimization process. In the study, besides the parameters added to MFCC, the effects of MFCC number and GMM component number on success were also investigated. MFCC number was changed between 8 and 20 and GMM component number was changed between 32 and 256 and tests were performed on the system. In tests performed with 1388 speeches of 299 speakers in the development section of aGender database, the highest classification rate was calculated as 60.23% by adding logarithmic energy, delta and delta-delta parameters to 12 cepstral coefficients. In the study, the optimum GMM component number was determined as 128, while the effects of logarithmic energy, delta and delta-delta parameters on success were 1.17%, 3.24% and 4.61%, respectively.

Keywords

References

Bahari MH, McLaren M, van Leeuwen DA, 2014. Speaker age estimation using i-vectors. Engineering Applications of Artificial Intelligence, 34: 99-108.
Bocklet T, Maier A, Bauer JG, Burkhardt F, Noth E, 2008. Age and gender recognition for telephone applications based on gmm supervectors and support vector machines. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, Las Vegas, 31 March-4 April, 2008, pp: 1605-1608.
Campbell, WM, Sturim DE, Reynolds DA, 2006. Support vector machines using GMM supervectors for speaker verification. IEEE signal processing letters, 13(5): 308-311.
Choukri M,Wu S, 2019. Age and Gender Classification for Permission Control of Mobile Devices in Tracking Systems. In International Conference on Artificial Intelligence for Communications and Networks, Harbin, May 25-26, 2019, pp: 318-324.
Dempster A, Laird N, Rubin D, 1977. Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Stat. Soc. 39:1–38.
Dhonde SB, Chaudhari A, Jagade SM, 2017. Integration of mel-frequency cepstral coefficients with log energy and temporal derivatives for text-independent speaker identification. In Proceedings of the International Conference on Data Engineering and Communication Technology, 2017: pp: 791-797
Ertam F, 2019. An effective gender recognition approach using voice data via deeper LSTM networks. Applied Acoustics, 156: 351-358.
Fang SH, Tsao Y, Hsiao MJ, Chen JY, Lai YH, Lin FC, Wang CT, 2019. Detection of pathological voice using cepstrum vectors: A deep learning approach. Journal of Voice, 33(5): 634-641.

Grzybowska J, Kacprzak S, 2016. Speaker Age Classification and Regression Using i-Vectors. In INTERSPEECH 2016, San Francisco, September 8–12, 2016, pp: 1402-1406.
Kerkeni L, Serrestou Y, Mbarki M, Raoof K, Mahjoub, MA, 2018. Speech Emotion Recognition: Methods and Cases Study. In ICAART, January 16-18, 2018, pp: 175-182.
Koo H, Jeong S, Yoon S, Kim W, 2020. Development of Speech Emotion Recognition Algorithm using MFCC and Prosody. In 2020 International Conference on Electronics, Information, and Communication (ICEIC), January 19-22, 2020, pp: 1-4.
Li M, Han KJ, Narayanan S, 2013. Automatic speaker age and gender recognition using acoustic and prosodic level information fusion. Computer Speech & Language, 27(1): 151-167.
Mallouh AA, Qawaqneh Z, Barkana BD, 2017. Combining two different DNN architectures for classifying speaker’s age and gender. In International Conference on Bio-inspired Systems and Signal Processing, Porto, February 21-23, 2017, pp: 112-117.
Meinedo H, Trancoso I, 2010. Age and gender classification using fusion of acoustic and prosodic features. In Eleventh Annual Conference of the International Speech Communication Association, Makuhari, September 26-30, 2010, pp: 2818-2821.
Metze F, Ajmera J, Englert R, Bub U, Burkhardt F, Stegmann J, Müller C, Huber R, Andrassy B, Bauer JG, Littel B, 2007. Comparison of four approaches to age and gender recognition for telephone applications. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP'07, Honolulu, April 15-20, 2007, pp: IV-1089-IV-1092I.
Rabiner L, Juang BH, Yegnanarayana B, 2008. Fundamentals of Speech Recognition, Pearson Education, London.
Rao KS, Manjunath KE, 2017. Speech recognition using articulatory and excitation source features. Springer. (Appendix A MFCC Features)
Reynolds DA, Quatieri TF, Dunn RB, 2000. Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10(1-3), 19-41.
Safavi S, Russell M, Jančovič P, 2018. Automatic speaker, age-group and gender identification from children’s speech. Computer Speech & Language, 50: 141-156.
Son G, Kwon S, Park N, 2019. Gender classification based on the non-lexical cues of emergency calls with recurrent neural networks (RNN). Symmetry, 11(4): 525.
van Heerden C, Barnard E, Davel M, van der Walt C, van Dyk E, Feld M, Müller C, 2010. Combining regression and classification methods for improving automatic speaker age recognition. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, March 14-19, 2010, pp: 5174-5177.
Yücesoy E, 2020. Konuşmacının Yaş ve Cinsiyetine Göre Sınıflandırılmasında DVM Çekirdeğinin Etkisi. El-Cezeri Journal of Science and Engineering, 7(3):970-982.
Zazo R, Nidadavolu PS, Chen N, Gonzalez-Rodriguez J, Dehak N, 2018. Age estimation in short speech utterances based on LSTM recurrent neural networks. IEEE Access, 6: 22524-22530.

Details

Primary Language

Turkish

Subjects

Computer Software

Journal Section

Research Article

Authors

Ergün Yücesoy ^*
0000-0003-1707-384X
Türkiye

Publication Date

March 1, 2021

Submission Date

July 23, 2020

Acceptance Date

November 4, 2020

Published in Issue

Year 2021 Volume: 11 Number: 1

DOI

https://doi.org/10.21597/jist.772804

IZ

https://izlik.org/JA48GN33XD

Cite

RIS / Bibtex

APA

Yücesoy, E. (2021). MFKK Özniteliklerine Eklenen Logaritmik Enerji ve Delta Parametrelerinin Yaş ve Cinsiyet Sınıflandırma Üzerindeki Etkileri. Journal of the Institute of Science and Technology, 11(1), 32-43. https://doi.org/10.21597/jist.772804

AMA

1.Yücesoy E. MFKK Özniteliklerine Eklenen Logaritmik Enerji ve Delta Parametrelerinin Yaş ve Cinsiyet Sınıflandırma Üzerindeki Etkileri. J. Inst. Sci. and Tech. 2021;11(1):32-43. doi:10.21597/jist.772804

Chicago

Yücesoy, Ergün. 2021. “MFKK Özniteliklerine Eklenen Logaritmik Enerji Ve Delta Parametrelerinin Yaş Ve Cinsiyet Sınıflandırma Üzerindeki Etkileri”. Journal of the Institute of Science and Technology 11 (1): 32-43. https://doi.org/10.21597/jist.772804.

EndNote

Yücesoy E (March 1, 2021) MFKK Özniteliklerine Eklenen Logaritmik Enerji ve Delta Parametrelerinin Yaş ve Cinsiyet Sınıflandırma Üzerindeki Etkileri. Journal of the Institute of Science and Technology 11 1 32–43.

IEEE

[1]E. Yücesoy, “MFKK Özniteliklerine Eklenen Logaritmik Enerji ve Delta Parametrelerinin Yaş ve Cinsiyet Sınıflandırma Üzerindeki Etkileri”, J. Inst. Sci. and Tech., vol. 11, no. 1, pp. 32–43, Mar. 2021, doi: 10.21597/jist.772804.

ISNAD

Yücesoy, Ergün. “MFKK Özniteliklerine Eklenen Logaritmik Enerji Ve Delta Parametrelerinin Yaş Ve Cinsiyet Sınıflandırma Üzerindeki Etkileri”. Journal of the Institute of Science and Technology 11/1 (March 1, 2021): 32-43. https://doi.org/10.21597/jist.772804.

JAMA

1.Yücesoy E. MFKK Özniteliklerine Eklenen Logaritmik Enerji ve Delta Parametrelerinin Yaş ve Cinsiyet Sınıflandırma Üzerindeki Etkileri. J. Inst. Sci. and Tech. 2021;11:32–43.

MLA

Yücesoy, Ergün. “MFKK Özniteliklerine Eklenen Logaritmik Enerji Ve Delta Parametrelerinin Yaş Ve Cinsiyet Sınıflandırma Üzerindeki Etkileri”. Journal of the Institute of Science and Technology, vol. 11, no. 1, Mar. 2021, pp. 32-43, doi:10.21597/jist.772804.

Vancouver

1.Ergün Yücesoy. MFKK Özniteliklerine Eklenen Logaritmik Enerji ve Delta Parametrelerinin Yaş ve Cinsiyet Sınıflandırma Üzerindeki Etkileri. J. Inst. Sci. and Tech. 2021 Mar. 1;11(1):32-43. doi:10.21597/jist.772804

Cited By

Detecting audio copy-move forgery with an artificial neural network

Signal, Image and Video Processing

https://doi.org/10.1007/s11760-023-02856-w