Research Article

PARABOLIC FILTER MEL FREQUENCY CEPSTRAL COEFFICIENT AND FUSION OF FEATURES FOR SPEAKER AGE CLASSIFICATION

Volume: 38 Number: 4 October 5, 2021
  • Mohammed Muntaz Osman
  • Osman Büyük
EN

PARABOLIC FILTER MEL FREQUENCY CEPSTRAL COEFFICIENT AND FUSION OF FEATURES FOR SPEAKER AGE CLASSIFICATION

Abstract

Speech is an acoustic signal initiated at the inner end of the human vocal tract and radiated as an audio wave at the tip of the outer end. The structure and length of the vocal tract makes distinctions on features taken from speeches similar in content, but uttered by different speakers. As a person grows his/her vocal tract changes in length which in turn modifies speech characteristics gradually. The mel frequency cepstral coefficient (MFCC) which uses triangular band pass filter banks has been widely regarded as the most popular feature used in most speech processing applications. To improve the accuracy of speaker age classification a new spectral based feature set named as parabolic filter mel frequency cepstral coefficient (PFMFCC) is proposed in this study. PFMFCC uses parabolic band pass filter banks instead of the triangular ones. This feature extraction technique uses 30 parabolic band pass filter banks to extract 42 features from each speech frame of length 20 ms. These features are applied to three classical classifiers, namely the Gaussian mixture model (GMM), cosine score, and probabilistic linear discriminant analysis (PLDA). The aGender database consisting of 47 hours of German speech uttered by a total of 852 speakers is used in this study. The new PFMFCC feature achieved 51.01%, 56.01% and 58.14% accuracies with cosine score, GMM and PLDA classifiers respectively on the female dataset. Similarly it achieved 50.44%, 52.74% and 57.23% accuracies with cosine score, GMM and PLDA classifiers respectively on the male dataset. Using feature fusion of seven feature sets overall accuracies of 60.18%, 52.17% and 56.35% are obtained on cosine score, GMM and PLDA classifiers respectively for all the seven speaker age classes. The feature fusion has improved the overall accuracy by 2.55% using cosine score compared to a related speaker age classification study carried out on the same database previously

Keywords

References

  1. ⦁ Mysak, Edward D., (1959) Pitch and duration characteristics of older males, Journal of Speech& Hearing Research, 2(1), pp.46-54.
  2. ⦁ Minematsu, Nobuaki, M. Sekiguchi, and K. Hirose, (2002) Automatic estimation of one's age with his/her speech based upon acoustic modeling techniques of speakers, in IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), vol. 1, pp. I-137-I-140.
  3. ⦁ Muller, Christian, F. Wittig, and J. Baus, (2003) Exploiting speech for recognizing elderly users to respond to their special needs, in Eighth European Conference on Speech Communication and Technology, pp. 1305-1308.
  4. ⦁ Spiegl, Werner, G. Stemmer, E. Lasarcyk, V. Kolhatkar, A. Cassidy, B. Potard, et al., (2009) Analyzing features for automatic age estimation on cross-sectional data, In INTERSPEECH 2009, vol. 10, pp. 2923-2926.
  5. ⦁ Li M, Jung C-S, Han KJ , (2010) Combining five acoustic level modeling methods for automatic speaker age and gender recognition, In: INTERSPEECH2010, pp. 2826–2829.
  6. ⦁ Ajmera, J., Burkhardt, F., (2008) Age and gender classification using modulation cepstrum, In: Proc. Odyssey, pp. 025.
  7. ⦁ F. Burkhardt, Eckert, M., Johannsen, W. and J. Stegmann, (2010) A database of age and gender annotated telephone speech, Proceedings of the Language and Resources Conference (LREC).
  8. ⦁ Mallouh, Arafat Abu, Zakariya Qawaqneh, and Buket D. Barkana, (2018) New transformed features generated by deep bottleneck extractor and a GMM–UBM classifier for speaker age and gender classification. Neural Computing and Applications 30(8): pp. 2581-2593.

Details

Primary Language

English

Subjects

-

Journal Section

Research Article

Authors

Mohammed Muntaz Osman This is me
0000-0001-6932-4159
Türkiye

Osman Büyük This is me
0000-0003-1039-3234
Türkiye

Publication Date

October 5, 2021

Submission Date

August 7, 2020

Acceptance Date

October 19, 2020

Published in Issue

Year 2020 Volume: 38 Number: 4

APA
Osman, M. M., & Büyük, O. (2021). PARABOLIC FILTER MEL FREQUENCY CEPSTRAL COEFFICIENT AND FUSION OF FEATURES FOR SPEAKER AGE CLASSIFICATION. Sigma Journal of Engineering and Natural Sciences, 38(4), 2177-2191. https://izlik.org/JA26CD33YP
AMA
1.Osman MM, Büyük O. PARABOLIC FILTER MEL FREQUENCY CEPSTRAL COEFFICIENT AND FUSION OF FEATURES FOR SPEAKER AGE CLASSIFICATION. SIGMA. 2021;38(4):2177-2191. https://izlik.org/JA26CD33YP
Chicago
Osman, Mohammed Muntaz, and Osman Büyük. 2021. “PARABOLIC FILTER MEL FREQUENCY CEPSTRAL COEFFICIENT AND FUSION OF FEATURES FOR SPEAKER AGE CLASSIFICATION”. Sigma Journal of Engineering and Natural Sciences 38 (4): 2177-91. https://izlik.org/JA26CD33YP.
EndNote
Osman MM, Büyük O (October 1, 2021) PARABOLIC FILTER MEL FREQUENCY CEPSTRAL COEFFICIENT AND FUSION OF FEATURES FOR SPEAKER AGE CLASSIFICATION. Sigma Journal of Engineering and Natural Sciences 38 4 2177–2191.
IEEE
[1]M. M. Osman and O. Büyük, “PARABOLIC FILTER MEL FREQUENCY CEPSTRAL COEFFICIENT AND FUSION OF FEATURES FOR SPEAKER AGE CLASSIFICATION”, SIGMA, vol. 38, no. 4, pp. 2177–2191, Oct. 2021, [Online]. Available: https://izlik.org/JA26CD33YP
ISNAD
Osman, Mohammed Muntaz - Büyük, Osman. “PARABOLIC FILTER MEL FREQUENCY CEPSTRAL COEFFICIENT AND FUSION OF FEATURES FOR SPEAKER AGE CLASSIFICATION”. Sigma Journal of Engineering and Natural Sciences 38/4 (October 1, 2021): 2177-2191. https://izlik.org/JA26CD33YP.
JAMA
1.Osman MM, Büyük O. PARABOLIC FILTER MEL FREQUENCY CEPSTRAL COEFFICIENT AND FUSION OF FEATURES FOR SPEAKER AGE CLASSIFICATION. SIGMA. 2021;38:2177–2191.
MLA
Osman, Mohammed Muntaz, and Osman Büyük. “PARABOLIC FILTER MEL FREQUENCY CEPSTRAL COEFFICIENT AND FUSION OF FEATURES FOR SPEAKER AGE CLASSIFICATION”. Sigma Journal of Engineering and Natural Sciences, vol. 38, no. 4, Oct. 2021, pp. 2177-91, https://izlik.org/JA26CD33YP.
Vancouver
1.Mohammed Muntaz Osman, Osman Büyük. PARABOLIC FILTER MEL FREQUENCY CEPSTRAL COEFFICIENT AND FUSION OF FEATURES FOR SPEAKER AGE CLASSIFICATION. SIGMA [Internet]. 2021 Oct. 1;38(4):2177-91. Available from: https://izlik.org/JA26CD33YP

IMPORTANT NOTE: JOURNAL SUBMISSION LINK https://eds.yildiz.edu.tr/sigma/