Research Article

Effect of number and position of frames in speaker age estimation

Volume: 41 Number: 2 April 30, 2023
EN

Effect of number and position of frames in speaker age estimation

Abstract

With the invention of powerful processing devices as well as lucrative capabilities in the first two decades of the 21st century, machine learning algorithms will soon be able to predict speaker age with higher accuracy or much lower error rate. It is an age-old quest for the human society to profile individuals remotely which basically includes age. Speaker age estimation has been treated in quite few perspectives. However, most of these approaches fail to show the effect of utterance length, aka number of frames on speaker age estimation. We present a detailed analysis on the effect of number of frames and position of frames for speaker age es-timation using four magnitude-based and one phase-based spectral feature sets. The optimal speech duration for this objective is investigated. In addition, the mismatch between the train-ing and test utterance duration is explored. The magnitude-based features are mainly derived from filter bank analysis. After the filter-bank analysis, an i-vector is generated for each utter-ance. Least squares support vector regression (LSSVR) is employed for speaker age estimation. In the experiments, the aGender database which consists of utterances from four age groups of German speakers is used. Increasing number of frames in the training and test increases the age estimation accuracy. This can be associated with the notion that more data helps the estimation process. Concerning position, the frames located at the centre of utterances tend to offer better results for both genders. The backend algorithms offer the best performance when the utterance length of training and test sets are equal for longer speech segments, otherwise training with medium length utterances and testing with longer ones offers better estimation performance especially for the female dataset.

Keywords

References

  1. REFERENCES
  2. [1] Barkana BD, Zhou J. A new pitch-range based fea-ture set for a speaker's age and gender classification. Appl Acoust 2015;98:52−61. [CrossRef]
  3. [2] Schötz S. Perception, Analysis and Synthesis of Speaker Age. thesis/docmono. Lund University; 2006.
  4. [3] Chauhan PM, Desai NP. Mel Frequency Cepstral Coefficients (MFCC) based speaker identification in noisy environment using wiener filter. In: 2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE); Mar 2014; pp. 1−5. [CrossRef]
  5. [4] Murthy HA, Gadde V. The modified group delay function and its application to phoneme recogni-tion. In: 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03); Apr 2003; Vol. 1, p. I−68.
  6. [5] Bahari MH, McLaren M, Van hamme H, van Leeuwen DA. Speaker age estimation using i-vec-tors. Eng Appl Artif Intell 2014;34:99−108. [CrossRef]
  7. [6] Burkhardt F, Eckert M, Johannsen W, Stegmann J. A database of age and gender annotated telephone speech. Proceedings of the Language and Resources Conference (LREC); 2010.
  8. [7] Ajmera J, Burkhardt F. Age and gender classification using modulation cepstrum. In: Odyssey; 2008; pp. 25.

Details

Primary Language

English

Subjects

Computer Software

Journal Section

Research Article

Publication Date

April 30, 2023

Submission Date

April 20, 2021

Acceptance Date

October 1, 2021

Published in Issue

Year 2023 Volume: 41 Number: 2

APA
Osman, M. M., Büyük, O., & Tangel, A. (2023). Effect of number and position of frames in speaker age estimation. Sigma Journal of Engineering and Natural Sciences, 41(2), 243-255. https://izlik.org/JA95NA66XP
AMA
1.Osman MM, Büyük O, Tangel A. Effect of number and position of frames in speaker age estimation. SIGMA. 2023;41(2):243-255. https://izlik.org/JA95NA66XP
Chicago
Osman, Mohammed Muntaz, Osman Büyük, and Ali Tangel. 2023. “Effect of Number and Position of Frames in Speaker Age Estimation”. Sigma Journal of Engineering and Natural Sciences 41 (2): 243-55. https://izlik.org/JA95NA66XP.
EndNote
Osman MM, Büyük O, Tangel A (April 1, 2023) Effect of number and position of frames in speaker age estimation. Sigma Journal of Engineering and Natural Sciences 41 2 243–255.
IEEE
[1]M. M. Osman, O. Büyük, and A. Tangel, “Effect of number and position of frames in speaker age estimation”, SIGMA, vol. 41, no. 2, pp. 243–255, Apr. 2023, [Online]. Available: https://izlik.org/JA95NA66XP
ISNAD
Osman, Mohammed Muntaz - Büyük, Osman - Tangel, Ali. “Effect of Number and Position of Frames in Speaker Age Estimation”. Sigma Journal of Engineering and Natural Sciences 41/2 (April 1, 2023): 243-255. https://izlik.org/JA95NA66XP.
JAMA
1.Osman MM, Büyük O, Tangel A. Effect of number and position of frames in speaker age estimation. SIGMA. 2023;41:243–255.
MLA
Osman, Mohammed Muntaz, et al. “Effect of Number and Position of Frames in Speaker Age Estimation”. Sigma Journal of Engineering and Natural Sciences, vol. 41, no. 2, Apr. 2023, pp. 243-55, https://izlik.org/JA95NA66XP.
Vancouver
1.Mohammed Muntaz Osman, Osman Büyük, Ali Tangel. Effect of number and position of frames in speaker age estimation. SIGMA [Internet]. 2023 Apr. 1;41(2):243-55. Available from: https://izlik.org/JA95NA66XP

IMPORTANT NOTE: JOURNAL SUBMISSION LINK https://eds.yildiz.edu.tr/sigma/