Conference Paper

A Comparative Assessment of Text-independent Automatic Speaker Identification Methods Using Limited Data

Number: 26 July 31, 2021
TR EN

A Comparative Assessment of Text-independent Automatic Speaker Identification Methods Using Limited Data

Abstract

Automatic Speaker Identification (ASI) is one of the active fields of research in signal processing. Various machine learning algorithms have been used for this purpose. With the recent developments in hardware technologies and data accumulation, Deep Learning (DL) methods have become the new state-of-the-art approach in several classification and identification tasks. In this paper, we evaluate the performance of traditional methods such as Gaussian Mixture Model-Universal Background Model (GMM-UBM) and DL-based techniques such as Factorized Time-Delay Neural Network (FTDNN) and Convolutional Neural Networks (CNN) for text-independent closed-set automatic speaker identification on two datasets with different conditions. LibriSpeech is one of the experimental datasets, which consists of clean audio signals from audiobooks, collected from a large number of speakers. The other dataset was collected and prepared by us, which has rather limited speech data with low signal-to-noise-ratio from real-life conversations of customers with the agents in a call center. The duration of the speech signals in the query phase is an important factor affecting the performances of ASI methods. In this work, a CNN architecture is proposed for automatic speaker identification from short speech segments. The architecture design aims at capturing the temporal nature of speech signal in an optimum convolutional neural network with low number of parameters compared to the well-known CNN architectures. We show that the proposed CNN-based algorithm performs better on the large and clean dataset, whereas on the other dataset with limited amount of data, traditional method outperforms all DL approaches. The achieved top-1 accuracy by the proposed model is 99.5% on 1-second voice instances from LibriSpeech dataset.

Keywords

Supporting Institution

Arcelik, Scientific Project Unit (BAP) of Istanbul Technical University

Project Number

MOA-2019-42321

References

  1. Beigi, H. (2011). Fundamentals of Speaker Recognition. Springer Publishing Company, Incorporated.
  2. Chowdhury, M. F. R., Selouani, S.-A., and O’Shaughnessy, D. (2010). Text-independent distributed speaker identification and verification using gmm-ubm speaker models for mobile communications. In 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010), pages 57–60. IEEE.
  3. Chung, J. S., Huh, J., Mun, S., Lee, M., Heo, H. S., Choe, S., Ham, C., Jung, S., Lee, B.-J., and Han, I. (2020). In defence of metric learning for speaker recognition. arXiv preprint arXiv:2003.11982.
  4. Jain, A. K., Flynn, P., and Ross, A. A. (2007). Handbook of biometrics. Springer Science & Business Media.
  5. Jin, Q. and Waibel, A. (2000). Application of lda to speaker recognition. In Sixth International Conference on Spoken Language Processing.
  6. Kanagasundaram, A., Vogt, R., Dean, D. B., Sridharan, S., and Mason, M. W. (2011). I-vector based speaker recognition on short utterances. In "Proceedings of the 12th Annual Conference of the International Speech Communication Association", pages 2341–2344. International Speech Communication Association (ISCA).
  7. Kanagasundaram, A., Vogt, R. J., Dean, D. B., and Sridharan, S. (2012). Plda based speaker recognition on short utterances. In "The Speaker and Language Recognition Workshop (Odyssey 2012)". ISCA.
  8. Kenny, P., Stafylakis, T., Ouellet, P., and Alam, M. J. (2014). Jfa-based front ends for speaker recognition. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1705–1709. IEEE.

Details

Primary Language

English

Subjects

Engineering

Journal Section

Conference Paper

Publication Date

July 31, 2021

Submission Date

June 21, 2021

Acceptance Date

June 26, 2021

Published in Issue

Year 2021 Number: 26

APA
Fasounaki, M., Yüce, E. B., Öncül, S., & İnce, G. (2021). A Comparative Assessment of Text-independent Automatic Speaker Identification Methods Using Limited Data. Avrupa Bilim Ve Teknoloji Dergisi, 26, 217-222. https://doi.org/10.31590/ejosat.950218
AMA
1.Fasounaki M, Yüce EB, Öncül S, İnce G. A Comparative Assessment of Text-independent Automatic Speaker Identification Methods Using Limited Data. EJOSAT. 2021;(26):217-222. doi:10.31590/ejosat.950218
Chicago
Fasounaki, Mandana, Emirhan Burak Yüce, Serkan Öncül, and Gökhan İnce. 2021. “A Comparative Assessment of Text-Independent Automatic Speaker Identification Methods Using Limited Data”. Avrupa Bilim Ve Teknoloji Dergisi, nos. 26: 217-22. https://doi.org/10.31590/ejosat.950218.
EndNote
Fasounaki M, Yüce EB, Öncül S, İnce G (July 1, 2021) A Comparative Assessment of Text-independent Automatic Speaker Identification Methods Using Limited Data. Avrupa Bilim ve Teknoloji Dergisi 26 217–222.
IEEE
[1]M. Fasounaki, E. B. Yüce, S. Öncül, and G. İnce, “A Comparative Assessment of Text-independent Automatic Speaker Identification Methods Using Limited Data”, EJOSAT, no. 26, pp. 217–222, July 2021, doi: 10.31590/ejosat.950218.
ISNAD
Fasounaki, Mandana - Yüce, Emirhan Burak - Öncül, Serkan - İnce, Gökhan. “A Comparative Assessment of Text-Independent Automatic Speaker Identification Methods Using Limited Data”. Avrupa Bilim ve Teknoloji Dergisi. 26 (July 1, 2021): 217-222. https://doi.org/10.31590/ejosat.950218.
JAMA
1.Fasounaki M, Yüce EB, Öncül S, İnce G. A Comparative Assessment of Text-independent Automatic Speaker Identification Methods Using Limited Data. EJOSAT. 2021;:217–222.
MLA
Fasounaki, Mandana, et al. “A Comparative Assessment of Text-Independent Automatic Speaker Identification Methods Using Limited Data”. Avrupa Bilim Ve Teknoloji Dergisi, no. 26, July 2021, pp. 217-22, doi:10.31590/ejosat.950218.
Vancouver
1.Mandana Fasounaki, Emirhan Burak Yüce, Serkan Öncül, Gökhan İnce. A Comparative Assessment of Text-independent Automatic Speaker Identification Methods Using Limited Data. EJOSAT. 2021 Jul. 1;(26):217-22. doi:10.31590/ejosat.950218

Cited By