Research Article

Markov Model Based Real Time Speaker Recognition using K-Means, Fast Fourier Transform and Mel Frequency Cepstral Coefficients

Volume: 15 Number: 3 September 30, 2019
EN

Markov Model Based Real Time Speaker Recognition using K-Means, Fast Fourier Transform and Mel Frequency Cepstral Coefficients

Abstract

In this study, which was carried out using a combination of machine learning and sound processing methods, a speaker recognition system and application were developed using real-time Mel Frequency Cepstral Coefficients (MFCC) features and Markov chain model classifier. A sound sample was taken from each speaker for the training of the system and these sound samples were processed in Fast Fourier Transform and MFCC feature extraction algorithms. The MFCC features were clustered using the k-means clustering algorithm. A Markov chain model was created for each speaker by using the outputs obtained after clustering. By deducting the characteristic features of the voice of the speaker, the person who was talking in the society and how long and at which time intervals they spoke during the conversation was determined in real time with high accuracy.

Keywords

References

  1. 1. Khosravani A, Homayounpour M, 2017. A PLDA approach for language and text independent spaker, Computer Speech & Language; 1(1):457-474.
  2. 2. Hana H, Baeb KM, Honga SK, Parkb H, Kwakd JH, Wanga HS, Joea DJ, Parka JH, Junga YH, Hurc S, Yoob CD, Lee KJ, 2018. Machine learning-based self-powered acoustic sensor for speaker recognition. Nano Energy; 658-665.
  3. 3. Alexa Voice Service, Alexa Voice Information Report. https://developer.amazon.com/alexa-voice-service (accessed at 26.01.2019).
  4. 4. Asas Kaldi's code. http://kaldi-asr.org/ (accessed at 26.01.2019). 5. Dragon Speech Recognition Solutions, Information Web. https://www.nuance.com/dragon.html (accessed at 26.01.2019).
  5. 6. Google Voice. https://www.google.com/voice (accessed at 26.01.2019).
  6. 7. Open Source Speech Recognition Toolkit. https://cmusphinx.github.io/ (accessed at 26.01.2019).
  7. 8. Reynolds A, 1995. Automatic speaker recognition using Gaussian mixture speaker models, The Lincoln Laboratory Journal.
  8. 9. Mahboob T, Khanum, M, Sikandar M, Khiyal H, Bibi R, 2015. Speaker Identification Using GMM with MFCC, IJCSI International Journal of Computer Science; 2.

Details

Primary Language

English

Subjects

Engineering

Journal Section

Research Article

Publication Date

September 30, 2019

Submission Date

April 22, 2019

Acceptance Date

September 16, 2019

Published in Issue

Year 2019 Volume: 15 Number: 3

APA
Borandağ, E. (2019). Markov Model Based Real Time Speaker Recognition using K-Means, Fast Fourier Transform and Mel Frequency Cepstral Coefficients. Celal Bayar University Journal of Science, 15(3), 287-292. https://doi.org/10.18466/cbayarfbe.556936
AMA
1.Borandağ E. Markov Model Based Real Time Speaker Recognition using K-Means, Fast Fourier Transform and Mel Frequency Cepstral Coefficients. CBUJOS. 2019;15(3):287-292. doi:10.18466/cbayarfbe.556936
Chicago
Borandağ, Emin. 2019. “Markov Model Based Real Time Speaker Recognition Using K-Means, Fast Fourier Transform and Mel Frequency Cepstral Coefficients”. Celal Bayar University Journal of Science 15 (3): 287-92. https://doi.org/10.18466/cbayarfbe.556936.
EndNote
Borandağ E (September 1, 2019) Markov Model Based Real Time Speaker Recognition using K-Means, Fast Fourier Transform and Mel Frequency Cepstral Coefficients. Celal Bayar University Journal of Science 15 3 287–292.
IEEE
[1]E. Borandağ, “Markov Model Based Real Time Speaker Recognition using K-Means, Fast Fourier Transform and Mel Frequency Cepstral Coefficients”, CBUJOS, vol. 15, no. 3, pp. 287–292, Sept. 2019, doi: 10.18466/cbayarfbe.556936.
ISNAD
Borandağ, Emin. “Markov Model Based Real Time Speaker Recognition Using K-Means, Fast Fourier Transform and Mel Frequency Cepstral Coefficients”. Celal Bayar University Journal of Science 15/3 (September 1, 2019): 287-292. https://doi.org/10.18466/cbayarfbe.556936.
JAMA
1.Borandağ E. Markov Model Based Real Time Speaker Recognition using K-Means, Fast Fourier Transform and Mel Frequency Cepstral Coefficients. CBUJOS. 2019;15:287–292.
MLA
Borandağ, Emin. “Markov Model Based Real Time Speaker Recognition Using K-Means, Fast Fourier Transform and Mel Frequency Cepstral Coefficients”. Celal Bayar University Journal of Science, vol. 15, no. 3, Sept. 2019, pp. 287-92, doi:10.18466/cbayarfbe.556936.
Vancouver
1.Emin Borandağ. Markov Model Based Real Time Speaker Recognition using K-Means, Fast Fourier Transform and Mel Frequency Cepstral Coefficients. CBUJOS. 2019 Sep. 1;15(3):287-92. doi:10.18466/cbayarfbe.556936