Araştırma Makalesi
BibTex RIS Kaynak Göster
Yıl 2019, Cilt: 15 Sayı: 3, 287 - 292, 30.09.2019
https://doi.org/10.18466/cbayarfbe.556936

Öz

Kaynakça

  • 1. Khosravani A, Homayounpour M, 2017. A PLDA approach for language and text independent spaker, Computer Speech & Language; 1(1):457-474.
  • 2. Hana H, Baeb KM, Honga SK, Parkb H, Kwakd JH, Wanga HS, Joea DJ, Parka JH, Junga YH, Hurc S, Yoob CD, Lee KJ, 2018. Machine learning-based self-powered acoustic sensor for speaker recognition. Nano Energy; 658-665.
  • 3. Alexa Voice Service, Alexa Voice Information Report. https://developer.amazon.com/alexa-voice-service (accessed at 26.01.2019).
  • 4. Asas Kaldi's code. http://kaldi-asr.org/ (accessed at 26.01.2019). 5. Dragon Speech Recognition Solutions, Information Web. https://www.nuance.com/dragon.html (accessed at 26.01.2019).
  • 6. Google Voice. https://www.google.com/voice (accessed at 26.01.2019).
  • 7. Open Source Speech Recognition Toolkit. https://cmusphinx.github.io/ (accessed at 26.01.2019).
  • 8. Reynolds A, 1995. Automatic speaker recognition using Gaussian mixture speaker models, The Lincoln Laboratory Journal.
  • 9. Mahboob T, Khanum, M, Sikandar M, Khiyal H, Bibi R, 2015. Speaker Identification Using GMM with MFCC, IJCSI International Journal of Computer Science; 2.
  • 10. Bharti R, Bansal P, 2015. Real Time Speaker Recognition System using MFCC and Vector Quantization Technique.
  • 11. Srivastava S, Chandra M, Sahoo G, 2015. Phase Based Mel Frequency Cepstral Coefficients for Speaker Identification, Springer India; 1(1):57-64.
  • 12. Kumar C, Rehman F, Kumar S, Mehmood A, Shabir G. Analysis of MFCC and BFCC in a speaker identification system, International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), 2018, 174-179.
  • 13. Jadhav A, Dharwadkar N, 2018, A Speaker Recognition System Using Gaussian Mixture Model, EM Algorithm and K-Means Clustering, I.J.Modern Education and Computer Science 11(1):19-28.
  • 14. Hunter J, 2018. Kemeny's function for Markov chains and Markov renewal processes. Linear Algebra and its Applications; (559):54-72.
  • 15. Strain J, 2018. Fast Fourier transforms of piecewise polynomials. Journal of Computational Physics; (373):346-369.
  • 16. Jokinena E, Saeidia R, Kinnunenb T, Alkua P, 2019. Vocal effort compensation for MFCC feature extraction in a shouted versus normal speaker recognition task. Computer Speech & Language; 53:1-11.
  • 17. Ismkhan H, 2018. K-means: An iterative clustering algorithm based on an enhanced version of the K-means. Pattern Recognition; (79):402-413.
  • 18. Tan, L, Jiang, J. Chapter 4 - Discrete Fourier Transform and Signal Spectrum, Digital Signal Processing 3th edn. Fundamentals and Applications, 2019, 91-142.
  • 19. Taherisadra M, Asnania P, Galsterb S, Dehzangib O, 2018. ECG-based driver inattention identification during naturalistic driving using Mel-frequency cepstrum 2-D transform and convolutional neural networks. Smart Health, (9):50-61.
  • 20. Breena J, Crisostomib E, Faizrahnemoonc M, Kirklanda S, Shorten R, 2018. Clustering behaviour in Markov chains with eigenvalues close to one. Linear Algebra and its Applications, (555): 163-185.

Markov Model Based Real Time Speaker Recognition using K-Means, Fast Fourier Transform and Mel Frequency Cepstral Coefficients

Yıl 2019, Cilt: 15 Sayı: 3, 287 - 292, 30.09.2019
https://doi.org/10.18466/cbayarfbe.556936

Öz

In this study, which was carried out using a combination of machine
learning and sound processing methods, a speaker recognition system and
application were developed using real-time Mel Frequency Cepstral Coefficients
(MFCC) features and Markov chain model classifier. A sound sample was taken
from each speaker for the training of the system and these sound samples were
processed in Fast Fourier Transform and MFCC feature extraction algorithms. The
MFCC features were clustered using the k-means clustering algorithm. A Markov
chain model was created for each speaker by using the outputs obtained after
clustering. By deducting the characteristic features of the voice of the
speaker, the person who was talking in the society and how long and at which
time intervals they spoke during the conversation was determined in real time
with high accuracy.

Kaynakça

  • 1. Khosravani A, Homayounpour M, 2017. A PLDA approach for language and text independent spaker, Computer Speech & Language; 1(1):457-474.
  • 2. Hana H, Baeb KM, Honga SK, Parkb H, Kwakd JH, Wanga HS, Joea DJ, Parka JH, Junga YH, Hurc S, Yoob CD, Lee KJ, 2018. Machine learning-based self-powered acoustic sensor for speaker recognition. Nano Energy; 658-665.
  • 3. Alexa Voice Service, Alexa Voice Information Report. https://developer.amazon.com/alexa-voice-service (accessed at 26.01.2019).
  • 4. Asas Kaldi's code. http://kaldi-asr.org/ (accessed at 26.01.2019). 5. Dragon Speech Recognition Solutions, Information Web. https://www.nuance.com/dragon.html (accessed at 26.01.2019).
  • 6. Google Voice. https://www.google.com/voice (accessed at 26.01.2019).
  • 7. Open Source Speech Recognition Toolkit. https://cmusphinx.github.io/ (accessed at 26.01.2019).
  • 8. Reynolds A, 1995. Automatic speaker recognition using Gaussian mixture speaker models, The Lincoln Laboratory Journal.
  • 9. Mahboob T, Khanum, M, Sikandar M, Khiyal H, Bibi R, 2015. Speaker Identification Using GMM with MFCC, IJCSI International Journal of Computer Science; 2.
  • 10. Bharti R, Bansal P, 2015. Real Time Speaker Recognition System using MFCC and Vector Quantization Technique.
  • 11. Srivastava S, Chandra M, Sahoo G, 2015. Phase Based Mel Frequency Cepstral Coefficients for Speaker Identification, Springer India; 1(1):57-64.
  • 12. Kumar C, Rehman F, Kumar S, Mehmood A, Shabir G. Analysis of MFCC and BFCC in a speaker identification system, International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), 2018, 174-179.
  • 13. Jadhav A, Dharwadkar N, 2018, A Speaker Recognition System Using Gaussian Mixture Model, EM Algorithm and K-Means Clustering, I.J.Modern Education and Computer Science 11(1):19-28.
  • 14. Hunter J, 2018. Kemeny's function for Markov chains and Markov renewal processes. Linear Algebra and its Applications; (559):54-72.
  • 15. Strain J, 2018. Fast Fourier transforms of piecewise polynomials. Journal of Computational Physics; (373):346-369.
  • 16. Jokinena E, Saeidia R, Kinnunenb T, Alkua P, 2019. Vocal effort compensation for MFCC feature extraction in a shouted versus normal speaker recognition task. Computer Speech & Language; 53:1-11.
  • 17. Ismkhan H, 2018. K-means: An iterative clustering algorithm based on an enhanced version of the K-means. Pattern Recognition; (79):402-413.
  • 18. Tan, L, Jiang, J. Chapter 4 - Discrete Fourier Transform and Signal Spectrum, Digital Signal Processing 3th edn. Fundamentals and Applications, 2019, 91-142.
  • 19. Taherisadra M, Asnania P, Galsterb S, Dehzangib O, 2018. ECG-based driver inattention identification during naturalistic driving using Mel-frequency cepstrum 2-D transform and convolutional neural networks. Smart Health, (9):50-61.
  • 20. Breena J, Crisostomib E, Faizrahnemoonc M, Kirklanda S, Shorten R, 2018. Clustering behaviour in Markov chains with eigenvalues close to one. Linear Algebra and its Applications, (555): 163-185.
Toplam 19 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Mühendislik
Bölüm Makaleler
Yazarlar

Emin Borandağ 0000-0001-5553-2707

Yayımlanma Tarihi 30 Eylül 2019
Yayımlandığı Sayı Yıl 2019 Cilt: 15 Sayı: 3

Kaynak Göster

APA Borandağ, E. (2019). Markov Model Based Real Time Speaker Recognition using K-Means, Fast Fourier Transform and Mel Frequency Cepstral Coefficients. Celal Bayar Üniversitesi Fen Bilimleri Dergisi, 15(3), 287-292. https://doi.org/10.18466/cbayarfbe.556936
AMA Borandağ E. Markov Model Based Real Time Speaker Recognition using K-Means, Fast Fourier Transform and Mel Frequency Cepstral Coefficients. CBUJOS. Eylül 2019;15(3):287-292. doi:10.18466/cbayarfbe.556936
Chicago Borandağ, Emin. “Markov Model Based Real Time Speaker Recognition Using K-Means, Fast Fourier Transform and Mel Frequency Cepstral Coefficients”. Celal Bayar Üniversitesi Fen Bilimleri Dergisi 15, sy. 3 (Eylül 2019): 287-92. https://doi.org/10.18466/cbayarfbe.556936.
EndNote Borandağ E (01 Eylül 2019) Markov Model Based Real Time Speaker Recognition using K-Means, Fast Fourier Transform and Mel Frequency Cepstral Coefficients. Celal Bayar Üniversitesi Fen Bilimleri Dergisi 15 3 287–292.
IEEE E. Borandağ, “Markov Model Based Real Time Speaker Recognition using K-Means, Fast Fourier Transform and Mel Frequency Cepstral Coefficients”, CBUJOS, c. 15, sy. 3, ss. 287–292, 2019, doi: 10.18466/cbayarfbe.556936.
ISNAD Borandağ, Emin. “Markov Model Based Real Time Speaker Recognition Using K-Means, Fast Fourier Transform and Mel Frequency Cepstral Coefficients”. Celal Bayar Üniversitesi Fen Bilimleri Dergisi 15/3 (Eylül 2019), 287-292. https://doi.org/10.18466/cbayarfbe.556936.
JAMA Borandağ E. Markov Model Based Real Time Speaker Recognition using K-Means, Fast Fourier Transform and Mel Frequency Cepstral Coefficients. CBUJOS. 2019;15:287–292.
MLA Borandağ, Emin. “Markov Model Based Real Time Speaker Recognition Using K-Means, Fast Fourier Transform and Mel Frequency Cepstral Coefficients”. Celal Bayar Üniversitesi Fen Bilimleri Dergisi, c. 15, sy. 3, 2019, ss. 287-92, doi:10.18466/cbayarfbe.556936.
Vancouver Borandağ E. Markov Model Based Real Time Speaker Recognition using K-Means, Fast Fourier Transform and Mel Frequency Cepstral Coefficients. CBUJOS. 2019;15(3):287-92.