MFCC ve LBP Yöntemlerinin Karşılaştırılması ile Konuşmacı Tanıma ve Konuşmacı Doğrulama

Emrah Aydemir

doi:10.54525/tbbmd.1083707

Araştırma Makalesi

Speaker Recognition and Speaker Verification by Comparison of MFCC and LBP Methods

Yıl 2022, Cilt: 15 Sayı: 2, 104 - 109, 15.12.2022

Emrah Aydemir

https://doi.org/10.54525/tbbmd.1083707

Öz

Speaker recognition or speaker identification is the automatic recognition of the speaker by analyzing the parameters of the audio signals. Human voices contain a very high attachment to their owner. For this reason, in this study, a dataset was obtained from Youtube to determine who is from 46 different people who read Surah Yasin. Feature extraction was done from the obtained audio files with MFCC and LBP. Feature vectors have been tested with various classification algorithms and 35.10% success has been obtained for MFCC, while 90.74% success has been obtained for LBP. For person verification, 100% classification success was achieved in LBP.

Kaynakça

Abdul Z. K., “Kurdish speaker identification based on one dimensional convolutional neural network,” Computational Methods for Differential Equations, vol. 7, no. 4 (Special Issue), pp. 566-572, 2019.
Patel K. ve Prasad R., “Speech recognition and verification using MFCC & VQ,” Int. J. Emerg. Sci. Eng.(IJESE), vol. 1, no. 7, pp. 137-140, 2013.
Kumar C. S. ve Rao P. M., “Design of an automatic speaker recognition system using MFCC, vector quantization and LBG algorithm,” International Journal on Computer Science and Engineering, vol. 3, no. 8, p. 2942, 2011.
Le-Qing L., “Insect sound recognition based on mfcc and pnn,” in 2011 International Conference on Multimedia and Signal Processing, 2011, vol. 2: IEEE, pp. 42-46.
Wanli Z. ve L. Guoxin L., “The research of feature extraction based on MFCC for speaker recognition,” in Proceedings of 2013 3rd International Conference on Computer Science and Network Technology, 2013: IEEE, pp. 1074-1077.
Bimbot F. et al., “A tutorial on text-independent speaker verification,” EURASIP Journal on Advances in Signal Processing, vol. 2004, no. 4, pp. 1-22, 2004.
Singh S., “Forensic and Automatic Speaker Recognition System,” International Journal of Electrical & Computer Engineering (2088-8708), vol. 8, no. 5, 2018.
Kinnunen T. ve Li H., “An overview of text-independent speaker recognition: From features to supervectors,” Speech communication, vol. 52, no. 1, pp. 12-40, 2010.
Larcher A, Lee K. A., Ma B., ve Li H., “Text-dependent speaker verification: Classifiers, databases and RSR2015,” Speech Communication, vol. 60, pp. 56-77, 2014.
Sanjaya M. ve Salleh Z., “Implementasi Pengenalan Pola Suara Menggunakan Mel-Frequency Cepstrum Coefficients (Mfcc) Dan Adaptive Neuro-Fuzzy Inferense System (Anfis) Sebagai Kontrol Lampu Otomatis,” ALHAZEN Journal of Physics, vol. 1, no. 1, pp. 43-54, 2014.
Tiwari V., “MFCC and its applications in speaker recognition,” International journal on emerging technologies, vol. 1, no. 1, pp. 19-22, 2010.
Bansal P., Imam S. A., ve Bharti R., “Speaker recognition using MFCC, shifted MFCC with vector quantization and fuzzy,” in 2015 International Conference on Soft Computing Techniques and Implementations (ICSCTI), 2015: IEEE, pp. 41-44.
Yutai W., Bo L., Xiaoqing J., Feng L., ve Lihao W., “Speaker recognition based on dynamic MFCC parameters,” in 2009 International Conference on Image Analysis and Signal Processing: IEEE, pp. 406-409, 2009.
Ohini Kafui T., ve Mignotte M.. “Environmental sound classification using local binary pattern and audio features collaboration.” IEEE Transactions on Multimedia, 2020, vol. 23: pp. 3978-3985.
Sengupta, N., Sahidullah, M., & Saha, G. 2017. “Lung sound classification using local binary pattern”. arXiv preprint arXiv:1710.01703.
ER, M.B. “Heart sounds classification using convolutional neural network with 1D-local binary pattern and 1D-local ternary pattern features”. Applied Acoustics, 2021, vol. 180: 108152.
Yang, W., Krishnan, S. “Combining temporal features by local binary pattern for acoustic scene classification.” IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017, vol. 25, no. 6: 1315-1321.
Abidin, S., Togneri, R., & Sohel, F. “Spectrotemporal analysis using local binary pattern variants for acoustic scene classification.” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 11, 2112-2121, 2018.
Yang L., Chen X. ve Tao L., “Acoustic scene classification using multi-scale features”, Proc. Detection Classification Acoustic Scenes Events (DCASE), pp. 29-33, 2018.
Deng, M., Meng, T., Cao, J., Wang, S., Zhang, J., & Fan, H. “Heart sound classification based on improved MFCC features and convolutional recurrent neural networks.” Neural Networks, 130, 22-32, 2020.
Chauhan, S., Wang, P., Lim, C. S., & Anantharaman, V. “A computer-aided MFCC-based HMM system for automatic auscultation.” Computers in biology and medicine, 38(2), 221-233, 2008.
Rahmandani, M., Nugroho, H. A., & Setiawan, N. A. Cardiac sound classification using Mel-frequency cepstral coefficients (MFCC) and artificial neural network (ANN). In 2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE), pp. 22-26, IEEE, 2018.
Şaşmaz E., ve Tek, F. B. “Animal sound classification using a convolutional neural network.” In 2018 3rd International Conference on Computer Science and Engineering (UBMK) (pp. 625-629). IEEE, 2018.
Dewi S. P., Prasasti A. L., ve Irawan B. “The study of baby crying analysis using MFCC and LFCC in different classification methods.” In 2019 IEEE International Conference on Signals and Systems (ICSigSys) (pp. 18-23). IEEE, 2019.
Leena R., Mehta S.P., Mahajan A.S.. “Dabhade Comparative Study Of MFCC And LPC For Marathi Isolated Word Recognition System.” International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering vol. 2, no. 6, p. 2133-2139, 2013.

MFCC ve LBP Yöntemlerinin Karşılaştırılması ile Konuşmacı Tanıma ve Konuşmacı Doğrulama

Yıl 2022, Cilt: 15 Sayı: 2, 104 - 109, 15.12.2022

Emrah Aydemir

https://doi.org/10.54525/tbbmd.1083707

Öz

Konuşmacıyı tanıma ya da konuşmacıyı tanımlama konuşmacının ses sinyallerine ait parametrelerinin analiz edilmesi ile otomatik olarak tanınmasıdır. İnsan sesleri sahibine çok yüksek bağlılık içerir. Bu nedenle bu çalışmada Yasin Suresini okuyan 46 farklı kişiden kim olduğunu belirlemek için Youtube üzerinden veri kümesi elde edilmiştir. Elde edilen ses dosyalarından MFCC ve LBP ile öznitelik çıkarımı yapılmıştır. Öznitelik vektörleri çeşitli sınıflandırma algoritmaları ile sınanmış ve MFCC için %35,10 başarı elde edilirken LBP için %90,74 oranında başarılı sonuçlar elde edilmiştir. Kişi doğrulama için ise LBP’de %100 sınıflandırma başarısı elde edilmiştir.

Anahtar Kelimeler

Konuşmacı tanıma, konuşmacı doğrulama, yerel ikili desen, mfcc

Kaynakça

Abdul Z. K., “Kurdish speaker identification based on one dimensional convolutional neural network,” Computational Methods for Differential Equations, vol. 7, no. 4 (Special Issue), pp. 566-572, 2019.
Patel K. ve Prasad R., “Speech recognition and verification using MFCC & VQ,” Int. J. Emerg. Sci. Eng.(IJESE), vol. 1, no. 7, pp. 137-140, 2013.
Kumar C. S. ve Rao P. M., “Design of an automatic speaker recognition system using MFCC, vector quantization and LBG algorithm,” International Journal on Computer Science and Engineering, vol. 3, no. 8, p. 2942, 2011.
Le-Qing L., “Insect sound recognition based on mfcc and pnn,” in 2011 International Conference on Multimedia and Signal Processing, 2011, vol. 2: IEEE, pp. 42-46.
Wanli Z. ve L. Guoxin L., “The research of feature extraction based on MFCC for speaker recognition,” in Proceedings of 2013 3rd International Conference on Computer Science and Network Technology, 2013: IEEE, pp. 1074-1077.
Bimbot F. et al., “A tutorial on text-independent speaker verification,” EURASIP Journal on Advances in Signal Processing, vol. 2004, no. 4, pp. 1-22, 2004.
Singh S., “Forensic and Automatic Speaker Recognition System,” International Journal of Electrical & Computer Engineering (2088-8708), vol. 8, no. 5, 2018.
Kinnunen T. ve Li H., “An overview of text-independent speaker recognition: From features to supervectors,” Speech communication, vol. 52, no. 1, pp. 12-40, 2010.
Larcher A, Lee K. A., Ma B., ve Li H., “Text-dependent speaker verification: Classifiers, databases and RSR2015,” Speech Communication, vol. 60, pp. 56-77, 2014.
Sanjaya M. ve Salleh Z., “Implementasi Pengenalan Pola Suara Menggunakan Mel-Frequency Cepstrum Coefficients (Mfcc) Dan Adaptive Neuro-Fuzzy Inferense System (Anfis) Sebagai Kontrol Lampu Otomatis,” ALHAZEN Journal of Physics, vol. 1, no. 1, pp. 43-54, 2014.
Tiwari V., “MFCC and its applications in speaker recognition,” International journal on emerging technologies, vol. 1, no. 1, pp. 19-22, 2010.
Bansal P., Imam S. A., ve Bharti R., “Speaker recognition using MFCC, shifted MFCC with vector quantization and fuzzy,” in 2015 International Conference on Soft Computing Techniques and Implementations (ICSCTI), 2015: IEEE, pp. 41-44.
Yutai W., Bo L., Xiaoqing J., Feng L., ve Lihao W., “Speaker recognition based on dynamic MFCC parameters,” in 2009 International Conference on Image Analysis and Signal Processing: IEEE, pp. 406-409, 2009.
Ohini Kafui T., ve Mignotte M.. “Environmental sound classification using local binary pattern and audio features collaboration.” IEEE Transactions on Multimedia, 2020, vol. 23: pp. 3978-3985.
Sengupta, N., Sahidullah, M., & Saha, G. 2017. “Lung sound classification using local binary pattern”. arXiv preprint arXiv:1710.01703.
ER, M.B. “Heart sounds classification using convolutional neural network with 1D-local binary pattern and 1D-local ternary pattern features”. Applied Acoustics, 2021, vol. 180: 108152.
Yang, W., Krishnan, S. “Combining temporal features by local binary pattern for acoustic scene classification.” IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017, vol. 25, no. 6: 1315-1321.
Abidin, S., Togneri, R., & Sohel, F. “Spectrotemporal analysis using local binary pattern variants for acoustic scene classification.” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 11, 2112-2121, 2018.
Yang L., Chen X. ve Tao L., “Acoustic scene classification using multi-scale features”, Proc. Detection Classification Acoustic Scenes Events (DCASE), pp. 29-33, 2018.
Deng, M., Meng, T., Cao, J., Wang, S., Zhang, J., & Fan, H. “Heart sound classification based on improved MFCC features and convolutional recurrent neural networks.” Neural Networks, 130, 22-32, 2020.
Chauhan, S., Wang, P., Lim, C. S., & Anantharaman, V. “A computer-aided MFCC-based HMM system for automatic auscultation.” Computers in biology and medicine, 38(2), 221-233, 2008.
Rahmandani, M., Nugroho, H. A., & Setiawan, N. A. Cardiac sound classification using Mel-frequency cepstral coefficients (MFCC) and artificial neural network (ANN). In 2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE), pp. 22-26, IEEE, 2018.
Şaşmaz E., ve Tek, F. B. “Animal sound classification using a convolutional neural network.” In 2018 3rd International Conference on Computer Science and Engineering (UBMK) (pp. 625-629). IEEE, 2018.
Dewi S. P., Prasasti A. L., ve Irawan B. “The study of baby crying analysis using MFCC and LFCC in different classification methods.” In 2019 IEEE International Conference on Signals and Systems (ICSigSys) (pp. 18-23). IEEE, 2019.
Leena R., Mehta S.P., Mahajan A.S.. “Dabhade Comparative Study Of MFCC And LPC For Marathi Isolated Word Recognition System.” International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering vol. 2, no. 6, p. 2133-2139, 2013.

Toplam 25 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	Mühendislik
Bölüm	Makaleler(Araştırma)
Yazarlar	Emrah Aydemir 0000-0002-8380-7891
Erken Görünüm Tarihi	3 Aralık 2022
Yayımlanma Tarihi	15 Aralık 2022
Yayımlandığı Sayı	Yıl 2022 Cilt: 15 Sayı: 2

Kaynak Göster

APA	Aydemir, E. (2022). MFCC ve LBP Yöntemlerinin Karşılaştırılması ile Konuşmacı Tanıma ve Konuşmacı Doğrulama. Türkiye Bilişim Vakfı Bilgisayar Bilimleri Ve Mühendisliği Dergisi, 15(2), 104-109. https://doi.org/10.54525/tbbmd.1083707
AMA	Aydemir E. MFCC ve LBP Yöntemlerinin Karşılaştırılması ile Konuşmacı Tanıma ve Konuşmacı Doğrulama. TBV-BBMD. Aralık 2022;15(2):104-109. doi:10.54525/tbbmd.1083707
Chicago	Aydemir, Emrah. “MFCC Ve LBP Yöntemlerinin Karşılaştırılması Ile Konuşmacı Tanıma Ve Konuşmacı Doğrulama”. Türkiye Bilişim Vakfı Bilgisayar Bilimleri Ve Mühendisliği Dergisi 15, sy. 2 (Aralık 2022): 104-9. https://doi.org/10.54525/tbbmd.1083707.
EndNote	Aydemir E (01 Aralık 2022) MFCC ve LBP Yöntemlerinin Karşılaştırılması ile Konuşmacı Tanıma ve Konuşmacı Doğrulama. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi 15 2 104–109.
IEEE	E. Aydemir, “MFCC ve LBP Yöntemlerinin Karşılaştırılması ile Konuşmacı Tanıma ve Konuşmacı Doğrulama”, TBV-BBMD, c. 15, sy. 2, ss. 104–109, 2022, doi: 10.54525/tbbmd.1083707.
ISNAD	Aydemir, Emrah. “MFCC Ve LBP Yöntemlerinin Karşılaştırılması Ile Konuşmacı Tanıma Ve Konuşmacı Doğrulama”. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi 15/2 (Aralık 2022), 104-109. https://doi.org/10.54525/tbbmd.1083707.
JAMA	Aydemir E. MFCC ve LBP Yöntemlerinin Karşılaştırılması ile Konuşmacı Tanıma ve Konuşmacı Doğrulama. TBV-BBMD. 2022;15:104–109.
MLA	Aydemir, Emrah. “MFCC Ve LBP Yöntemlerinin Karşılaştırılması Ile Konuşmacı Tanıma Ve Konuşmacı Doğrulama”. Türkiye Bilişim Vakfı Bilgisayar Bilimleri Ve Mühendisliği Dergisi, c. 15, sy. 2, 2022, ss. 104-9, doi:10.54525/tbbmd.1083707.
Vancouver	Aydemir E. MFCC ve LBP Yöntemlerinin Karşılaştırılması ile Konuşmacı Tanıma ve Konuşmacı Doğrulama. TBV-BBMD. 2022;15(2):104-9.

Kapak Resmi İndir

Makale Dosyaları

Tam Metin

https://i.creativecommons.org/l/by-nc/4.0Makale Kabulü

Çevrimiçi makale yüklemesi yapmak için kullanıcı kayıt/girişini kullanınız.

Dergiye gönderilen makalelerin kabul süreci şu aşamalardan oluşmaktadır:

1. Gönderilen her makale ilk aşamada en az iki hakeme gönderilmektedir.

2. Hakem ataması, dergi editörleri tarafından yapılmaktadır. Derginin hakem havuzunda yaklaşık 200 hakem bulunmaktadır ve bu hakemler ilgi alanlarına göre sınıflandırılmıştır. Her hakeme ilgilendiği konuda makale gönderilmektedir. Hakem seçimi menfaat çatışmasına neden olmayacak biçimde yapılmaktadır.

3. Hakemlere gönderilen makalelerde yazar adları kapatılmaktadır.

4. Hakemlere bir makalenin nasıl değerlendirileceği açıklanmaktadır ve aşağıda görülen değerlendirme formunu doldurmaları istenmektedir.

5. İki hakemin olumlu görüş bildirdiği makaleler editörler tarafından benzerlik incelemesinden geçirilir. Makalelerdeki benzerliğin %25’ten küçük olması beklenir.

6. Tüm aşamaları geçmiş olan bir bildiri dil ve sunuş açısından editör tarafından incelenir ve gerekli düzeltme ve iyileştirmeler yapılır. Gerekirse yazarlara durum bildirilir.

Bu eser Creative Commons Atıf-GayriTicari 4.0 Uluslararası Lisansı ile lisanslanmıştır.