PERFORMING ACCURATE SPEAKER RECOGNITION BY USE OF SVM AND CEPSTRAL FEATURES

Zülfikar Aslan; Mehmet Akın

Research Article

PERFORMING ACCURATE SPEAKER RECOGNITION BY USE OF SVM AND CEPSTRAL FEATURES

Year 2018, Volume: 3 Issue: 2, 16 - 25, 01.01.2019

Zülfikar Aslan , Mehmet Akın

Abstract

The task of performing speaker
recognition over voice recordings is an active research area in the relevant
literature in which many applications has been proposed so far. In this study, speaker recognition is
performed over cepstral features extracted from raw voice recordings. Some of
the most prominent cepstral feature selection methods, namely, LPC, LPCC, MFCC, PLP and
RASTA-PLP are utilized and their contribution to the performance of the
applied method is investigated. Obtained
features are handled by SVM classification algorithm to finalize the speaker
recognition task. As a result, it is observed
that cepstral feature selection methods such
as LPCC and MFCC combined with SVM classification result
in around 97% accuracy.

Keywords

Speaker recognition, cepstral feature selection, SVM

References

"Principles of Data Acquisition and Conversion". Texas Instruments. April 2015. http://www.ti.com/lit/an/sbaa051a/sbaa051a.pdf (08.06.2017)
Ambikairajah, E. (2007, December). Emerging features for speaker recognition. In Information, Communications & Signal Processing, 2007 6th International Conference on (pp. 1-7). IEEE.
Kurzekar, P. K., Deshmukh, R. R., Waghmare, V. B., & Shrishrimal, P. P. (2014). A comparative study of feature extraction techniques for speech recognition system. International Journal of Innovative Research in Science, Engineering and Technology, 3(12), 18006-18016.
Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63(4), 561-580.
Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on acoustics, speech, and signal processing, 28(4), 357-366.
Yusnita, M. A., Paulraj, M. P., Yaacob, S., Fadzilah, M. N., & Shahriman, A. B. (2013). Acoustic analysis of formants across genders and ethnical accents in Malaysian English using ANOVA. Procedia Engineering, 64, 385-394.
Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. the Journal of the Acoustical Society of America, 87(4), 1738-1752.
Dumitru, C. O., & Gavat, I. (2006, June). A comparative study of feature extraction methods applied to continuous speech recognition in Romanian Language. In Multimedia Signal Processing and Communications, 48th International Symposium ELMAR-2006 focused on (pp. 115-118). IEEE.
O'Shaughnessy, D. (2003). Interacting with computers by voice: automatic speech recognition and synthesis. Proceedings of the IEEE, 91(9), 1272-1305.
Maheswari, N. U., Kabilan, A. P., & Venkatesh, R. (2010). A hybrid model of neural network approach for speaker independent word recognition. International Journal of Computer Theory and Engineering, 2(6), 912.
Dhonde, S. B., & Jagade, S. M. (2015). Feature extraction techniques in speaker recognition: A review. International Journal on Recent Technologies in Mechanical and Electrical Engineering (IJRMEE), 2(5), 104-106.
Chowdhury, M. H. (2014). Speech based gender identification using empirical mode decomposition (EMD) (Doctoral dissertation, BRAC University).
Kumar, J., Prabhakar, O. P., & Sahu, N. K. (2014). Comparative Analysis of Different Feature Extraction and Classifier Techniques for Speaker Identification Systems: A Review. International Journal of Innovative Research in Computer and Communication Engineering, 2(1), 2760-2269.
Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE transactions on speech and audio processing, 2(4), 578-589.
Kwon, O. W., Chan, K., & Lee, T. W. (2003). Speech feature analysis using variational Bayesian PCA. IEEE Signal Processing Letters, 10(5), 137-140.
Soman, K.P., Loganathan, R. and Ajay, V. (2011). Machine learning with SVM andother kernel methods. PHI Learning Pvt. Ltd., 486 s.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
Li, S., Li, H., Li, M., Shyr, Y., Xie, L., & Li, Y. (2009). Improved prediction of lysine acetylation by support vector machines. Protein and peptide letters, 16(8), 977-983.
Yildiz, M., Bergil, E., & Oral, C. (2017). Comparison of different classification methods for the preictal stage detection in EEG signals. Biomedical Research, 28(2).
AYHAN, S., & ERDOĞMUŞ, Ş. (2014). Destek vektör makineleriyle sınıflandırma problemlerinin çözümü için çekirdek fonksiyonu seçimi. Eskişehir Osmangazi Üniversitesi İktisadi ve İdari Bilimler Dergisi, 9(1).
Zouhir, Y., & Ouni, K. (2014). A bio-inspired feature extraction for robust speech recognition. SpringerPlus, 3(1), 651.
Saksamudre, S. K., & Deshmukh, R. R. (2015). Comparative study of isolated word recognition system for Hindi language. International Journal of Engineering Research and Technology, 4(07).
Salomons, E. L., & Havinga, P. J. (2015). A survey on the feasibility of sound classification on wireless sensor nodes. Sensors, 15(4), 7462-7498.

Year 2018, Volume: 3 Issue: 2, 16 - 25, 01.01.2019

Zülfikar Aslan , Mehmet Akın

Abstract

References

"Principles of Data Acquisition and Conversion". Texas Instruments. April 2015. http://www.ti.com/lit/an/sbaa051a/sbaa051a.pdf (08.06.2017)
Ambikairajah, E. (2007, December). Emerging features for speaker recognition. In Information, Communications & Signal Processing, 2007 6th International Conference on (pp. 1-7). IEEE.
Kurzekar, P. K., Deshmukh, R. R., Waghmare, V. B., & Shrishrimal, P. P. (2014). A comparative study of feature extraction techniques for speech recognition system. International Journal of Innovative Research in Science, Engineering and Technology, 3(12), 18006-18016.
Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63(4), 561-580.
Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on acoustics, speech, and signal processing, 28(4), 357-366.
Yusnita, M. A., Paulraj, M. P., Yaacob, S., Fadzilah, M. N., & Shahriman, A. B. (2013). Acoustic analysis of formants across genders and ethnical accents in Malaysian English using ANOVA. Procedia Engineering, 64, 385-394.
Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. the Journal of the Acoustical Society of America, 87(4), 1738-1752.
Dumitru, C. O., & Gavat, I. (2006, June). A comparative study of feature extraction methods applied to continuous speech recognition in Romanian Language. In Multimedia Signal Processing and Communications, 48th International Symposium ELMAR-2006 focused on (pp. 115-118). IEEE.
O'Shaughnessy, D. (2003). Interacting with computers by voice: automatic speech recognition and synthesis. Proceedings of the IEEE, 91(9), 1272-1305.
Maheswari, N. U., Kabilan, A. P., & Venkatesh, R. (2010). A hybrid model of neural network approach for speaker independent word recognition. International Journal of Computer Theory and Engineering, 2(6), 912.
Dhonde, S. B., & Jagade, S. M. (2015). Feature extraction techniques in speaker recognition: A review. International Journal on Recent Technologies in Mechanical and Electrical Engineering (IJRMEE), 2(5), 104-106.
Chowdhury, M. H. (2014). Speech based gender identification using empirical mode decomposition (EMD) (Doctoral dissertation, BRAC University).
Kumar, J., Prabhakar, O. P., & Sahu, N. K. (2014). Comparative Analysis of Different Feature Extraction and Classifier Techniques for Speaker Identification Systems: A Review. International Journal of Innovative Research in Computer and Communication Engineering, 2(1), 2760-2269.
Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE transactions on speech and audio processing, 2(4), 578-589.
Kwon, O. W., Chan, K., & Lee, T. W. (2003). Speech feature analysis using variational Bayesian PCA. IEEE Signal Processing Letters, 10(5), 137-140.
Soman, K.P., Loganathan, R. and Ajay, V. (2011). Machine learning with SVM andother kernel methods. PHI Learning Pvt. Ltd., 486 s.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
Li, S., Li, H., Li, M., Shyr, Y., Xie, L., & Li, Y. (2009). Improved prediction of lysine acetylation by support vector machines. Protein and peptide letters, 16(8), 977-983.
Yildiz, M., Bergil, E., & Oral, C. (2017). Comparison of different classification methods for the preictal stage detection in EEG signals. Biomedical Research, 28(2).
AYHAN, S., & ERDOĞMUŞ, Ş. (2014). Destek vektör makineleriyle sınıflandırma problemlerinin çözümü için çekirdek fonksiyonu seçimi. Eskişehir Osmangazi Üniversitesi İktisadi ve İdari Bilimler Dergisi, 9(1).
Zouhir, Y., & Ouni, K. (2014). A bio-inspired feature extraction for robust speech recognition. SpringerPlus, 3(1), 651.
Saksamudre, S. K., & Deshmukh, R. R. (2015). Comparative study of isolated word recognition system for Hindi language. International Journal of Engineering Research and Technology, 4(07).
Salomons, E. L., & Havinga, P. J. (2015). A survey on the feasibility of sound classification on wireless sensor nodes. Sensors, 15(4), 7462-7498.

There are 23 citations in total.

Details

Primary Language	English
Journal Section	Articles
Authors	Zülfikar Aslan 0000-0002-2706-5715 Mehmet Akın 0000-0003-0776-7653
Publication Date	January 1, 2019
Acceptance Date	July 18, 2018
Published in Issue	Year 2018 Volume: 3 Issue: 2

Cite

APA	Aslan, Z., & Akın, M. (2019). PERFORMING ACCURATE SPEAKER RECOGNITION BY USE OF SVM AND CEPSTRAL FEATURES. The International Journal of Energy and Engineering Sciences, 3(2), 16-25.

Article Files

Full Text

IMPORTANT NOTES

No part of the material protected by this copyright may be reproduced or utilized in any form or by any means, without the prior written permission of the copyright owners, unless the use is a fair dealing for the purpose of private study, research or review. The authors reserve the right that their material can be used for purely educational and research purposes. All the authors are responsible for the originality and plagiarism, multiple publication, disclosure and conflicts of interest and fundamental errors in the published works.

*Please note that All the authors are responsible for the originality and plagiarism, multiple publication, disclosure and conflicts of interest and fundamental errors in the published works. Author(s) submitting a manuscript for publication in IJEES also accept that the manuscript may go through screening for plagiarism check using IThenticate software. For experimental works involving animals, approvals from relevant ethics committee should have been obtained beforehand assuring that the experiment was conducted according to relevant national or international guidelines on care and use of laboratory animals. Authors may be requested to provide evidence to this end.

**Authors are highly recommended to obey the IJEES policies regarding copyrights/Licensing and ethics before submitting their manuscripts.