Research Article

The Impact of Pre-processing and Feature Selection Methods for Speech Emotion Recognition

Volume: 10 Number: 1 March 15, 2019
EN TR

The Impact of Pre-processing and Feature Selection Methods for Speech Emotion Recognition

Abstract

Speech emotion recognition uses features obtained from digital signal processing and digitized sound signal.  All of the features extracted from the speech can be handled under one dimension as well as grouped in terms of dimensional or structure.  In this study, the effects of feature selection and preprocessing methods on emotion detection were investigated.  For this purpose, EMO-DB data set and three different classifiers are used.  According to the results obtained, the highest success was achieved with 90.3% with multi-layer perceptron and high-pass filter.  Spectral features provide higher success than prosodic features.  In addition, females compared to males and individuals in 20-29 age interval compared to individuals in 30-35 age interval reflect their emotions more to their voices.  Among the filtering methods obtained in the study, high-pass filtering increased the success of classifier whereas low-pass filtering, band-pass filtering and noise reduction reduced it.

Keywords

References

  1. Altun, Halis, ve Gökhan Polat. 2009. “Boosting Selection of Speech Related Features to Improve Performance of Multi-Class SVMs in Emotion Detection”. Expert Systems with Applications 36 (4): 8197-8203. https://doi.org/10.1016/j.eswa.2008.10.005.
  2. Bänziger, Tanja, Sona Patel, ve Klaus R. Scherer. 2014. “The Role of Perceived Voice and Speech Characteristics in Vocal Emotion Communication”. Journal of Nonverbal Behavior 38 (1): 31-52. https://doi.org/10.1007/s10919-013-0165-x.
  3. Batliner, Anton, Stefan Steidl, Björn Schuller, Dino Seppi, Thurid Vogt, Johannes Wagner, Laurence Devillers, vd. 2011. “Whodunnit – Searching for the Most Important Feature Types Signalling Emotion-Related User States in Speech”. Computer Speech & Language 25 (1): 4-28. https://doi.org/10.1016/j.csl.2009.12.003.
  4. Bayrakdar, Sümeyye, Devrim Akgün, ve İbrahim Yücedağ. 2017. “Video dosyaları üzerinde yüz ifade analizi için hızlandırılmış bir yaklaşım.” Pamukkale University Journal of Engineering Sciences 23 (5).
  5. Boersma, Paul, ve David Weenink. 2010. Praat: doing phonetics by computer [Computer program], Version 5.1. 44.
  6. Boll, Steven F. 1979. “Suppression of acoustic noise in speech using spectral subtraction”. Acoustics, Speech and Signal Processing, IEEE Transactions on 27 (2): 113–120.
  7. Burkhardt, Felix, Astrid Paeschke, Miriam Rolfes, Walter F. Sendlmeier, ve Benjamin Weiss. 2005. “A database of German emotional speech.” Içinde Interspeech, 5:1517–1520. https://www.kw.tu-berlin.de/fileadmin/a01311100/A_Database_of_German_Emotional_Speech_-_Burkhardt_01.pdf.
  8. Chen, Lijiang, Xia Mao, Pengfei Wei, Yuli Xue, ve Mitsuru Ishizuka. 2012. “Mandarin Emotion Recognition Combining Acoustic and Emotional Point Information”. Applied Intelligence 37 (4): 602-12. https://doi.org/10.1007/s10489-012-0352-1.

Details

Primary Language

Turkish

Subjects

-

Journal Section

Research Article

Publication Date

March 15, 2019

Submission Date

December 18, 2018

Acceptance Date

January 14, 2019

Published in Issue

Year 1970 Volume: 10 Number: 1

IEEE
[1]T. Özseven, “Konuşma Tabanlı Duygu Tanımada Ön İşleme ve Öznitelik Seçim Yöntemlerinin Etkisi”, DUJE, vol. 10, no. 1, pp. 99–112, Mar. 2019, doi: 10.24012/dumf.498727.

Cited By