The Impact of Pre-processing and Feature Selection Methods for Speech Emotion Recognition
Abstract
Speech emotion recognition uses features obtained from digital signal processing and digitized sound signal. All of the features extracted from the speech can be handled under one dimension as well as grouped in terms of dimensional or structure. In this study, the effects of feature selection and preprocessing methods on emotion detection were investigated. For this purpose, EMO-DB data set and three different classifiers are used. According to the results obtained, the highest success was achieved with 90.3% with multi-layer perceptron and high-pass filter. Spectral features provide higher success than prosodic features. In addition, females compared to males and individuals in 20-29 age interval compared to individuals in 30-35 age interval reflect their emotions more to their voices. Among the filtering methods obtained in the study, high-pass filtering increased the success of classifier whereas low-pass filtering, band-pass filtering and noise reduction reduced it.
Keywords
References
- Altun, Halis, ve Gökhan Polat. 2009. “Boosting Selection of Speech Related Features to Improve Performance of Multi-Class SVMs in Emotion Detection”. Expert Systems with Applications 36 (4): 8197-8203. https://doi.org/10.1016/j.eswa.2008.10.005.
- Bänziger, Tanja, Sona Patel, ve Klaus R. Scherer. 2014. “The Role of Perceived Voice and Speech Characteristics in Vocal Emotion Communication”. Journal of Nonverbal Behavior 38 (1): 31-52. https://doi.org/10.1007/s10919-013-0165-x.
- Batliner, Anton, Stefan Steidl, Björn Schuller, Dino Seppi, Thurid Vogt, Johannes Wagner, Laurence Devillers, vd. 2011. “Whodunnit – Searching for the Most Important Feature Types Signalling Emotion-Related User States in Speech”. Computer Speech & Language 25 (1): 4-28. https://doi.org/10.1016/j.csl.2009.12.003.
- Bayrakdar, Sümeyye, Devrim Akgün, ve İbrahim Yücedağ. 2017. “Video dosyaları üzerinde yüz ifade analizi için hızlandırılmış bir yaklaşım.” Pamukkale University Journal of Engineering Sciences 23 (5).
- Boersma, Paul, ve David Weenink. 2010. Praat: doing phonetics by computer [Computer program], Version 5.1. 44.
- Boll, Steven F. 1979. “Suppression of acoustic noise in speech using spectral subtraction”. Acoustics, Speech and Signal Processing, IEEE Transactions on 27 (2): 113–120.
- Burkhardt, Felix, Astrid Paeschke, Miriam Rolfes, Walter F. Sendlmeier, ve Benjamin Weiss. 2005. “A database of German emotional speech.” Içinde Interspeech, 5:1517–1520. https://www.kw.tu-berlin.de/fileadmin/a01311100/A_Database_of_German_Emotional_Speech_-_Burkhardt_01.pdf.
- Chen, Lijiang, Xia Mao, Pengfei Wei, Yuli Xue, ve Mitsuru Ishizuka. 2012. “Mandarin Emotion Recognition Combining Acoustic and Emotional Point Information”. Applied Intelligence 37 (4): 602-12. https://doi.org/10.1007/s10489-012-0352-1.
Details
Primary Language
Turkish
Subjects
-
Journal Section
Research Article
Authors
Turgut Özseven
*
0000-0002-6325-461X
Türkiye
Publication Date
March 15, 2019
Submission Date
December 18, 2018
Acceptance Date
January 14, 2019
Published in Issue
Year 1970 Volume: 10 Number: 1
Cited By
Konuşmadan Duygu Tanıma Üzerine Detaylı bir İnceleme: Özellikler ve Sınıflandırma Metotları
European Journal of Science and Technology
https://doi.org/10.31590/ejosat.1039403