Speech, which is one of the most effective methods of communication, varies according to the emotions experienced by people and includes not only vocabulary but also information about emotions. With developing technologies, human-machine interaction is also improving. Emotional information to be extracted from voice signals is valuable for this interaction. For these reasons, studies on emotion recognition systems are increasing. In this study, sentiment analysis is performed using the Toronto Emotional Speech Set (TESS) created by University of Toronto. The voice data in the dataset is first preprocessed and then a new CNN-based deep learning method on it is compared. The voice files in the TESS dataset have been first obtained feature maps using the MFCC method, and then classification has been performed with this method based on the proposed neural network model. Separate models have been created with CNN and LSTM models for the classification process. The experiments show that the MFCC-applied CNN model achieves a better result with an accuracy of 99.5% than the existing methods for the classification of voice signals. The accuracy value of the CNN model shows that the proposed CNN model can be used for emotion classification from human voice data.
Emotion recognition system Mel frequency cepstral coefficients (MFCCs) Deep learning Signal processing
Primary Language | English |
---|---|
Subjects | Computer Software, Electrical Engineering (Other) |
Journal Section | Research Articles |
Authors | |
Early Pub Date | June 5, 2024 |
Publication Date | April 20, 2024 |
Submission Date | October 9, 2023 |
Acceptance Date | March 14, 2024 |
Published in Issue | Year 2024 |