A Modified MFCC-Based Deep Learning Method for Emotion Classification from Speech

Fatih Şengül; Sıtkı Akkaya

doi:10.35860/iarej.1373333

Research Article

A Modified MFCC-Based Deep Learning Method for Emotion Classification from Speech

Year 2024, Volume: 8 Issue: 1, 33 - 42, 20.04.2024

Fatih Şengül , Sıtkı Akkaya

https://doi.org/10.35860/iarej.1373333

Cited By: 3

Abstract

Speech, which is one of the most effective methods of communication, varies according to the emotions experienced by people and includes not only vocabulary but also information about emotions. With developing technologies, human-machine interaction is also improving. Emotional information to be extracted from voice signals is valuable for this interaction. For these reasons, studies on emotion recognition systems are increasing. In this study, sentiment analysis is performed using the Toronto Emotional Speech Set (TESS) created by University of Toronto. The voice data in the dataset is first preprocessed and then a new CNN-based deep learning method on it is compared. The voice files in the TESS dataset have been first obtained feature maps using the MFCC method, and then classification has been performed with this method based on the proposed neural network model. Separate models have been created with CNN and LSTM models for the classification process. The experiments show that the MFCC-applied CNN model achieves a better result with an accuracy of 99.5% than the existing methods for the classification of voice signals. The accuracy value of the CNN model shows that the proposed CNN model can be used for emotion classification from human voice data.

Keywords

Emotion recognition system , Mel frequency cepstral coefficients (MFCCs) , Deep learning , Signal processing

References

1. Liu, K., Wang, D., Wu, D., Liu, Y., and Feng, J., Speech emotion recognition via multi-level attention network. IEEE Signal Processing Letters, 2022. 29: p. 2278-2282.
2. Venkataramanan, K., and Rajamohan, H. R., Emotion recognition from speech. arXiv preprint, 2019. p. 1912-10458.
3. Aydin, M., Tuğrul, B., and Yilmaz, A. R., Emotion Recognition System from Speech using Convolutional Neural Networks. Computer Science, 2022. p. 137-143.
4. Xu, Y., English speech recognition and evaluation of pronunciation quality using deep learning. Mobile Information Systems, 2022. p. 1-12.
5. Donuk, K., and Hanbay, D., Konuşma Duygu Tanıma için Akustik Özelliklere Dayalı LSTM Tabanlı Bir Yaklaşım. Computer Science, 2022. 7(2): p. 54-67.
6. Akinpelu, S., and Viriri, S., Robust Feature Selection-Based Speech Emotion Classification Using Deep Transfer Learning. Applied Sciences, 2022. 12(16): 8265.
7. Patel, N., Patel, S., and Mankad, S. H., Impact of autoencoder based compact representation on emotion detection from audio. Journal of Ambient Intelligence and Humanized Computing, 2022. p. 1-19.
8. Asiya, U. A., and Kiran, V. K., Speech Emotion Recognition-A Deep Learning Approach, in Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC). 2021. Palladam, India: p. 867-871.
9. Gokilavani, M., Katakam, H., Basheer, S. A., and Srinivas, P. V. V. S., Ravdness, crema-d, tess based algorithm for emotion recognition using speech, in 2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT). 2022. Tirunelveli, India: p. 1625-1631.
10. Pichora-Fuller, M. K., and Dupuis, K., Toronto emotional speech set (TESS). Scholars Portal Dataverse. 2020. 1.
11. Sun, C., Li, H., and Ma, L., Speech emotion recognition based on improved masking EMD and convolutional recurrent neural network. Frontiers in Psychology, 2023. 13: 1075624.
12. Zhang, C., Mousavi, A. A., Masri, S. F., Gholipour, G., Yan, K., and Li, X., Vibration feature extraction using signal processing techniques for structural health monitoring: A review. Mechanical Systems and Signal Processing, 2022. 177: 109175.
13. Zhang, Y., and Zheng, X., Development of Image Processing Based on Deep Learning Algorithm, in 2022 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC). 2022. Dalian, China: p. 1226-1228.
14. Jastrzebska, A., Time series classification through visual pattern recognition. Journal of King Saud University-Computer and Information Sciences, 2022. 34(2): p. 134-142.
15. Kop, B. Ş., and Bayindir, L., Bebek Ağlamalarının Makine Öğrenmesi Yöntemleriyle Sınıflandırılması. Avrupa Bilim ve Teknoloji Dergisi, 2021. 27: p. 784-791.
16. Davis, S., and Mermelstein, P., Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on acoustics, speech, and signal processing, 1980. 28(4): p. 357-366.
17. Choudhary, R. R., Meena, G., and Mohbey, K. K., Speech emotion-based sentiment recognition using deep neural networks, in Journal of Physics: Conference Series. IOP Publishing. 2022. p. 012003.
18. Yıldırım, M., MFCC Yöntemi ve Önerilen Derin Model ile Çevresel Seslerin Otomatik Olarak Sınıflandırılması. Fırat Üniversitesi Mühendislik Bilimleri Dergisi, 2022. 34(1): p. 449-457.
19. Li, F., Liu, M., Zhao, Y., Kong, L., Dong, L., Liu, X., and Hui, M., Feature extraction and classification of heart sound using 1D convolutional neural networks. Eurasip Journal on Advances in Signal Processing, 2019. 2019(1): p. 1-11.
20. Maharana, K., Mondal, S., and Nemade, B., A review: Data pre-processing and data augmentation techniques. Global Transitions Proceedings, 2022. 3(1): p. 91-99.
21. Alpay, Ö., LSTM mimarisi kullanarak USD/TRY fiyat tahmini. Avrupa Bilim ve Teknoloji Dergisi, 2020. p. 452-456.
22. Priyadarshini, I., and Puri, V., Mars weather data analysis using machine learning techniques. Earth Science Informatics, 2021. 14: p. 1885-1898.
23. Adem, K. and Kılıçarslan, S., COVID-19 diagnosis prediction in emergency care patients using convolutional neural network. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, 2021. 21(2): p. 300-309.
24. Liu, S., and Chen, M., Wire Rope Defect Recognition Method Based on MFL Signal Analysis and 1D-CNNs. Sensors, 2023. 23(7): p. 3366.
25. Christgau, S., and Steinke, T., Porting a legacy cuda stencil code to oneapi, in 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2020. New Orleans, LA, USA: p. 359-367.

Year 2024, Volume: 8 Issue: 1, 33 - 42, 20.04.2024

Fatih Şengül , Sıtkı Akkaya

https://doi.org/10.35860/iarej.1373333

Cited By: 3

Abstract

References

1. Liu, K., Wang, D., Wu, D., Liu, Y., and Feng, J., Speech emotion recognition via multi-level attention network. IEEE Signal Processing Letters, 2022. 29: p. 2278-2282.
2. Venkataramanan, K., and Rajamohan, H. R., Emotion recognition from speech. arXiv preprint, 2019. p. 1912-10458.
3. Aydin, M., Tuğrul, B., and Yilmaz, A. R., Emotion Recognition System from Speech using Convolutional Neural Networks. Computer Science, 2022. p. 137-143.
4. Xu, Y., English speech recognition and evaluation of pronunciation quality using deep learning. Mobile Information Systems, 2022. p. 1-12.
5. Donuk, K., and Hanbay, D., Konuşma Duygu Tanıma için Akustik Özelliklere Dayalı LSTM Tabanlı Bir Yaklaşım. Computer Science, 2022. 7(2): p. 54-67.
6. Akinpelu, S., and Viriri, S., Robust Feature Selection-Based Speech Emotion Classification Using Deep Transfer Learning. Applied Sciences, 2022. 12(16): 8265.
7. Patel, N., Patel, S., and Mankad, S. H., Impact of autoencoder based compact representation on emotion detection from audio. Journal of Ambient Intelligence and Humanized Computing, 2022. p. 1-19.
8. Asiya, U. A., and Kiran, V. K., Speech Emotion Recognition-A Deep Learning Approach, in Fifth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC). 2021. Palladam, India: p. 867-871.
9. Gokilavani, M., Katakam, H., Basheer, S. A., and Srinivas, P. V. V. S., Ravdness, crema-d, tess based algorithm for emotion recognition using speech, in 2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT). 2022. Tirunelveli, India: p. 1625-1631.
10. Pichora-Fuller, M. K., and Dupuis, K., Toronto emotional speech set (TESS). Scholars Portal Dataverse. 2020. 1.
11. Sun, C., Li, H., and Ma, L., Speech emotion recognition based on improved masking EMD and convolutional recurrent neural network. Frontiers in Psychology, 2023. 13: 1075624.
12. Zhang, C., Mousavi, A. A., Masri, S. F., Gholipour, G., Yan, K., and Li, X., Vibration feature extraction using signal processing techniques for structural health monitoring: A review. Mechanical Systems and Signal Processing, 2022. 177: 109175.
13. Zhang, Y., and Zheng, X., Development of Image Processing Based on Deep Learning Algorithm, in 2022 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC). 2022. Dalian, China: p. 1226-1228.
14. Jastrzebska, A., Time series classification through visual pattern recognition. Journal of King Saud University-Computer and Information Sciences, 2022. 34(2): p. 134-142.
15. Kop, B. Ş., and Bayindir, L., Bebek Ağlamalarının Makine Öğrenmesi Yöntemleriyle Sınıflandırılması. Avrupa Bilim ve Teknoloji Dergisi, 2021. 27: p. 784-791.
16. Davis, S., and Mermelstein, P., Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE transactions on acoustics, speech, and signal processing, 1980. 28(4): p. 357-366.
17. Choudhary, R. R., Meena, G., and Mohbey, K. K., Speech emotion-based sentiment recognition using deep neural networks, in Journal of Physics: Conference Series. IOP Publishing. 2022. p. 012003.
18. Yıldırım, M., MFCC Yöntemi ve Önerilen Derin Model ile Çevresel Seslerin Otomatik Olarak Sınıflandırılması. Fırat Üniversitesi Mühendislik Bilimleri Dergisi, 2022. 34(1): p. 449-457.
19. Li, F., Liu, M., Zhao, Y., Kong, L., Dong, L., Liu, X., and Hui, M., Feature extraction and classification of heart sound using 1D convolutional neural networks. Eurasip Journal on Advances in Signal Processing, 2019. 2019(1): p. 1-11.
20. Maharana, K., Mondal, S., and Nemade, B., A review: Data pre-processing and data augmentation techniques. Global Transitions Proceedings, 2022. 3(1): p. 91-99.
21. Alpay, Ö., LSTM mimarisi kullanarak USD/TRY fiyat tahmini. Avrupa Bilim ve Teknoloji Dergisi, 2020. p. 452-456.
22. Priyadarshini, I., and Puri, V., Mars weather data analysis using machine learning techniques. Earth Science Informatics, 2021. 14: p. 1885-1898.
23. Adem, K. and Kılıçarslan, S., COVID-19 diagnosis prediction in emergency care patients using convolutional neural network. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, 2021. 21(2): p. 300-309.
24. Liu, S., and Chen, M., Wire Rope Defect Recognition Method Based on MFL Signal Analysis and 1D-CNNs. Sensors, 2023. 23(7): p. 3366.
25. Christgau, S., and Steinke, T., Porting a legacy cuda stencil code to oneapi, in 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2020. New Orleans, LA, USA: p. 359-367.

There are 25 citations in total.

Details

Primary Language	English
Subjects	Computer Software, Electrical Engineering (Other)
Journal Section	Research Article
Authors	Fatih Şengül 0000-0001-5865-7476 Sıtkı Akkaya 0000-0002-3257-7838
Submission Date	October 9, 2023
Acceptance Date	March 14, 2024
Early Pub Date	June 5, 2024
Publication Date	April 20, 2024
Published in Issue	Year 2024 Volume: 8 Issue: 1

Cite

APA	Şengül, F., & Akkaya, S. (2024). A Modified MFCC-Based Deep Learning Method for Emotion Classification from Speech. International Advanced Researches and Engineering Journal, 8(1), 33-42. https://doi.org/10.35860/iarej.1373333
AMA	1.Şengül F, Akkaya S. A Modified MFCC-Based Deep Learning Method for Emotion Classification from Speech. Int. Adv. Res. Eng. J. 2024;8(1):33-42. doi:10.35860/iarej.1373333
Chicago	Şengül, Fatih, and Sıtkı Akkaya. 2024. “A Modified MFCC-Based Deep Learning Method for Emotion Classification from Speech”. International Advanced Researches and Engineering Journal 8 (1): 33-42. https://doi.org/10.35860/iarej.1373333.
EndNote	Şengül F, Akkaya S (April 1, 2024) A Modified MFCC-Based Deep Learning Method for Emotion Classification from Speech. International Advanced Researches and Engineering Journal 8 1 33–42.
IEEE	[1]F. Şengül and S. Akkaya, “A Modified MFCC-Based Deep Learning Method for Emotion Classification from Speech”, Int. Adv. Res. Eng. J., vol. 8, no. 1, pp. 33–42, Apr. 2024, doi: 10.35860/iarej.1373333.
ISNAD	Şengül, Fatih - Akkaya, Sıtkı. “A Modified MFCC-Based Deep Learning Method for Emotion Classification from Speech”. International Advanced Researches and Engineering Journal 8/1 (April 1, 2024): 33-42. https://doi.org/10.35860/iarej.1373333.
JAMA	1.Şengül F, Akkaya S. A Modified MFCC-Based Deep Learning Method for Emotion Classification from Speech. Int. Adv. Res. Eng. J. 2024;8:33–42.
MLA	Şengül, Fatih, and Sıtkı Akkaya. “A Modified MFCC-Based Deep Learning Method for Emotion Classification from Speech”. International Advanced Researches and Engineering Journal, vol. 8, no. 1, Apr. 2024, pp. 33-42, doi:10.35860/iarej.1373333.
Vancouver	1.Şengül F, Akkaya S. A Modified MFCC-Based Deep Learning Method for Emotion Classification from Speech. Int. Adv. Res. Eng. J. [Internet]. 2024 Apr. 1;8(1):33-42. Available from: https://izlik.org/JA35NG79AU

Cited By

Merger of internet of things and machine learning: The internet of everything sector projects, benefits, and future roles

International Advanced Researches and Engineering Journal

https://doi.org/10.35860/iarej.1467110

Enhancing water quality prediction with artificial intelligence: A hybrid convlstm model for spatio-temporal analysis

International Advanced Researches and Engineering Journal

https://doi.org/10.35860/iarej.1679575

TSFNet: A Temporal–Spectral Fusion Network for advanced speech emotion recognition in medical applications

Artificial Intelligence in Medicine

https://doi.org/10.1016/j.artmed.2025.103279

Article Files

Full Text

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Open Access Journal System - BOAI