Deep Learning-Based Speech Emotion Recognition for IoT Edge Devices: A Comparative Study
Öz
With advancements in artificial intelligence (AI), particularly in pattern recognition, significant progress has been made in recognising human emotions from speech characteristics, facial activity, and physiological responses. However, the expansion of Internet of Things (IoT)-based infrastructures has increased pressure on conventional cloud systems due to the high volume of transmitted data and the need for real-time responsiveness. As a remedy, edge computing has emerged as a distributed alternative, enabling localised data processing and reducing dependency on remote servers. In this context, the present study evaluates the classification performance of three hybrid deep learning (DL) models—Convolutional Neural Network–Dense Neural Network (CNN-Dense), Long Short-Term Memory–Convolutional Neural Network (LSTM-CNN), and Dense–Long Short-Term Memory (Dense-LSTM) —within a simulated edge-based environment. The Toronto Emotional Speech Set (TESS) dataset was employed, and experimental workflows were implemented via Amazon Web Services (AWS) to simulate edge resource limitations. Accuracy was assessed using macro-averaged metrics, including precision, recall, and F1-score. Among the models, CNN-Dense showed the highest performance, achieving an F1-score of 96%, followed by LSTM-CNN (95%) and Dense-LSTM (93%). The findings suggest that CNN–Dense may offer feature extraction advantages, and that hybrid models could be promising for emotion classification in decentralised systems.
Anahtar Kelimeler
Kaynakça
- [1] Bonomi, F., Milito, R., Zhu, J., Addepalli, S., “Fog computing and its role in the internet of things” In Proceedings of the first edition of the MCC workshop on Mobile cloud computing (pp. 13-16), (2012).
- [2] Al-Turjman, F.,”Intelligence and security in big 5G-oriented IoNT: An overview”, Future Generation Computer Systems, 102, 357-368, (2022).
- [3] Mavaddati, S.,” Voice-based age, gender, and language recognition based on ResNet deep model and transfer learning in spectro-temporal domain” Neurocomputing, 580, 127429, (2024).
- [4] Nonis, F., Dagnes, N., Marcolin, F., Vezzetti, E., “3D approaches and challenges in facial expression recognition algorithms-a literature review” Applied Sciences, 9(18), 3904 ,(2019).
- [5] Vasdev, D., Gupta, V., Shubham, S., Chaudhary, A., Jain, N., Salimi, M., Ahmadian, A.,” Periapical dental X-ray image classification using deep neural networks“, Annals of Operations Research, (2022).
- [6] Yang, Z., Li, Z., Zhou, S., Zhang, L., Serikawa, S., “Speech emotion recognition based on multi-feature speed rate and LSTM”, Neurocomputing, 601, 128177, (2024).
- [7] Pitaloka, D. A., Wulandari, A., Basaruddin, T., Liliana, D. Y.,” Enhancing CNN with preprocessing stage in automatic emotion recognition” Procedia computer science, 116, 523-529, (2017).
- [8] Ng, H. W., Nguyen, V. D., Vonikakis, V., Winkler, S.,” Deep learning for emotion recognition on small datasets using transfer learning” In Proceedings of the 2015 ACM on international conference on multimodal interaction (pp. 443-449), (2015).
Ayrıntılar
Birincil Dil
İngilizce
Konular
Derin Öğrenme, Konuşma Tanıma
Bölüm
Araştırma Makalesi
Yazarlar
Buket İşler
*
0000-0002-9393-9564
Türkiye
Erken Görünüm Tarihi
28 Eylül 2025
Yayımlanma Tarihi
21 Nisan 2026
Gönderilme Tarihi
30 Haziran 2025
Kabul Tarihi
12 Eylül 2025
Yayımlandığı Sayı
Yıl 2026 Cilt: 29 Sayı: 4