Deep Learning-Based Speech Emotion Recognition for IoT Edge Devices: A Comparative Study

Buket İşler

doi:10.2339/politeknik.1729678

TR EN

Nesnelerin İnterneti (IoT) Uç Birimleri İçin Derin Öğrenme Tabanlı Konuşma Duygusu Tanıma: Karşılaştırmalı Bir Çalışma

Öz

Yapay zekâ (YZ) alanındaki, özellikle örüntü tanıma odaklı ilerlemeler sayesinde, konuşma özellikleri, yüz hareketleri ve fizyolojik tepkiler aracılığıyla insan duygularının tanınmasında kayda değer gelişmeler sağlanmıştır. Bununla birlikte, Nesnelerin İnterneti (IoT) tabanlı altyapıların yaygınlaşması, iletilen veri hacminin büyüklüğü ve gerçek-zamanlı tepki gereksinimi nedeniyle geleneksel bulut sistemleri üzerinde artan bir baskı oluşturmuştur. Bu soruna çözüm olarak, verinin yerel düzeyde işlenmesine olanak tanıyan ve uzak sunuculara bağımlılığı azaltan dağınık bir paradigma olan uç bilişim ortaya konmuştur. Bu bağlamda, mevcut çalışma, üç hibrit derin öğrenme (DL) mimarisinin—Evrişimli Sinir Ağı-Yoğun Bağlantılı Sinir Ağı (CNN–Dense), Uzun Kısa Süreli Bellek-Evrişimli Sinir Ağı (LSTM–CNN) ve Yoğun Bağlantılı Sinir Ağı-Uzun Kısa Süreli Bellek (Dense–LSTM) —sınıflandırma performansını simüle edilmiş bir uç ortamında değerlendirmektedir. Toronto Emotional Speech Set (TESS) veri kümesi kullanılmış, uç kaynak kısıtlarını yansıtmak amacıyla deneysel iş akışları Amazon Web Services (AWS) üzerinde yürütülmüştür. Doğruluk, makro-ortalama kesinlik (precision), duyarlılık (recall) ve F1 skoru dâhil olmak üzere makro-ortalama metrikler aracılığıyla ölçüm yapılmıştır. Modeller arasında, CNN–Dense mimarisi 0,96 F1 skoru ile en yüksek performansı sergilemiş; LSTM–CNN 0,95, Dense–LSTM ise 0,93 F1 skoru elde etmiştir. ulgular, CNN–Dense modelinin özellik çıkarımı açısından avantaj sağlayabileceğini ve hibrit modellerin merkezî olmayan sistemlerde duygu sınıflandırması için umut verici olabileceğini göstermektedir.

Anahtar Kelimeler

Deep Learning-Based Speech Emotion Recognition for IoT Edge Devices: A Comparative Study

Öz

With advancements in artificial intelligence (AI), particularly in pattern recognition, significant progress has been made in recognising human emotions from speech characteristics, facial activity, and physiological responses. However, the expansion of Internet of Things (IoT)-based infrastructures has increased pressure on conventional cloud systems due to the high volume of transmitted data and the need for real-time responsiveness. As a remedy, edge computing has emerged as a distributed alternative, enabling localised data processing and reducing dependency on remote servers. In this context, the present study evaluates the classification performance of three hybrid deep learning (DL) models—Convolutional Neural Network–Dense Neural Network (CNN-Dense), Long Short-Term Memory–Convolutional Neural Network (LSTM-CNN), and Dense–Long Short-Term Memory (Dense-LSTM) —within a simulated edge-based environment. The Toronto Emotional Speech Set (TESS) dataset was employed, and experimental workflows were implemented via Amazon Web Services (AWS) to simulate edge resource limitations. Accuracy was assessed using macro-averaged metrics, including precision, recall, and F1-score. Among the models, CNN-Dense showed the highest performance, achieving an F1-score of 96%, followed by LSTM-CNN (95%) and Dense-LSTM (93%). The findings suggest that CNN–Dense may offer feature extraction advantages, and that hybrid models could be promising for emotion classification in decentralised systems.

Anahtar Kelimeler

Kaynakça

[1] Bonomi, F., Milito, R., Zhu, J., Addepalli, S., “Fog computing and its role in the internet of things” In Proceedings of the first edition of the MCC workshop on Mobile cloud computing (pp. 13-16), (2012).
[2] Al-Turjman, F.,”Intelligence and security in big 5G-oriented IoNT: An overview”, Future Generation Computer Systems, 102, 357-368, (2022).
[3] Mavaddati, S.,” Voice-based age, gender, and language recognition based on ResNet deep model and transfer learning in spectro-temporal domain” Neurocomputing, 580, 127429, (2024).
[4] Nonis, F., Dagnes, N., Marcolin, F., Vezzetti, E., “3D approaches and challenges in facial expression recognition algorithms-a literature review” Applied Sciences, 9(18), 3904 ,(2019).
[5] Vasdev, D., Gupta, V., Shubham, S., Chaudhary, A., Jain, N., Salimi, M., Ahmadian, A.,” Periapical dental X-ray image classification using deep neural networks“, Annals of Operations Research, (2022).
[6] Yang, Z., Li, Z., Zhou, S., Zhang, L., Serikawa, S., “Speech emotion recognition based on multi-feature speed rate and LSTM”, Neurocomputing, 601, 128177, (2024).
[7] Pitaloka, D. A., Wulandari, A., Basaruddin, T., Liliana, D. Y.,” Enhancing CNN with preprocessing stage in automatic emotion recognition” Procedia computer science, 116, 523-529, (2017).
[8] Ng, H. W., Nguyen, V. D., Vonikakis, V., Winkler, S.,” Deep learning for emotion recognition on small datasets using transfer learning” In Proceedings of the 2015 ACM on international conference on multimodal interaction (pp. 443-449), (2015).

[9] Varshney, P., Simmhan, Y.,” Demystifying fog computing: Characterizing architectures, applications and abstractions” In 2017 IEEE 1st international conference on fog and edge computing (ICFEC) (pp. 115-124). IEEE, (2017).
[10] Shi, W., Cao, J., Zhang, Q., Li, Y., Xu, L.,” Edge computing: Vision and challenges”, IEEE internet of things journal, 3(5), 637-646. (2016).
[11] Kavitha, T., Malini, S., Senbagavalli, G., “Deep learning for emotion recognition”, In Handbook of Research on Computer Vision and Image Processing in the Deep Learning Era, 56-91. IGI Global., (2023).
[12] Gastaldo, P., Ragusa, E., Dosen, S., Palmieri, F., “Special Issue on integration of machine learning and edge computing for next generation of smart wearable systems”, Future Generation Computer Systems, 107574, (2024).
[13] Calvo, R. A., D'Mello, S.,”Affect detection: An interdisciplinary review of models, methods, and their applications”, IEEE Transactions on affective computing, 1(1), 18-37. (2010).
[14] Poria, S., Cambria, E., Bajpai, R., Hussain, A., “A review of affective computing: From unimodal analysis to multimodal fusion”, Information fusion, 37, 98-125, (2017).
[15] Elkobaisi, M. R., Al Machot, F.,” Human emotion modeling (HEM): an interface for IoT systems”, Journal of Ambient Intelligence and Humanized Computing, 13(8), 4009-4017, (2022).
[16] Kumar, N., Makkar, A., “Machine learning in cognitive IoT.”,CRC Press, (2020).
[17] Gyrard, A., Boudaoud, K., “Interdisciplinary iot and emotion knowledge graph-based recommendation system to boost mental health”, Applied Sciences, 12(19), 9712, (2022).
[18] Chamishka, S., Madhavi, I., Nawaratne, R., Alahakoon, D., De Silva, D., Chilamkurti, N., Nanayakkara, V.,” A voice-based real-time emotion detection technique using recurrent neural network empowered feature modelling”, Multimedia Tools and Applications, 81(24), 35173-35194, (2022).
[19] Oruh, J., Viriri, S., Adegun, A., “Long short-term memory recurrent neural network for automatic speech recognition”, IEEE Access, 10, 30069-30079, (2022).
[20] Yin, T., Dong, F., Chen, C., Ouyang, C., Wang, Z., Yang, Y.,” A Spiking LSTM Accelerator for Automatic Speech Recognition Application Based on FPGA“, Electronics, 13(5), 827, (2024).
[21] Abdelwahab, M., Busso, C., “Study of dense network approaches for speech emotion recognition”, In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP),5084-5088. IEEE, (2018).
[22] Latif, S., Shahid, A., Qadir, J., “Generative emotional AI for speech emotion recognition: The case for synthetic emotional speech augmentation”, Applied Acoustics, 210, 109425, (2023).
[23] Kaur, J., Kumar, A., “Speech emotion recognition using CNN, k-NN, MLP and random forest”, In Computer Networks and Inventive Communication Technologies: Proceedings of Third ICCNCT 2020, 499-509., Springer Singapore, (2021).
[24] Jabbooree, A. I., Khanli, L. M., Salehpour, P., Pourbahrami, S., “A novel facial expression recognition algorithm using geometry β–skeleton in fusion based on deep CNN”, Image and Vision Computing, 134, 104677, (2023).
[25] Qazi, H., Kaushik, B. N., “A hybrid technique using CNN+ LSTM for speech emotion recognition”, International Journal of Engineering and Advanced Technology (IJEAT), 9(5), 1126-1130, (2020).
[26] Chowdhury, J.H., Ramanna, S. Kotecha, K., “Speech emotion recognition with light weight deep neural ensemble model using hand crafted features”, Sci Rep 15, 11824, (2025).
[27] Pichandi, S., Balasubramanian, G. Chakrapani, V.,” Hybrid deep models for parallel feature extraction and enhanced emotion state classification.”, Sci Rep 14, 24957, (2024).
[28] Donuk, K., Arı, A., Özdemir, M. F., Hanbay, D., “Deep feature selection for facial emotion recognition based on BPSO and SVM.” Politeknik Dergisi, 26(1), 131–142, (2023).
[29] Demirel, U., Çam, H., “Investigation of fluctuations in cryptocurrency transactions with sentiment analysis.” Politeknik Dergisi, 28(3), 773–784, (2025).
[30] Kelebekler, E., İnal, M., “Otomobil içindeki cihazların sesle kontrolüne yönelik konuşma tanıma sisteminin gerçek zamanlı laboratuar uygulaması.” Politeknik Dergisi, 11(2), 109-114, (2008).
[31] Turgut, Z., Üstebay, S., Ali Aydın, M., Gürkaş Aydın, G. Z., Sertbaş, A. “ Performance analysis of machine learning and deep learning classification methods for indoor localization in Internet of things environment.” Transactions on emerging telecommunications technologies, 30(9), e3705, (2019).
[32] Marjani, M., Nasaruddin, F., Gani, A., Karim, A., Hashem, I. A. T., Siddiqa, A., Yaqoob, I.,” Big IoT data analytics: architecture, opportunities, and open research challenges.” IEEE Access, 5, 5247-5261, (2017).
[33] Al-Fuqaha, A., Guizani, M., Mohammadi, M., Aledhari, M., Ayyash, M.,” Internet of things: A survey on enabling technologies, protocols, and applications.”, IEEE communications surveys & tutorials, 17(4), 2347-2376, (2015).
[34] Atzori, L., Iera, A., Morabito, G., “The internet of things: A survey.”, Computer networks, 54(15), 2787-2805, (2010).
[35] Kaya, Ş. M., İşler, B., Abu-Mahfouz, A. M., Rasheed, J., AlShammari, A., “An intelligent anomaly detection approach for accurate and reliable weather forecasting at IoT edges: A case study.”, Sensors, 23(5), 2426, (2023).
[36] Kaya, Ş. M., Erdem, A., Güneş, A., “A smart data pre-processing approach to effective management of big health data in IoT edge.”, Smart Homecare Technology and TeleHealth, 9-21, (2021).
[37] O'Connor, P. J., Hill, A., Kaya, M., Martin, B., “The measurement of emotional intelligence: A critical review of the literature and recommendations for researchers and practitioners.”, Frontiers in psychology, 10, 429307, (2019).
[38] Song, Y., Tung, P. H., Jeon, B., “Trends in Artificial Emotional Intelligence Technology and Application.” In 2022 IEEE/ACIS 7th International Conference on Big Data, Cloud Computing, and Data Science (BCD) (pp. 366-370). IEEE, (2022).
[39] Lopes, A. T., De Aguiar, E., De Souza, A. F., Oliveira-Santos, T., “Facial expression recognition with convolutional neural networks: coping with few data and the training sample order.”, Pattern recognition, 61, 610-628, (2017).
[40] Kong, F., “Facial expression recognition method based on deep convolutional neural network combined with improved LBP features.”, Personal and Ubiquitous Computing, 23, 531-539, (2019).
[41] Wen, G., Chang, T., Li, H., Jiang, L.,” Dynamic objectives learning for facial expression recognition.”, IEEE Transactions on Multimedia, 22(11), 2914-2925, (2020).
[42] Rawat, W., Wang, Z., “Deep convolutional neural networks for image classification: A comprehensive review.”, Neural computation, 29(9), 2352-2449, (2017).
[43] Shrestha, G., Das, M., Dey, N., “Plant disease detection using CNN.”, In 2020 IEEE applied signal processing conference (ASPCON) (pp. 109-113). IEEE, (2020).
[44] Krishnamoorthy, P., Sathiyanarayanan, M., Proença, H. P., “A novel and secured email classification and emotion detection using hybrid deep neural network.”, International Journal of Cognitive Computing in Engineering, 5, 44-57, (2024).
[45] Hochreiter, S., Schmidhuber, J., “Long short-term memory. Neural computation, 9(8), 1735-1780.”, (1997).
[46] Pascanu, R., Mikolov, T., Bengio, Y., “On the difficulty of training recurrent neural networks.”, In International conference on machine learning (pp. 1310-1318). Pmlr., (2013).
[47] Gers, F. A., Schmidhuber, J., Cummins, F., “Learning to forget: Continual prediction with LSTM”, Neural computation, 12(10), 2451-2471, (2000).
[48] Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., ... Kingsbury, B.” “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups”, IEEE Signal processing magazine, 29(6), 82-97, (2012).
[49] Dahl, G. E., Sainath, T. N., Hinton, G. E., “Improving deep neural networks for LVCSR using rectified linear units and dropout”, In 2013 IEEE international conference on acoustics, speech and signal processing (pp. 8609-8613). IEEE, (2013).
[50] Graves, A., Mohamed, A. R., Hinton, G., “Speech recognition with deep recurrent neural networks”, In 2013 IEEE international conference on acoustics, speech and signal processing (pp. 6645-6649). IEEE, (2013).
[51] Yu, D., Deng, L., “Automatic speech recognition (Vol. 1)” Berlin: Springer, (2016).
[52] Davis, S., Mermelstein, P., “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”, IEEE transactions on acoustics, speech, and signal processing, 28(4), 357-366, (1980).
[53] Bakır, H., Çayır, A. N., Navruz, T. S., “A comprehensive experimental study for analyzing the effects of data augmentation techniques on voice classification” Multimedia Tools and Applications, 83(6), 17601-17628, (2024).
[54] Liu, Z. T., Wu, M., Cao, W. H., Mao, J. W., Xu, J. P., Tan, G. Z., “Speech emotion recognition based on feature selection and extreme learning machine decision tree” Neurocomputing, 273, 271-280, (2018).
[55] Li, F., Luo, J., Wang, L., Liu, W., Sang, X., “GCF2-Net: Global-aware cross-modal feature fusion network for speech emotion recognition”, Frontiers in Neuroscience, 17, 1183132, (2023).
[56] Wang, J., Xue, M., Culhane, R., Diao, E., Ding, J., Tarokh, V., “Speech emotion recognition with dual-sequence LSTM architecture”, In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6474-6478). IEEE, (2020).
[57] Zhao, R., Jiang, X., Yu, F. R., Leung, V., Wang, T., Zhang, S., “Leveraging Cross-Attention Transformer and Multi-Feature Fusion for Cross-Linguistic Speech Emotion Recognition”, arXiv preprint arXiv:2501.10408, (2025).
[58] Wang, C., Shen, X., “Feature-Enhanced Multi-Task Learning for Speech Emotion Recognition Using Decision Trees and LSTM”, Electronics, 13(14), 2689, (2024).
[59] Oyucu, S., Polat, H., “A language model optimization method for Turkish automatic speech recognition system”, Politeknik Dergisi, 26(3), 1167-1178, (2023).
[60] Mishra, S., Bhatnagar, N., P, P. et al., “Speech emotion recognition and classification using hybrid deep CNN and BiLSTM model”, Multimed Tools Appl 83, 37603–37620, (2024).
[61] Costa, Y. M., Oliveira, L. S., Silla Jr, C. N.,“An evaluation of convolutional neural networks for music classification using spectrograms”, Applied soft computing, 52, 28-38, (2017).
[62] Pandiammal, K. S., Karishma, S., Sakthe, K. H., Manimaran, V., Kalaiselvi, S., Anitha, V., “Emotion Recognition from Speech–an LSTM approach with the Tess Dataset”, In 2024 5th International Conference on Innovative Trends in Information Technology (ICITIIT) (pp. 1-6). IEEE, (2024).
[63] Chaugule, Irfan.,”Enhancing Generalizatıon In Speech Emotıon Recognıtıon: A Comprehensıve Review Of Domaın Adaptatıon And Transfer Learnıng Technıques”, International Journal of Research in Management & Social Science Volume 13, Issue 2 April- June 2025, (2025).
[64] Zhao, Y., Wang, J., Zong, Y., Zheng, W., Lian, H., Zhao, L., “Deep implicit distribution alignment networks for cross-corpus speech emotion recognition”, In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1-5. IEEE, 2023).
[65] Wang, X., Jia, W., “Optimizing edge AI: a comprehensive survey on data, model, and system strategies”, arXiv preprint arXiv:2501.03265, (2025).
[66] Pandelea, V., Hulea, D., Suciu, G., “Real-Time Emotion Recognition System on Jetson Nano Edge Device”, Sensors, 21(13), 4496, (2021).
[67] Lin, S., Li, B., Cook, C., Berisha, V., Liss, J.,” EdgeSpeechNets: Highly efficient deep neural networks for speech recognition on the edge”, arXiv preprint arXiv:1810.08559, (2018).
[68] Wong, A., Famuori, M., Shafiee, M. J., Chwyl, B., “TinySpeech: Attention Condensers for Deep Speech Recognition on Edge Devices”, arXiv preprint arXiv:2008.04245, (2020).
[69] Ozdemır, M. F., Hanbay, D.,” Real-time scalable system for face tracking in multi-camera”, Politeknik Dergisi, 1-1, (2024).

Ayrıntılar

Birincil Dil

İngilizce

Konular

Derin Öğrenme, Konuşma Tanıma

Bölüm

Araştırma Makalesi

Yazarlar

Buket İşler ^*
0000-0002-9393-9564
Türkiye

Erken Görünüm Tarihi

28 Eylül 2025

Yayımlanma Tarihi

21 Nisan 2026

Gönderilme Tarihi

30 Haziran 2025

Kabul Tarihi

12 Eylül 2025

Yayımlandığı Sayı

Yıl 2026 Cilt: 29 Sayı: 4

DOI

https://doi.org/10.2339/politeknik.1729678

IZ

https://izlik.org/JA72TF24TJ

Kaynak Göster

RIS / Bibtex

APA

İşler, B. (2026). Deep Learning-Based Speech Emotion Recognition for IoT Edge Devices: A Comparative Study. Politeknik Dergisi, 29(4), 1-13. https://doi.org/10.2339/politeknik.1729678

AMA

1.İşler B. Deep Learning-Based Speech Emotion Recognition for IoT Edge Devices: A Comparative Study. Politeknik Dergisi. 2026;29(4):1-13. doi:10.2339/politeknik.1729678

Chicago

İşler, Buket. 2026. “Deep Learning-Based Speech Emotion Recognition for IoT Edge Devices: A Comparative Study”. Politeknik Dergisi 29 (4): 1-13. https://doi.org/10.2339/politeknik.1729678.

EndNote

İşler B (01 Nisan 2026) Deep Learning-Based Speech Emotion Recognition for IoT Edge Devices: A Comparative Study. Politeknik Dergisi 29 4 1–13.

IEEE

[1]B. İşler, “Deep Learning-Based Speech Emotion Recognition for IoT Edge Devices: A Comparative Study”, Politeknik Dergisi, c. 29, sy 4, ss. 1–13, Nis. 2026, doi: 10.2339/politeknik.1729678.

ISNAD

İşler, Buket. “Deep Learning-Based Speech Emotion Recognition for IoT Edge Devices: A Comparative Study”. Politeknik Dergisi 29/4 (01 Nisan 2026): 1-13. https://doi.org/10.2339/politeknik.1729678.

JAMA

1.İşler B. Deep Learning-Based Speech Emotion Recognition for IoT Edge Devices: A Comparative Study. Politeknik Dergisi. 2026;29:1–13.

MLA

İşler, Buket. “Deep Learning-Based Speech Emotion Recognition for IoT Edge Devices: A Comparative Study”. Politeknik Dergisi, c. 29, sy 4, Nisan 2026, ss. 1-13, doi:10.2339/politeknik.1729678.

Vancouver

1.Buket İşler. Deep Learning-Based Speech Emotion Recognition for IoT Edge Devices: A Comparative Study. Politeknik Dergisi. 01 Nisan 2026;29(4):1-13. doi:10.2339/politeknik.1729678