Duygu tanımada akustik verilerle derin öğrenme modellerinin karşılaştırılması: LSTM ve DenseNet üzerine bir inceleme

Buket İşler; Fahhreddin Raşit Kiliç

doi:10.24012/dumf.1576811

Research Article

BibTex

RIS

Cite

Duygu tanımada akustik verilerle derin öğrenme modellerinin karşılaştırılması: LSTM ve DenseNet üzerine bir inceleme

Year 2025, Volume: 16 Issue: 1, 59 - 67

Buket İşler , Fahhreddin Raşit Kiliç

Abstract

Ses tanıma teknolojileri, insan-makine etkileşiminde önemli bir rol oynamakta olup, özellikle duygu tanıma sistemleri bu alandaki en kritik uygulamalardan biridir. İnsan davranışlarının daha iyi analiz edilmesi ve çeşitli alanlarda daha duyarlı sistemlerin geliştirilmesi amacıyla kullanılmaktadır. Bu çalışma, sesli duygu tanıma alanında iki farklı derin öğrenme modeli olan Uzun Kısa Süreli Bellek (LSTM) ve Dense modellerini karşılaştırmayı amaçlamaktadır. Ses verileri üzerinde daha etkili duygu tanıma performansı elde edebilmek için farklı derin öğrenme yöntemlerinin nasıl sonuçlar verdiği incelenmiştir. Çalışmada, Emotion Speech Dataset (ESD) kullanılmış ve her iki modelin genel doğruluk oranları karşılaştırılmıştır. Araştırma sonuçları F1-Score kriterine göre, LSTM modelinin %92 genel doğruluk oranına ulaştığı, DenseNet modelinin ise %88 genel doğruluk oranı sağladığı gözlemlenmiştir. Bu bulgular, zamansal verilerle çalışmada başarılı olan LSTM modelinin, duygu tanıma açısından daha üstün performans sergilediğini göstermektedir.

Keywords

Ses tanıma, Duygu tanıma, LSTM, DenseNet

References

[1] Feinberg, TE, Rifkin A, Schaffer C, Walker E. “Facial discrimination and emotional recognition in schizophrenia and affective disorders”. Archives of general psychiatry, 43(3), 276-279, 1986.
[2] Kamble K, Sengupta J. “A comprehensive survey on emotion recognition based on electroencephalograph (EEG) signals”. Multimedia Tools and Applications, 82(18), 27269-27304, 2023.
[3] Cevik F, Kilimci ZH. “Derin öğrenme yöntemleri ve kelime yerleştirme modelleri kullanılarak Parkinson hastalığının duygu analiziyle değerlendirilmesi”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 27(2), 151-161,2020.
[4] Zhao J, Mao X, Chen L. “Speech emotion recognition using deep 1D & 2D CNN LSTM networks”. Biomedical signal processing and control, 47, 312-323, 2019.
[5] Saxena A, Khanna A, Gupta D. “Emotion recognition and detection methods: A comprehensive survey”. Journal of Artificial Intelligence and Systems, 2(1), 53-79, 2020.
[6] Durahim AO, Setirek,ÇA, Özel BB, Kebapçı H. “Türkçe şarkılar için şarkı sözleri üzerinden müzik duygu sınıflandırması”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 24(2), 292-301,2018.
[7] Nonis F, Dagnes N, Marcolin F, Vezzetti E. “3D approaches and challenges in facial expression recognition algorithms-a literature review”. Applied Sciences, 9(18), 3904, 2019.
[8] Vasdev D, Gupta V, Shubham S, Chaudhary A, Jain N, Salimi M, Ahmadian A.” Periapical dental X-ray image classification using deep neural networks”. Annals of Operations Research, 2022.
[9] Ng HW, Nguyen VD, Vonikakis V, Winkler S. “Deep learning for emotion recognition on small datasets using transfer learning”. In Proceedings of the 2015 ACM on international conference on multimodal interaction, pp. 443-449,2015.
[10] Al-Turjman F. “Intelligence and security in big 5G-oriented IoNT: An overview”. Future generation computer systems, 102, 357-368, 2020.
[11] Abdelwahab M, Busso C. “Study of DenseNet network approaches for speech emotion recognition”. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), (pp. 5084-5088). IEEE, 2018.
[12] Gao Z, Wang X, Yang Y, Li Y, Ma K, Chen G. “A channel-fused DenseNet convolutional network for EEG-based emotion recognition”. IEEE Transactions on Cognitive and Developmental Systems, 13(4), 945-954,2020.
[13] Öztürk ÖF, Pashaei E. “Konuşmalardaki duygunun evrişimsel LSTM modeli ile tespiti”. Dicle University Journal of Engineering/Dicle Üniversitesi Mühendislik Dergisi, 12(4),2021.
[14] Chamishka S, Madhavi I, Nawaratne R, Alahakoon D, De Silva D, Chilamkurti N, Nanayakkara V. “A voice-based real-time emotion detection technique using recurrent neural network empowered feature modelling”. Multimedia Tools and Applications, 81(24), 35173-35194,2022.
[15] Oruh J, Viriri S, Adegun A. “Long short-term memory recurrent neural network for automatic speech recognition”. IEEE Access, 10, 30069-30079,2022.
[16] Latif S, Shahid A, Qadir J. “Generative emotional AI for speech emotion recognition: The case for synthetic emotional speech augmentation”. Applied Acoustics, 210, 109425,2023.
[17] Yin T, Dong F, Chen C, Ouyang C, Wang Z, Yang Y. “A Spiking LSTM Accelerator for Automatic Speech Recognition Application Based on FPGA”. Electronics, 13(5), 827,2024.
[18] Yang Z, Li Z, Zhou S, Zhang L, Serikawa S. “Speech emotion recognition based on multi-feature speed rate and LSTM”. Neurocomputing, 601, 128177,2024.
[19] Zhou K, Sisman B, Liu R, Li H. “Seen and unseen emotional style transfer for voice conversion with a new emotional speech dataset”. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 920-924). IEEE, June 2021.
[20] Hochreiter S, Schmidhuber J. “Long short-term memory”. Neural computation, 9(8), 1735-1780, 1997.
[21] Pascanu R, Mikolov T, Bengio Y. “On the difficulty of training recurrent neural networks”. In International conference on machine learning. PMLR 28(3):1310-1318, 2013.
[22] Gers FA, Schmidhuber J, Cummins F. “Learning to forget: Continual prediction with LSTM”. Neural computation, 12(10), 2451-2471,2020.
[23] Krishnamoorthy P, Sathiyanarayanan M, Proença HP. “A novel and secured email classification and emotion detection using hybrid deep neural network”. International Journal of Cognitive Computing in Engineering, 5, 44-57,2024.
[24] Hinton G, Deng L, Yu D, Dahl GE, Mohamed AR, Jaitly N, ... Kingsbury B. “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups”. IEEE Signal processing magazine, 29(6), 82-97,2012.
[25] Dahl GE, Sainath TN, Hinton GE. “Improving deep neural networks for LVCSR using rectified linear units and dropout”. In 2013 IEEE international conference on acoustics, speech and signal processing, 8609-8613, 2013.
[26] Graves A, Mohamed AR, Hinton G. “Speech recognition with deep recurrent neural networks”. In 2013 IEEE international conference on acoustics, speech and signal processing, 6645-6649, 2013.
[27] Yu D, Deng L. Automatic speech recognition (Vol. 1). Berlin: Springer,2016.
[28] Mai S, Xing S, Hu H. “Analyzing multimodal sentiment via acoustic-and visual-LSTM with channel-aware temporal convolution network”. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 1424-1437, 2021.
[29] Çam NB, Dönmez İ, Bitikçioğlu ÖF, Yediparmak FB, Bektaş E, Haklıdır M.” Multimodal Speech Emotion and Text Sentiment Analysis”. In 2023 8th International Conference on Computer Science and Engineering (UBMK) (pp. 157-162), September 2023.
[30] Öztürk, Ö. F., Pashaei, E. (2021). Konuşmalardaki duygunun evrişimsel LSTM modeli ile tespiti. Dicle Üniversitesi Mühendislik Fakültesi Mühendislik Dergisi, 12(4), 581-589.
[31] Olthof AW, van Ooijen PM, Cornelissen LJ. “Deep learning-based natural language processing in radiology: the impact of report complexity, disease prevalence, dataset size, and algorithm type on model performance”. Journal of medical systems, 45(10), 91,2021.
[32] LeCun Y, Bengio Y, Hinton G. “Deep learning. Nature”, 521, 436–444,2015.
[33] Zhou, K., Sisman, B., Liu, R., & Li, H. (2022). Emotional voice conversion: Theory, databases and ESD. Speech Communication, 137, 1-18.
[34] Uthiraa, S., Vora, A., Bonde, P., Pusuluri, A., Patil, H. A. (2024, July). Spectral and Pitch Components of CQT Spectrum for Emotion Recognition. In 2024 International Conference on Signal Processing and Communications (SPCOM) (pp. 1-5). IEEE.
[35] Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X. Gool LV. “Temporal Segment Networks for Action Recognition in Videos”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41, 2740-2755, 2019.

Year 2025, Volume: 16 Issue: 1, 59 - 67

Buket İşler , Fahhreddin Raşit Kiliç

Abstract

References

[1] Feinberg, TE, Rifkin A, Schaffer C, Walker E. “Facial discrimination and emotional recognition in schizophrenia and affective disorders”. Archives of general psychiatry, 43(3), 276-279, 1986.
[2] Kamble K, Sengupta J. “A comprehensive survey on emotion recognition based on electroencephalograph (EEG) signals”. Multimedia Tools and Applications, 82(18), 27269-27304, 2023.
[3] Cevik F, Kilimci ZH. “Derin öğrenme yöntemleri ve kelime yerleştirme modelleri kullanılarak Parkinson hastalığının duygu analiziyle değerlendirilmesi”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 27(2), 151-161,2020.
[4] Zhao J, Mao X, Chen L. “Speech emotion recognition using deep 1D & 2D CNN LSTM networks”. Biomedical signal processing and control, 47, 312-323, 2019.
[5] Saxena A, Khanna A, Gupta D. “Emotion recognition and detection methods: A comprehensive survey”. Journal of Artificial Intelligence and Systems, 2(1), 53-79, 2020.
[6] Durahim AO, Setirek,ÇA, Özel BB, Kebapçı H. “Türkçe şarkılar için şarkı sözleri üzerinden müzik duygu sınıflandırması”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 24(2), 292-301,2018.
[7] Nonis F, Dagnes N, Marcolin F, Vezzetti E. “3D approaches and challenges in facial expression recognition algorithms-a literature review”. Applied Sciences, 9(18), 3904, 2019.
[8] Vasdev D, Gupta V, Shubham S, Chaudhary A, Jain N, Salimi M, Ahmadian A.” Periapical dental X-ray image classification using deep neural networks”. Annals of Operations Research, 2022.
[9] Ng HW, Nguyen VD, Vonikakis V, Winkler S. “Deep learning for emotion recognition on small datasets using transfer learning”. In Proceedings of the 2015 ACM on international conference on multimodal interaction, pp. 443-449,2015.
[10] Al-Turjman F. “Intelligence and security in big 5G-oriented IoNT: An overview”. Future generation computer systems, 102, 357-368, 2020.
[11] Abdelwahab M, Busso C. “Study of DenseNet network approaches for speech emotion recognition”. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), (pp. 5084-5088). IEEE, 2018.
[12] Gao Z, Wang X, Yang Y, Li Y, Ma K, Chen G. “A channel-fused DenseNet convolutional network for EEG-based emotion recognition”. IEEE Transactions on Cognitive and Developmental Systems, 13(4), 945-954,2020.
[13] Öztürk ÖF, Pashaei E. “Konuşmalardaki duygunun evrişimsel LSTM modeli ile tespiti”. Dicle University Journal of Engineering/Dicle Üniversitesi Mühendislik Dergisi, 12(4),2021.
[14] Chamishka S, Madhavi I, Nawaratne R, Alahakoon D, De Silva D, Chilamkurti N, Nanayakkara V. “A voice-based real-time emotion detection technique using recurrent neural network empowered feature modelling”. Multimedia Tools and Applications, 81(24), 35173-35194,2022.
[15] Oruh J, Viriri S, Adegun A. “Long short-term memory recurrent neural network for automatic speech recognition”. IEEE Access, 10, 30069-30079,2022.
[16] Latif S, Shahid A, Qadir J. “Generative emotional AI for speech emotion recognition: The case for synthetic emotional speech augmentation”. Applied Acoustics, 210, 109425,2023.
[17] Yin T, Dong F, Chen C, Ouyang C, Wang Z, Yang Y. “A Spiking LSTM Accelerator for Automatic Speech Recognition Application Based on FPGA”. Electronics, 13(5), 827,2024.
[18] Yang Z, Li Z, Zhou S, Zhang L, Serikawa S. “Speech emotion recognition based on multi-feature speed rate and LSTM”. Neurocomputing, 601, 128177,2024.
[19] Zhou K, Sisman B, Liu R, Li H. “Seen and unseen emotional style transfer for voice conversion with a new emotional speech dataset”. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 920-924). IEEE, June 2021.
[20] Hochreiter S, Schmidhuber J. “Long short-term memory”. Neural computation, 9(8), 1735-1780, 1997.
[21] Pascanu R, Mikolov T, Bengio Y. “On the difficulty of training recurrent neural networks”. In International conference on machine learning. PMLR 28(3):1310-1318, 2013.
[22] Gers FA, Schmidhuber J, Cummins F. “Learning to forget: Continual prediction with LSTM”. Neural computation, 12(10), 2451-2471,2020.
[23] Krishnamoorthy P, Sathiyanarayanan M, Proença HP. “A novel and secured email classification and emotion detection using hybrid deep neural network”. International Journal of Cognitive Computing in Engineering, 5, 44-57,2024.
[24] Hinton G, Deng L, Yu D, Dahl GE, Mohamed AR, Jaitly N, ... Kingsbury B. “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups”. IEEE Signal processing magazine, 29(6), 82-97,2012.
[25] Dahl GE, Sainath TN, Hinton GE. “Improving deep neural networks for LVCSR using rectified linear units and dropout”. In 2013 IEEE international conference on acoustics, speech and signal processing, 8609-8613, 2013.
[26] Graves A, Mohamed AR, Hinton G. “Speech recognition with deep recurrent neural networks”. In 2013 IEEE international conference on acoustics, speech and signal processing, 6645-6649, 2013.
[27] Yu D, Deng L. Automatic speech recognition (Vol. 1). Berlin: Springer,2016.
[28] Mai S, Xing S, Hu H. “Analyzing multimodal sentiment via acoustic-and visual-LSTM with channel-aware temporal convolution network”. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 1424-1437, 2021.
[29] Çam NB, Dönmez İ, Bitikçioğlu ÖF, Yediparmak FB, Bektaş E, Haklıdır M.” Multimodal Speech Emotion and Text Sentiment Analysis”. In 2023 8th International Conference on Computer Science and Engineering (UBMK) (pp. 157-162), September 2023.
[30] Öztürk, Ö. F., Pashaei, E. (2021). Konuşmalardaki duygunun evrişimsel LSTM modeli ile tespiti. Dicle Üniversitesi Mühendislik Fakültesi Mühendislik Dergisi, 12(4), 581-589.
[31] Olthof AW, van Ooijen PM, Cornelissen LJ. “Deep learning-based natural language processing in radiology: the impact of report complexity, disease prevalence, dataset size, and algorithm type on model performance”. Journal of medical systems, 45(10), 91,2021.
[32] LeCun Y, Bengio Y, Hinton G. “Deep learning. Nature”, 521, 436–444,2015.
[33] Zhou, K., Sisman, B., Liu, R., & Li, H. (2022). Emotional voice conversion: Theory, databases and ESD. Speech Communication, 137, 1-18.
[34] Uthiraa, S., Vora, A., Bonde, P., Pusuluri, A., Patil, H. A. (2024, July). Spectral and Pitch Components of CQT Spectrum for Emotion Recognition. In 2024 International Conference on Signal Processing and Communications (SPCOM) (pp. 1-5). IEEE.
[35] Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X. Gool LV. “Temporal Segment Networks for Action Recognition in Videos”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41, 2740-2755, 2019.

There are 35 citations in total.

Details

Primary Language	Turkish
Subjects	Audio Processing
Journal Section	Articles
Authors	Buket İşler 0000-0002-9393-9564 Fahhreddin Raşit Kiliç 0009-0001-2099-3279
Early Pub Date	March 26, 2025
Publication Date
Submission Date	October 31, 2024
Acceptance Date	February 2, 2025
Published in Issue	Year 2025 Volume: 16 Issue: 1

Cite

IEEE	B. İşler and F. R. Kiliç, “Duygu tanımada akustik verilerle derin öğrenme modellerinin karşılaştırılması: LSTM ve DenseNet üzerine bir inceleme”, DUJE, vol. 16, no. 1, pp. 59–67, 2025, doi: 10.24012/dumf.1576811.

Download Cover Image

Article Files

Full Text

DUJE tarafından yayınlanan tüm makaleler, Creative Commons Atıf 4.0 Uluslararası Lisansı ile lisanslanmıştır. Bu, orijinal eser ve kaynağın uygun şekilde belirtilmesi koşuluyla, herkesin eseri kopyalamasına, yeniden dağıtmasına, yeniden düzenlemesine, iletmesine ve uyarlamasına izin verir. 24456