Arnavutça Konuşma Verilerini Kullanan Derin Öğrenme Tabanlı Duygu Durum Analizi ve Sınıflandırma
Yıl 2024,
Cilt: 7 Sayı: 2, 30 - 40, 26.12.2024
Bahadir Karasulu
,
Elif Avcı
,
Tesnim Strazimiri
,
Betül Cengiz
Öz
Günümüzde konuşma veya ses verilerinden konuşmacının duygu durumunun analiz edilebildiği derin öğrenme tabanlı yazılımlar sayesinde etkileşimli sesli çağrı yanıtlama sistemleri oluşturulabilmektedir. Çalışmamızda, Arnavutça bir veri kümesi oluşturulmuştur. Ses verilerinin spektral ve duyusal açıdan analizi çeşitli derin öğrenme modelleri kullanılarak gerçekleştirilmiştir. Oluşturulan veri kümesi, dört farklı duygu sınıfını (öfkeli, mutlu, üzgün, şaşkın) içeren Arnavutça konuşma verilerini içermektedir. Sınıflandırma işleminde, evrişimli sinir ağı (ESA) modeli kullanılmıştır. Deneysel sonuçlara göre, Arnavutça duygu durumu sınıflandırma başarımı, alıcı işletim karakteristik (AİK) eğrisi altında kalan alan (EAKA) bazında; öfkeli sınıfı için 0.76, mutlu sınıfı için 1.00, üzgün sınıfı için 1.00 ve şaşkın sınıfı için 0.93 olarak elde edilmiştir. Çalışmaya dair bilimsel bulgular ve tartışmalar da sunulmuştur.
Etik Beyan
Veri kümemizin oluşturulmasında katkı sunan katılımcılar gönüllülük esasına göre deneylere katılmışlardır. Bu çalışmadaki veriler hiçbir yerde daha önce yayınlanmamış veya yayınlanmak üzere bir mecraya gönderilmemiştir.
Teşekkür
Veri kümemizin oluşturulmasında gönüllü olarak katkı sunan değerli katılımcılara teşekkür ederiz.
Kaynakça
- [1]. Sudhanshu K., Partha Pratim R., Debi Prosad D., Byung-Gyu K. "A Comprehensive Review on Sentiment Analysis: Tasks, Approaches and Applications". arXiv Prepr. arXiv: 2311.11250v1, 2023.
- [2]. Goodfellow, I., Bengio, Y., & Courville, A. Deep Learning, USA, MIT Press, 2016.
- [3]. Wenxuan Z., Yue D., Bing L., Sinno Jialin P., Lidong B. “Sentiment Analysis in the Era of Large Language Models: A Reality Check”. arXiv Prepr. arXiv:2305.15005, 2023.
- [4]. Burkhardt F, Paeschke A, Rolfes M, Sendlmeier F, Weiss B. “A Database of German Emotional Speech", 9th European Conference on Speech Communication and Technology, INTERSPEECH 2005 - Eurospeech, 1517-1520, Lisbon, Portugal, 4-8 Eylül, 2005.
- [5]. Livingstone S. R., Russo F. A. “The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English” PLOS ONE, 13(5), .e0196391. 2018.
- [6]. Muraku B., Xiao L., Meçe E. K. "Toward Detection of Fake News Using Sentiment Analysis for Albanian News Articles". Editors: Barolli L., Advances in Internet, Data & Web Technologies, EIDWT, Lecture Notes on Data Engineering and Communications Technologies, 19, 575–585, Springer, Cham, 2024.
- [7]. Çano E., “AlbMoRe: A Corpus of Movie Reviews for Sentiment Analysis in Albanian”. Digital Philology Data Mining and Machine Learning, arXiv Prepr. arXiv:2306.085262023. University of Vienna, Austria, 2023.
- [8]. Vasili, R., Xhina, E., Ninka, I., & Terpo, D. "Sentiment Analysis on Social Media for Albanian Language". OALib, 8(1-31), 2021.
- [9]. McFee B, Raffel C, Liang D, Ellis D, Mcvicar M, Battenberg E, Nieto O. “Librosa: Audio and Music Signal Analysis in Python”. Proceedings of the 14th Python in Science Conference, January 2015.
- [10]. Hunter J.D. “Matplotlib: A 2D Graphics Environment”. Computing in Science & Engineering, 9(3), 90-95, 2007.
- [11]. Clark A. PIL: Python Imaging Library. Pillow (PIL Fork) Documentation, 1999.
- [12]. Tensorflow web sitesi, 2024. https://www.tensorflow.org/
- [13]. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, É. "Scikit-learn: Machine Learning in Python". Journal of Machine Learning Research, 12(85), 2825-2830, 2011.
- [14]. Bisong, E.. "Google Colaboratory". Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners, 59-64, Apress, Berkeley, CA. 2019.
- [15]. Grabas, L.. "NLP-Text Dataset, A balanced dataset with 5 labels joy, sad, anger, fear, and neutral, 3202 unique values". 2024. https://github.com/lukasgarbas/nlp-text-emotion/blob/master/data/data_test.csv
- [16]. He K, Zhang X, Ren S, Sun J. "Deep residual learning for image recognition", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778, Las Vegas, Nevada, USA, 27-30 Haziran 2016.
- [17]. Chollet, F. "Xception: Deep Learning with Depthwise Separable Convolutions". IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- [18]. Powers, D. M. W. "Evaluation: from Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation". arXiv Prepr. arXiv:2010.16061, 2020.
- [19]. Ying, X. "An Overview of Overfitting and its Solutions". Journal of Physics: Conference Series, 1168(2), 022022, 2019.
- [20]. Kadriu, A., Rista, A. "A Model for Albanian Speech Recognition Using End-to-End Deep Learning Techniques". International Journal of Research and Development, 9(3), 2022.
- [21]. Kaçuri, M. Explainability of Hate Speech Classification for Albanian Language Using Rule Based Systems and Neural Networks. Diploma Thesis, Technische Universität Wien. reposiTUm, 2023.
Deep Learning Based Emotional State Analysis and Classification Using Albanian Speech Data
Yıl 2024,
Cilt: 7 Sayı: 2, 30 - 40, 26.12.2024
Bahadir Karasulu
,
Elif Avcı
,
Tesnim Strazimiri
,
Betül Cengiz
Öz
Nowadays, interactive voice call response systems can be built using deep learning-based software that analyzes the speaker's emotional state from speech or audio data. In our study, an Albanian dataset was created. Spectral and emotional analysis of the audio data were performed using various deep learning models. The dataset contains Albanian speech data with four different emotion classes (angry, happy, sad, surprised). A convolutional neural network (CNN) model was used for classification. The developed classification system was also tested with other datasets to verify its accuracy. According to the experimental results, the performance of Albanian emotion classification, based on the Area under the Receiver Operating Characteristics Curve (AUC-ROC), was 0.76 for the angry class, 1.00 for the happy class, 1.00 for the sad class, and 0.93 for the surprised class. Scientific findings and discussions are also included.
Kaynakça
- [1]. Sudhanshu K., Partha Pratim R., Debi Prosad D., Byung-Gyu K. "A Comprehensive Review on Sentiment Analysis: Tasks, Approaches and Applications". arXiv Prepr. arXiv: 2311.11250v1, 2023.
- [2]. Goodfellow, I., Bengio, Y., & Courville, A. Deep Learning, USA, MIT Press, 2016.
- [3]. Wenxuan Z., Yue D., Bing L., Sinno Jialin P., Lidong B. “Sentiment Analysis in the Era of Large Language Models: A Reality Check”. arXiv Prepr. arXiv:2305.15005, 2023.
- [4]. Burkhardt F, Paeschke A, Rolfes M, Sendlmeier F, Weiss B. “A Database of German Emotional Speech", 9th European Conference on Speech Communication and Technology, INTERSPEECH 2005 - Eurospeech, 1517-1520, Lisbon, Portugal, 4-8 Eylül, 2005.
- [5]. Livingstone S. R., Russo F. A. “The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English” PLOS ONE, 13(5), .e0196391. 2018.
- [6]. Muraku B., Xiao L., Meçe E. K. "Toward Detection of Fake News Using Sentiment Analysis for Albanian News Articles". Editors: Barolli L., Advances in Internet, Data & Web Technologies, EIDWT, Lecture Notes on Data Engineering and Communications Technologies, 19, 575–585, Springer, Cham, 2024.
- [7]. Çano E., “AlbMoRe: A Corpus of Movie Reviews for Sentiment Analysis in Albanian”. Digital Philology Data Mining and Machine Learning, arXiv Prepr. arXiv:2306.085262023. University of Vienna, Austria, 2023.
- [8]. Vasili, R., Xhina, E., Ninka, I., & Terpo, D. "Sentiment Analysis on Social Media for Albanian Language". OALib, 8(1-31), 2021.
- [9]. McFee B, Raffel C, Liang D, Ellis D, Mcvicar M, Battenberg E, Nieto O. “Librosa: Audio and Music Signal Analysis in Python”. Proceedings of the 14th Python in Science Conference, January 2015.
- [10]. Hunter J.D. “Matplotlib: A 2D Graphics Environment”. Computing in Science & Engineering, 9(3), 90-95, 2007.
- [11]. Clark A. PIL: Python Imaging Library. Pillow (PIL Fork) Documentation, 1999.
- [12]. Tensorflow web sitesi, 2024. https://www.tensorflow.org/
- [13]. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, É. "Scikit-learn: Machine Learning in Python". Journal of Machine Learning Research, 12(85), 2825-2830, 2011.
- [14]. Bisong, E.. "Google Colaboratory". Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners, 59-64, Apress, Berkeley, CA. 2019.
- [15]. Grabas, L.. "NLP-Text Dataset, A balanced dataset with 5 labels joy, sad, anger, fear, and neutral, 3202 unique values". 2024. https://github.com/lukasgarbas/nlp-text-emotion/blob/master/data/data_test.csv
- [16]. He K, Zhang X, Ren S, Sun J. "Deep residual learning for image recognition", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778, Las Vegas, Nevada, USA, 27-30 Haziran 2016.
- [17]. Chollet, F. "Xception: Deep Learning with Depthwise Separable Convolutions". IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- [18]. Powers, D. M. W. "Evaluation: from Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation". arXiv Prepr. arXiv:2010.16061, 2020.
- [19]. Ying, X. "An Overview of Overfitting and its Solutions". Journal of Physics: Conference Series, 1168(2), 022022, 2019.
- [20]. Kadriu, A., Rista, A. "A Model for Albanian Speech Recognition Using End-to-End Deep Learning Techniques". International Journal of Research and Development, 9(3), 2022.
- [21]. Kaçuri, M. Explainability of Hate Speech Classification for Albanian Language Using Rule Based Systems and Neural Networks. Diploma Thesis, Technische Universität Wien. reposiTUm, 2023.