Balkan Journal of Electrical and Computer Engineering

2147-284X 2147-284X

MUSA YILMAZ

10.17694/bajece.1372107

Computer Software Software Testing, Verification and Validation

Bilgisayar Yazılımı Yazılım Testi, Doğrulama ve Validasyon

Multimodal Emotion Recognition Using Bi-LG-GCN for MELD Dataset

https://orcid.org/0009-0005-2559-8816

Alsaadawı

Hussein Farooq Tayeb

FIRAT UNIVERSITY

https://orcid.org/0000-0002-6113-4649

Daş

Resul

Firat University, Technology Faculty, Department of Software Engineering

03 01 2024

12 1 36 46 10 06 2023 10 16 2023

2013

Balkan Journal of Electrical and Computer Engineering

Emotion recognition using multimodal data is a widely adopted approach due to its potential to enhance human interactions and various applications. By leveraging multimodal data for emotion recognition, the quality of human interactions can be significantly improved. We present the Multimodal Emotion Lines Dataset (MELD) and a novel method for multimodal emotion recognition using a bi-lateral gradient graph neural network (Bi-LG-GNN) and feature extraction and pre-processing. The multimodal dataset uses fine-grained emotion labeling for textual, audio, and visual modalities. This work aims to identify affective computing states successfully concealed in the textual and audio data for emotion recognition and sentiment analysis. We use pre-processing techniques to improve the quality and consistency of the data to increase the dataset’s usefulness. The process also includes noise removal, normalization, and linguistic processing to deal with linguistic variances and background noise in the discourse. The Kernel Principal Component Analysis (K-PCA) is employed for feature extraction, aiming to derive valuable attributes from each modality and encode labels for array values. We propose a Bi-LG-GCN-based architecture explicitly tailored for multimodal emotion recognition, effectively fusing data from various modalities. The Bi-LG-GCN system takes each modality's feature-extracted and pre-processed representation as input to the generator network, generating realistic synthetic data samples that capture multimodal relationships. These generated synthetic data samples, reflecting multimodal relationships, serve as inputs to the discriminator network, which has been trained to distinguish genuine from synthetic data. With this approach, the model can learn discriminative features for emotion recognition and make accurate predictions regarding subsequent emotional states. Our method was evaluated on the MELD dataset, yielding notable results in terms of accuracy (80%), F1-score (81%), precision (81%), and recall (81%) when using the MELD dataset. The pre-processing and feature extraction steps enhance input representation quality and discrimination. Our Bi-LG-GCN-based approach, featuring multimodal data synthesis, outperforms contemporary techniques, thus demonstrating its practical utility.

Bimodal emotion recognition text and speech recognition Multimodal Emotion Lines Dataset (MELD) bi-lateral gradient graph convolutional network (Bi-LG-GCN) Affective computing identification.

[1] P. Savci and B. Das, “Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML,” Heliyon, vol. 9, no. 5, p. e15670, 2023-05-01. [Online]. Available: https://www.sciencedirect.com/science/ article/pii/S2405844023028773

[2] M. Aydogan, “A hybrid deep neural network-based automated diagnosis system using x-ray images and clinical findings,” International Journalof Imaging Systems and Technology, vol. 33, no. 4, pp. 1368–1382, 2023, eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/ima.22856. [On-line]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/ima.

[3] D. Dupr´e, E. G. Krumhuber, D. K¨uster, and G. J. McKeown, “A performance comparison of eight commercially available automatic classifiers for facial affect recognition,” PLOS ONE, vol. 15, no. 4, p. e0231968, 2020, publisher: Public Library of Science. [Online]. Available: https://journals.plos.org/plosone/article?id=10.1371/ journal.pone.0231968

[4] E. Cameron and M. Green, Making Sense of Change Management: A Complete Guide to the Models, Tools and Techniques of Organizational Change. Kogan Page Publishers, 2019. [Online]. Available: https://www.example.com/your-book-url

[5] W. Zehra, A. R. Javed, Z. Jalil, H. U. Khan, and T. R. Gadekallu, “Cross corpus multi-lingual speech emotion recognition using ensemble learning,” Complex & Intelligent Systems, vol. 7, no. 4, pp. 1845–1854, 2021. [Online]. Available: https://doi.org/10.1007/s40747-020-00250-4

[6] A survey of emotion recognition methods with emphasis on e-learning environments | journal of network and computer applications. [Online]. Available: https://dl.acm.org/doi/10.1016/j.jnca.2019.102423

[7] S. K. Yadav, K. Tiwari, H. M. Pandey, and S. A. Akbar, “A review of multimodal human activity recognition with special emphasis on classification, applications, challenges and future directions,” Knowledge- Based Systems, vol. 223, p. 106970, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0950705121002331

[8] R. Das and M. Soylu, “A key review on graph data science: The power of graphs in scientific studies,” Chemometrics and Intelligent Laboratory Systems, vol. 240, p. 104896, 2023-09-15. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0169743923001466

[9] P. Savci and B. Das, “Prediction of the customers’ interests using sentiment analysis in e-commerce data for comparison of arabic, english, and turkish languages,” Journal of King Saud University - Computer and Information Sciences, vol. 35, no. 3, pp. 227–237, 2023-03-01. [Online]. Available: https://www.sciencedirect.com/science/ article/pii/S131915782300054X

[10] I. Pulatov, R. Oteniyazov, F. Makhmudov, and Y.-I. Cho, “Enhancing speech emotion recognition using dual feature extraction encoders,” Sensors, vol. 23, no. 14, p. 6640, 2023-01, number: 14 Publisher: Multidisciplinary Digital Publishing Institute. [Online]. Available: https://www.mdpi.com/1424-8220/23/14/6640

[11] M. Egger, M. Ley, and S. Hanke, “Emotion recognition from physiological signal analysis: A review,” Electronic Notes in Theoretical Computer Science, vol. 343, pp. 35–55, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S157106611930009X

[12] E. S. Salama, R. A. El-Khoribi, M. E. Shoman, and M. A. W. Shalaby, “A 3d-convolutional neural network framework with ensemble learning techniques for multi-modal emotion recognition,” Egyptian Informatics Journal, vol. 22, no. 2, pp. 167–176, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1110866520301389

[13] C.-H. Wu and W.-B. Liang, “Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels,” T. Affective Computing, vol. 2, pp. 10–21, 2011. [Online]. Available: https://ieeexplore.ieee.org/document/5674019

[14] M. Soylu, A. Soylu, and R. Das, “A new approach to recognizing the use of attitude markers by authors of academic journal articles,” Expert Systems with Applications, vol. 230, p. 120538, 2023-11. [Online]. Available: https://linkinghub.elsevier.com/retrieve/ pii/S0957417423010400

[15] Speech emotion recognition with acoustic and lexical features. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/7178872/

[16] K. D. N. and A. Patil, “Multimodal emotion recognition using crossmodal attention and 1d convolutional neural networks,” in Interspeech 2020. ISCA, 2020, pp. 4243–4247. [Online]. Available: https: //www.isca-speech.org/archive/interspeech 2020/n20 interspeech.html

[17] Y. Cimtay, E. Ekmekcioglu, and S. Caglar-Ozhan, “Cross-subject multimodal emotion recognition based on hybrid fusion,” IEEE Access, vol. 8, pp. 168 865–168 878, 2020, conference Name: IEEE Access. [Online]. Available: https://ieeexplore.ieee.org/document/9195813

[18] T. Dalgleish and M. Power, Handbook of Cognition and Emotion. John Wiley & Sons, 2000-11-21, google-Books-ID: vsLvrhohXhAC. [Online]. Available: https://www.google.com.tr/books/ edition/Handbook of Cognition and Emotion/vsLvrhohXhAC?hl=en& gbpv=1&dq=isbn:9780470842218&printsec=frontcover&pli=1

[19] C. Guanghui and Z. Xiaoping, “Multi-modal emotion recognition by fusing correlation features of speech-visual,” IEEE Signal Processing Letters, vol. 28, pp. 533–537, 2021, conference Name: IEEE Signal Processing Letters. [Online]. Available: https://ieeexplore.ieee. org/document/9340264

[20] S. K. Bharti, S. Varadhaganapathy, R. K. Gupta, P. K. Shukla, M. Bouye, S. K. Hingaa, and A. Mahmoud, “Text-based emotion recognition usingdeep learning approach,” Computational Intelligence and Neuroscience, vol. 2022, p. e2645381, 2022, publisher: Hindawi. [Online]. Available: https://www.hindawi.com/journals/cin/2022/2645381/

[21] Z. Lian, J. Tao, B. Liu, J. Huang, Z. Yang, and R. Li, “Context-dependent domain adversarial neural network for multimodal emotion recognition.” in Interspeech, 2020, pp. 394–398. [Online]. Available: https://www. iscaspeech.org/archive/interspeech 2020/lian20b interspeech.html

[22] D. Priyasad, T. Fernando, S. Denman, C. Fookes, and S. Sridharan, “Attention driven fusion for multi-modal emotion recognition.” [Online]. Available: http://arxiv.org/abs/2009.10991

[23] T. Mittal, U. Bhattacharya, R. Chandra, A. Bera, and D. Manocha, “M3er: Multiplicative multimodal emotion recognition using facial, textual, and speech cues,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 1359–1367, 2020. [Online]. Available: https://doi.org/10.48550/arXiv.1911.05659

[24] W. Liu, J.-L. Qiu, W.-L. Zheng, and B.-L. Lu, “Multimodal emotion recognition using deep canonical correlation analysis.” [Online]. Available: http://arxiv.org/abs/1908.05349

[25] T. Mittal, P. Guhan, U. Bhattacharya, R. Chandra, A. Bera, and D. Manocha, “Emoticon: Context-aware multimodal emotion recognition using frege’s principle,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020. [Online]. Available: https://ieeexplore.ieee.org/document/9156904

[26] M. R. Makiuchi, K. Uto, and K. Shinoda, “Multimodal emotion recognition with high-level speech and text features.” [Online]. Available: http://arxiv.org/abs/2111.10202

[27] Y.-T. Lan, W. Liu, and B.-L. Lu, “Multimodal emotion recognition using deep generalized canonical correlation analysis with an attention mechanism,” in 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020-07, pp. 1–6. [Online]. Available: https://ieeexplore.ieee.org/document/9207625/

[28] H. Zhang, “Expression-EEG based collaborative multimodal emotion recognition using deep AutoEncoder,” IEEE Access, vol. 8, pp. 164 130–164 143, 2020, conference Name: IEEE Access. [Online]. Available: https://ieeexplore.ieee.org/document/9187342

[29] S. R. Zaman, D. Sadekeen, M. A. Alfaz, and R. Shahriyar, “One source to detect them all: Gender, age, and emotion detection from voice,” in 2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC), 2021, pp. 338–343, ISSN: 0730-3157. [Online]. Available: https://ieeexplore.ieee.org/document/9529731

[30] X. Wu, W.-L. Zheng, and B.-L. Lu, “Investigating EEG-based functional connectivity patterns for multimodal emotion recognition.” [Online]. Available: http://arxiv.org/abs/2004.01973

[31] M. S. Akhtar, D. Chauhan, D. Ghosal, S. Poria, A. Ekbal, and P. Bhattacharyya, “Multi-task learning for multi-modal emotion recognition and sentiment analysis,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 2019, pp. 370–379. [Online]. Available: https://aclanthology.org/ N19-1034

[32] S. Nemati, R. Rohani, M. E. Basiri, M. Abdar, N. Y. Yen, and V. Makarenkov, “A hybrid latent space data fusion method for multimodal emotion recognition,” IEEE Access, vol. 7, pp. 172 948– 172 964, 2019, conference Name: IEEE Access. [Online]. Available: https://ieeexplore.ieee.org/document/8911364

[33] Z. Fang, A. He, Q. Yu, B. Gao, W. Ding, T. Zhang, and L. Ma, “FAF: A novel multimodal emotion recognition approach integrating face, body and text.” [Online]. Available: http://arxiv.org/abs/2211.15425

[34] L. Sun, Z. Lian, J. Tao, B. Liu, and M. Niu, “Multi-modal continuous dimensional emotion recognition using recurrent neural network and self-attention mechanism,” in Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop, ser. MuSe’20. Association for Computing Machinery, 2020-10-15, pp. 27–34. [Online]. Available: https://doi.org/10.1145/3423327.3423672

[35] L. Cai, Y. Hu, J. Dong, and S. Zhou, “Audio-textual emotion recognition based on improved neural networks,” Mathematical Problems in Engineering, vol. 2019, pp. 1–9, 2019. [Online]. Available: https://www.hindawi.com/journals/mpe/2019/2593036/

[36] M. Aydo˘gan and A. Karci, “Improving the accuracy using pretrained word embeddings on deep neural networks for turkish text classification,” Physica A: Statistical Mechanics and its Applications, vol. 541, p. 123288, 2020-03. [Online]. Available: https://linkinghub. elsevier.com/retrieve/pii/S0378437119318436

[37] Q.-T. Truong and H. Lauw, “VistaNet: Visual aspect attention network for multimodal sentiment analysis,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 305–312, 2019-07-17. [Online]. Available: https://doi.org/10.1609/aaai.v33i01.3301305

[38] N. Ahmed, Z. A. Aghbari, and S. Girija, “A systematic survey on multimodal emotion recognition using learning algorithms,” Intelligent Systems with Applications, vol. 17, p. 200171, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2667305322001089

[39] A. Gandhi, K. Adhvaryu, S. Poria, E. Cambria, and A. Hussain, “Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions,” Information Fusion, vol. 91, pp. 424–444, 2023-03- 01. [Online]. Available: https://www.sciencedirect.com/science/article/ pii/S1566253522001634

[40] A. Solgi, A. Pourhaghi, R. Bahmani, and H. Zarei, “Improving SVR and ANFIS performance using wavelet transform and PCA algorithm for modeling and predicting biochemical oxygen demand (BOD),” Ecohydrology & Hydrobiology, vol. 17, no. 2, pp. 164–175, 2017-04- 01. [Online]. Available: https://www.sciencedirect.com/science/article/ pii/S1642359316300672

[41] J. Li, X. Wang, G. Lv, and Z. Zeng, “GraphMFT: A graph network based multimodal fusion technique for emotion recognition in conversation.” [Online]. Available: http://arxiv.org/abs/2208.00339