TY - JOUR T1 - EmotionUnet: Konuşma Duygu Tanıma için U-Net Tabanlı Özgün Derin Öğrenme Modeli TT - EmotionUnet: A Novel Deep Learning Model Based on U-Net for Speech Emotion Recognition AU - Görmez, Yasin PY - 2025 DA - August Y2 - 2025 DO - 10.5824/ajite.2025.03.003.x JF - AJIT-e: Academic Journal of Information Technology JO - AJIT-e PB - Akademik Bilişim Araştırmaları Derneği WT - DergiPark SN - 1309-1581 SP - 232 EP - 250 VL - 16 IS - 3 LA - tr AB - Konuşma, insanlar arasındaki iletişimin en temel ve etkili yolu olarak değerlendirilmektedir. İnsanlar konuşma yolu ile duygu, düşünce ve bilgilerini paylaşmakta, ilişkilerini güçlendirmekte ve toplumsal bağlarını pekiştirmektedir. Konuşma sırasında karşıdaki kişinin duygu durumunun anlaşılması, empati kurarak daha etkili ve anlamlı bir iletişim sağlamak için önemlidir. Günümüzde telefon gibi araçlarla yapılan uzaktan konuşmalarda ifade edilen duygu tonlarının anlaşılması için konuşma duygu tanıma yöntemlerinden sıklıkla faydalanılmaktadır. Konuşma duygu tanıma müşteri hizmetleri, sağlık, eğitim, eğlence ve akıllı sistemler gibi birçok alanda kullanılmaktadır. Konuşma duygu tanımada sinyal işleme, istatistiksel analiz ve biyometrik teknikler gibi yöntemler kullanılırken, son zamanlarda derin öğrenme yöntemleri de yaygınlaşmıştır. Bu çalışmada konuşma duygu tanıma için evrişimsel sinir ağları kullanılarak U-Net tabanlı özgün derin öğrenme modeli önerilmiştir. Önerilen modelin hiper-parametre optimizasyonları için Bayesian optimizasyon yönteminden faydalanılmıştır. Önerilen model Türkçe, İngilizce, Arapça ve Bangla dillerinden dört farklı veri ile analiz edilmiştir. Önerilen model ile farklı veri setlerinde %56,55 ile %99,71 arasında doğruluk hesaplanmıştır. KW - Konuşma Duygu Tanıma KW - Derin Öğrenme KW - Evrişimsel Sinir Ağları KW - U-Net KW - Makine Öğrenmesi N2 - Speech is considered to be the most basic and effective way of communication between people. Through speaking, people share their feelings, thoughts and information, strengthen their relationships and reinforce their social bonds. It is important to understand the emotional state of the other person during the conversation in order to provide a more effective and meaningful communication by empathizing. Today, speech emotion recognition methods are frequently used to understand the emotional tones expressed in remote conversations via tools such as telephones. Speech emotion recognition is used in many fields such as customer service, healthcare, education, entertainment, and intelligent systems. While signal processing, statistical analysis and biometric techniques are used in speech emotion recognition, deep learning methods have recently become widespread. In this study, a novel U-Net based deep learning model for speech emotion recognition using convolutional neural networks is proposed. Bayesian optimization method is used for hyper-parameter optimization of the proposed model. The proposed model is analyzed with four different datasets from Turkish, English, Arabic and Bangla languages. The accuracy of the proposed model is calculated between 56,55% and 99,71% on different datasets. CR - Ahmad, J., Muhammad, K., Kwon, S., Baik, S. W., & Rho, S. (2016). Dempster-Shafer Fusion Based Gender Recognition for Speech Analysis Applications. 2016 International Conference on Platform Technology and Service (PlatCon), 1–4. https://doi.org/10.1109/PlatCon.2016.7456788 CR - Allen, J. B., & Rabiner, L. R. (1977). A unified approach to short-time Fourier analysis and synthesis. Proceedings of the IEEE, 65(11), 1558–1564. Proceedings of the IEEE. https://doi.org/10.1109/PROC.1977.10770 CR - Alsabhan, W. (2023). Human–Computer Interaction with a Real-Time Speech Emotion Recognition with Ensembling Techniques 1D Convolution Neural Network and Attention. Sensors, 23(3), Article 3. https://doi.org/10.3390/s23031386 CR - Altamimi, M., & Alayba, A. M. (2023). ANAD: Arabic news article dataset. Data in Brief, 50, 109460. https://doi.org/10.1016/j.dib.2023.109460 CR - Anvarjon, T., Mustaqeem, & Kwon, S. (2020). Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features. Sensors, 20(18), Article 18. https://doi.org/10.3390/s20185212 CR - Aziz, S., Arif, N. H., Ahbab, S., Ahmed, S., Ahmed, T., & Kabir, Md. H. (2023). Improved Speech Emotion Recognition in Bengali Language using Deep Learning. 2023 26th International Conference on Computer and Information Technology (ICCIT), 1–6. https://doi.org/10.1109/ICCIT60459.2023.10441053 CR - Canpolat, S. F., Ormanoğlu, Z., & Zeyrek, D. (2020). Turkish Emotion Voice Database (TurEV-DB). In D. Beermann, L. Besacier, S. Sakti, & C. Soria (Eds.), Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL) (pp. 368–375). European Language Resources association. https://aclanthology.org/2020.sltu-1.52 CR - Das, R. K., Islam, N., Ahmed, Md. R., Islam, S., Shatabda, S., & Islam, A. K. M. M. (2022). BanglaSER: A speech emotion recognition dataset for the Bangla language. Data in Brief, 42, 108091. https://doi.org/10.1016/j.dib.2022.108091 CR - Ghai, M., Lal, S., Duggal, S., & Manik, S. (2017). Emotion recognition on speech signals using machine learning. 2017 International Conference on Big Data Analytics and Computational Intelligence (ICBDAC), 34–39. https://doi.org/10.1109/ICBDACI.2017.8070805 CR - Harár, P., Burget, R., & Dutta, M. K. (2017). Speech emotion recognition with deep learning. 2017 4th International Conference on Signal Processing and Integrated Networks (SPIN), 137–140. https://doi.org/10.1109/SPIN.2017.8049931 CR - Ismaiel, W., Alhalangy, A., Mohamed, A. O. Y., & Musa, A. I. A. (2024). Deep Learning, Ensemble and Supervised Machine Learning for Arabic Speech Emotion Recognition. Engineering, Technology & Applied Science Research, 14(2), Article 2. https://doi.org/10.48084/etasr.7134 CR - Issa, D., Fatih Demirci, M., & Yazici, A. (2020). Speech emotion recognition with deep convolutional neural networks. Biomedical Signal Processing and Control, 59, 101894. https://doi.org/10.1016/j.bspc.2020.101894 CR - Jha, T., Kavya, R., Christopher, J., & Arunachalam, V. (2022). Machine learning techniques for speech emotion recognition using paralinguistic acoustic features. International Journal of Speech Technology, 25(3), 707–725. https://doi.org/10.1007/s10772-022-09985-6 CR - Keras: Deep Learning for humans. (2024, July 20). https://keras.io/ CR - Kerkeni, L., Serrestou, Y., Mbarki, M., Raoof, K., Ali Mahjoub, M., & Cleder, C. (2020). Automatic Speech Emotion Recognition Using Machine Learning. In A. Cano (Ed.), Social Media and Machine Learning. IntechOpen. https://doi.org/10.5772/intechopen.84856 CR - Khan, M., Gueaieb, W., El Saddik, A., & Kwon, S. (2024). MSER: Multimodal speech emotion recognition using cross-attention with deep fusion. Expert Systems with Applications, 245, 122946. https://doi.org/10.1016/j.eswa.2023.122946 CR - Kotowski, K., Smolarczyk, T., Roterman-Konieczna, I., & Stapor, K. (2021). ProteinUnet—An efficient alternative to SPIDER3-single for sequence-based prediction of protein secondary structures. Journal of Computational Chemistry, 42(1), 50–59. https://doi.org/10.1002/jcc.26432 CR - Krishna, K. V., Sainath, N., & Posonia, A. M. (2022). Speech Emotion Recognition using Machine Learning. 2022 6th International Conference on Computing Methodologies and Communication (ICCMC), 1014–1018. https://doi.org/10.1109/ICCMC53470.2022.9753976 CR - Lin, Y.-L., & Wei, G. (2005). Speech emotion recognition based on HMM and SVM. 2005 International Conference on Machine Learning and Cybernetics, 8, 4898-4901 Vol. 8. https://doi.org/10.1109/ICMLC.2005.1527805 CR - Liu, Z.-T., Han, M.-T., Wu, B.-H., & Rehman, A. (2023). Speech emotion recognition based on convolutional neural network with attention-based bidirectional long short-term memory network and multi-task learning. Applied Acoustics, 202, 109178. https://doi.org/10.1016/j.apacoust.2022.109178 CR - Liu, Z.-T., Wu, M., Cao, W.-H., Mao, J.-W., Xu, J.-P., & Tan, G.-Z. (2018). Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing, 273, 271–280. https://doi.org/10.1016/j.neucom.2017.07.050 CR - Madanian, S., Chen, T., Adeleye, O., Templeton, J. M., Poellabauer, C., Parry, D., & Schneider, S. L. (2023). Speech emotion recognition using machine learning—A systematic review. Intelligent Systems with Applications, 20, 200266. https://doi.org/10.1016/j.iswa.2023.200266 CR - Mary Little Flower, T., Jaya, T., & Christopher Ezhil Singh, S. (2024). Data augmentation using a 1D-CNN model with MFCC/MFMC features for speech emotion recognition. Automatika, 65(4), 1325–1338. https://doi.org/10.1080/00051144.2024.2371249 CR - Mishra, D., & rawat, A. (2015). Emotion Recognition through Speech Using Neural Network. International Journal of Advanced Research in Computer Science and Software Engineering (IJARCSSE), 5. CR - Mishra, S. P., Warule, P., & Deb, S. (2024). Speech emotion recognition using MFCC-based entropy feature. Signal, Image and Video Processing, 18(1), 153–161. https://doi.org/10.1007/s11760-023-02716-7 CR - Mustaqeem, & Kwon, S. (2021). MLT-DNet: Speech emotion recognition using 1D dilated CNN based on multi-learning trick approach. Expert Systems with Applications, 167, 114177. https://doi.org/10.1016/j.eswa.2020.114177 CR - Mustaqeem, Sajjad, M., & Kwon, S. (2020). Clustering-Based Speech Emotion Recognition by Incorporating Learned Features and Deep BiLSTM. IEEE Access, 8, 79861–79875. IEEE Access. https://doi.org/10.1109/ACCESS.2020.2990405 CR - Nediyanchath, A., Paramasivam, P., & Yenigalla, P. (2020). Multi-Head Attention for Speech Emotion Recognition with Auxiliary Learning of Gender Recognition. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7179–7183. https://doi.org/10.1109/ICASSP40776.2020.9054073 CR - Pichora-Fuller, M. K., & Dupuis, K. (2020). Toronto emotional speech set (TESS) [Dataset]. Borealis. https://doi.org/10.5683/SP2/E8H2MF CR - Sankara Pandiammal, K., Karishma, S., Harine Sakthe, K., Manimaran, V., Kalaiselvi, S., & Anitha, V. (2024). Emotion Recognition from Speech – an LSTM approach with the Tess Dataset. 2024 5th International Conference on Innovative Trends in Information Technology (ICITIIT), 1–6. https://doi.org/10.1109/ICITIIT61487.2024.10580351 CR - Satt, A., Rozenberg, S., & Hoory, R. (2017). Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms. Interspeech 2017, 1089–1093. https://doi.org/10.21437/Interspeech.2017-200 CR - scikit-optimize: Sequential model-based optimization toolbox. (Version 0.10.2). (2025). [Python; MacOS, Microsoft :: Windows, POSIX, Unix]. https://scikit-optimize.readthedocs.io/en/latest/contents.html CR - Singh, P., Sahidullah, M., & Saha, G. (2023). Modulation spectral features for speech emotion recognition using deep neural networks. Speech Communication, 146, 53–69. https://doi.org/10.1016/j.specom.2022.11.005 CR - Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian Optimization of Machine Learning Algorithms. Advances in Neural Information Processing Systems, 25. https://proceedings.neurips.cc/paper/2012/hash/05311655a15b75fab86956663e1819cd-Abstract.html CR - Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48(9), 1162–1181. https://doi.org/10.1016/j.specom.2006.04.003 CR - Wagner, J., Kim, J., & Andre, E. (2005). From Physiological Signals to Emotions: Implementing and Comparing Selected Methods for Feature Extraction and Classification. 2005 IEEE International Conference on Multimedia and Expo, 940–943. https://doi.org/10.1109/ICME.2005.1521579 CR - Zhou, X., Guo, J., & Bie, R. (2016). Deep Learning Based Affective Model for Speech Emotion Recognition. 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), 841–846. https://doi.org/10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0133 CR - Zhu, L., Chen, L., Zhao, D., Zhou, J., & Zhang, W. (2017). Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN. Sensors, 17(7), Article 7. https://doi.org/10.3390/s17071694 UR - https://doi.org/10.5824/ajite.2025.03.003.x L1 - https://dergipark.org.tr/tr/download/article-file/4575531 ER -