TY  - JOUR
T1  - EmotionUnet: Konuşma Duygu Tanıma için U-Net Tabanlı Özgün Derin Öğrenme Modeli
TT  - EmotionUnet: A Novel Deep Learning Model Based on U-Net for Speech Emotion Recognition
AU  - Görmez, Yasin
PY  - 2025
DA  - August
Y2  - 2025
DO  - 10.5824/ajite.2025.03.003.x
JF  - AJIT-e: Academic Journal of Information Technology
JO  - AJIT-e
PB  - Akademik Bilişim Araştırmaları Derneği
WT  - DergiPark
SN  - 1309-1581
SP  - 232
EP  - 250
VL  - 16
IS  - 3
LA  - tr
AB  - Konuşma, insanlar arasındaki iletişimin en temel ve etkili yolu olarak değerlendirilmektedir. İnsanlar konuşma yolu ile duygu, düşünce ve bilgilerini paylaşmakta, ilişkilerini güçlendirmekte ve toplumsal bağlarını pekiştirmektedir. Konuşma sırasında karşıdaki kişinin duygu durumunun anlaşılması, empati kurarak daha etkili ve anlamlı bir iletişim sağlamak için önemlidir. Günümüzde telefon gibi araçlarla yapılan uzaktan konuşmalarda ifade edilen duygu tonlarının anlaşılması için konuşma duygu tanıma yöntemlerinden sıklıkla faydalanılmaktadır. Konuşma duygu tanıma müşteri hizmetleri, sağlık, eğitim, eğlence ve akıllı sistemler gibi birçok alanda kullanılmaktadır. Konuşma duygu tanımada sinyal işleme, istatistiksel analiz ve biyometrik teknikler gibi yöntemler kullanılırken, son zamanlarda derin öğrenme yöntemleri de yaygınlaşmıştır. Bu çalışmada konuşma duygu tanıma için evrişimsel sinir ağları kullanılarak U-Net tabanlı özgün derin öğrenme modeli önerilmiştir. Önerilen modelin hiper-parametre optimizasyonları için Bayesian optimizasyon yönteminden faydalanılmıştır. Önerilen model Türkçe, İngilizce, Arapça ve Bangla dillerinden dört farklı veri ile analiz edilmiştir. Önerilen model ile farklı veri setlerinde %56,55 ile %99,71 arasında doğruluk hesaplanmıştır.
KW  - Konuşma Duygu Tanıma
KW  - Derin Öğrenme
KW  - Evrişimsel Sinir Ağları
KW  - U-Net
KW  - Makine Öğrenmesi
N2  - Speech is considered to be the most basic and effective way of communication between people. Through speaking, people share their feelings, thoughts and information, strengthen their relationships and reinforce their social bonds. It is important to understand the emotional state of the other person during the conversation in order to provide a more effective and meaningful communication by empathizing. Today, speech emotion recognition methods are frequently used to understand the emotional tones expressed in remote conversations via tools such as telephones. Speech emotion recognition is used in many fields such as customer service, healthcare, education, entertainment, and intelligent systems. While signal processing, statistical analysis and biometric techniques are used in speech emotion recognition, deep learning methods have recently become widespread. In this study, a novel U-Net based deep learning model for speech emotion recognition using convolutional neural networks is proposed. Bayesian optimization method is used for hyper-parameter optimization of the proposed model. The proposed model is analyzed with four different datasets from Turkish, English, Arabic and Bangla languages. The accuracy of the proposed model is calculated between 56,55% and 99,71% on different datasets.
CR  - Ahmad, J., Muhammad, K., Kwon, S., Baik, S. W., &amp; Rho, S. (2016). Dempster-Shafer Fusion Based Gender Recognition for Speech Analysis Applications. 2016 International Conference on Platform Technology and Service (PlatCon), 1–4. https://doi.org/10.1109/PlatCon.2016.7456788
CR  - Allen, J. B., &amp; Rabiner, L. R. (1977). A unified approach to short-time Fourier analysis and synthesis. Proceedings of the IEEE, 65(11), 1558–1564. Proceedings of the IEEE. https://doi.org/10.1109/PROC.1977.10770
CR  - Alsabhan, W. (2023). Human–Computer Interaction with a Real-Time Speech Emotion Recognition with Ensembling Techniques 1D Convolution Neural Network and Attention. Sensors, 23(3), Article 3. https://doi.org/10.3390/s23031386
CR  - Altamimi, M., &amp; Alayba, A. M. (2023). ANAD: Arabic news article dataset. Data in Brief, 50, 109460. https://doi.org/10.1016/j.dib.2023.109460
CR  - Anvarjon, T., Mustaqeem, &amp; Kwon, S. (2020). Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features. Sensors, 20(18), Article 18. https://doi.org/10.3390/s20185212
CR  - Aziz, S., Arif, N. H., Ahbab, S., Ahmed, S., Ahmed, T., &amp; Kabir, Md. H. (2023). Improved Speech Emotion Recognition in Bengali Language using Deep Learning. 2023 26th International Conference on Computer and Information Technology (ICCIT), 1–6. https://doi.org/10.1109/ICCIT60459.2023.10441053
CR  - Canpolat, S. F., Ormanoğlu, Z., &amp; Zeyrek, D. (2020). Turkish Emotion Voice Database (TurEV-DB). In D. Beermann, L. Besacier, S. Sakti, &amp; C. Soria (Eds.), Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL) (pp. 368–375). European Language Resources association. https://aclanthology.org/2020.sltu-1.52
CR  - Das, R. K., Islam, N., Ahmed, Md. R., Islam, S., Shatabda, S., &amp; Islam, A. K. M. M. (2022). BanglaSER: A speech emotion recognition dataset for the Bangla language. Data in Brief, 42, 108091. https://doi.org/10.1016/j.dib.2022.108091
CR  - Ghai, M., Lal, S., Duggal, S., &amp; Manik, S. (2017). Emotion recognition on speech signals using machine learning. 2017 International Conference on Big Data Analytics and Computational Intelligence (ICBDAC), 34–39. https://doi.org/10.1109/ICBDACI.2017.8070805
CR  - Harár, P., Burget, R., &amp; Dutta, M. K. (2017). Speech emotion recognition with deep learning. 2017 4th International Conference on Signal Processing and Integrated Networks (SPIN), 137–140. https://doi.org/10.1109/SPIN.2017.8049931
CR  - Ismaiel, W., Alhalangy, A., Mohamed, A. O. Y., &amp; Musa, A. I. A. (2024). Deep Learning, Ensemble and Supervised Machine Learning for Arabic Speech Emotion Recognition. Engineering, Technology &amp; Applied Science Research, 14(2), Article 2. https://doi.org/10.48084/etasr.7134
CR  - Issa, D., Fatih Demirci, M., &amp; Yazici, A. (2020). Speech emotion recognition with deep convolutional neural networks. Biomedical Signal Processing and Control, 59, 101894. https://doi.org/10.1016/j.bspc.2020.101894
CR  - Jha, T., Kavya, R., Christopher, J., &amp; Arunachalam, V. (2022). Machine learning techniques for speech emotion recognition using paralinguistic acoustic features. International Journal of Speech Technology, 25(3), 707–725. https://doi.org/10.1007/s10772-022-09985-6
CR  - Keras: Deep Learning for humans. (2024, July 20). https://keras.io/
CR  - Kerkeni, L., Serrestou, Y., Mbarki, M., Raoof, K., Ali Mahjoub, M., &amp; Cleder, C. (2020). Automatic Speech Emotion Recognition Using Machine Learning. In A. Cano (Ed.), Social Media and Machine Learning. IntechOpen. https://doi.org/10.5772/intechopen.84856
CR  - Khan, M., Gueaieb, W., El Saddik, A., &amp; Kwon, S. (2024). MSER: Multimodal speech emotion recognition using cross-attention with deep fusion. Expert Systems with Applications, 245, 122946. https://doi.org/10.1016/j.eswa.2023.122946
CR  - Kotowski, K., Smolarczyk, T., Roterman-Konieczna, I., &amp; Stapor, K. (2021). ProteinUnet—An efficient alternative to SPIDER3-single for sequence-based prediction of protein secondary structures. Journal of Computational Chemistry, 42(1), 50–59. https://doi.org/10.1002/jcc.26432
CR  - Krishna, K. V., Sainath, N., &amp; Posonia, A. M. (2022). Speech Emotion Recognition using Machine Learning. 2022 6th International Conference on Computing Methodologies and Communication (ICCMC), 1014–1018. https://doi.org/10.1109/ICCMC53470.2022.9753976
CR  - Lin, Y.-L., &amp; Wei, G. (2005). Speech emotion recognition based on HMM and SVM. 2005 International Conference on Machine Learning and Cybernetics, 8, 4898-4901 Vol. 8. https://doi.org/10.1109/ICMLC.2005.1527805
CR  - Liu, Z.-T., Han, M.-T., Wu, B.-H., &amp; Rehman, A. (2023). Speech emotion recognition based on convolutional neural network with attention-based bidirectional long short-term memory network and multi-task learning. Applied Acoustics, 202, 109178. https://doi.org/10.1016/j.apacoust.2022.109178
CR  - Liu, Z.-T., Wu, M., Cao, W.-H., Mao, J.-W., Xu, J.-P., &amp; Tan, G.-Z. (2018). Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing, 273, 271–280. https://doi.org/10.1016/j.neucom.2017.07.050
CR  - Madanian, S., Chen, T., Adeleye, O., Templeton, J. M., Poellabauer, C., Parry, D., &amp; Schneider, S. L. (2023). Speech emotion recognition using machine learning—A systematic review. Intelligent Systems with Applications, 20, 200266. https://doi.org/10.1016/j.iswa.2023.200266
CR  - Mary Little Flower, T., Jaya, T., &amp; Christopher Ezhil Singh, S. (2024). Data augmentation using a 1D-CNN model with MFCC/MFMC features for speech emotion recognition. Automatika, 65(4), 1325–1338. https://doi.org/10.1080/00051144.2024.2371249
CR  - Mishra, D., &amp; rawat, A. (2015). Emotion Recognition through Speech Using Neural Network. International Journal of Advanced Research in Computer Science and Software Engineering (IJARCSSE), 5.
CR  - Mishra, S. P., Warule, P., &amp; Deb, S. (2024). Speech emotion recognition using MFCC-based entropy feature. Signal, Image and Video Processing, 18(1), 153–161. https://doi.org/10.1007/s11760-023-02716-7
CR  - Mustaqeem, &amp; Kwon, S. (2021). MLT-DNet: Speech emotion recognition using 1D dilated CNN based on multi-learning trick approach. Expert Systems with Applications, 167, 114177. https://doi.org/10.1016/j.eswa.2020.114177
CR  - Mustaqeem, Sajjad, M., &amp; Kwon, S. (2020). Clustering-Based Speech Emotion Recognition by Incorporating Learned Features and Deep BiLSTM. IEEE Access, 8, 79861–79875. IEEE Access. https://doi.org/10.1109/ACCESS.2020.2990405
CR  - Nediyanchath, A., Paramasivam, P., &amp; Yenigalla, P. (2020). Multi-Head Attention for Speech Emotion Recognition with Auxiliary Learning of Gender Recognition. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 7179–7183. https://doi.org/10.1109/ICASSP40776.2020.9054073
CR  - Pichora-Fuller, M. K., &amp; Dupuis, K. (2020). Toronto emotional speech set (TESS) [Dataset]. Borealis. https://doi.org/10.5683/SP2/E8H2MF
CR  - Sankara Pandiammal, K., Karishma, S., Harine Sakthe, K., Manimaran, V., Kalaiselvi, S., &amp; Anitha, V. (2024). Emotion Recognition from Speech – an LSTM approach with the Tess Dataset. 2024 5th International Conference on Innovative Trends in Information Technology (ICITIIT), 1–6. https://doi.org/10.1109/ICITIIT61487.2024.10580351
CR  - Satt, A., Rozenberg, S., &amp; Hoory, R. (2017). Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms. Interspeech 2017, 1089–1093. https://doi.org/10.21437/Interspeech.2017-200
CR  - scikit-optimize: Sequential model-based optimization toolbox. (Version 0.10.2). (2025). [Python; MacOS, Microsoft :: Windows, POSIX, Unix]. https://scikit-optimize.readthedocs.io/en/latest/contents.html
CR  - Singh, P., Sahidullah, M., &amp; Saha, G. (2023). Modulation spectral features for speech emotion recognition using deep neural networks. Speech Communication, 146, 53–69. https://doi.org/10.1016/j.specom.2022.11.005
CR  - Snoek, J., Larochelle, H., &amp; Adams, R. P. (2012). Practical Bayesian Optimization of Machine Learning Algorithms. Advances in Neural Information Processing Systems, 25. https://proceedings.neurips.cc/paper/2012/hash/05311655a15b75fab86956663e1819cd-Abstract.html
CR  - Ververidis, D., &amp; Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48(9), 1162–1181. https://doi.org/10.1016/j.specom.2006.04.003
CR  - Wagner, J., Kim, J., &amp; Andre, E. (2005). From Physiological Signals to Emotions: Implementing and Comparing Selected Methods for Feature Extraction and Classification. 2005 IEEE International Conference on Multimedia and Expo, 940–943. https://doi.org/10.1109/ICME.2005.1521579
CR  - Zhou, X., Guo, J., &amp; Bie, R. (2016). Deep Learning Based Affective Model for Speech Emotion Recognition. 2016 Intl IEEE Conferences on Ubiquitous Intelligence &amp; Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), 841–846. https://doi.org/10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0133
CR  - Zhu, L., Chen, L., Zhao, D., Zhou, J., &amp; Zhang, W. (2017). Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN. Sensors, 17(7), Article 7. https://doi.org/10.3390/s17071694
UR  - https://doi.org/10.5824/ajite.2025.03.003.x
L1  - https://dergipark.org.tr/tr/download/article-file/4575531
ER  -