Duygusal Konuşma Tanımada Yapay Veri Kullanımı

Umut Avcı

doi:10.21205/deufmd.2025278104

Araştırma Makalesi

Artificial Data Usage in Recognizing Emotional Speech

Yıl 2025, Cilt: 27 Sayı: 81, 359 - 375, 29.09.2025

Umut Avcı

https://doi.org/10.21205/deufmd.2025278104

Öz

This study investigates the effects of data augmentation techniques on emotion recognition in Turkish language speech, utilizing the BUEMODB and ITUDB datasets. Following the preprocessing phase, which involved the removal of silent segments and normalization of audio signals, baseline classification was established by converting audio into mel spectrograms, extracting six feature sets, and employing seven machine learning classifiers. The initial results indicated baseline F1 scores of 56.3% for the BUEMODB dataset and 65.2% for the ITUDB dataset. In subsequent experiments, data augmentation techniques were implemented to expand the training data fivefold through various audio transformations, such as Noise Injection and Pitch Shift, alongside image transformations including Zoom Range and Height Shift Range. The application of audio-based augmentation yielded improved classification outcomes, with BUEMODB achieving an accuracy of 57.6% and ITUDB reaching 71.3% when Air Absorption and Time Stretch were employed in combination. Furthermore, image-based augmentation contributed to enhanced performance, resulting in scores of 60.0% for BUEMODB and 73.2% for ITUDB. Ultimately, a hybrid approach was explored, integrating the highest-performing audio and image transformations. This approach led to F1 scores of 59.7% for BUEMODB and 75.1% for ITUDB, reflecting nearly a 10% improvement over baseline performance. The findings underscore that meticulously selected data augmentation techniques, particularly those that are image-based and hybrid, can significantly improve the accuracy of emotion recognition while mitigating the drawbacks associated with excessive transformations.

Anahtar Kelimeler

Speech Emotion Recognition , Data Augmentation , Supervised Learning

Kaynakça

Malik, M., Malik, M.K., Mehmood, K. 2021. Automatic Speech Recognition: a Survey, Multimedia Tools Applications, Cilt 80, s. 9411–9457. DOI: 10.1007/s11042-020-10073-7
Cai, Z., Yang, Y., Li, M. 2023. Cross-lingual Multi-speaker Speech Synthesis with Limited Bilingual Training Data, Computer Speech and Language, Cilt 77, s. 101427. DOI: 10.1016/j.csl.2022.101427
Sharma, R., Govind, D., Mishra, J. 2024. Milestones in Speaker Recognition, Artificial Intelligence Review, Cilt 57, 58. DOI: 10.1007/s10462-023-10688-w
Maiti, S., Ueda, Y., Watanabe, S., Zhang, C. 2022. EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers. IEEE Spoken Language Technology Workshop (SLT), s. 480-487. DOI: 10.1109/SLT54892.2023.10022924
Mehrabian, A. 1968. Communication without Words, Psychology Today, Cilt 2, No. 4, s. 53-56.
Plaza, M., Kazala, R., Koruba, Z., Kozlowski, M., Lucińska, M., Sitek, K., Spyrka, J. 2022. Emotion Recognition Method for Call/Contact Centre Systems, Applied Sciences, Cilt 12, No. 21, s. 10951. DOI: 10.3390/app12211095
Bahreini, K., Nadolski, R., Westera, W. 2016. Towards Real-time Speech Emotion Recognition for Affective e-learning, Education and Information Technologies, Cilt 21, s. 1367–1386. DOI: 10.1007/s10639-015-9388-2
Li, H.-C., Pan, T., Lee, M.-H., Chiu, H.-W. 2021. Make Patient Consultation Warmer: A Clinical Application for Speech Emotion Recognition, Applied Sciences, Cilt 11, s. 4782. DOI: 10.3390/app11114782
Hu, J., Huang, Y., Hu, X., Xu, Y. 2023. The Acoustically Emotion-Aware Conversational Agent With Speech Emotion Recognition and Empathetic Responses, IEEE Transactions on Affective Computing, Cilt 14, No. 1, s. 17-30. DOI: 10.1109/TAFFC.2022.3205919
Frommel, J., Schrader, C., Weber, M. 2018. Towards Emotion-based Adaptive Games: Emotion Recognition Via Input and Performance Features. Proceedings of the 2018 Annual Symposium on Computer-Human Interaction in Play, s. 173–185. DOI: 10.1145/3242671.3242672
Chien, C., Wang, W-C., Moutinho, L., Cheng, Y-M., Pao, T.L., Chen, Y.T., Yeh, J.H. 2007. Applying Recognition Of Emotions In Speech To Extend The Impact Of Brand Slogan Research, Portuguese Journal of Management Studies, Cilt 0(2), s. 115-132.
Jing, S., Mao, X., Chen, L. 2019. Automatic Speech Discrete Labels to Dimensional Emotional Values Conversion Method, IET Biometrics, Cilt 8, s. 168-176. DOI: 10.1049/iet-bmt.2018.5016
Fahad, S., Ranjan, A., Yadav, J., Deepak, A. 2021. A Survey of Speech Emotion Recognition in Natural Environment, Digital Signal Processing, Cilt 110, s. 102951. DOI: 10.1016/j.dsp.2020.102951
Livingstone, S. R., Russo, F. A. 2018. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A Dynamic, Multimodal Set of Facial and Vocal Expressions in North American English, PLoS ONE, Cilt 13(5). DOI: 10.1371/journal.pone.0196391
Alhinti, L., Cunningham, S., Christensen, H. 2023. The Dysarthric Expressed Emotional Database (DEED): An Audio-visual Database in British English, PLoS ONE. Cilt 18(8). DOI: 10.1371/journal.pone.0287971
Blumentals, E., Salimbajevs, A. 2022. Emotion Recognition in Real-World Support Call Center Data for Latvian Language. ACM Intelligent User Interfaces Workshops, s. 200-203.
Grimm, M., Kroschel, K., Narayanan, S. 2008. The Vera am Mittag German Audio-visual Emotional Speech Database. IEEE International Conference on Multimedia and Expo, s. 865-868. DOI: 10.1109/ICME.2008.4607572
Lotfian, R., Busso, C. 2019. Building Naturalistic Emotionally Balanced Speech Corpus by Retrieving Emotional Speech from Existing Podcast Recordings, IEEE Transactions on Affective Computing, Cilt 10, No. 4, s. 471-483. DOI: 10.1109/TAFFC.2017.2736999
McKeown, G., Valstar, M., Cowie, R., Pantic, M., Schroder, M. 2012. The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent, IEEE Transactions on Affective Computing, Cilt 3, No. 1, s. 5-17. DOI: 10.1109/T-AFFC.2011.20
Batliner, A., Steidl, S., Nöth, E. 2008. Releasing a Thoroughly Annotated and Processed Spontaneous Emotional Database: The FAU Aibo Emotion Corpus. Proceedings of the Satellite Workshop of LREC, s. 26–27.
Swain, M., Routray, A., Kabisatpathy, P. 2018. Databases, Features and Classifiers for Speech Emotion Recognition: a Review, Journal of Speech Technologies, Cilt 21(1), s. 93-120. DOI: 10.1007/s10772-018-9491-z
Li, X., Akagi, M. 2019. Improving Multilingual Speech Emotion Recognition by Combining Acoustic Features in a Three-layer Model, Speech Communication, Cilt 110, s. 1-12. DOI: 10.1016/j.specom.2019.04.004
Cao, H., Verma, R., Nenkova, A. 2015. Speaker-sensitive Emotion Recognition via Ranking: Studies on Acted and Spontaneous Speech, Computer Speech and Language, Cilt 29(1), s. 186-202. DOI: 10.1016/j.csl.2014.01.003
Al-Dujaili, M.J., Ebrahimi-Moghadam, A. 2023. Speech Emotion Recognition: A Comprehensive Survey, Wireless Personal Communication, Cilt 129, s. 2525–2561. DOI: 10.1007/s11277-023-10244-3
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J. G. 2001. Emotion Recognition in Human-computer Interaction, IEEE Signal Processing Magazine, Cilt 18(1), s. 32-80. DOI: 10.1109/79.911197
Sakurai, M., Kosaka, T. 2021. Emotion Recognition Combining Acoustic and Linguistic Features Based on Speech Recognition Results. IEEE 10th Global Conference on Consumer Electronics, s. 824-827. DOI: 10.1109/GCCE53005.2021.9621810
Sato, K., Kishi, K., Kosaka, T. 2023. Speech Emotion Recognition by Late Fusion of Linguistic and Acoustic Features using Deep Learning Models. Asia Pacific Signal and Information Processing Association Annual Summit and Conference, s. 1013-1018. DOI: 10.1109/APSIPAASC58517.2023.10317325
Santoso, J., Ishizuka, K., Hashimoto, T. 2024. Large Language Model-Based Emotional Speech Annotation Using Context and Acoustic Feature for Speech Emotion Recognition. IEEE International Conference on Acoustics, Speech and Signal Processing, s. 11026-11030. DOI: 10.1109/ICASSP48485.2024.10448316
Triantafyllopoulos, A., Wagner, J., Wierstorf, H., Schmitt, M., Reichel, U.D., Eyben, F., Burkhardt, F., & Schuller, B. 2022. Probing Speech Emotion Recognition Transformers for Linguistic Knowledge. Interspeech, s. 146—150. DOI: 10.21437/Interspeech.2022-10371
Pham, N.T., Tran, A.T., Pham, B.N.H., Dang-Ngoc, H., Nguyen, S.D., Dang, D.N.M. 2024. Speech Emotion Recognition: A Brief Review of Multi-modal Multi-task Learning Approaches, Recent Advances in Electrical Engineering and Related Sciences: Theory and Application. DOI: 10.1007/978-981-99-8703-0_50
Lim, Y., Ng, K-W., Naveen, P., Haw, S-C. 2022. Emotion Recognition by Facial Expression and Voice: Review and Analysis, Journal of Informatics and Web Engineering, Cilt 1, s. 45-54. DOI: 10.33093/jiwe.2022.1.2.4
Nantasri, P. ve ark. 2020. A Light-Weight Artificial Neural Network for Speech Emotion Recognition using Average Values of MFCCs and Their Derivatives. 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, s. 41-44. DOI: 10.1109/ECTI-CON49241.2020.9158221
Huang, S., Dang, H., Jiang, R., Hao, Y., Xue, C., Gu, W. 2021. Multi-Layer Hybrid Fuzzy Classification Based on SVM and Improved PSO for Speech Emotion Recognition, Electronics, Cilt 10(23), s. 2891. DOI: 10.3390/electronics10232891
Fahad, M.S., Deepak, A., Pradhan, G., Yadav, J. 2021. DNN-HMM-Based Speaker-Adaptive Emotion Recognition Using MFCC and Epoch-Based Features, Circuits Systems and Signal Processing, Cilt 40, s. 466–489. DOI: 10.1007/s00034-020-01486-8
Tashev, I.J., Wang, Z-Q., Godin, K. 2017. Speech Emotion Recognition based on Gaussian Mixture Models and Deep Neural Networks. Information Theory and Applications Workshop (ITA), s. 1-4. DOI: 10.1109/ITA.2017.8023477
Dogdu, C., Kessler, T., Schneider, D., Shadaydeh, M., Schweinberger, S.R. 2022. A Comparison of Machine Learning Algorithms and Feature Sets for Automatic Vocal Emotion Recognition in Speech, Sensors, Cilt 22(19):7561. DOI: 10.3390/s22197561
Sun, L., Li, Q., Fu, S., Li, P. 2022. Speech Emotion Recognition based on Genetic Algorithm–decision Tree Fusion of Deep and Acoustic Features, ETRI Journal, Cilt 44(3), s. 462-475. DOI: 10.4218/etrij.2020-0458
Ramesh, S., Gomathi, S., Sasikala, S., Saravanan, T.R. 2023. Automatic Speech Emotion Detection using Hybrid of Gray Wolf Optimizer and Naïve Bayes, International Journal of Speech Technology, Cilt 26, s. 571–578. DOI: 10.1007/s10772-021-09870-8
Peng, Z., Lu, Y., Pan, S., Liu, Y. 2021. Efficient Speech Emotion Recognition Using Multi-Scale CNN and Attention. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), s. 3020-3024. DOI: 10.1109/ICASSP39728.2021.9414286
Trigeorgis, G., Ringeval, F., Brueckner, R., Marchi, E., Nicolaou, M.A., Schuller, B., Zafeiriou, S. 2016. Adieu Features? End-to-end Speech Emotion Recognition using a Deep Convolutional Recurrent Network. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, s. 5200-5204. DOI: 10.1109/ICASSP.2016.7472669
Huang, Y., Tian, K., Wu, A., Zhang, G. 2019. Feature Fusion Methods Research based on Deep Belief Networks for Speech Emotion Recognition under Noise Condition, Journal of Ambient Intelligence Humanized Computing, Cilt 10, s. 1787–1798. DOI: 10.1007/s12652-017-0644-8
Jianfeng, Z., Xia, M., Lijiang, C. 2019. Speech Emotion Recognition using Deep 1D & 2D CNN LSTM networks, Biomedical Signal Processing and Control, Cilt 47, s. 312-323. DOI: 10.1016/J.BSPC.2018.08.035
Zhang, C., Xue, L. 2021. Autoencoder With Emotion Embedding for Speech Emotion Recognition, IEEE Access, Cilt 9, s. 51231-51241. DOI: 10.1109/ACCESS.2021.3069818
Alzubaidi, L., Zhang, J., Humaidi, A.J., ve ark. 2021. Review of Deep Learning: Concepts, CNN Architectures, Challenges, Applications, Future Directions, Journal of Big Data, Cilt 8(53), s. 1-74. DOI: 10.1186/s40537-021-00444-8
Mujaddidurrahman, A., Ernawan, F., Wibowo, A., Sarwoko, E.A., Sugiharto, A., Wahyudi, M.D.R. 2021. Speech Emotion Recognition Using 2D-CNN with Data Augmentation. International Conference on Software Engineering & Computer Systems and 4th International Conference on Computational Science and Information Management, s. 685-689. DOI: 10.1109/ICSECS52883.2021.00130
Braunschweiler, N., Doddipatla, R., Keizer, S., Stoyanchev, S. 2021. A Study on Cross-Corpus Speech Emotion Recognition and Data Augmentation, IEEE Automatic Speech Recognition and Understanding Workshop, s. 24-30. DOI: 10.1109/ASRU51503.2021.9687987
Paraskevopoulou, G., Spyrou, E., Perantonis, S.J. 2022. A Data Augmentation Approach for Improving the Performance of Speech Emotion Recognition. 19th International Conference on Signal Processing and Multimedia Applications, s. 61-69. DOI: 10.5220/0011148000003289
Jahangir, R., Teh, Y.W., Mujtaba, G., Alroobaea, R., Shaikh, Z.H., Ali, I. 2022. Convolutional Neural Network-based Cross-corpus Speech Emotion Recognition with Data Augmentation and Features Fusion, Machine Vision and Applications, Cilt 33(41), s. 1-16. DOI: 10.1007/s00138-022-01294-x
Tao, H., Shan, S., Hu, Z., Zhu, C., Ge, H. 2023. Strong Generalized Speech Emotion Recognition Based on Effective Data Augmentation, Entropy, Cilt 25(68), s. 1-16. DOI: 10.3390/e25010068
Chatziagapi, A., Paraskevopoulos, G., Sgouropoulos, D., Pantazopoulos, G., Nikandrou, M., Giannakopoulos, T., Katsamanis, A., Potamianos, A., Narayanan, S. 2019. Data Augmentation Using GANs for Speech Emotion Recognition, Interspeech, s. 171-175. DOI: 10.21437/Interspeech.2019-2561
Yi, L., Mak, M.W. 2022. Improving Speech Emotion Recognition With Adversarial Data Augmentation Network, IEEE Transactions on Neural Networks and Learning Systems, Cilt 33(1), s. 172-184. DOI: 10.1109/TNNLS.2020.3027600
Baek, J-Y., Lee, S-P. 2023. Enhanced Speech Emotion Recognition Using DCGAN-Based Data Augmentation, Electronics, Cilt 12(18):3966. DOI: 10.3390/electronics12183966
Padi, S., Sadjadi, S.O., Sriram, R.D., Manocha, D. 2021. Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation, International Conference on Multimodal Interaction, s. 645–652. DOI: 10.1145/3462244.3481003
Pham, N.T., Dang, D.N.M., Nguyen, D.Y., Nguyen, T.T., Nguyen, H., Manavalan, B., Lim, C.P., Nguyen, S.D. 2023. Hybrid Data Augmentation and Deep Attention-based Dilated Convolutional-recurrent Neural Networks for Speech Emotion Recognition, Expert Systems with Applications, Cilt 230. DOI: 10.1016/j.eswa.2023.120608
Jothimani, S., Premalatha, K. 2022. MFF-SAug: Multi Feature Fusion with Spectrogram Augmentation of Speech Emotion Recognition using Convolution Neural Network, Chaos, Solitons & Fractals. DOI: 10.1016/j.chaos.2022.112512
Al-onazi, B.B., Nauman, M.A., Jahangir, R., Malik, M.M., Alkhammash, E.H., Elshewey, A.M. 2022. Transformer-Based Multilingual Speech Emotion Recognition Using Data Augmentation and Feature Fusion, Applied Sciences, Cilt 12(18): 9188. DOI: 10.3390/app12189188
Tiwari, U., Soni, M., Chakraborty, R., Panda, A., Kopparapu, S.K. 2020. Multi-Conditioning and Data Augmentation Using Generative Noise Model for Speech Emotion Recognition in Noisy Conditions, IEEE International Conference on Acoustics, Speech and Signal Processing, Spain, 7194-7198. DOI: 10.1109/ICASSP40776.2020.9053581
Lakomkin, E., Zamani, M.A., Weber, C., Magg, S., Wermter, S. 2018. On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks, IEEE International Conference on Intelligent Robots and Systems, Spain, 854-860. DOI: 10.1109/IROS.2018.8593571
Pan, S-T., Wu, H-J. 2023. Performance Improvement of Speech Emotion Recognition Systems by Combining 1D CNN and LSTM with Data Augmentation, Electronics, Cilt 12(11): 2436. DOI: 10.3390/electronics12112436
Chaturvedi, I., Noel, T., Satapathy, R. 2022. Speech Emotion Recognition Using Audio Matching, Electronics, Cilt 11(23): 3943. DOI: 10.3390/electronics11233943
Ibrahim, K.M., Perzo, A., Leglaive, S. 2024. Towards Improving Speech Emotion Recognition Using Synthetic Data Augmentation from Emotion Conversion, IEEE International Conference on Acoustics, Speech and Signal Processing, Korea, 10636-10640. DOI: 10.1109/ICASSP48485.2024.10445740
Shoumy, N.J., Ang, L.M., Rahaman, D.M.M., Zia, T., Seng, K.P., Khatun, S. 2021. Augmented Audio Data in Improving Speech Emotion Classification Tasks, Advances and Trends in Artificial Intelligence: From Theory to Practice, 360-365. DOI: 10.1007/978-3-030-79463-7_30
Abdelwahab, M., Busso, C. 2018. Study of Dense Network Approaches for Speech Emotion Recognition, IEEE International Conference on Acoustics, Speech and Signal Processing, Canada, 5084-8388. DOI: 10.1109/ICASSP.2018.8461866
Ando, A., Mori, T., Kobashikawa, S., Toda, T. 2021. Speech Emotion Recognition based on Listener-dependent Emotion Perception Models, APSIPA Transactions on Signal and Information Processing, Cilt 10, e6. DOI: 10.1017/ATSIP.2021.7
Ando, A., Masumura, R., Kamiyama, H., Kobashikawa, S., Aono, Y. 2019. Speech Emotion Recognition Based on Multi-Label Emotion Existence Model, Proc. Interspeech, 2818-2822. DOI: 10.21437/Interspeech.2019-2524
Rajan, R., Raj, T.V.H. 2023. SENet-based Speech Emotion Recognition using Synthesis-style Transfer Data Augmentation, International Journal of Speech Technology, Cilt 26(4), s. 1017–1030. DOI: 10.1007/s10772-023-10071-8
Sahoo, K.K., Dutta, I., Ijaz, M.F., Woźniak, M., Singh, P.K. 2021. TLEFuzzyNet: Fuzzy Rank-Based Ensemble of Transfer Learning Models for Emotion Recognition From Human Speeches, IEEE Access, Cilt 9, s. 166518-166530. DOI: 10.1109/ACCESS.2021.3135658
Mishra, P., Sharma, R. 2020. Gender Differentiated Convolutional Neural Networks for Speech Emotion Recognition, 12th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops, Czech Republic, 142-148. DOI: 10.1109/ICUMT51630.2020.9222412
Ottoni, L.T.C., Ottoni, A.L.C., Cerqueira, J.d.J.F. 2023. A Deep Learning Approach for Speech Emotion Recognition Optimization Using Meta-Learning, Electronics, Cilt 12(23), s. 4859. DOI: 10.3390/electronics12234859
Huynh, C.M., Balas, B. 2014. Emotion Recognition (sometimes) Depends on Horizontal Orientations, Attention, Perception, & Psychophysics, Cilt 76, s. 1381-1392. DOI: 10.3758/s13414-014-0669-4
Wei, C., Sun, X., Tian, F., Ren, F. 2019. Speech Emotion Recognition with Hybrid Neural Network, 5th International Conference on Big Data Computing and Communications (BIGCOM), China, s. 298-302. DOI: 10.1109/BIGCOM.2019.00051
Etienne, C., Fidanza, G., Petrovskii, A., Devillers, L., Schmauch, B. 2018. CNN+LSTM Architecture for Speech Emotion Recognition with Data Augmentation, Workshop on Speech, Music and Mind, s. 21-25. DOI: 10.21437/SMM.2018-5
Gupta, N., Priya, R.V., Verma, C.K. 2024. ERFN: Leveraging Context for Enhanced Emotion Detection, International Journal of Advanced Computer Science and Applications, Cilt 15(6). DOI: 10.14569/IJACSA.2024.0150663
Zhang, X., Wang, X., Yin, S. 2021. Multi-modal Data Transfer Learning-based LSTM Method for Speech Emotion Recognition, International Journal of Electronics and Information Engineering, Cilt 13(2), s. 54-65. DOI: 10.6636/IJEIE.202106_13(2).03
Mocanu, B., Tapu, R., Zaharia, T. 2023. Multimodal Emotion Recognition using Cross Modal Audio-Video Fusion with Attention and Deep Metric Learning, Image Vision Computing, Cilt 133. DOI: 10.1016/j.imavis.2023.104676
Valles, D. ve ark. 2023. Data Collection and Real-Time Facial Emotion Recognition in iOS Apps with CNN-Based Models, IEEE World AI IoT Congress, USA, 0669-0677. DOI: 10.1109/AIIoT58121.2023.10174520
Halvdansson, S. 2024. On a time-frequency blurring operator with applications in data augmentation. DOI: 10.48550/arXiv.2405.12899
Jothimani, S., Sangeethaa, S.N., Premalatha, K., Sathishkannan, R. 2023. A New Spatio-Temporal Neural Architecture with Bi-LSTM for Multimodal Emotion Recognition, 8th International Conference on Communication and Electronics Systems, India, 257-262. DOI: 10.1109/ICCES57224.2023.10192713
Dossou, B.F., Gbenou, Y.K. 2021. FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition, IEEE/CVF International Conference on Computer Vision Workshops, 3526-3531. DOI: 10.1109/ICCVW54120.2021.00393
Falahzadeh, M.R., Farokhi, F., Harimi, A., Nadooshan, R.S. 2023. Deep Convolutional Neural Network and Gray Wolf Optimization Algorithm for Speech Emotion Recognition, Circuits Systems Signal Process, Cilt 42, s. 449–492. DOI: 10.1007/s00034-022-02130-3
Bakhshi, A., Harimi, A., Chalup, S.K. 2022. CyTex: Transforming Speech to Textured Images for Speech Emotion Recognition, Speech Communication, Cilt 139, s. 62-75. DOI: 10.1016/j.specom.2022.02.007
Jordal, I., Tamazian, A., Theofanis, E. ve ark. Audiomentations. https://zenodo.org/doi/10.5281/zenodo.6046288 (Erişim Tarihi: 24.10.2024).
Demsar, J., Curk, T., Erjavec, A., Gorup, C., Hocevar, T., Milutinovic, M., Mozina, M., Polajnar, M., Toplak, M., Staric, A., Stajdohar, M., Umek, L., Zagar, L., Zbontar, J., Zitnik, M., Zupan, B. 2013. Orange: Data Mining Toolbox in Python, Journal of Machine Learning Research, 14(Aug): 2349−2353.
Meral, H.M., Ekenel, H.K., Ozsoy, A. 2003. Analysis of Emotion in Turkish, XVII National Conference on Turkish Linguistics.
Çalışkan, Y. E., İnce, G. 2015. Emotion Recognition using Auditory Cues, 23nd Signal Processing and Communications Applications Conference (SIU), s. 2042-2045. DOI: 10.1109/SIU.2015.7130269

Duygusal Konuşma Tanımada Yapay Veri Kullanımı

Yıl 2025, Cilt: 27 Sayı: 81, 359 - 375, 29.09.2025

Umut Avcı

https://doi.org/10.21205/deufmd.2025278104

Öz

Bu çalışma, Türkçe konuşmalarda duygu tanıma performansını geliştirmek üzere veri artırma tekniklerinin rolünü incelemekte ve BUEMODB ile ITUDB veri kümelerini temel almaktadır. Konuşmaların sessiz bölümlerin kaldırılması ve ses sinyallerinin normalizasyonu ile gerçekleştirilen ön işleme aşamasının ardından, ses verileri mel spektrogramlara dönüştürülmüş, altı öznitelik seti çıkarılmış ve yedi farklı denetimli öğrenme algoritması kullanılarak temel sınıflandırma yapılmıştır. İlk deneyler sonucunda BUEMODB veri seti için %56,3, ITUDB veri seti için %65,2 F1 skoru elde edilmiştir. Sonraki deneylerde, veri artırma teknikleri kullanılarak eğitim verisi beş kat büyütülmüştür. Bu kapsamda Gürültü Ekleme ve Ses Tonu Değiştirme gibi ses dönüşümlerinin yanı sıra Yakınlaştırma ve Yükseklik Kaydırma gibi görüntü dönüşümleri uygulanmıştır. Ses bazlı tekniklerle veri artırıldığında sınıflandırma başarısı iyileşmiş, Hava Emilimi ve Zaman Ölçekleme kombinasyonu ile F1 skorları BUEMODB için %57,6’ya, ITUDB için %71,3’e çıkmıştır. Görüntü bazlı veri artırma teknikleri daha da yüksek performans göstererek BUEMODB için %60,0’lık, ITUDB için %73,2’lik F1 skorları sağlamıştır. Son olarak, en iyi sonuç veren ses ve görüntü dönüşümlerini birleştiren hibrit bir yaklaşım denenmiştir. Bu yöntemle BUEMODB için %59,7, ITUDB için %75,1 F1 skoruna ulaşılmış ve temel performansa göre yaklaşık %10’luk bir artış kaydedilmiştir. Bulgular, özellikle görüntü ve hibrit tabanlı veri artırma tekniklerinin dikkatlice seçilmesi halinde duygu tanıma doğruluğunun önemli ölçüde yükseltilebileceğini göstermiştir.

Anahtar Kelimeler

Konuşmadan Duygu Tanıma , Veri Artırımı , Denetimli Öğrenme

Kaynakça

Malik, M., Malik, M.K., Mehmood, K. 2021. Automatic Speech Recognition: a Survey, Multimedia Tools Applications, Cilt 80, s. 9411–9457. DOI: 10.1007/s11042-020-10073-7
Cai, Z., Yang, Y., Li, M. 2023. Cross-lingual Multi-speaker Speech Synthesis with Limited Bilingual Training Data, Computer Speech and Language, Cilt 77, s. 101427. DOI: 10.1016/j.csl.2022.101427
Sharma, R., Govind, D., Mishra, J. 2024. Milestones in Speaker Recognition, Artificial Intelligence Review, Cilt 57, 58. DOI: 10.1007/s10462-023-10688-w
Maiti, S., Ueda, Y., Watanabe, S., Zhang, C. 2022. EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers. IEEE Spoken Language Technology Workshop (SLT), s. 480-487. DOI: 10.1109/SLT54892.2023.10022924
Mehrabian, A. 1968. Communication without Words, Psychology Today, Cilt 2, No. 4, s. 53-56.
Plaza, M., Kazala, R., Koruba, Z., Kozlowski, M., Lucińska, M., Sitek, K., Spyrka, J. 2022. Emotion Recognition Method for Call/Contact Centre Systems, Applied Sciences, Cilt 12, No. 21, s. 10951. DOI: 10.3390/app12211095
Bahreini, K., Nadolski, R., Westera, W. 2016. Towards Real-time Speech Emotion Recognition for Affective e-learning, Education and Information Technologies, Cilt 21, s. 1367–1386. DOI: 10.1007/s10639-015-9388-2
Li, H.-C., Pan, T., Lee, M.-H., Chiu, H.-W. 2021. Make Patient Consultation Warmer: A Clinical Application for Speech Emotion Recognition, Applied Sciences, Cilt 11, s. 4782. DOI: 10.3390/app11114782
Hu, J., Huang, Y., Hu, X., Xu, Y. 2023. The Acoustically Emotion-Aware Conversational Agent With Speech Emotion Recognition and Empathetic Responses, IEEE Transactions on Affective Computing, Cilt 14, No. 1, s. 17-30. DOI: 10.1109/TAFFC.2022.3205919
Frommel, J., Schrader, C., Weber, M. 2018. Towards Emotion-based Adaptive Games: Emotion Recognition Via Input and Performance Features. Proceedings of the 2018 Annual Symposium on Computer-Human Interaction in Play, s. 173–185. DOI: 10.1145/3242671.3242672
Chien, C., Wang, W-C., Moutinho, L., Cheng, Y-M., Pao, T.L., Chen, Y.T., Yeh, J.H. 2007. Applying Recognition Of Emotions In Speech To Extend The Impact Of Brand Slogan Research, Portuguese Journal of Management Studies, Cilt 0(2), s. 115-132.
Jing, S., Mao, X., Chen, L. 2019. Automatic Speech Discrete Labels to Dimensional Emotional Values Conversion Method, IET Biometrics, Cilt 8, s. 168-176. DOI: 10.1049/iet-bmt.2018.5016
Fahad, S., Ranjan, A., Yadav, J., Deepak, A. 2021. A Survey of Speech Emotion Recognition in Natural Environment, Digital Signal Processing, Cilt 110, s. 102951. DOI: 10.1016/j.dsp.2020.102951
Livingstone, S. R., Russo, F. A. 2018. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A Dynamic, Multimodal Set of Facial and Vocal Expressions in North American English, PLoS ONE, Cilt 13(5). DOI: 10.1371/journal.pone.0196391
Alhinti, L., Cunningham, S., Christensen, H. 2023. The Dysarthric Expressed Emotional Database (DEED): An Audio-visual Database in British English, PLoS ONE. Cilt 18(8). DOI: 10.1371/journal.pone.0287971
Blumentals, E., Salimbajevs, A. 2022. Emotion Recognition in Real-World Support Call Center Data for Latvian Language. ACM Intelligent User Interfaces Workshops, s. 200-203.
Grimm, M., Kroschel, K., Narayanan, S. 2008. The Vera am Mittag German Audio-visual Emotional Speech Database. IEEE International Conference on Multimedia and Expo, s. 865-868. DOI: 10.1109/ICME.2008.4607572
Lotfian, R., Busso, C. 2019. Building Naturalistic Emotionally Balanced Speech Corpus by Retrieving Emotional Speech from Existing Podcast Recordings, IEEE Transactions on Affective Computing, Cilt 10, No. 4, s. 471-483. DOI: 10.1109/TAFFC.2017.2736999
McKeown, G., Valstar, M., Cowie, R., Pantic, M., Schroder, M. 2012. The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent, IEEE Transactions on Affective Computing, Cilt 3, No. 1, s. 5-17. DOI: 10.1109/T-AFFC.2011.20
Batliner, A., Steidl, S., Nöth, E. 2008. Releasing a Thoroughly Annotated and Processed Spontaneous Emotional Database: The FAU Aibo Emotion Corpus. Proceedings of the Satellite Workshop of LREC, s. 26–27.
Swain, M., Routray, A., Kabisatpathy, P. 2018. Databases, Features and Classifiers for Speech Emotion Recognition: a Review, Journal of Speech Technologies, Cilt 21(1), s. 93-120. DOI: 10.1007/s10772-018-9491-z
Li, X., Akagi, M. 2019. Improving Multilingual Speech Emotion Recognition by Combining Acoustic Features in a Three-layer Model, Speech Communication, Cilt 110, s. 1-12. DOI: 10.1016/j.specom.2019.04.004
Cao, H., Verma, R., Nenkova, A. 2015. Speaker-sensitive Emotion Recognition via Ranking: Studies on Acted and Spontaneous Speech, Computer Speech and Language, Cilt 29(1), s. 186-202. DOI: 10.1016/j.csl.2014.01.003
Al-Dujaili, M.J., Ebrahimi-Moghadam, A. 2023. Speech Emotion Recognition: A Comprehensive Survey, Wireless Personal Communication, Cilt 129, s. 2525–2561. DOI: 10.1007/s11277-023-10244-3
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J. G. 2001. Emotion Recognition in Human-computer Interaction, IEEE Signal Processing Magazine, Cilt 18(1), s. 32-80. DOI: 10.1109/79.911197
Sakurai, M., Kosaka, T. 2021. Emotion Recognition Combining Acoustic and Linguistic Features Based on Speech Recognition Results. IEEE 10th Global Conference on Consumer Electronics, s. 824-827. DOI: 10.1109/GCCE53005.2021.9621810
Sato, K., Kishi, K., Kosaka, T. 2023. Speech Emotion Recognition by Late Fusion of Linguistic and Acoustic Features using Deep Learning Models. Asia Pacific Signal and Information Processing Association Annual Summit and Conference, s. 1013-1018. DOI: 10.1109/APSIPAASC58517.2023.10317325
Santoso, J., Ishizuka, K., Hashimoto, T. 2024. Large Language Model-Based Emotional Speech Annotation Using Context and Acoustic Feature for Speech Emotion Recognition. IEEE International Conference on Acoustics, Speech and Signal Processing, s. 11026-11030. DOI: 10.1109/ICASSP48485.2024.10448316
Triantafyllopoulos, A., Wagner, J., Wierstorf, H., Schmitt, M., Reichel, U.D., Eyben, F., Burkhardt, F., & Schuller, B. 2022. Probing Speech Emotion Recognition Transformers for Linguistic Knowledge. Interspeech, s. 146—150. DOI: 10.21437/Interspeech.2022-10371
Pham, N.T., Tran, A.T., Pham, B.N.H., Dang-Ngoc, H., Nguyen, S.D., Dang, D.N.M. 2024. Speech Emotion Recognition: A Brief Review of Multi-modal Multi-task Learning Approaches, Recent Advances in Electrical Engineering and Related Sciences: Theory and Application. DOI: 10.1007/978-981-99-8703-0_50
Lim, Y., Ng, K-W., Naveen, P., Haw, S-C. 2022. Emotion Recognition by Facial Expression and Voice: Review and Analysis, Journal of Informatics and Web Engineering, Cilt 1, s. 45-54. DOI: 10.33093/jiwe.2022.1.2.4
Nantasri, P. ve ark. 2020. A Light-Weight Artificial Neural Network for Speech Emotion Recognition using Average Values of MFCCs and Their Derivatives. 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, s. 41-44. DOI: 10.1109/ECTI-CON49241.2020.9158221
Huang, S., Dang, H., Jiang, R., Hao, Y., Xue, C., Gu, W. 2021. Multi-Layer Hybrid Fuzzy Classification Based on SVM and Improved PSO for Speech Emotion Recognition, Electronics, Cilt 10(23), s. 2891. DOI: 10.3390/electronics10232891
Fahad, M.S., Deepak, A., Pradhan, G., Yadav, J. 2021. DNN-HMM-Based Speaker-Adaptive Emotion Recognition Using MFCC and Epoch-Based Features, Circuits Systems and Signal Processing, Cilt 40, s. 466–489. DOI: 10.1007/s00034-020-01486-8
Tashev, I.J., Wang, Z-Q., Godin, K. 2017. Speech Emotion Recognition based on Gaussian Mixture Models and Deep Neural Networks. Information Theory and Applications Workshop (ITA), s. 1-4. DOI: 10.1109/ITA.2017.8023477
Dogdu, C., Kessler, T., Schneider, D., Shadaydeh, M., Schweinberger, S.R. 2022. A Comparison of Machine Learning Algorithms and Feature Sets for Automatic Vocal Emotion Recognition in Speech, Sensors, Cilt 22(19):7561. DOI: 10.3390/s22197561
Sun, L., Li, Q., Fu, S., Li, P. 2022. Speech Emotion Recognition based on Genetic Algorithm–decision Tree Fusion of Deep and Acoustic Features, ETRI Journal, Cilt 44(3), s. 462-475. DOI: 10.4218/etrij.2020-0458
Ramesh, S., Gomathi, S., Sasikala, S., Saravanan, T.R. 2023. Automatic Speech Emotion Detection using Hybrid of Gray Wolf Optimizer and Naïve Bayes, International Journal of Speech Technology, Cilt 26, s. 571–578. DOI: 10.1007/s10772-021-09870-8
Peng, Z., Lu, Y., Pan, S., Liu, Y. 2021. Efficient Speech Emotion Recognition Using Multi-Scale CNN and Attention. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), s. 3020-3024. DOI: 10.1109/ICASSP39728.2021.9414286
Trigeorgis, G., Ringeval, F., Brueckner, R., Marchi, E., Nicolaou, M.A., Schuller, B., Zafeiriou, S. 2016. Adieu Features? End-to-end Speech Emotion Recognition using a Deep Convolutional Recurrent Network. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, s. 5200-5204. DOI: 10.1109/ICASSP.2016.7472669
Huang, Y., Tian, K., Wu, A., Zhang, G. 2019. Feature Fusion Methods Research based on Deep Belief Networks for Speech Emotion Recognition under Noise Condition, Journal of Ambient Intelligence Humanized Computing, Cilt 10, s. 1787–1798. DOI: 10.1007/s12652-017-0644-8
Jianfeng, Z., Xia, M., Lijiang, C. 2019. Speech Emotion Recognition using Deep 1D & 2D CNN LSTM networks, Biomedical Signal Processing and Control, Cilt 47, s. 312-323. DOI: 10.1016/J.BSPC.2018.08.035
Zhang, C., Xue, L. 2021. Autoencoder With Emotion Embedding for Speech Emotion Recognition, IEEE Access, Cilt 9, s. 51231-51241. DOI: 10.1109/ACCESS.2021.3069818
Alzubaidi, L., Zhang, J., Humaidi, A.J., ve ark. 2021. Review of Deep Learning: Concepts, CNN Architectures, Challenges, Applications, Future Directions, Journal of Big Data, Cilt 8(53), s. 1-74. DOI: 10.1186/s40537-021-00444-8
Mujaddidurrahman, A., Ernawan, F., Wibowo, A., Sarwoko, E.A., Sugiharto, A., Wahyudi, M.D.R. 2021. Speech Emotion Recognition Using 2D-CNN with Data Augmentation. International Conference on Software Engineering & Computer Systems and 4th International Conference on Computational Science and Information Management, s. 685-689. DOI: 10.1109/ICSECS52883.2021.00130
Braunschweiler, N., Doddipatla, R., Keizer, S., Stoyanchev, S. 2021. A Study on Cross-Corpus Speech Emotion Recognition and Data Augmentation, IEEE Automatic Speech Recognition and Understanding Workshop, s. 24-30. DOI: 10.1109/ASRU51503.2021.9687987
Paraskevopoulou, G., Spyrou, E., Perantonis, S.J. 2022. A Data Augmentation Approach for Improving the Performance of Speech Emotion Recognition. 19th International Conference on Signal Processing and Multimedia Applications, s. 61-69. DOI: 10.5220/0011148000003289
Jahangir, R., Teh, Y.W., Mujtaba, G., Alroobaea, R., Shaikh, Z.H., Ali, I. 2022. Convolutional Neural Network-based Cross-corpus Speech Emotion Recognition with Data Augmentation and Features Fusion, Machine Vision and Applications, Cilt 33(41), s. 1-16. DOI: 10.1007/s00138-022-01294-x
Tao, H., Shan, S., Hu, Z., Zhu, C., Ge, H. 2023. Strong Generalized Speech Emotion Recognition Based on Effective Data Augmentation, Entropy, Cilt 25(68), s. 1-16. DOI: 10.3390/e25010068
Chatziagapi, A., Paraskevopoulos, G., Sgouropoulos, D., Pantazopoulos, G., Nikandrou, M., Giannakopoulos, T., Katsamanis, A., Potamianos, A., Narayanan, S. 2019. Data Augmentation Using GANs for Speech Emotion Recognition, Interspeech, s. 171-175. DOI: 10.21437/Interspeech.2019-2561
Yi, L., Mak, M.W. 2022. Improving Speech Emotion Recognition With Adversarial Data Augmentation Network, IEEE Transactions on Neural Networks and Learning Systems, Cilt 33(1), s. 172-184. DOI: 10.1109/TNNLS.2020.3027600
Baek, J-Y., Lee, S-P. 2023. Enhanced Speech Emotion Recognition Using DCGAN-Based Data Augmentation, Electronics, Cilt 12(18):3966. DOI: 10.3390/electronics12183966
Padi, S., Sadjadi, S.O., Sriram, R.D., Manocha, D. 2021. Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation, International Conference on Multimodal Interaction, s. 645–652. DOI: 10.1145/3462244.3481003
Pham, N.T., Dang, D.N.M., Nguyen, D.Y., Nguyen, T.T., Nguyen, H., Manavalan, B., Lim, C.P., Nguyen, S.D. 2023. Hybrid Data Augmentation and Deep Attention-based Dilated Convolutional-recurrent Neural Networks for Speech Emotion Recognition, Expert Systems with Applications, Cilt 230. DOI: 10.1016/j.eswa.2023.120608
Jothimani, S., Premalatha, K. 2022. MFF-SAug: Multi Feature Fusion with Spectrogram Augmentation of Speech Emotion Recognition using Convolution Neural Network, Chaos, Solitons & Fractals. DOI: 10.1016/j.chaos.2022.112512
Al-onazi, B.B., Nauman, M.A., Jahangir, R., Malik, M.M., Alkhammash, E.H., Elshewey, A.M. 2022. Transformer-Based Multilingual Speech Emotion Recognition Using Data Augmentation and Feature Fusion, Applied Sciences, Cilt 12(18): 9188. DOI: 10.3390/app12189188
Tiwari, U., Soni, M., Chakraborty, R., Panda, A., Kopparapu, S.K. 2020. Multi-Conditioning and Data Augmentation Using Generative Noise Model for Speech Emotion Recognition in Noisy Conditions, IEEE International Conference on Acoustics, Speech and Signal Processing, Spain, 7194-7198. DOI: 10.1109/ICASSP40776.2020.9053581
Lakomkin, E., Zamani, M.A., Weber, C., Magg, S., Wermter, S. 2018. On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks, IEEE International Conference on Intelligent Robots and Systems, Spain, 854-860. DOI: 10.1109/IROS.2018.8593571
Pan, S-T., Wu, H-J. 2023. Performance Improvement of Speech Emotion Recognition Systems by Combining 1D CNN and LSTM with Data Augmentation, Electronics, Cilt 12(11): 2436. DOI: 10.3390/electronics12112436
Chaturvedi, I., Noel, T., Satapathy, R. 2022. Speech Emotion Recognition Using Audio Matching, Electronics, Cilt 11(23): 3943. DOI: 10.3390/electronics11233943
Ibrahim, K.M., Perzo, A., Leglaive, S. 2024. Towards Improving Speech Emotion Recognition Using Synthetic Data Augmentation from Emotion Conversion, IEEE International Conference on Acoustics, Speech and Signal Processing, Korea, 10636-10640. DOI: 10.1109/ICASSP48485.2024.10445740
Shoumy, N.J., Ang, L.M., Rahaman, D.M.M., Zia, T., Seng, K.P., Khatun, S. 2021. Augmented Audio Data in Improving Speech Emotion Classification Tasks, Advances and Trends in Artificial Intelligence: From Theory to Practice, 360-365. DOI: 10.1007/978-3-030-79463-7_30
Abdelwahab, M., Busso, C. 2018. Study of Dense Network Approaches for Speech Emotion Recognition, IEEE International Conference on Acoustics, Speech and Signal Processing, Canada, 5084-8388. DOI: 10.1109/ICASSP.2018.8461866
Ando, A., Mori, T., Kobashikawa, S., Toda, T. 2021. Speech Emotion Recognition based on Listener-dependent Emotion Perception Models, APSIPA Transactions on Signal and Information Processing, Cilt 10, e6. DOI: 10.1017/ATSIP.2021.7
Ando, A., Masumura, R., Kamiyama, H., Kobashikawa, S., Aono, Y. 2019. Speech Emotion Recognition Based on Multi-Label Emotion Existence Model, Proc. Interspeech, 2818-2822. DOI: 10.21437/Interspeech.2019-2524
Rajan, R., Raj, T.V.H. 2023. SENet-based Speech Emotion Recognition using Synthesis-style Transfer Data Augmentation, International Journal of Speech Technology, Cilt 26(4), s. 1017–1030. DOI: 10.1007/s10772-023-10071-8
Sahoo, K.K., Dutta, I., Ijaz, M.F., Woźniak, M., Singh, P.K. 2021. TLEFuzzyNet: Fuzzy Rank-Based Ensemble of Transfer Learning Models for Emotion Recognition From Human Speeches, IEEE Access, Cilt 9, s. 166518-166530. DOI: 10.1109/ACCESS.2021.3135658
Mishra, P., Sharma, R. 2020. Gender Differentiated Convolutional Neural Networks for Speech Emotion Recognition, 12th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops, Czech Republic, 142-148. DOI: 10.1109/ICUMT51630.2020.9222412
Ottoni, L.T.C., Ottoni, A.L.C., Cerqueira, J.d.J.F. 2023. A Deep Learning Approach for Speech Emotion Recognition Optimization Using Meta-Learning, Electronics, Cilt 12(23), s. 4859. DOI: 10.3390/electronics12234859
Huynh, C.M., Balas, B. 2014. Emotion Recognition (sometimes) Depends on Horizontal Orientations, Attention, Perception, & Psychophysics, Cilt 76, s. 1381-1392. DOI: 10.3758/s13414-014-0669-4
Wei, C., Sun, X., Tian, F., Ren, F. 2019. Speech Emotion Recognition with Hybrid Neural Network, 5th International Conference on Big Data Computing and Communications (BIGCOM), China, s. 298-302. DOI: 10.1109/BIGCOM.2019.00051
Etienne, C., Fidanza, G., Petrovskii, A., Devillers, L., Schmauch, B. 2018. CNN+LSTM Architecture for Speech Emotion Recognition with Data Augmentation, Workshop on Speech, Music and Mind, s. 21-25. DOI: 10.21437/SMM.2018-5
Gupta, N., Priya, R.V., Verma, C.K. 2024. ERFN: Leveraging Context for Enhanced Emotion Detection, International Journal of Advanced Computer Science and Applications, Cilt 15(6). DOI: 10.14569/IJACSA.2024.0150663
Zhang, X., Wang, X., Yin, S. 2021. Multi-modal Data Transfer Learning-based LSTM Method for Speech Emotion Recognition, International Journal of Electronics and Information Engineering, Cilt 13(2), s. 54-65. DOI: 10.6636/IJEIE.202106_13(2).03
Mocanu, B., Tapu, R., Zaharia, T. 2023. Multimodal Emotion Recognition using Cross Modal Audio-Video Fusion with Attention and Deep Metric Learning, Image Vision Computing, Cilt 133. DOI: 10.1016/j.imavis.2023.104676
Valles, D. ve ark. 2023. Data Collection and Real-Time Facial Emotion Recognition in iOS Apps with CNN-Based Models, IEEE World AI IoT Congress, USA, 0669-0677. DOI: 10.1109/AIIoT58121.2023.10174520
Halvdansson, S. 2024. On a time-frequency blurring operator with applications in data augmentation. DOI: 10.48550/arXiv.2405.12899
Jothimani, S., Sangeethaa, S.N., Premalatha, K., Sathishkannan, R. 2023. A New Spatio-Temporal Neural Architecture with Bi-LSTM for Multimodal Emotion Recognition, 8th International Conference on Communication and Electronics Systems, India, 257-262. DOI: 10.1109/ICCES57224.2023.10192713
Dossou, B.F., Gbenou, Y.K. 2021. FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition, IEEE/CVF International Conference on Computer Vision Workshops, 3526-3531. DOI: 10.1109/ICCVW54120.2021.00393
Falahzadeh, M.R., Farokhi, F., Harimi, A., Nadooshan, R.S. 2023. Deep Convolutional Neural Network and Gray Wolf Optimization Algorithm for Speech Emotion Recognition, Circuits Systems Signal Process, Cilt 42, s. 449–492. DOI: 10.1007/s00034-022-02130-3
Bakhshi, A., Harimi, A., Chalup, S.K. 2022. CyTex: Transforming Speech to Textured Images for Speech Emotion Recognition, Speech Communication, Cilt 139, s. 62-75. DOI: 10.1016/j.specom.2022.02.007
Jordal, I., Tamazian, A., Theofanis, E. ve ark. Audiomentations. https://zenodo.org/doi/10.5281/zenodo.6046288 (Erişim Tarihi: 24.10.2024).
Demsar, J., Curk, T., Erjavec, A., Gorup, C., Hocevar, T., Milutinovic, M., Mozina, M., Polajnar, M., Toplak, M., Staric, A., Stajdohar, M., Umek, L., Zagar, L., Zbontar, J., Zitnik, M., Zupan, B. 2013. Orange: Data Mining Toolbox in Python, Journal of Machine Learning Research, 14(Aug): 2349−2353.
Meral, H.M., Ekenel, H.K., Ozsoy, A. 2003. Analysis of Emotion in Turkish, XVII National Conference on Turkish Linguistics.
Çalışkan, Y. E., İnce, G. 2015. Emotion Recognition using Auditory Cues, 23nd Signal Processing and Communications Applications Conference (SIU), s. 2042-2045. DOI: 10.1109/SIU.2015.7130269

Toplam 85 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	İletişim Mühendisliği (Diğer), Performans Değerlendirmesi
Bölüm	Araştırma Makalesi
Yazarlar	Umut Avcı 0000-0002-7433-8704
Erken Görünüm Tarihi	25 Eylül 2025
Yayımlanma Tarihi	29 Eylül 2025
Gönderilme Tarihi	24 Eylül 2024
Kabul Tarihi	16 Kasım 2024
Yayımlandığı Sayı	Yıl 2025 Cilt: 27 Sayı: 81

Kaynak Göster

APA	Avcı, U. (2025). Duygusal Konuşma Tanımada Yapay Veri Kullanımı. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi, 27(81), 359-375. https://doi.org/10.21205/deufmd.2025278104
AMA	Avcı U. Duygusal Konuşma Tanımada Yapay Veri Kullanımı. DEUFMD. Eylül 2025;27(81):359-375. doi:10.21205/deufmd.2025278104
Chicago	Avcı, Umut. “Duygusal Konuşma Tanımada Yapay Veri Kullanımı”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 27, sy. 81 (Eylül 2025): 359-75. https://doi.org/10.21205/deufmd.2025278104.
EndNote	Avcı U (01 Eylül 2025) Duygusal Konuşma Tanımada Yapay Veri Kullanımı. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 27 81 359–375.
IEEE	U. Avcı, “Duygusal Konuşma Tanımada Yapay Veri Kullanımı”, DEUFMD, c. 27, sy. 81, ss. 359–375, 2025, doi: 10.21205/deufmd.2025278104.
ISNAD	Avcı, Umut. “Duygusal Konuşma Tanımada Yapay Veri Kullanımı”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 27/81 (Eylül2025), 359-375. https://doi.org/10.21205/deufmd.2025278104.
JAMA	Avcı U. Duygusal Konuşma Tanımada Yapay Veri Kullanımı. DEUFMD. 2025;27:359–375.
MLA	Avcı, Umut. “Duygusal Konuşma Tanımada Yapay Veri Kullanımı”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi, c. 27, sy. 81, 2025, ss. 359-75, doi:10.21205/deufmd.2025278104.
Vancouver	Avcı U. Duygusal Konuşma Tanımada Yapay Veri Kullanımı. DEUFMD. 2025;27(81):359-75.

Kapak Resmi İndir

Makale Dosyaları

Tam Metin

Dokuz Eylül Üniversitesi, Mühendislik Fakültesi Dekanlığı Tınaztepe Yerleşkesi, Adatepe Mah. Doğuş Cad. No: 207-I / 35390 Buca-İZMİR.