Araştırma Makalesi
BibTex RIS Kaynak Göster

Artificial Data Usage in Recognizing Emotional Speech

Yıl 2025, Cilt: 27 Sayı: 81, 359 - 375, 29.09.2025
https://doi.org/10.21205/deufmd.2025278104

Öz

This study investigates the effects of data augmentation techniques on emotion recognition in Turkish language speech, utilizing the BUEMODB and ITUDB datasets. Following the preprocessing phase, which involved the removal of silent segments and normalization of audio signals, baseline classification was established by converting audio into mel spectrograms, extracting six feature sets, and employing seven machine learning classifiers. The initial results indicated baseline F1 scores of 56.3% for the BUEMODB dataset and 65.2% for the ITUDB dataset. In subsequent experiments, data augmentation techniques were implemented to expand the training data fivefold through various audio transformations, such as Noise Injection and Pitch Shift, alongside image transformations including Zoom Range and Height Shift Range. The application of audio-based augmentation yielded improved classification outcomes, with BUEMODB achieving an accuracy of 57.6% and ITUDB reaching 71.3% when Air Absorption and Time Stretch were employed in combination. Furthermore, image-based augmentation contributed to enhanced performance, resulting in scores of 60.0% for BUEMODB and 73.2% for ITUDB. Ultimately, a hybrid approach was explored, integrating the highest-performing audio and image transformations. This approach led to F1 scores of 59.7% for BUEMODB and 75.1% for ITUDB, reflecting nearly a 10% improvement over baseline performance. The findings underscore that meticulously selected data augmentation techniques, particularly those that are image-based and hybrid, can significantly improve the accuracy of emotion recognition while mitigating the drawbacks associated with excessive transformations.

Kaynakça

  • Malik, M., Malik, M.K., Mehmood, K. 2021. Automatic Speech Recognition: a Survey, Multimedia Tools Applications, Cilt 80, s. 9411–9457. DOI: 10.1007/s11042-020-10073-7
  • Cai, Z., Yang, Y., Li, M. 2023. Cross-lingual Multi-speaker Speech Synthesis with Limited Bilingual Training Data, Computer Speech and Language, Cilt 77, s. 101427. DOI: 10.1016/j.csl.2022.101427
  • Sharma, R., Govind, D., Mishra, J. 2024. Milestones in Speaker Recognition, Artificial Intelligence Review, Cilt 57, 58. DOI: 10.1007/s10462-023-10688-w
  • Maiti, S., Ueda, Y., Watanabe, S., Zhang, C. 2022. EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers. IEEE Spoken Language Technology Workshop (SLT), s. 480-487. DOI: 10.1109/SLT54892.2023.10022924
  • Mehrabian, A. 1968. Communication without Words, Psychology Today, Cilt 2, No. 4, s. 53-56.
  • Plaza, M., Kazala, R., Koruba, Z., Kozlowski, M., Lucińska, M., Sitek, K., Spyrka, J. 2022. Emotion Recognition Method for Call/Contact Centre Systems, Applied Sciences, Cilt 12, No. 21, s. 10951. DOI: 10.3390/app12211095
  • Bahreini, K., Nadolski, R., Westera, W. 2016. Towards Real-time Speech Emotion Recognition for Affective e-learning, Education and Information Technologies, Cilt 21, s. 1367–1386. DOI: 10.1007/s10639-015-9388-2
  • Li, H.-C., Pan, T., Lee, M.-H., Chiu, H.-W. 2021. Make Patient Consultation Warmer: A Clinical Application for Speech Emotion Recognition, Applied Sciences, Cilt 11, s. 4782. DOI: 10.3390/app11114782
  • Hu, J., Huang, Y., Hu, X., Xu, Y. 2023. The Acoustically Emotion-Aware Conversational Agent With Speech Emotion Recognition and Empathetic Responses, IEEE Transactions on Affective Computing, Cilt 14, No. 1, s. 17-30. DOI: 10.1109/TAFFC.2022.3205919
  • Frommel, J., Schrader, C., Weber, M. 2018. Towards Emotion-based Adaptive Games: Emotion Recognition Via Input and Performance Features. Proceedings of the 2018 Annual Symposium on Computer-Human Interaction in Play, s. 173–185. DOI: 10.1145/3242671.3242672
  • Chien, C., Wang, W-C., Moutinho, L., Cheng, Y-M., Pao, T.L., Chen, Y.T., Yeh, J.H. 2007. Applying Recognition Of Emotions In Speech To Extend The Impact Of Brand Slogan Research, Portuguese Journal of Management Studies, Cilt 0(2), s. 115-132.
  • Jing, S., Mao, X., Chen, L. 2019. Automatic Speech Discrete Labels to Dimensional Emotional Values Conversion Method, IET Biometrics, Cilt 8, s. 168-176. DOI: 10.1049/iet-bmt.2018.5016
  • Fahad, S., Ranjan, A., Yadav, J., Deepak, A. 2021. A Survey of Speech Emotion Recognition in Natural Environment, Digital Signal Processing, Cilt 110, s. 102951. DOI: 10.1016/j.dsp.2020.102951
  • Livingstone, S. R., Russo, F. A. 2018. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A Dynamic, Multimodal Set of Facial and Vocal Expressions in North American English, PLoS ONE, Cilt 13(5). DOI: 10.1371/journal.pone.0196391
  • Alhinti, L., Cunningham, S., Christensen, H. 2023. The Dysarthric Expressed Emotional Database (DEED): An Audio-visual Database in British English, PLoS ONE. Cilt 18(8). DOI: 10.1371/journal.pone.0287971
  • Blumentals, E., Salimbajevs, A. 2022. Emotion Recognition in Real-World Support Call Center Data for Latvian Language. ACM Intelligent User Interfaces Workshops, s. 200-203.
  • Grimm, M., Kroschel, K., Narayanan, S. 2008. The Vera am Mittag German Audio-visual Emotional Speech Database. IEEE International Conference on Multimedia and Expo, s. 865-868. DOI: 10.1109/ICME.2008.4607572
  • Lotfian, R., Busso, C. 2019. Building Naturalistic Emotionally Balanced Speech Corpus by Retrieving Emotional Speech from Existing Podcast Recordings, IEEE Transactions on Affective Computing, Cilt 10, No. 4, s. 471-483. DOI: 10.1109/TAFFC.2017.2736999
  • McKeown, G., Valstar, M., Cowie, R., Pantic, M., Schroder, M. 2012. The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent, IEEE Transactions on Affective Computing, Cilt 3, No. 1, s. 5-17. DOI: 10.1109/T-AFFC.2011.20
  • Batliner, A., Steidl, S., Nöth, E. 2008. Releasing a Thoroughly Annotated and Processed Spontaneous Emotional Database: The FAU Aibo Emotion Corpus. Proceedings of the Satellite Workshop of LREC, s. 26–27.
  • Swain, M., Routray, A., Kabisatpathy, P. 2018. Databases, Features and Classifiers for Speech Emotion Recognition: a Review, Journal of Speech Technologies, Cilt 21(1), s. 93-120. DOI: 10.1007/s10772-018-9491-z
  • Li, X., Akagi, M. 2019. Improving Multilingual Speech Emotion Recognition by Combining Acoustic Features in a Three-layer Model, Speech Communication, Cilt 110, s. 1-12. DOI: 10.1016/j.specom.2019.04.004
  • Cao, H., Verma, R., Nenkova, A. 2015. Speaker-sensitive Emotion Recognition via Ranking: Studies on Acted and Spontaneous Speech, Computer Speech and Language, Cilt 29(1), s. 186-202. DOI: 10.1016/j.csl.2014.01.003
  • Al-Dujaili, M.J., Ebrahimi-Moghadam, A. 2023. Speech Emotion Recognition: A Comprehensive Survey, Wireless Personal Communication, Cilt 129, s. 2525–2561. DOI: 10.1007/s11277-023-10244-3
  • Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J. G. 2001. Emotion Recognition in Human-computer Interaction, IEEE Signal Processing Magazine, Cilt 18(1), s. 32-80. DOI: 10.1109/79.911197
  • Sakurai, M., Kosaka, T. 2021. Emotion Recognition Combining Acoustic and Linguistic Features Based on Speech Recognition Results. IEEE 10th Global Conference on Consumer Electronics, s. 824-827. DOI: 10.1109/GCCE53005.2021.9621810
  • Sato, K., Kishi, K., Kosaka, T. 2023. Speech Emotion Recognition by Late Fusion of Linguistic and Acoustic Features using Deep Learning Models. Asia Pacific Signal and Information Processing Association Annual Summit and Conference, s. 1013-1018. DOI: 10.1109/APSIPAASC58517.2023.10317325
  • Santoso, J., Ishizuka, K., Hashimoto, T. 2024. Large Language Model-Based Emotional Speech Annotation Using Context and Acoustic Feature for Speech Emotion Recognition. IEEE International Conference on Acoustics, Speech and Signal Processing, s. 11026-11030. DOI: 10.1109/ICASSP48485.2024.10448316
  • Triantafyllopoulos, A., Wagner, J., Wierstorf, H., Schmitt, M., Reichel, U.D., Eyben, F., Burkhardt, F., & Schuller, B. 2022. Probing Speech Emotion Recognition Transformers for Linguistic Knowledge. Interspeech, s. 146—150. DOI: 10.21437/Interspeech.2022-10371
  • Pham, N.T., Tran, A.T., Pham, B.N.H., Dang-Ngoc, H., Nguyen, S.D., Dang, D.N.M. 2024. Speech Emotion Recognition: A Brief Review of Multi-modal Multi-task Learning Approaches, Recent Advances in Electrical Engineering and Related Sciences: Theory and Application. DOI: 10.1007/978-981-99-8703-0_50
  • Lim, Y., Ng, K-W., Naveen, P., Haw, S-C. 2022. Emotion Recognition by Facial Expression and Voice: Review and Analysis, Journal of Informatics and Web Engineering, Cilt 1, s. 45-54. DOI: 10.33093/jiwe.2022.1.2.4
  • Nantasri, P. ve ark. 2020. A Light-Weight Artificial Neural Network for Speech Emotion Recognition using Average Values of MFCCs and Their Derivatives. 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, s. 41-44. DOI: 10.1109/ECTI-CON49241.2020.9158221
  • Huang, S., Dang, H., Jiang, R., Hao, Y., Xue, C., Gu, W. 2021. Multi-Layer Hybrid Fuzzy Classification Based on SVM and Improved PSO for Speech Emotion Recognition, Electronics, Cilt 10(23), s. 2891. DOI: 10.3390/electronics10232891
  • Fahad, M.S., Deepak, A., Pradhan, G., Yadav, J. 2021. DNN-HMM-Based Speaker-Adaptive Emotion Recognition Using MFCC and Epoch-Based Features, Circuits Systems and Signal Processing, Cilt 40, s. 466–489. DOI: 10.1007/s00034-020-01486-8
  • Tashev, I.J., Wang, Z-Q., Godin, K. 2017. Speech Emotion Recognition based on Gaussian Mixture Models and Deep Neural Networks. Information Theory and Applications Workshop (ITA), s. 1-4. DOI: 10.1109/ITA.2017.8023477
  • Dogdu, C., Kessler, T., Schneider, D., Shadaydeh, M., Schweinberger, S.R. 2022. A Comparison of Machine Learning Algorithms and Feature Sets for Automatic Vocal Emotion Recognition in Speech, Sensors, Cilt 22(19):7561. DOI: 10.3390/s22197561
  • Sun, L., Li, Q., Fu, S., Li, P. 2022. Speech Emotion Recognition based on Genetic Algorithm–decision Tree Fusion of Deep and Acoustic Features, ETRI Journal, Cilt 44(3), s. 462-475. DOI: 10.4218/etrij.2020-0458
  • Ramesh, S., Gomathi, S., Sasikala, S., Saravanan, T.R. 2023. Automatic Speech Emotion Detection using Hybrid of Gray Wolf Optimizer and Naïve Bayes, International Journal of Speech Technology, Cilt 26, s. 571–578. DOI: 10.1007/s10772-021-09870-8
  • Peng, Z., Lu, Y., Pan, S., Liu, Y. 2021. Efficient Speech Emotion Recognition Using Multi-Scale CNN and Attention. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), s. 3020-3024. DOI: 10.1109/ICASSP39728.2021.9414286
  • Trigeorgis, G., Ringeval, F., Brueckner, R., Marchi, E., Nicolaou, M.A., Schuller, B., Zafeiriou, S. 2016. Adieu Features? End-to-end Speech Emotion Recognition using a Deep Convolutional Recurrent Network. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, s. 5200-5204. DOI: 10.1109/ICASSP.2016.7472669
  • Huang, Y., Tian, K., Wu, A., Zhang, G. 2019. Feature Fusion Methods Research based on Deep Belief Networks for Speech Emotion Recognition under Noise Condition, Journal of Ambient Intelligence Humanized Computing, Cilt 10, s. 1787–1798. DOI: 10.1007/s12652-017-0644-8
  • Jianfeng, Z., Xia, M., Lijiang, C. 2019. Speech Emotion Recognition using Deep 1D & 2D CNN LSTM networks, Biomedical Signal Processing and Control, Cilt 47, s. 312-323. DOI: 10.1016/J.BSPC.2018.08.035
  • Zhang, C., Xue, L. 2021. Autoencoder With Emotion Embedding for Speech Emotion Recognition, IEEE Access, Cilt 9, s. 51231-51241. DOI: 10.1109/ACCESS.2021.3069818
  • Alzubaidi, L., Zhang, J., Humaidi, A.J., ve ark. 2021. Review of Deep Learning: Concepts, CNN Architectures, Challenges, Applications, Future Directions, Journal of Big Data, Cilt 8(53), s. 1-74. DOI: 10.1186/s40537-021-00444-8
  • Mujaddidurrahman, A., Ernawan, F., Wibowo, A., Sarwoko, E.A., Sugiharto, A., Wahyudi, M.D.R. 2021. Speech Emotion Recognition Using 2D-CNN with Data Augmentation. International Conference on Software Engineering & Computer Systems and 4th International Conference on Computational Science and Information Management, s. 685-689. DOI: 10.1109/ICSECS52883.2021.00130
  • Braunschweiler, N., Doddipatla, R., Keizer, S., Stoyanchev, S. 2021. A Study on Cross-Corpus Speech Emotion Recognition and Data Augmentation, IEEE Automatic Speech Recognition and Understanding Workshop, s. 24-30. DOI: 10.1109/ASRU51503.2021.9687987
  • Paraskevopoulou, G., Spyrou, E., Perantonis, S.J. 2022. A Data Augmentation Approach for Improving the Performance of Speech Emotion Recognition. 19th International Conference on Signal Processing and Multimedia Applications, s. 61-69. DOI: 10.5220/0011148000003289
  • Jahangir, R., Teh, Y.W., Mujtaba, G., Alroobaea, R., Shaikh, Z.H., Ali, I. 2022. Convolutional Neural Network-based Cross-corpus Speech Emotion Recognition with Data Augmentation and Features Fusion, Machine Vision and Applications, Cilt 33(41), s. 1-16. DOI: 10.1007/s00138-022-01294-x
  • Tao, H., Shan, S., Hu, Z., Zhu, C., Ge, H. 2023. Strong Generalized Speech Emotion Recognition Based on Effective Data Augmentation, Entropy, Cilt 25(68), s. 1-16. DOI: 10.3390/e25010068
  • Chatziagapi, A., Paraskevopoulos, G., Sgouropoulos, D., Pantazopoulos, G., Nikandrou, M., Giannakopoulos, T., Katsamanis, A., Potamianos, A., Narayanan, S. 2019. Data Augmentation Using GANs for Speech Emotion Recognition, Interspeech, s. 171-175. DOI: 10.21437/Interspeech.2019-2561
  • Yi, L., Mak, M.W. 2022. Improving Speech Emotion Recognition With Adversarial Data Augmentation Network, IEEE Transactions on Neural Networks and Learning Systems, Cilt 33(1), s. 172-184. DOI: 10.1109/TNNLS.2020.3027600
  • Baek, J-Y., Lee, S-P. 2023. Enhanced Speech Emotion Recognition Using DCGAN-Based Data Augmentation, Electronics, Cilt 12(18):3966. DOI: 10.3390/electronics12183966
  • Padi, S., Sadjadi, S.O., Sriram, R.D., Manocha, D. 2021. Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation, International Conference on Multimodal Interaction, s. 645–652. DOI: 10.1145/3462244.3481003
  • Pham, N.T., Dang, D.N.M., Nguyen, D.Y., Nguyen, T.T., Nguyen, H., Manavalan, B., Lim, C.P., Nguyen, S.D. 2023. Hybrid Data Augmentation and Deep Attention-based Dilated Convolutional-recurrent Neural Networks for Speech Emotion Recognition, Expert Systems with Applications, Cilt 230. DOI: 10.1016/j.eswa.2023.120608
  • Jothimani, S., Premalatha, K. 2022. MFF-SAug: Multi Feature Fusion with Spectrogram Augmentation of Speech Emotion Recognition using Convolution Neural Network, Chaos, Solitons & Fractals. DOI: 10.1016/j.chaos.2022.112512
  • Al-onazi, B.B., Nauman, M.A., Jahangir, R., Malik, M.M., Alkhammash, E.H., Elshewey, A.M. 2022. Transformer-Based Multilingual Speech Emotion Recognition Using Data Augmentation and Feature Fusion, Applied Sciences, Cilt 12(18): 9188. DOI: 10.3390/app12189188
  • Tiwari, U., Soni, M., Chakraborty, R., Panda, A., Kopparapu, S.K. 2020. Multi-Conditioning and Data Augmentation Using Generative Noise Model for Speech Emotion Recognition in Noisy Conditions, IEEE International Conference on Acoustics, Speech and Signal Processing, Spain, 7194-7198. DOI: 10.1109/ICASSP40776.2020.9053581
  • Lakomkin, E., Zamani, M.A., Weber, C., Magg, S., Wermter, S. 2018. On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks, IEEE International Conference on Intelligent Robots and Systems, Spain, 854-860. DOI: 10.1109/IROS.2018.8593571
  • Pan, S-T., Wu, H-J. 2023. Performance Improvement of Speech Emotion Recognition Systems by Combining 1D CNN and LSTM with Data Augmentation, Electronics, Cilt 12(11): 2436. DOI: 10.3390/electronics12112436
  • Chaturvedi, I., Noel, T., Satapathy, R. 2022. Speech Emotion Recognition Using Audio Matching, Electronics, Cilt 11(23): 3943. DOI: 10.3390/electronics11233943
  • Ibrahim, K.M., Perzo, A., Leglaive, S. 2024. Towards Improving Speech Emotion Recognition Using Synthetic Data Augmentation from Emotion Conversion, IEEE International Conference on Acoustics, Speech and Signal Processing, Korea, 10636-10640. DOI: 10.1109/ICASSP48485.2024.10445740
  • Shoumy, N.J., Ang, L.M., Rahaman, D.M.M., Zia, T., Seng, K.P., Khatun, S. 2021. Augmented Audio Data in Improving Speech Emotion Classification Tasks, Advances and Trends in Artificial Intelligence: From Theory to Practice, 360-365. DOI: 10.1007/978-3-030-79463-7_30
  • Abdelwahab, M., Busso, C. 2018. Study of Dense Network Approaches for Speech Emotion Recognition, IEEE International Conference on Acoustics, Speech and Signal Processing, Canada, 5084-8388. DOI: 10.1109/ICASSP.2018.8461866
  • Ando, A., Mori, T., Kobashikawa, S., Toda, T. 2021. Speech Emotion Recognition based on Listener-dependent Emotion Perception Models, APSIPA Transactions on Signal and Information Processing, Cilt 10, e6. DOI: 10.1017/ATSIP.2021.7
  • Ando, A., Masumura, R., Kamiyama, H., Kobashikawa, S., Aono, Y. 2019. Speech Emotion Recognition Based on Multi-Label Emotion Existence Model, Proc. Interspeech, 2818-2822. DOI: 10.21437/Interspeech.2019-2524
  • Rajan, R., Raj, T.V.H. 2023. SENet-based Speech Emotion Recognition using Synthesis-style Transfer Data Augmentation, International Journal of Speech Technology, Cilt 26(4), s. 1017–1030. DOI: 10.1007/s10772-023-10071-8
  • Sahoo, K.K., Dutta, I., Ijaz, M.F., Woźniak, M., Singh, P.K. 2021. TLEFuzzyNet: Fuzzy Rank-Based Ensemble of Transfer Learning Models for Emotion Recognition From Human Speeches, IEEE Access, Cilt 9, s. 166518-166530. DOI: 10.1109/ACCESS.2021.3135658
  • Mishra, P., Sharma, R. 2020. Gender Differentiated Convolutional Neural Networks for Speech Emotion Recognition, 12th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops, Czech Republic, 142-148. DOI: 10.1109/ICUMT51630.2020.9222412
  • Ottoni, L.T.C., Ottoni, A.L.C., Cerqueira, J.d.J.F. 2023. A Deep Learning Approach for Speech Emotion Recognition Optimization Using Meta-Learning, Electronics, Cilt 12(23), s. 4859. DOI: 10.3390/electronics12234859
  • Huynh, C.M., Balas, B. 2014. Emotion Recognition (sometimes) Depends on Horizontal Orientations, Attention, Perception, & Psychophysics, Cilt 76, s. 1381-1392. DOI: 10.3758/s13414-014-0669-4
  • Wei, C., Sun, X., Tian, F., Ren, F. 2019. Speech Emotion Recognition with Hybrid Neural Network, 5th International Conference on Big Data Computing and Communications (BIGCOM), China, s. 298-302. DOI: 10.1109/BIGCOM.2019.00051
  • Etienne, C., Fidanza, G., Petrovskii, A., Devillers, L., Schmauch, B. 2018. CNN+LSTM Architecture for Speech Emotion Recognition with Data Augmentation, Workshop on Speech, Music and Mind, s. 21-25. DOI: 10.21437/SMM.2018-5
  • Gupta, N., Priya, R.V., Verma, C.K. 2024. ERFN: Leveraging Context for Enhanced Emotion Detection, International Journal of Advanced Computer Science and Applications, Cilt 15(6). DOI: 10.14569/IJACSA.2024.0150663
  • Zhang, X., Wang, X., Yin, S. 2021. Multi-modal Data Transfer Learning-based LSTM Method for Speech Emotion Recognition, International Journal of Electronics and Information Engineering, Cilt 13(2), s. 54-65. DOI: 10.6636/IJEIE.202106_13(2).03
  • Mocanu, B., Tapu, R., Zaharia, T. 2023. Multimodal Emotion Recognition using Cross Modal Audio-Video Fusion with Attention and Deep Metric Learning, Image Vision Computing, Cilt 133. DOI: 10.1016/j.imavis.2023.104676
  • Valles, D. ve ark. 2023. Data Collection and Real-Time Facial Emotion Recognition in iOS Apps with CNN-Based Models, IEEE World AI IoT Congress, USA, 0669-0677. DOI: 10.1109/AIIoT58121.2023.10174520
  • Halvdansson, S. 2024. On a time-frequency blurring operator with applications in data augmentation. DOI: 10.48550/arXiv.2405.12899
  • Jothimani, S., Sangeethaa, S.N., Premalatha, K., Sathishkannan, R. 2023. A New Spatio-Temporal Neural Architecture with Bi-LSTM for Multimodal Emotion Recognition, 8th International Conference on Communication and Electronics Systems, India, 257-262. DOI: 10.1109/ICCES57224.2023.10192713
  • Dossou, B.F., Gbenou, Y.K. 2021. FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition, IEEE/CVF International Conference on Computer Vision Workshops, 3526-3531. DOI: 10.1109/ICCVW54120.2021.00393
  • Falahzadeh, M.R., Farokhi, F., Harimi, A., Nadooshan, R.S. 2023. Deep Convolutional Neural Network and Gray Wolf Optimization Algorithm for Speech Emotion Recognition, Circuits Systems Signal Process, Cilt 42, s. 449–492. DOI: 10.1007/s00034-022-02130-3
  • Bakhshi, A., Harimi, A., Chalup, S.K. 2022. CyTex: Transforming Speech to Textured Images for Speech Emotion Recognition, Speech Communication, Cilt 139, s. 62-75. DOI: 10.1016/j.specom.2022.02.007
  • Jordal, I., Tamazian, A., Theofanis, E. ve ark. Audiomentations. https://zenodo.org/doi/10.5281/zenodo.6046288 (Erişim Tarihi: 24.10.2024).
  • Demsar, J., Curk, T., Erjavec, A., Gorup, C., Hocevar, T., Milutinovic, M., Mozina, M., Polajnar, M., Toplak, M., Staric, A., Stajdohar, M., Umek, L., Zagar, L., Zbontar, J., Zitnik, M., Zupan, B. 2013. Orange: Data Mining Toolbox in Python, Journal of Machine Learning Research, 14(Aug): 2349−2353.
  • Meral, H.M., Ekenel, H.K., Ozsoy, A. 2003. Analysis of Emotion in Turkish, XVII National Conference on Turkish Linguistics.
  • Çalışkan, Y. E., İnce, G. 2015. Emotion Recognition using Auditory Cues, 23nd Signal Processing and Communications Applications Conference (SIU), s. 2042-2045. DOI: 10.1109/SIU.2015.7130269

Duygusal Konuşma Tanımada Yapay Veri Kullanımı

Yıl 2025, Cilt: 27 Sayı: 81, 359 - 375, 29.09.2025
https://doi.org/10.21205/deufmd.2025278104

Öz

Bu çalışma, Türkçe konuşmalarda duygu tanıma performansını geliştirmek üzere veri artırma tekniklerinin rolünü incelemekte ve BUEMODB ile ITUDB veri kümelerini temel almaktadır. Konuşmaların sessiz bölümlerin kaldırılması ve ses sinyallerinin normalizasyonu ile gerçekleştirilen ön işleme aşamasının ardından, ses verileri mel spektrogramlara dönüştürülmüş, altı öznitelik seti çıkarılmış ve yedi farklı denetimli öğrenme algoritması kullanılarak temel sınıflandırma yapılmıştır. İlk deneyler sonucunda BUEMODB veri seti için %56,3, ITUDB veri seti için %65,2 F1 skoru elde edilmiştir. Sonraki deneylerde, veri artırma teknikleri kullanılarak eğitim verisi beş kat büyütülmüştür. Bu kapsamda Gürültü Ekleme ve Ses Tonu Değiştirme gibi ses dönüşümlerinin yanı sıra Yakınlaştırma ve Yükseklik Kaydırma gibi görüntü dönüşümleri uygulanmıştır. Ses bazlı tekniklerle veri artırıldığında sınıflandırma başarısı iyileşmiş, Hava Emilimi ve Zaman Ölçekleme kombinasyonu ile F1 skorları BUEMODB için %57,6’ya, ITUDB için %71,3’e çıkmıştır. Görüntü bazlı veri artırma teknikleri daha da yüksek performans göstererek BUEMODB için %60,0’lık, ITUDB için %73,2’lik F1 skorları sağlamıştır. Son olarak, en iyi sonuç veren ses ve görüntü dönüşümlerini birleştiren hibrit bir yaklaşım denenmiştir. Bu yöntemle BUEMODB için %59,7, ITUDB için %75,1 F1 skoruna ulaşılmış ve temel performansa göre yaklaşık %10’luk bir artış kaydedilmiştir. Bulgular, özellikle görüntü ve hibrit tabanlı veri artırma tekniklerinin dikkatlice seçilmesi halinde duygu tanıma doğruluğunun önemli ölçüde yükseltilebileceğini göstermiştir.

Kaynakça

  • Malik, M., Malik, M.K., Mehmood, K. 2021. Automatic Speech Recognition: a Survey, Multimedia Tools Applications, Cilt 80, s. 9411–9457. DOI: 10.1007/s11042-020-10073-7
  • Cai, Z., Yang, Y., Li, M. 2023. Cross-lingual Multi-speaker Speech Synthesis with Limited Bilingual Training Data, Computer Speech and Language, Cilt 77, s. 101427. DOI: 10.1016/j.csl.2022.101427
  • Sharma, R., Govind, D., Mishra, J. 2024. Milestones in Speaker Recognition, Artificial Intelligence Review, Cilt 57, 58. DOI: 10.1007/s10462-023-10688-w
  • Maiti, S., Ueda, Y., Watanabe, S., Zhang, C. 2022. EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers. IEEE Spoken Language Technology Workshop (SLT), s. 480-487. DOI: 10.1109/SLT54892.2023.10022924
  • Mehrabian, A. 1968. Communication without Words, Psychology Today, Cilt 2, No. 4, s. 53-56.
  • Plaza, M., Kazala, R., Koruba, Z., Kozlowski, M., Lucińska, M., Sitek, K., Spyrka, J. 2022. Emotion Recognition Method for Call/Contact Centre Systems, Applied Sciences, Cilt 12, No. 21, s. 10951. DOI: 10.3390/app12211095
  • Bahreini, K., Nadolski, R., Westera, W. 2016. Towards Real-time Speech Emotion Recognition for Affective e-learning, Education and Information Technologies, Cilt 21, s. 1367–1386. DOI: 10.1007/s10639-015-9388-2
  • Li, H.-C., Pan, T., Lee, M.-H., Chiu, H.-W. 2021. Make Patient Consultation Warmer: A Clinical Application for Speech Emotion Recognition, Applied Sciences, Cilt 11, s. 4782. DOI: 10.3390/app11114782
  • Hu, J., Huang, Y., Hu, X., Xu, Y. 2023. The Acoustically Emotion-Aware Conversational Agent With Speech Emotion Recognition and Empathetic Responses, IEEE Transactions on Affective Computing, Cilt 14, No. 1, s. 17-30. DOI: 10.1109/TAFFC.2022.3205919
  • Frommel, J., Schrader, C., Weber, M. 2018. Towards Emotion-based Adaptive Games: Emotion Recognition Via Input and Performance Features. Proceedings of the 2018 Annual Symposium on Computer-Human Interaction in Play, s. 173–185. DOI: 10.1145/3242671.3242672
  • Chien, C., Wang, W-C., Moutinho, L., Cheng, Y-M., Pao, T.L., Chen, Y.T., Yeh, J.H. 2007. Applying Recognition Of Emotions In Speech To Extend The Impact Of Brand Slogan Research, Portuguese Journal of Management Studies, Cilt 0(2), s. 115-132.
  • Jing, S., Mao, X., Chen, L. 2019. Automatic Speech Discrete Labels to Dimensional Emotional Values Conversion Method, IET Biometrics, Cilt 8, s. 168-176. DOI: 10.1049/iet-bmt.2018.5016
  • Fahad, S., Ranjan, A., Yadav, J., Deepak, A. 2021. A Survey of Speech Emotion Recognition in Natural Environment, Digital Signal Processing, Cilt 110, s. 102951. DOI: 10.1016/j.dsp.2020.102951
  • Livingstone, S. R., Russo, F. A. 2018. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A Dynamic, Multimodal Set of Facial and Vocal Expressions in North American English, PLoS ONE, Cilt 13(5). DOI: 10.1371/journal.pone.0196391
  • Alhinti, L., Cunningham, S., Christensen, H. 2023. The Dysarthric Expressed Emotional Database (DEED): An Audio-visual Database in British English, PLoS ONE. Cilt 18(8). DOI: 10.1371/journal.pone.0287971
  • Blumentals, E., Salimbajevs, A. 2022. Emotion Recognition in Real-World Support Call Center Data for Latvian Language. ACM Intelligent User Interfaces Workshops, s. 200-203.
  • Grimm, M., Kroschel, K., Narayanan, S. 2008. The Vera am Mittag German Audio-visual Emotional Speech Database. IEEE International Conference on Multimedia and Expo, s. 865-868. DOI: 10.1109/ICME.2008.4607572
  • Lotfian, R., Busso, C. 2019. Building Naturalistic Emotionally Balanced Speech Corpus by Retrieving Emotional Speech from Existing Podcast Recordings, IEEE Transactions on Affective Computing, Cilt 10, No. 4, s. 471-483. DOI: 10.1109/TAFFC.2017.2736999
  • McKeown, G., Valstar, M., Cowie, R., Pantic, M., Schroder, M. 2012. The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent, IEEE Transactions on Affective Computing, Cilt 3, No. 1, s. 5-17. DOI: 10.1109/T-AFFC.2011.20
  • Batliner, A., Steidl, S., Nöth, E. 2008. Releasing a Thoroughly Annotated and Processed Spontaneous Emotional Database: The FAU Aibo Emotion Corpus. Proceedings of the Satellite Workshop of LREC, s. 26–27.
  • Swain, M., Routray, A., Kabisatpathy, P. 2018. Databases, Features and Classifiers for Speech Emotion Recognition: a Review, Journal of Speech Technologies, Cilt 21(1), s. 93-120. DOI: 10.1007/s10772-018-9491-z
  • Li, X., Akagi, M. 2019. Improving Multilingual Speech Emotion Recognition by Combining Acoustic Features in a Three-layer Model, Speech Communication, Cilt 110, s. 1-12. DOI: 10.1016/j.specom.2019.04.004
  • Cao, H., Verma, R., Nenkova, A. 2015. Speaker-sensitive Emotion Recognition via Ranking: Studies on Acted and Spontaneous Speech, Computer Speech and Language, Cilt 29(1), s. 186-202. DOI: 10.1016/j.csl.2014.01.003
  • Al-Dujaili, M.J., Ebrahimi-Moghadam, A. 2023. Speech Emotion Recognition: A Comprehensive Survey, Wireless Personal Communication, Cilt 129, s. 2525–2561. DOI: 10.1007/s11277-023-10244-3
  • Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J. G. 2001. Emotion Recognition in Human-computer Interaction, IEEE Signal Processing Magazine, Cilt 18(1), s. 32-80. DOI: 10.1109/79.911197
  • Sakurai, M., Kosaka, T. 2021. Emotion Recognition Combining Acoustic and Linguistic Features Based on Speech Recognition Results. IEEE 10th Global Conference on Consumer Electronics, s. 824-827. DOI: 10.1109/GCCE53005.2021.9621810
  • Sato, K., Kishi, K., Kosaka, T. 2023. Speech Emotion Recognition by Late Fusion of Linguistic and Acoustic Features using Deep Learning Models. Asia Pacific Signal and Information Processing Association Annual Summit and Conference, s. 1013-1018. DOI: 10.1109/APSIPAASC58517.2023.10317325
  • Santoso, J., Ishizuka, K., Hashimoto, T. 2024. Large Language Model-Based Emotional Speech Annotation Using Context and Acoustic Feature for Speech Emotion Recognition. IEEE International Conference on Acoustics, Speech and Signal Processing, s. 11026-11030. DOI: 10.1109/ICASSP48485.2024.10448316
  • Triantafyllopoulos, A., Wagner, J., Wierstorf, H., Schmitt, M., Reichel, U.D., Eyben, F., Burkhardt, F., & Schuller, B. 2022. Probing Speech Emotion Recognition Transformers for Linguistic Knowledge. Interspeech, s. 146—150. DOI: 10.21437/Interspeech.2022-10371
  • Pham, N.T., Tran, A.T., Pham, B.N.H., Dang-Ngoc, H., Nguyen, S.D., Dang, D.N.M. 2024. Speech Emotion Recognition: A Brief Review of Multi-modal Multi-task Learning Approaches, Recent Advances in Electrical Engineering and Related Sciences: Theory and Application. DOI: 10.1007/978-981-99-8703-0_50
  • Lim, Y., Ng, K-W., Naveen, P., Haw, S-C. 2022. Emotion Recognition by Facial Expression and Voice: Review and Analysis, Journal of Informatics and Web Engineering, Cilt 1, s. 45-54. DOI: 10.33093/jiwe.2022.1.2.4
  • Nantasri, P. ve ark. 2020. A Light-Weight Artificial Neural Network for Speech Emotion Recognition using Average Values of MFCCs and Their Derivatives. 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, s. 41-44. DOI: 10.1109/ECTI-CON49241.2020.9158221
  • Huang, S., Dang, H., Jiang, R., Hao, Y., Xue, C., Gu, W. 2021. Multi-Layer Hybrid Fuzzy Classification Based on SVM and Improved PSO for Speech Emotion Recognition, Electronics, Cilt 10(23), s. 2891. DOI: 10.3390/electronics10232891
  • Fahad, M.S., Deepak, A., Pradhan, G., Yadav, J. 2021. DNN-HMM-Based Speaker-Adaptive Emotion Recognition Using MFCC and Epoch-Based Features, Circuits Systems and Signal Processing, Cilt 40, s. 466–489. DOI: 10.1007/s00034-020-01486-8
  • Tashev, I.J., Wang, Z-Q., Godin, K. 2017. Speech Emotion Recognition based on Gaussian Mixture Models and Deep Neural Networks. Information Theory and Applications Workshop (ITA), s. 1-4. DOI: 10.1109/ITA.2017.8023477
  • Dogdu, C., Kessler, T., Schneider, D., Shadaydeh, M., Schweinberger, S.R. 2022. A Comparison of Machine Learning Algorithms and Feature Sets for Automatic Vocal Emotion Recognition in Speech, Sensors, Cilt 22(19):7561. DOI: 10.3390/s22197561
  • Sun, L., Li, Q., Fu, S., Li, P. 2022. Speech Emotion Recognition based on Genetic Algorithm–decision Tree Fusion of Deep and Acoustic Features, ETRI Journal, Cilt 44(3), s. 462-475. DOI: 10.4218/etrij.2020-0458
  • Ramesh, S., Gomathi, S., Sasikala, S., Saravanan, T.R. 2023. Automatic Speech Emotion Detection using Hybrid of Gray Wolf Optimizer and Naïve Bayes, International Journal of Speech Technology, Cilt 26, s. 571–578. DOI: 10.1007/s10772-021-09870-8
  • Peng, Z., Lu, Y., Pan, S., Liu, Y. 2021. Efficient Speech Emotion Recognition Using Multi-Scale CNN and Attention. 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), s. 3020-3024. DOI: 10.1109/ICASSP39728.2021.9414286
  • Trigeorgis, G., Ringeval, F., Brueckner, R., Marchi, E., Nicolaou, M.A., Schuller, B., Zafeiriou, S. 2016. Adieu Features? End-to-end Speech Emotion Recognition using a Deep Convolutional Recurrent Network. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, s. 5200-5204. DOI: 10.1109/ICASSP.2016.7472669
  • Huang, Y., Tian, K., Wu, A., Zhang, G. 2019. Feature Fusion Methods Research based on Deep Belief Networks for Speech Emotion Recognition under Noise Condition, Journal of Ambient Intelligence Humanized Computing, Cilt 10, s. 1787–1798. DOI: 10.1007/s12652-017-0644-8
  • Jianfeng, Z., Xia, M., Lijiang, C. 2019. Speech Emotion Recognition using Deep 1D & 2D CNN LSTM networks, Biomedical Signal Processing and Control, Cilt 47, s. 312-323. DOI: 10.1016/J.BSPC.2018.08.035
  • Zhang, C., Xue, L. 2021. Autoencoder With Emotion Embedding for Speech Emotion Recognition, IEEE Access, Cilt 9, s. 51231-51241. DOI: 10.1109/ACCESS.2021.3069818
  • Alzubaidi, L., Zhang, J., Humaidi, A.J., ve ark. 2021. Review of Deep Learning: Concepts, CNN Architectures, Challenges, Applications, Future Directions, Journal of Big Data, Cilt 8(53), s. 1-74. DOI: 10.1186/s40537-021-00444-8
  • Mujaddidurrahman, A., Ernawan, F., Wibowo, A., Sarwoko, E.A., Sugiharto, A., Wahyudi, M.D.R. 2021. Speech Emotion Recognition Using 2D-CNN with Data Augmentation. International Conference on Software Engineering & Computer Systems and 4th International Conference on Computational Science and Information Management, s. 685-689. DOI: 10.1109/ICSECS52883.2021.00130
  • Braunschweiler, N., Doddipatla, R., Keizer, S., Stoyanchev, S. 2021. A Study on Cross-Corpus Speech Emotion Recognition and Data Augmentation, IEEE Automatic Speech Recognition and Understanding Workshop, s. 24-30. DOI: 10.1109/ASRU51503.2021.9687987
  • Paraskevopoulou, G., Spyrou, E., Perantonis, S.J. 2022. A Data Augmentation Approach for Improving the Performance of Speech Emotion Recognition. 19th International Conference on Signal Processing and Multimedia Applications, s. 61-69. DOI: 10.5220/0011148000003289
  • Jahangir, R., Teh, Y.W., Mujtaba, G., Alroobaea, R., Shaikh, Z.H., Ali, I. 2022. Convolutional Neural Network-based Cross-corpus Speech Emotion Recognition with Data Augmentation and Features Fusion, Machine Vision and Applications, Cilt 33(41), s. 1-16. DOI: 10.1007/s00138-022-01294-x
  • Tao, H., Shan, S., Hu, Z., Zhu, C., Ge, H. 2023. Strong Generalized Speech Emotion Recognition Based on Effective Data Augmentation, Entropy, Cilt 25(68), s. 1-16. DOI: 10.3390/e25010068
  • Chatziagapi, A., Paraskevopoulos, G., Sgouropoulos, D., Pantazopoulos, G., Nikandrou, M., Giannakopoulos, T., Katsamanis, A., Potamianos, A., Narayanan, S. 2019. Data Augmentation Using GANs for Speech Emotion Recognition, Interspeech, s. 171-175. DOI: 10.21437/Interspeech.2019-2561
  • Yi, L., Mak, M.W. 2022. Improving Speech Emotion Recognition With Adversarial Data Augmentation Network, IEEE Transactions on Neural Networks and Learning Systems, Cilt 33(1), s. 172-184. DOI: 10.1109/TNNLS.2020.3027600
  • Baek, J-Y., Lee, S-P. 2023. Enhanced Speech Emotion Recognition Using DCGAN-Based Data Augmentation, Electronics, Cilt 12(18):3966. DOI: 10.3390/electronics12183966
  • Padi, S., Sadjadi, S.O., Sriram, R.D., Manocha, D. 2021. Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation, International Conference on Multimodal Interaction, s. 645–652. DOI: 10.1145/3462244.3481003
  • Pham, N.T., Dang, D.N.M., Nguyen, D.Y., Nguyen, T.T., Nguyen, H., Manavalan, B., Lim, C.P., Nguyen, S.D. 2023. Hybrid Data Augmentation and Deep Attention-based Dilated Convolutional-recurrent Neural Networks for Speech Emotion Recognition, Expert Systems with Applications, Cilt 230. DOI: 10.1016/j.eswa.2023.120608
  • Jothimani, S., Premalatha, K. 2022. MFF-SAug: Multi Feature Fusion with Spectrogram Augmentation of Speech Emotion Recognition using Convolution Neural Network, Chaos, Solitons & Fractals. DOI: 10.1016/j.chaos.2022.112512
  • Al-onazi, B.B., Nauman, M.A., Jahangir, R., Malik, M.M., Alkhammash, E.H., Elshewey, A.M. 2022. Transformer-Based Multilingual Speech Emotion Recognition Using Data Augmentation and Feature Fusion, Applied Sciences, Cilt 12(18): 9188. DOI: 10.3390/app12189188
  • Tiwari, U., Soni, M., Chakraborty, R., Panda, A., Kopparapu, S.K. 2020. Multi-Conditioning and Data Augmentation Using Generative Noise Model for Speech Emotion Recognition in Noisy Conditions, IEEE International Conference on Acoustics, Speech and Signal Processing, Spain, 7194-7198. DOI: 10.1109/ICASSP40776.2020.9053581
  • Lakomkin, E., Zamani, M.A., Weber, C., Magg, S., Wermter, S. 2018. On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks, IEEE International Conference on Intelligent Robots and Systems, Spain, 854-860. DOI: 10.1109/IROS.2018.8593571
  • Pan, S-T., Wu, H-J. 2023. Performance Improvement of Speech Emotion Recognition Systems by Combining 1D CNN and LSTM with Data Augmentation, Electronics, Cilt 12(11): 2436. DOI: 10.3390/electronics12112436
  • Chaturvedi, I., Noel, T., Satapathy, R. 2022. Speech Emotion Recognition Using Audio Matching, Electronics, Cilt 11(23): 3943. DOI: 10.3390/electronics11233943
  • Ibrahim, K.M., Perzo, A., Leglaive, S. 2024. Towards Improving Speech Emotion Recognition Using Synthetic Data Augmentation from Emotion Conversion, IEEE International Conference on Acoustics, Speech and Signal Processing, Korea, 10636-10640. DOI: 10.1109/ICASSP48485.2024.10445740
  • Shoumy, N.J., Ang, L.M., Rahaman, D.M.M., Zia, T., Seng, K.P., Khatun, S. 2021. Augmented Audio Data in Improving Speech Emotion Classification Tasks, Advances and Trends in Artificial Intelligence: From Theory to Practice, 360-365. DOI: 10.1007/978-3-030-79463-7_30
  • Abdelwahab, M., Busso, C. 2018. Study of Dense Network Approaches for Speech Emotion Recognition, IEEE International Conference on Acoustics, Speech and Signal Processing, Canada, 5084-8388. DOI: 10.1109/ICASSP.2018.8461866
  • Ando, A., Mori, T., Kobashikawa, S., Toda, T. 2021. Speech Emotion Recognition based on Listener-dependent Emotion Perception Models, APSIPA Transactions on Signal and Information Processing, Cilt 10, e6. DOI: 10.1017/ATSIP.2021.7
  • Ando, A., Masumura, R., Kamiyama, H., Kobashikawa, S., Aono, Y. 2019. Speech Emotion Recognition Based on Multi-Label Emotion Existence Model, Proc. Interspeech, 2818-2822. DOI: 10.21437/Interspeech.2019-2524
  • Rajan, R., Raj, T.V.H. 2023. SENet-based Speech Emotion Recognition using Synthesis-style Transfer Data Augmentation, International Journal of Speech Technology, Cilt 26(4), s. 1017–1030. DOI: 10.1007/s10772-023-10071-8
  • Sahoo, K.K., Dutta, I., Ijaz, M.F., Woźniak, M., Singh, P.K. 2021. TLEFuzzyNet: Fuzzy Rank-Based Ensemble of Transfer Learning Models for Emotion Recognition From Human Speeches, IEEE Access, Cilt 9, s. 166518-166530. DOI: 10.1109/ACCESS.2021.3135658
  • Mishra, P., Sharma, R. 2020. Gender Differentiated Convolutional Neural Networks for Speech Emotion Recognition, 12th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops, Czech Republic, 142-148. DOI: 10.1109/ICUMT51630.2020.9222412
  • Ottoni, L.T.C., Ottoni, A.L.C., Cerqueira, J.d.J.F. 2023. A Deep Learning Approach for Speech Emotion Recognition Optimization Using Meta-Learning, Electronics, Cilt 12(23), s. 4859. DOI: 10.3390/electronics12234859
  • Huynh, C.M., Balas, B. 2014. Emotion Recognition (sometimes) Depends on Horizontal Orientations, Attention, Perception, & Psychophysics, Cilt 76, s. 1381-1392. DOI: 10.3758/s13414-014-0669-4
  • Wei, C., Sun, X., Tian, F., Ren, F. 2019. Speech Emotion Recognition with Hybrid Neural Network, 5th International Conference on Big Data Computing and Communications (BIGCOM), China, s. 298-302. DOI: 10.1109/BIGCOM.2019.00051
  • Etienne, C., Fidanza, G., Petrovskii, A., Devillers, L., Schmauch, B. 2018. CNN+LSTM Architecture for Speech Emotion Recognition with Data Augmentation, Workshop on Speech, Music and Mind, s. 21-25. DOI: 10.21437/SMM.2018-5
  • Gupta, N., Priya, R.V., Verma, C.K. 2024. ERFN: Leveraging Context for Enhanced Emotion Detection, International Journal of Advanced Computer Science and Applications, Cilt 15(6). DOI: 10.14569/IJACSA.2024.0150663
  • Zhang, X., Wang, X., Yin, S. 2021. Multi-modal Data Transfer Learning-based LSTM Method for Speech Emotion Recognition, International Journal of Electronics and Information Engineering, Cilt 13(2), s. 54-65. DOI: 10.6636/IJEIE.202106_13(2).03
  • Mocanu, B., Tapu, R., Zaharia, T. 2023. Multimodal Emotion Recognition using Cross Modal Audio-Video Fusion with Attention and Deep Metric Learning, Image Vision Computing, Cilt 133. DOI: 10.1016/j.imavis.2023.104676
  • Valles, D. ve ark. 2023. Data Collection and Real-Time Facial Emotion Recognition in iOS Apps with CNN-Based Models, IEEE World AI IoT Congress, USA, 0669-0677. DOI: 10.1109/AIIoT58121.2023.10174520
  • Halvdansson, S. 2024. On a time-frequency blurring operator with applications in data augmentation. DOI: 10.48550/arXiv.2405.12899
  • Jothimani, S., Sangeethaa, S.N., Premalatha, K., Sathishkannan, R. 2023. A New Spatio-Temporal Neural Architecture with Bi-LSTM for Multimodal Emotion Recognition, 8th International Conference on Communication and Electronics Systems, India, 257-262. DOI: 10.1109/ICCES57224.2023.10192713
  • Dossou, B.F., Gbenou, Y.K. 2021. FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition, IEEE/CVF International Conference on Computer Vision Workshops, 3526-3531. DOI: 10.1109/ICCVW54120.2021.00393
  • Falahzadeh, M.R., Farokhi, F., Harimi, A., Nadooshan, R.S. 2023. Deep Convolutional Neural Network and Gray Wolf Optimization Algorithm for Speech Emotion Recognition, Circuits Systems Signal Process, Cilt 42, s. 449–492. DOI: 10.1007/s00034-022-02130-3
  • Bakhshi, A., Harimi, A., Chalup, S.K. 2022. CyTex: Transforming Speech to Textured Images for Speech Emotion Recognition, Speech Communication, Cilt 139, s. 62-75. DOI: 10.1016/j.specom.2022.02.007
  • Jordal, I., Tamazian, A., Theofanis, E. ve ark. Audiomentations. https://zenodo.org/doi/10.5281/zenodo.6046288 (Erişim Tarihi: 24.10.2024).
  • Demsar, J., Curk, T., Erjavec, A., Gorup, C., Hocevar, T., Milutinovic, M., Mozina, M., Polajnar, M., Toplak, M., Staric, A., Stajdohar, M., Umek, L., Zagar, L., Zbontar, J., Zitnik, M., Zupan, B. 2013. Orange: Data Mining Toolbox in Python, Journal of Machine Learning Research, 14(Aug): 2349−2353.
  • Meral, H.M., Ekenel, H.K., Ozsoy, A. 2003. Analysis of Emotion in Turkish, XVII National Conference on Turkish Linguistics.
  • Çalışkan, Y. E., İnce, G. 2015. Emotion Recognition using Auditory Cues, 23nd Signal Processing and Communications Applications Conference (SIU), s. 2042-2045. DOI: 10.1109/SIU.2015.7130269
Toplam 85 adet kaynakça vardır.

Ayrıntılar

Birincil Dil Türkçe
Konular İletişim Mühendisliği (Diğer), Performans Değerlendirmesi
Bölüm Araştırma Makalesi
Yazarlar

Umut Avcı 0000-0002-7433-8704

Erken Görünüm Tarihi 25 Eylül 2025
Yayımlanma Tarihi 29 Eylül 2025
Gönderilme Tarihi 24 Eylül 2024
Kabul Tarihi 16 Kasım 2024
Yayımlandığı Sayı Yıl 2025 Cilt: 27 Sayı: 81

Kaynak Göster

APA Avcı, U. (2025). Duygusal Konuşma Tanımada Yapay Veri Kullanımı. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi, 27(81), 359-375. https://doi.org/10.21205/deufmd.2025278104
AMA Avcı U. Duygusal Konuşma Tanımada Yapay Veri Kullanımı. DEUFMD. Eylül 2025;27(81):359-375. doi:10.21205/deufmd.2025278104
Chicago Avcı, Umut. “Duygusal Konuşma Tanımada Yapay Veri Kullanımı”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 27, sy. 81 (Eylül 2025): 359-75. https://doi.org/10.21205/deufmd.2025278104.
EndNote Avcı U (01 Eylül 2025) Duygusal Konuşma Tanımada Yapay Veri Kullanımı. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 27 81 359–375.
IEEE U. Avcı, “Duygusal Konuşma Tanımada Yapay Veri Kullanımı”, DEUFMD, c. 27, sy. 81, ss. 359–375, 2025, doi: 10.21205/deufmd.2025278104.
ISNAD Avcı, Umut. “Duygusal Konuşma Tanımada Yapay Veri Kullanımı”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 27/81 (Eylül2025), 359-375. https://doi.org/10.21205/deufmd.2025278104.
JAMA Avcı U. Duygusal Konuşma Tanımada Yapay Veri Kullanımı. DEUFMD. 2025;27:359–375.
MLA Avcı, Umut. “Duygusal Konuşma Tanımada Yapay Veri Kullanımı”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi, c. 27, sy. 81, 2025, ss. 359-75, doi:10.21205/deufmd.2025278104.
Vancouver Avcı U. Duygusal Konuşma Tanımada Yapay Veri Kullanımı. DEUFMD. 2025;27(81):359-75.

Dokuz Eylül Üniversitesi, Mühendislik Fakültesi Dekanlığı Tınaztepe Yerleşkesi, Adatepe Mah. Doğuş Cad. No: 207-I / 35390 Buca-İZMİR.