Research Article
BibTex RIS Cite

TÜRKÇE KONUŞMADA DUYGU TANIMA İÇİN MAKİNE ÖĞRENME YÖNTEMLERİ VE DERİN ÖĞRENME TABANLI MODELLERİN KARŞILAŞTIRILMASI

Year 2024, Volume: 12 Issue: 2, 285 - 297, 30.06.2024
https://doi.org/10.21923/jesd.1350375

Abstract

Son zamanlarda veri miktarına bağlı olarak sağlık, eğitim, pazarlama gibi birçok alanda analizlere ihtiyaç duyulmaktadır. Duygu analizi ise bu alanlarda kişilerin yorumlarını analiz etme, duygularını çıkarma için oldukça popüler bir alandır. Bu çalışmada kızgın, mutlu, sakin ve üzgün duygu etiketleri içeren Türkçe konuşma veri seti üzerinde, ses karakteristik özellikleri ve spektrogramlardan yararlanarak duyguların tespit edilmesi amaçlanmaktadır. Analiz aşamasında Librosa kütüphanesi ile çıkarılan sayısal özellikler ile makine öğrenme yöntemleri ve derin sinir ağları eğitilerek başarıları ölçülmüştür. Ayrıca düşük varyans filtreleme, geri yönlü özellik eleme, ki-kare ve temel bileşen analizi yöntemleri ile özellik azaltım işlemi uygulanarak elde edilen yeni özellikler ile makine öğrenme yöntemlerinin başarısındaki değişiklikler de araştırılmıştır. Görsel veri olan spektrogramlar ise EfficientNet, ResNet, MobileNet ve DenseNet derin öğrenme tabanlı modellerin eğitilmesi için kullanılmıştır. Modellerin eğitim aşamasında veri seti ile beraber modellere ince ayar işlemi uygulanmıştır. Deneysel çalışmaların sonucunda makine öğrenme yöntemlerinden Ekstrem Gradient Artırma %87.03 doğruluk değeri verirken, ResNet modeli ise %79.23 doğruluk değeri vermiştir.

References

  • Ali, L., Zhu, C., Zhou, M., Liu, Y. 2019. Early diagnosis of Parkinson’s disease from multiple voice recordings by simultaneous sample and feature selection. Expert Systems with Applications, 137, 22-28.
  • Altınel, A. B. 2021. Cluds: Combining Labeled and Unlabeled Data With Logistic Regression for Social Media Analysis. Mühendislik Bilimleri ve Tasarım Dergisi, 9(4), 1048-1061.
  • Alu, D. A. S. C., Zoltan, E., & Stoica, I. C. (2017). Voice based emotion recognition with convolutional neural networks for companion robots. Science and Technology, 20(3), 222-240.
  • Anand, S., Patra, S. R. 2022. Voice and Text Based Sentiment Analysis Using Natural Language Processing. In Cognitive Informatics and Soft Computing: Proceeding of CISC 2021, pp. 517-529. Singapore: Springer Nature Singapore.
  • Canpolat, S. F., Ormanoğlu, Z., Zeyrek, D. 2020. Turkish Emotion Voice Database (TurEV-DB). In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), pp. 368-375.
  • Çavuş, E., Sancaktar, İ. 2022. Batarya sağlık durumunun makine öğrenmesi ile kestirimi. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi, 11(3), 601-610.
  • Çelik, A. 2022. Predicting Diagnosis of Covid-19 Disease With Adaboost and Naive Bayes Machine Learning Algorithms. Mühendislik Bilimleri ve Tasarım Dergisi, 10(4), 1212-1221.
  • Çevik, K. K., Kayakuş, M. 2020. Bilişim Teknolojileri Departmaninda Kullanicilarin Taleplerine Cevap Verme Süresinin Makine Öğrenmesi ile Tahmin Edilmesi. Mühendislik Bilimleri ve Tasarım Dergisi, 8(3), 728-739.
  • Elbir, A., Aydin, N. 2020. Music genre classification and music recommendation by using deep learning. Electronics Letters, 56(12), 627-629.
  • Ergenç, İ., Bekar Uzun, İ. P. 2017. Türkçenin Ses Dizgesi (1st ed.). Ankara: Seçkin Yayıncılık.
  • Filter, L. V., Filter, P. 2014. Seven techniques for dimensionality reduction. Technical report
  • Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andrieetto, M., Adam, H. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.
  • Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K. Q. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700-4708.
  • Karcioğlu, A. A., Bulut, H. 2021. Performance Evaluation of Classification Algorithms Using Hyperparameter Optimization. In 2021 6th International Conference on Computer Science and Engineering (UBMK), pp. 354-358.
  • Karcioğlu, A. A., Yaşa, A. C. 2020. Automatic summary extraction in texts using genetic algorithms. In 2020 28th Signal Processing and Communications Applications Conference (SIU), pp. 1-4.
  • Kelle, A. C., Yüce, H. 2022. MQTT Trafiğinde DoS Saldırılarının Makine Öğrenmesi ile Sınıflandırılması ve Modelin SHAP ile Yorumlanması. Journal of Materials and Mechatronics: A, 3(1), 50-62.
  • Koren, L., Stipancic, T. 2021. Multimodal emotion analysis based on acoustic and linguistic features of the voice. In International Conference on Human-Computer Interaction, pp. 301-311. Cham: Springer International Publishing.
  • Marques, G., Agarwal, D., De la Torre Díez, I. 2020. Automated medical diagnosis of COVID-19 through EfficientNet convolutional neural network. Applied soft computing, 96, 106691.
  • McFee, B., Raffel, C., Liang, D., Ellis, D. P., McVicar, M., Battenberg, E., Nieto, O. 2015. librosa: Audio and music signal analysis in python. In Proceedings of the 14th python in science conference, 8, pp. 18-25.
  • Montavon, G., Samek, W., Müller, K. R. 2018. Methods for interpreting and understanding deep neural networks. Digital signal processing, 73, 1-15.
  • Mulla, G. A., Demir, Y., Hassan, M. 2021. Combination of PCA with SMOTE oversampling for classification of high-dimensional imbalanced data. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, 10(3), 858-869.
  • Murwati, A. S., Aldianto, L. 2022. Exploring Voice of Customers to Chatbot for Customer Service with Sentiment Analysis. The Asian Journal of Technology Management, 15(2), 141-153.
  • Najafian, M., Russell, M. 2020. Automatic accent identification as an analytical tool for accent robust automatic speech recognition. Speech Communication, 122, 44-55.
  • Oflazoglu, C., Yildirim, S. Recognizing emotion from Turkish speech using acoustic features. J AUDIO SPEECH MUSIC PROC. 2013, 26 (2013). https://doi.org/10.1186/1687-4722-2013-26
  • Özsönmez, D. B., Acarman, T., Parlak, İ. B. 2021. Optimal Classifier Selection in Turkish Speech Emotion Detection. 2021 29th Signal Processing and Communications Applications Conference (SIU), pp. 1-4.
  • Pelchat, N., Gelowitz, C. M. 2020. Neural network music genre classification. Canadian Journal of Electrical and Computer Engineering, 43(3), 170-173.
  • Reddy, A. S. B., Juliet, D. S. 2019. Transfer learning with ResNet-50 for malaria cell-image classification. In 2019 International Conference on Communication and Signal Processing (ICCSP), pp. 0945-0949. IEEE.
  • Ren, Z., Jia, J., Guo, Q., Zhang, K., Cai, L. 2014. Acoustics, content and geo-information based sentiment prediction from large-scale networked voice data. In 2014 IEEE International Conference on Multimedia and Expo (ICME), pp. 1-4.
  • Rhanoui, M., Mikram, M., Yousfi, S., Barzali, S. 2019. A CNN-BiLSTM model for document-level sentiment analysis. Machine Learning and Knowledge Extraction, 1(3), 832-847.
  • Sağbaş, E. A., Korukoğlu, S., BALLI, S. 2022. Mahalanobis uzaklığı tabanlı aykırı değer bulma ve ReliefF öznitelik seçimine dayalı bir makine öğrenmesi yaklaşımı ile akıllı telefon verileri üzerinden stres tespiti. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 28(2), 336-345.
  • Satyanarayana, G., Bhuvana, J., Balamurugan, M. 2020. Sentimental Analysis on voice using AWS Comprehend. In 2020 International Conference on Computer Communication and Informatics (ICCCI), pp. 1-4.
  • Sikri, A., Singh, N. P., Dalal, S. 2023. Chi-Square Method of Feature Selection: Impact of Pre-Processing of Data. International Journal of Intelligent Systems and Applications in Engineering, 11(3s), 241-248.
  • Singh, A. K. 2021. Prediction of Voice Sentiment using Machine Learning Technique. In 2021 10th International Conference on System Modeling & Advancement in Research Trends (SMART), pp. 162-166.
  • Tan, M., Le, Q. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning (pp. 6105-6114). PMLR.
  • Tracy, J. M., Özkanca, Y., Atkins, D. C., Ghomi, R. H. 2020. Investigating voice as a biomarker: deep phenotyping methods for early detection of Parkinson's disease. Journal of biomedical informatics, 104, 103362.
  • Wu, Y., Li, S., Li, H. 2019. Automatic pitch accent detection using long short-term memory neural networks. In Proceedings of the 2019 International Symposium on Signal Processing Systems, pp. 41-45.
  • Yılmaz, Ü., Kuvat, Ö. Investigating the Effect of Feature Selection Methods on the Success of Overall Equipment Effectiveness Prediction. Uludağ Üniversitesi Mühendislik Fakültesi Dergisi, 28(2), 437-452.

COMPARISON OF MACHINE LEARNING METHODS AND DEEP LEARNING-BASED MODELS FOR EMOTION RECOGNITION IN TURKISH SPEECH

Year 2024, Volume: 12 Issue: 2, 285 - 297, 30.06.2024
https://doi.org/10.21923/jesd.1350375

Abstract

Recently, depending on the amount of data, analyzes are needed in many areas such as health, education and marketing. Sentiment analysis is a very popular area for analyzing people's comments and extracting their emotions in these areas. This study aims to determine the emotions by using voice characteristics and spectrograms on the Turkish speech dataset containing angry, happy, calm and sad emotion labels. In the analysis phase, machine learning methods and deep neural networks were trained, and their success was measured with the numerical features extracted from Librosa library. In addition, feature reduction was applied with low variance filtering, backward feature elimination, chi-square and principal component analysis methods, and the changes in the success of machine learning methods were also investigated with the new features obtained. The spectrograms, which are visual data, were used to train EfficientNet, ResNet, MobileNet and DenseNet deep learning-based models. During the training phase of these models, fine-tuning process was applied to these models together with the dataset. As a result of the experimental studies, Extreme Gradient Boosting, one of the machine learning methods, gave an accuracy value of 87.03%, while the ResNet model gave an accuracy value of 79.23%.

References

  • Ali, L., Zhu, C., Zhou, M., Liu, Y. 2019. Early diagnosis of Parkinson’s disease from multiple voice recordings by simultaneous sample and feature selection. Expert Systems with Applications, 137, 22-28.
  • Altınel, A. B. 2021. Cluds: Combining Labeled and Unlabeled Data With Logistic Regression for Social Media Analysis. Mühendislik Bilimleri ve Tasarım Dergisi, 9(4), 1048-1061.
  • Alu, D. A. S. C., Zoltan, E., & Stoica, I. C. (2017). Voice based emotion recognition with convolutional neural networks for companion robots. Science and Technology, 20(3), 222-240.
  • Anand, S., Patra, S. R. 2022. Voice and Text Based Sentiment Analysis Using Natural Language Processing. In Cognitive Informatics and Soft Computing: Proceeding of CISC 2021, pp. 517-529. Singapore: Springer Nature Singapore.
  • Canpolat, S. F., Ormanoğlu, Z., Zeyrek, D. 2020. Turkish Emotion Voice Database (TurEV-DB). In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), pp. 368-375.
  • Çavuş, E., Sancaktar, İ. 2022. Batarya sağlık durumunun makine öğrenmesi ile kestirimi. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi, 11(3), 601-610.
  • Çelik, A. 2022. Predicting Diagnosis of Covid-19 Disease With Adaboost and Naive Bayes Machine Learning Algorithms. Mühendislik Bilimleri ve Tasarım Dergisi, 10(4), 1212-1221.
  • Çevik, K. K., Kayakuş, M. 2020. Bilişim Teknolojileri Departmaninda Kullanicilarin Taleplerine Cevap Verme Süresinin Makine Öğrenmesi ile Tahmin Edilmesi. Mühendislik Bilimleri ve Tasarım Dergisi, 8(3), 728-739.
  • Elbir, A., Aydin, N. 2020. Music genre classification and music recommendation by using deep learning. Electronics Letters, 56(12), 627-629.
  • Ergenç, İ., Bekar Uzun, İ. P. 2017. Türkçenin Ses Dizgesi (1st ed.). Ankara: Seçkin Yayıncılık.
  • Filter, L. V., Filter, P. 2014. Seven techniques for dimensionality reduction. Technical report
  • Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andrieetto, M., Adam, H. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.
  • Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K. Q. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700-4708.
  • Karcioğlu, A. A., Bulut, H. 2021. Performance Evaluation of Classification Algorithms Using Hyperparameter Optimization. In 2021 6th International Conference on Computer Science and Engineering (UBMK), pp. 354-358.
  • Karcioğlu, A. A., Yaşa, A. C. 2020. Automatic summary extraction in texts using genetic algorithms. In 2020 28th Signal Processing and Communications Applications Conference (SIU), pp. 1-4.
  • Kelle, A. C., Yüce, H. 2022. MQTT Trafiğinde DoS Saldırılarının Makine Öğrenmesi ile Sınıflandırılması ve Modelin SHAP ile Yorumlanması. Journal of Materials and Mechatronics: A, 3(1), 50-62.
  • Koren, L., Stipancic, T. 2021. Multimodal emotion analysis based on acoustic and linguistic features of the voice. In International Conference on Human-Computer Interaction, pp. 301-311. Cham: Springer International Publishing.
  • Marques, G., Agarwal, D., De la Torre Díez, I. 2020. Automated medical diagnosis of COVID-19 through EfficientNet convolutional neural network. Applied soft computing, 96, 106691.
  • McFee, B., Raffel, C., Liang, D., Ellis, D. P., McVicar, M., Battenberg, E., Nieto, O. 2015. librosa: Audio and music signal analysis in python. In Proceedings of the 14th python in science conference, 8, pp. 18-25.
  • Montavon, G., Samek, W., Müller, K. R. 2018. Methods for interpreting and understanding deep neural networks. Digital signal processing, 73, 1-15.
  • Mulla, G. A., Demir, Y., Hassan, M. 2021. Combination of PCA with SMOTE oversampling for classification of high-dimensional imbalanced data. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, 10(3), 858-869.
  • Murwati, A. S., Aldianto, L. 2022. Exploring Voice of Customers to Chatbot for Customer Service with Sentiment Analysis. The Asian Journal of Technology Management, 15(2), 141-153.
  • Najafian, M., Russell, M. 2020. Automatic accent identification as an analytical tool for accent robust automatic speech recognition. Speech Communication, 122, 44-55.
  • Oflazoglu, C., Yildirim, S. Recognizing emotion from Turkish speech using acoustic features. J AUDIO SPEECH MUSIC PROC. 2013, 26 (2013). https://doi.org/10.1186/1687-4722-2013-26
  • Özsönmez, D. B., Acarman, T., Parlak, İ. B. 2021. Optimal Classifier Selection in Turkish Speech Emotion Detection. 2021 29th Signal Processing and Communications Applications Conference (SIU), pp. 1-4.
  • Pelchat, N., Gelowitz, C. M. 2020. Neural network music genre classification. Canadian Journal of Electrical and Computer Engineering, 43(3), 170-173.
  • Reddy, A. S. B., Juliet, D. S. 2019. Transfer learning with ResNet-50 for malaria cell-image classification. In 2019 International Conference on Communication and Signal Processing (ICCSP), pp. 0945-0949. IEEE.
  • Ren, Z., Jia, J., Guo, Q., Zhang, K., Cai, L. 2014. Acoustics, content and geo-information based sentiment prediction from large-scale networked voice data. In 2014 IEEE International Conference on Multimedia and Expo (ICME), pp. 1-4.
  • Rhanoui, M., Mikram, M., Yousfi, S., Barzali, S. 2019. A CNN-BiLSTM model for document-level sentiment analysis. Machine Learning and Knowledge Extraction, 1(3), 832-847.
  • Sağbaş, E. A., Korukoğlu, S., BALLI, S. 2022. Mahalanobis uzaklığı tabanlı aykırı değer bulma ve ReliefF öznitelik seçimine dayalı bir makine öğrenmesi yaklaşımı ile akıllı telefon verileri üzerinden stres tespiti. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 28(2), 336-345.
  • Satyanarayana, G., Bhuvana, J., Balamurugan, M. 2020. Sentimental Analysis on voice using AWS Comprehend. In 2020 International Conference on Computer Communication and Informatics (ICCCI), pp. 1-4.
  • Sikri, A., Singh, N. P., Dalal, S. 2023. Chi-Square Method of Feature Selection: Impact of Pre-Processing of Data. International Journal of Intelligent Systems and Applications in Engineering, 11(3s), 241-248.
  • Singh, A. K. 2021. Prediction of Voice Sentiment using Machine Learning Technique. In 2021 10th International Conference on System Modeling & Advancement in Research Trends (SMART), pp. 162-166.
  • Tan, M., Le, Q. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning (pp. 6105-6114). PMLR.
  • Tracy, J. M., Özkanca, Y., Atkins, D. C., Ghomi, R. H. 2020. Investigating voice as a biomarker: deep phenotyping methods for early detection of Parkinson's disease. Journal of biomedical informatics, 104, 103362.
  • Wu, Y., Li, S., Li, H. 2019. Automatic pitch accent detection using long short-term memory neural networks. In Proceedings of the 2019 International Symposium on Signal Processing Systems, pp. 41-45.
  • Yılmaz, Ü., Kuvat, Ö. Investigating the Effect of Feature Selection Methods on the Success of Overall Equipment Effectiveness Prediction. Uludağ Üniversitesi Mühendislik Fakültesi Dergisi, 28(2), 437-452.
There are 37 citations in total.

Details

Primary Language Turkish
Subjects Computer Software, Software Engineering (Other)
Journal Section Research Articles
Authors

Zekeriya Anıl Güven 0000-0002-7025-2815

Publication Date June 30, 2024
Submission Date August 26, 2023
Acceptance Date March 18, 2024
Published in Issue Year 2024 Volume: 12 Issue: 2

Cite

APA Güven, Z. A. (2024). TÜRKÇE KONUŞMADA DUYGU TANIMA İÇİN MAKİNE ÖĞRENME YÖNTEMLERİ VE DERİN ÖĞRENME TABANLI MODELLERİN KARŞILAŞTIRILMASI. Mühendislik Bilimleri Ve Tasarım Dergisi, 12(2), 285-297. https://doi.org/10.21923/jesd.1350375