Research Article
BibTex RIS Cite

Sentiment Analysis in Turkish Using Language Models: A Comparative Study

Year 2025, Volume: 15 Issue: 1, 68 - 74

Abstract

Sentiment analysis is a natural language processing (NLP) task that aims to automatically identify positive, negative and neutral emotions in texts. Agglutinative languages such as Turkish pose challenges for sentiment analysis due to their complex morphological structure. Traditional methods are inadequate for detecting sentiment in texts. Language models (LMs), on the other hand, achieve successful results in sentiment analysis as well as in many other NLP tasks thanks to their ability to learn context and structural features of the language. In this study, XLM-RoBERTa, mBERT, BERTurk 32k, BERTurk 128k, ELECTRA Turkish Small and ELECTRA Turkish Base models were fine-tuned using the Turkish Sentiment Analysis – Version 1 (TRSAv1) dataset and the performances of the models were compared. The dataset consists of 150,000 texts containing user comments on e-commerce platforms. The classes have a balanced distribution for positive, negative and neutral classes. The fine-tuned models are evaluated using the test set with metrics such as accuracy, precision, recall and F1 score. The findings show that models customized for the Turkish language exhibit better performance in emotion detection compared to multilingual models. The BERTurk 32k model achieved strong results with an accuracy of 83.69% and an F1 score of 83.65%, while the BERTurk 128k model followed closely with an accuracy of 83.68% and an F1 score of 83.66%. On the other hand, the XLM-RoBERTa model, a multilingual model, delivered competitive performance with an accuracy of 83.27% and an F1 score of 83.22%.

References

  • [1] I. Yaqoob, I. A. T. Hashem, A. Gani, S. Mokhtar, E. Ahmed, N. B. Anuar, and A. V. Vasilakos, “Big data: From beginning to future,” Int. J. Inf. Manage., vol. 36, no. 6, pp. 1231–1247, 2016.
  • [2] S. Mittal, A. Goel, and R. Jain, “Sentiment analysis of E-commerce and social networking sites,” in Proc. 3rd Int. Conf. Comput. Sustainable Global Develop. (INDIACom), 2016, pp. 2300–2305.
  • [3] M. Rodríguez-Ibáñez, A. Casánez-Ventura, F. Castejón-Mateos, and P.-M. Cuenca-Jiménez, “A review on sentiment analysis from social media platforms,” Expert Syst. Appl., vol. 223, p. 119862, 2023.
  • [4] M. Marong, N. K. Batcha, and R. Mafas, “Sentiment analysis in e-commerce: A review on the techniques and algorithms,” J. Appl. Technol. Innov., vol. 4, no. 1, p. 6, 2020.
  • [5] M. Wankhade, A. C. S. Rao, and C. Kulkarni, “A survey on sentiment analysis methods, applications, and challenges,” Artif. Intell. Rev., vol. 55, no. 7, pp. 5731–5780, 2022.
  • [6] A. P. Jain and P. Dandannavar, “Application of machine learning techniques to sentiment analysis,” in Proc. 2nd Int. Conf. Appl. Theor. Comput. Commun. Technol. (iCATccT), 2016, pp. 628–632.
  • [7] C. S. G. Khoo and S. B. Johnkhan, “Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons,” J. Inf. Sci., vol. 44, no. 4, pp. 491–511, 2018.
  • [8] M. Ahmad, S. Aftab, and I. Ali, “Sentiment analysis of tweets using SVM,” Int. J. Comput. Appl., vol. 177, no. 5, pp. 25–29, 2017.
  • [9] C. Dhaoui, C. M. Webster, and L. P. Tan, “Social media sentiment analysis: Lexicon versus machine learning,” J. Consum. Market., vol. 34, no. 6, pp. 480–488, 2017.
  • [10] A. Onan, “Sentiment analysis on Twitter messages based on machine learning methods,” Yönetim Bilişim Sistemleri Dergisi, vol. 3, no. 2, pp. 1–14, 2017.
  • [11] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, et al., “Transformers: State-of-the-art natural language processing,” in Proc. 2020 Conf. Empirical Methods Natural Lang. Process.: System Demonstrations, 2020, pp. 38–45.
  • [12] R. Qasim, W. H. Bangyal, M. A. Alqarni, and A. A. Almazroi, “A fine-tuned BERT-based transfer learning approach for text classification,” J. Healthcare Eng., vol. 2022, no. 1, p. 3498123, 2022.
  • [13] X. Zhang, N. Rajabi, K. Duh, and P. Koehn, “Machine translation with large language models: Prompting, few-shot learning, and fine-tuning with QLoRA,” in Proc. Eighth Conf. Mach. Transl., 2023, pp. 468–481.
  • [14] H. Chouikhi and M. Alsuhaibani, “Deep transformer language models for Arabic text summarization: A comparison study,” Appl. Sci., vol. 12, no. 23, p. 11944, 2022.
  • [15] S. Butt, N. Ashraf, M. H. F. Siddiqui, G. Sidorov, and A. Gelbukh, “Transformer-based extractive social media question answering on TweetQA,” Computación y Sistemas, vol. 25, no. 1, pp. 23–32, 2021.
  • [16] K. L. Tan, C. P. Lee, K. S. M. Anbananthen, and K. M. Lim, “RoBERTa-LSTM: A hybrid model for sentiment analysis with transformer and recurrent neural network,” IEEE Access, vol. 10, pp. 21517–21525, 2022.
  • [17] S. Arroni, Y. Galán, X. M. Guzmán Guzmán, E. R. Núñez Valdéz, A. Gómez Gómez, et al., “Sentiment analysis and classification of hotel opinions in Twitter with the transformer architecture,” Int. J. Interact. Multimedia Artif. Intell., 2023.
  • [18] L. Khan, A. Amjad, N. Ashraf, and H.-T. Chang, “Multi-class sentiment analysis of Urdu text using multilingual BERT,” Sci. Rep., vol. 12, no. 1, p. 5436, 2022.
  • [19] Ö. Y. Yürütücü and Ş. Demir, “Ön eğitimli dil modelleriyle duygu analizi,” İstanbul Sabahattin Zaim Üniversitesi Fen Bilimleri Enstitüsü Dergisi, vol. 5, no. 1, pp. 46–53, 2023.
  • [20] A. Köksal and A. Özgür, “Twitter dataset and evaluation of transformers for Turkish sentiment analysis,” in Proc. 29th Signal Process. Commun. Appl. Conf. (SIU), 2021, pp. 1–4.
  • [21] S. Joshi, M. S. Khan, A. Dafe, K. Singh, V. Zope, and T. Jhamtani, “Fine tuning LLMs for low resource languages,” in Proc. 5th Int. Conf. Image Process. Capsule Netw. (ICIPCN), 2024, pp. 511–519.
  • [22] M. Aydoğan and V. Kocaman, “TRSAv1: A new benchmark dataset for classifying user reviews on Turkish e-commerce websites,” J. Inf. Sci., vol. 49, no. 6, pp. 1711–1725, 2023.
  • [23] A. Vaswani, “Attention is all you need,” Adv. Neural Inf. Process. Syst., 2017.
  • [24] J. Devlin, “BERT: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
  • [25] A. Conneau, “Unsupervised cross-lingual representation learning at scale,” arXiv preprint arXiv:1911.02116, 2019.
  • [26] Y. Liu, “RoBERTa: A robustly optimized BERT pretraining approach,” arXiv preprint arXiv:1907.11692, vol. 364, 2019.
  • [27] S. Schweter, BERTurk - BERT models for Turkish, version 1.0.0, Zenodo, Apr. 2020. [Online]. Available: https://doi.org/10.5281/zenodo.3770924. DOI: 10.5281/zenodo.3770924.
  • [28] K. Clark, “ELECTRA: Pre-training text encoders as discriminators rather than generators,” arXiv preprint arXiv:2003.10555, 2020.

Büyük Dil Modelleri Kullanılarak Türkçe Duygu Analizi: Karşılaştırmalı Bir Çalışma

Year 2025, Volume: 15 Issue: 1, 68 - 74

Abstract

Bu çalışma, Türkçe'de duygu analizi için büyük dil modellerinin (BDM) kapsamlı bir karşılaştırmasını sunmaktadır. Türkçe gibi eklemeli diller, karmaşık gramer yapıları ve bağlam bağımlılıkları nedeniyle duygu analizi için zorluklar oluşturmaktadır. Bu zorluklar, geleneksel makine öğrenimi ve sözlük tabanlı yöntemlerin performansını sınırlamaktadır. BDM'ler, dilin bağlam ve yapısal özelliklerini öğrenme yetenekleri nedeniyle bu sınırlamaların üstesinden gelmede önemli avantajlar sunmaktadır. Bu çalışmada, XLM-Roberta, mBERT, mT5, T5 Turkish, BERTurk 32k, BERTurk (128k), ELECTRA Turkish Small ve ELECTRA Turkish Base modelleri, Türkçe TRSAv1 veri seti kullanılarak ince ayarlanmıştır. TRSAv1 veri seti, e-ticaret platformlarındaki kullanıcı yorumlarından oluşan dengeli bir sınıf dağılımına sahip 150.000 yorum içermektedir. İnce ayarların ardından modeller doğruluk (accuracy), kesinlik (precision), duyarlılık (recall) ve F1 puanı gibi performans ölçütleriyle test edilmiştir. Analiz sonuçları, Türkçe için özel olarak eğitilen özellikle BERTurk modelleri gibi tek dilli modellerin, Türkçenin dil yapısını ve bağlamsal anlamsal çeşitliliğini öğrenmede daha başarılı olduğunu göstermektedir. Türkçe modellerin başarısı, Türkçenin zengin morfolojik yapısını öğrenme yeteneklerinden kaynaklanmaktadır. Çok dilli modellerden XLM-Roberta, olumlu ve olumsuz sınıflarda etkili performans gösterirken nötr sınıfta zayıf performans göstermiştir. mBERT ve ELECTRA Turkish Small ise diğer modellere kıyasla genel olarak zayıf performans göstermiştir.

References

  • [1] I. Yaqoob, I. A. T. Hashem, A. Gani, S. Mokhtar, E. Ahmed, N. B. Anuar, and A. V. Vasilakos, “Big data: From beginning to future,” Int. J. Inf. Manage., vol. 36, no. 6, pp. 1231–1247, 2016.
  • [2] S. Mittal, A. Goel, and R. Jain, “Sentiment analysis of E-commerce and social networking sites,” in Proc. 3rd Int. Conf. Comput. Sustainable Global Develop. (INDIACom), 2016, pp. 2300–2305.
  • [3] M. Rodríguez-Ibáñez, A. Casánez-Ventura, F. Castejón-Mateos, and P.-M. Cuenca-Jiménez, “A review on sentiment analysis from social media platforms,” Expert Syst. Appl., vol. 223, p. 119862, 2023.
  • [4] M. Marong, N. K. Batcha, and R. Mafas, “Sentiment analysis in e-commerce: A review on the techniques and algorithms,” J. Appl. Technol. Innov., vol. 4, no. 1, p. 6, 2020.
  • [5] M. Wankhade, A. C. S. Rao, and C. Kulkarni, “A survey on sentiment analysis methods, applications, and challenges,” Artif. Intell. Rev., vol. 55, no. 7, pp. 5731–5780, 2022.
  • [6] A. P. Jain and P. Dandannavar, “Application of machine learning techniques to sentiment analysis,” in Proc. 2nd Int. Conf. Appl. Theor. Comput. Commun. Technol. (iCATccT), 2016, pp. 628–632.
  • [7] C. S. G. Khoo and S. B. Johnkhan, “Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons,” J. Inf. Sci., vol. 44, no. 4, pp. 491–511, 2018.
  • [8] M. Ahmad, S. Aftab, and I. Ali, “Sentiment analysis of tweets using SVM,” Int. J. Comput. Appl., vol. 177, no. 5, pp. 25–29, 2017.
  • [9] C. Dhaoui, C. M. Webster, and L. P. Tan, “Social media sentiment analysis: Lexicon versus machine learning,” J. Consum. Market., vol. 34, no. 6, pp. 480–488, 2017.
  • [10] A. Onan, “Sentiment analysis on Twitter messages based on machine learning methods,” Yönetim Bilişim Sistemleri Dergisi, vol. 3, no. 2, pp. 1–14, 2017.
  • [11] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, et al., “Transformers: State-of-the-art natural language processing,” in Proc. 2020 Conf. Empirical Methods Natural Lang. Process.: System Demonstrations, 2020, pp. 38–45.
  • [12] R. Qasim, W. H. Bangyal, M. A. Alqarni, and A. A. Almazroi, “A fine-tuned BERT-based transfer learning approach for text classification,” J. Healthcare Eng., vol. 2022, no. 1, p. 3498123, 2022.
  • [13] X. Zhang, N. Rajabi, K. Duh, and P. Koehn, “Machine translation with large language models: Prompting, few-shot learning, and fine-tuning with QLoRA,” in Proc. Eighth Conf. Mach. Transl., 2023, pp. 468–481.
  • [14] H. Chouikhi and M. Alsuhaibani, “Deep transformer language models for Arabic text summarization: A comparison study,” Appl. Sci., vol. 12, no. 23, p. 11944, 2022.
  • [15] S. Butt, N. Ashraf, M. H. F. Siddiqui, G. Sidorov, and A. Gelbukh, “Transformer-based extractive social media question answering on TweetQA,” Computación y Sistemas, vol. 25, no. 1, pp. 23–32, 2021.
  • [16] K. L. Tan, C. P. Lee, K. S. M. Anbananthen, and K. M. Lim, “RoBERTa-LSTM: A hybrid model for sentiment analysis with transformer and recurrent neural network,” IEEE Access, vol. 10, pp. 21517–21525, 2022.
  • [17] S. Arroni, Y. Galán, X. M. Guzmán Guzmán, E. R. Núñez Valdéz, A. Gómez Gómez, et al., “Sentiment analysis and classification of hotel opinions in Twitter with the transformer architecture,” Int. J. Interact. Multimedia Artif. Intell., 2023.
  • [18] L. Khan, A. Amjad, N. Ashraf, and H.-T. Chang, “Multi-class sentiment analysis of Urdu text using multilingual BERT,” Sci. Rep., vol. 12, no. 1, p. 5436, 2022.
  • [19] Ö. Y. Yürütücü and Ş. Demir, “Ön eğitimli dil modelleriyle duygu analizi,” İstanbul Sabahattin Zaim Üniversitesi Fen Bilimleri Enstitüsü Dergisi, vol. 5, no. 1, pp. 46–53, 2023.
  • [20] A. Köksal and A. Özgür, “Twitter dataset and evaluation of transformers for Turkish sentiment analysis,” in Proc. 29th Signal Process. Commun. Appl. Conf. (SIU), 2021, pp. 1–4.
  • [21] S. Joshi, M. S. Khan, A. Dafe, K. Singh, V. Zope, and T. Jhamtani, “Fine tuning LLMs for low resource languages,” in Proc. 5th Int. Conf. Image Process. Capsule Netw. (ICIPCN), 2024, pp. 511–519.
  • [22] M. Aydoğan and V. Kocaman, “TRSAv1: A new benchmark dataset for classifying user reviews on Turkish e-commerce websites,” J. Inf. Sci., vol. 49, no. 6, pp. 1711–1725, 2023.
  • [23] A. Vaswani, “Attention is all you need,” Adv. Neural Inf. Process. Syst., 2017.
  • [24] J. Devlin, “BERT: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
  • [25] A. Conneau, “Unsupervised cross-lingual representation learning at scale,” arXiv preprint arXiv:1911.02116, 2019.
  • [26] Y. Liu, “RoBERTa: A robustly optimized BERT pretraining approach,” arXiv preprint arXiv:1907.11692, vol. 364, 2019.
  • [27] S. Schweter, BERTurk - BERT models for Turkish, version 1.0.0, Zenodo, Apr. 2020. [Online]. Available: https://doi.org/10.5281/zenodo.3770924. DOI: 10.5281/zenodo.3770924.
  • [28] K. Clark, “ELECTRA: Pre-training text encoders as discriminators rather than generators,” arXiv preprint arXiv:2003.10555, 2020.
There are 28 citations in total.

Details

Primary Language English
Subjects Software Engineering (Other)
Journal Section Research Article
Authors

Mert İncidelen 0009-0002-1975-8332

Murat Aydoğan 0000-0002-6876-6454

Early Pub Date July 1, 2025
Publication Date
Submission Date November 27, 2024
Acceptance Date June 23, 2025
Published in Issue Year 2025 Volume: 15 Issue: 1

Cite

APA İncidelen, M., & Aydoğan, M. (2025). Sentiment Analysis in Turkish Using Language Models: A Comparative Study. European Journal of Technique (EJT), 15(1), 68-74. https://doi.org/10.36222/ejt.1592448

All articles published by EJT are licensed under the Creative Commons Attribution 4.0 International License. This permits anyone to copy, redistribute, remix, transmit and adapt the work provided the original work and source is appropriately cited.Creative Commons Lisansı