Optimizing Turkish Opinion Mining: A Comparative Study of AI Algorithms

Ömer Köksal

doi:10.53070/bbd.1545101

Research Article

Türkçe Görüş Madenciliğinin Optimize Edilmesi: Yapay Zekâ Algoritmalarının Karşılaştırmalı Bir Çalışması

Year 2024, Volume: 9 Issue: Issue: 2, 186 - 201, 25.12.2024

Ömer Köksal

https://doi.org/10.53070/bbd.1545101

Abstract

Fikir madenciliği ya da diğer adıyla duygu analizi, metin verilerinde ifade edilen görüşleri, duyguları, tutumları ve hisleri analiz etmeye ve anlamaya odaklanan Doğal Dil İşlemenin (NLP) bir dalıdır. Fikir madenciliğinin amacı, bir inceleme, yorum veya sosyal medya gönderisi gibi belirli bir metin parçasının duygu kutupluluğunu belirlemektir. Ancak görüş madenciliği, daha az araştırılmış dillerdeki çalışmaları İngilizce yapılan çalışmalardan ayıran dile özgü zorluklarla karşı karşıyadır. Bu makale, çeşitli yapay zekâ algoritmalarını karşılaştırarak Türkçe fikir madenciliği için yeni bir süreç sunmaktadır. Şeffaflık ve yeniden üretilebilirliği sağlamak için açık kaynaklı bir Türkçe görüş madenciliği veri kümesi kullanarak kapsamlı deneyler yürüttük. Araştırmamızda geleneksel makine öğrenimi, derin öğrenmeye dayalı algoritmalar ve önceden eğitilmiş dönüştürücü modelleri değerlendirerek parametrelerini optimize etmeye odaklandık. Ayrıca kelime yerleştirmelerini geleneksel kelime torbası yöntemiyle karşılaştırdık. Hiper parametreler optimize ederek, model doğrulukları ve F1 puanları önemli ölçüde iyileştirildi. Önerilen süreç, literatürdeki mevcut yöntemlerden daha iyi performans göstererek, fikir madenciliği alanında gelecekteki araştırmalar için değerli bilgiler sağlamıştır.

Keywords

Fikir madenciliği, doğal dil işleme, makine ögrenmesi, derin ögrenme, ön-eğitimli dil modelleri, dönüştürücü algoritmaları

References

Rumelli M, Akkuş D, Kart O, Işık Z. “Sentiment Analysis in Turkish Text with Machine Learning Algorithms”. Innovations in Intelligent Systems & Applications Conference, ASYU 2019, 2019.
Dehkharghani R, Saygın Y, Yanıkoğlu B, Oflazer K. “SentiTurkNet: a Turkish polarity lexicon for sentiment analysis”. Language Resources & Evaluation, vol. 50, no. 3, pp. 667–685, Sep. 2016.
Çiftçi B, Apaydın MS. “A Deep Learning Approach to Sentiment Analysis in Turkish”. International Conference on Artificial Intelligence & Data Processing, IDAP 2018, 2019.
Açıkalın UU, Bardak B, Kutlu M. “Turkish Sentiment Analysis Using BERT”. 28th Signal Processing & Communications Applications Conference, SIU 2020 - Proceedings, 2020.
Demirtaş E, Pechenizkiy M. “Cross-lingual polarity detection with machine translation”. 2nd International Workshop on Issues of Sentiment Discovery & Opinion Mining, WISDOM 2013 - Held in Conjunction with SIGKDD 2013, 2013.
Gözükara F, Özel SA. “An Experimental Investigation of Document Vector Computation Methods for Sentiment Analysis of Turkish & English Reviews”. Çukurova University, Journal of Engineering and Architecture Faculty, Nov. 2016.
Kurt F, Kısa D, Karagöz P. “Investigating the Effect of Segmentation Methods on Neural Model based Sentiment Analysis on Informal Short Texts in Turkish”. ArXiv, Feb. 2019.
Görmez Y, Işık YE, Temiz M, Aydın Z. “FBSEM: A Novel Feature-Based Stacked Ensemble Method for Sentiment Analysis”. International Journal of Information Technologies, vol. 12, no. 6, pp. 11–22, Dec. 2020.
Yıldırım S. “Comparing Deep Neural Networks to Traditional Models for Sentiment Analysis in Turkish Language”. Deep Learning-based Approaches for Sentiment Analysis, pp. 311–319, 2020.
Işık M, Dağ H. “The impact of text preprocessing on the prediction of review ratings”. Turkish Journal of Electrical Engineering & Computer Science, vol. 28, no. 3, pp. 1405–1421, May 2020.
Gomes LAF, Torres RS, Côrtes ML. “Bug report severity level prediction in open-source software: A survey and research opportunities”. Information and Software Technology, vol. 115. Elsevier B.V., pp. 58–78, 01-Nov-2019. Alpaydın E. Machine Learning: The New AI. USA, The MIT Press, 2016.
Köksal Ö. “Tuning the Turkish Text Classification Process Using Supervised Machine Learning-based Algorithms”. International Conference on INnovations in Intelligent SysTems and Applications, pp. 1–7, 2020.
Köksal Ö. “Enhancing Turkish sentiment analysis using pre-trained language models”. 29th IEEE Conference on Signal Processing & Communication, 2021.
Köksal Ö, Tekinerdoğan B. “Automated Classification of Unstructured Bilingual Software Bug Reports: An Industrial Case Study Research”. Applied Science, vol. 12, no. 1, 2022.
McMahan RD, Natural language processing with PyTorch: build intelligent language applications using deep learning, USA, O’Reilly, 2019.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L. “Attention Is All You Need”. Neural Information Processing Systems, vol. 2017-Decem, pp. 5999–6009, 2017.
Devlin J, Chang MW, Lee K, Toutanova K. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186, 2019.
Köksal Ö, Yılmaz EH. “Improving automated Turkish text classification with learning-based algorithms”. Concurrency & Computation: Practice & Experience, p. e6874, Feb. 2022.
Ambalavanan AK, Devarakonda MV, “Using the contextual language model BERT for multi-criteria classification of scientific articles”. Journal of Biomedical Informatics, vol. 112, Dec. 2020.
Clark K, Luong MT, Le QV, Manning CD. “ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators”, ArXiv, vol. abs/2003.1, 2020.
OMG, “BPMN Specification - Business Process Model and Notation”, http://www.bpmn.org. (19.02.2024). Chinosi M, Trombetta A. “BPMN: An introduction to the standard”, Computer Standards & Interfaces, vol. 34, no. 1, pp. 124–134, Jan. 2012.
Mikolov T, Chen K, Corrado G, Dean J. “Efficient estimation of word representations in vector space”. International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings, 2013.
Le QV, Mikolov T. “Distributed Representations of Sentences and Documents”. 31st International Conference of Machine Learning. ICML 2014, vol. 4, pp. 2931–2939, May 2014.
Pennington J, Socher R, Manning C. “GloVe: Global Vectors for Word Representation", Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543, 2014.
Bojanowski P, Grave E, Joulin A, Mikolov T. “Enriching Word Vectors with Sub Word Information”. Transactions of the Association for Computer Linguistics., vol. 5, pp. 135–146, Dec 2017.
Çağataylı M, Celebi E. “The effect of stemming and stop-word-removal on automatic text classification in Turkish language”. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9489, pp. 168–176, 2015.
Joulin A, Grave E, Bojanowski P, Mikolov T. “Bag of Tricks for Efficient Text Classification”. 15th Conference of the European Chapter of the Association for Computational Linguistics, Volume 2, Short Papers, pp. 427–431, 2017.
Köksal Ö, O. Akgül, “A Comparative Text Classification Study with Deep Learning-Based Algorithms”. 9th International Conference on Electrical and Electronics Engineering, 2022, pp. 387–39, 2022.

Optimizing Turkish Opinion Mining: A Comparative Study of AI Algorithms

Year 2024, Volume: 9 Issue: Issue: 2, 186 - 201, 25.12.2024

Ömer Köksal

https://doi.org/10.53070/bbd.1545101

Abstract

Opinion mining, aka sentiment analysis, is a branch of Natural Language Processing (NLP) that focuses on analyzing and understanding opinions, sentiments, attitudes, and emotions expressed in text data. The goal of opinion mining is to determine the sentiment polarity of a given piece of text, such as a review, comment, or social media post. However, opinion mining faces language-specific challenges that differentiate studies in less commonly researched languages from those conducted in English. This article presents a novel process for Turkish opinion mining by comparing various artificial intelligence algorithms. We conducted extensive experiments using an open-source Turkish opinion-mining dataset to ensure transparency and reproducibility. Our research evaluated traditional machine learning, deep learning-based algorithms, and pre-trained transformer models, focusing on optimizing their parameters. We also compared word embeddings with the traditional bag-of-words method. By fine-tuning hyperparameters, our optimized models significantly improved accuracy and F1 scores. The proposed process outperformed existing methods in the literature, providing valuable insights for future research in opinion mining.

Keywords

Opinion mining, natural language processing, machine learning, deep learning, pre-trained language models, transformer models

References

Rumelli M, Akkuş D, Kart O, Işık Z. “Sentiment Analysis in Turkish Text with Machine Learning Algorithms”. Innovations in Intelligent Systems & Applications Conference, ASYU 2019, 2019.
Dehkharghani R, Saygın Y, Yanıkoğlu B, Oflazer K. “SentiTurkNet: a Turkish polarity lexicon for sentiment analysis”. Language Resources & Evaluation, vol. 50, no. 3, pp. 667–685, Sep. 2016.
Çiftçi B, Apaydın MS. “A Deep Learning Approach to Sentiment Analysis in Turkish”. International Conference on Artificial Intelligence & Data Processing, IDAP 2018, 2019.
Açıkalın UU, Bardak B, Kutlu M. “Turkish Sentiment Analysis Using BERT”. 28th Signal Processing & Communications Applications Conference, SIU 2020 - Proceedings, 2020.
Demirtaş E, Pechenizkiy M. “Cross-lingual polarity detection with machine translation”. 2nd International Workshop on Issues of Sentiment Discovery & Opinion Mining, WISDOM 2013 - Held in Conjunction with SIGKDD 2013, 2013.
Gözükara F, Özel SA. “An Experimental Investigation of Document Vector Computation Methods for Sentiment Analysis of Turkish & English Reviews”. Çukurova University, Journal of Engineering and Architecture Faculty, Nov. 2016.
Kurt F, Kısa D, Karagöz P. “Investigating the Effect of Segmentation Methods on Neural Model based Sentiment Analysis on Informal Short Texts in Turkish”. ArXiv, Feb. 2019.
Görmez Y, Işık YE, Temiz M, Aydın Z. “FBSEM: A Novel Feature-Based Stacked Ensemble Method for Sentiment Analysis”. International Journal of Information Technologies, vol. 12, no. 6, pp. 11–22, Dec. 2020.
Yıldırım S. “Comparing Deep Neural Networks to Traditional Models for Sentiment Analysis in Turkish Language”. Deep Learning-based Approaches for Sentiment Analysis, pp. 311–319, 2020.
Işık M, Dağ H. “The impact of text preprocessing on the prediction of review ratings”. Turkish Journal of Electrical Engineering & Computer Science, vol. 28, no. 3, pp. 1405–1421, May 2020.
Gomes LAF, Torres RS, Côrtes ML. “Bug report severity level prediction in open-source software: A survey and research opportunities”. Information and Software Technology, vol. 115. Elsevier B.V., pp. 58–78, 01-Nov-2019. Alpaydın E. Machine Learning: The New AI. USA, The MIT Press, 2016.
Köksal Ö. “Tuning the Turkish Text Classification Process Using Supervised Machine Learning-based Algorithms”. International Conference on INnovations in Intelligent SysTems and Applications, pp. 1–7, 2020.
Köksal Ö. “Enhancing Turkish sentiment analysis using pre-trained language models”. 29th IEEE Conference on Signal Processing & Communication, 2021.
Köksal Ö, Tekinerdoğan B. “Automated Classification of Unstructured Bilingual Software Bug Reports: An Industrial Case Study Research”. Applied Science, vol. 12, no. 1, 2022.
McMahan RD, Natural language processing with PyTorch: build intelligent language applications using deep learning, USA, O’Reilly, 2019.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L. “Attention Is All You Need”. Neural Information Processing Systems, vol. 2017-Decem, pp. 5999–6009, 2017.
Devlin J, Chang MW, Lee K, Toutanova K. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186, 2019.
Köksal Ö, Yılmaz EH. “Improving automated Turkish text classification with learning-based algorithms”. Concurrency & Computation: Practice & Experience, p. e6874, Feb. 2022.
Ambalavanan AK, Devarakonda MV, “Using the contextual language model BERT for multi-criteria classification of scientific articles”. Journal of Biomedical Informatics, vol. 112, Dec. 2020.
Clark K, Luong MT, Le QV, Manning CD. “ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators”, ArXiv, vol. abs/2003.1, 2020.
OMG, “BPMN Specification - Business Process Model and Notation”, http://www.bpmn.org. (19.02.2024). Chinosi M, Trombetta A. “BPMN: An introduction to the standard”, Computer Standards & Interfaces, vol. 34, no. 1, pp. 124–134, Jan. 2012.
Mikolov T, Chen K, Corrado G, Dean J. “Efficient estimation of word representations in vector space”. International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings, 2013.
Le QV, Mikolov T. “Distributed Representations of Sentences and Documents”. 31st International Conference of Machine Learning. ICML 2014, vol. 4, pp. 2931–2939, May 2014.
Pennington J, Socher R, Manning C. “GloVe: Global Vectors for Word Representation", Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543, 2014.
Bojanowski P, Grave E, Joulin A, Mikolov T. “Enriching Word Vectors with Sub Word Information”. Transactions of the Association for Computer Linguistics., vol. 5, pp. 135–146, Dec 2017.
Çağataylı M, Celebi E. “The effect of stemming and stop-word-removal on automatic text classification in Turkish language”. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9489, pp. 168–176, 2015.
Joulin A, Grave E, Bojanowski P, Mikolov T. “Bag of Tricks for Efficient Text Classification”. 15th Conference of the European Chapter of the Association for Computational Linguistics, Volume 2, Short Papers, pp. 427–431, 2017.
Köksal Ö, O. Akgül, “A Comparative Text Classification Study with Deep Learning-Based Algorithms”. 9th International Conference on Electrical and Electronics Engineering, 2022, pp. 387–39, 2022.

There are 28 citations in total.

Details

Primary Language	English
Subjects	Natural Language Processing
Journal Section	PAPERS
Authors	Ömer Köksal 0000-0003-1372-7033
Early Pub Date	December 24, 2024
Publication Date	December 25, 2024
Submission Date	September 7, 2024
Acceptance Date	December 21, 2024
Published in Issue	Year 2024 Volume: 9 Issue: Issue: 2

Cite

APA	Köksal, Ö. (2024). Optimizing Turkish Opinion Mining: A Comparative Study of AI Algorithms. Computer Science, 9(Issue: 2), 186-201. https://doi.org/10.53070/bbd.1545101

Article Files

Full Text

The Creative Commons Attribution 4.0 International License is applied to all research papers published by JCS and

A Digital Object Identifier (DOI) is assigned for each published paper.