Stance Detection on Short Turkish Text: A Case Study of Russia-Ukraine War

Serdar Arslan; Eray Fırat

doi:10.35414/akufemubid.1377465

Araştırma Makalesi

Türkçe Kısa Metinlerde Duruş Tespiti: Rusya-Ukrayna Savaşı Örneği

Yıl 2024, Cilt: 24 Sayı: 3, 602 - 619, 27.06.2024

Serdar Arslan , Eray Fırat

https://doi.org/10.35414/akufemubid.1377465

Öz

Son yıllarda sosyal medya, çeşitli konulardaki halkın görüşlerini anlamak için önemli bir bilgi kaynağı haline gelmiştir. Bu nedenle, bu verilerden otomatik bilgi çıkarmak öneminin arttığı bir alan haline gelmiştir. Doğal dil işleme alanının alt görevlerinden biri olan görüş belirleme, otomatik bilgi çıkarma için kritik bir konudur. Duruş tespiti, kullanıcının belirli bir konu, olay veya kişi hakkındaki tutumunu otomatik olarak belirler. Bu çalışmada, Rusya-Ukrayna Savaşı'na yönelik sosyal medya kullanıcılarının tutumlarını belirleme görevine odaklanan Türkçe etiketli bir veri kümesi oluşturulmuş ve bu veri kümesinde çeşitli makine öğrenimi yöntemleri değerlendirilmiştir. Bu çalışma için 8215 tweet Twitter'dan toplandı ve temizlendi. Veri kümesi daha sonra Rusya ve Ukrayna olmak üzere iki hedefle etiketlendi. Stance Detection görevi için GloVe ve FastText kelime gömme ile Support Vector Machines, Random Forest, k-Nearest Neighbor, XGBoost, Long-Short Term Memory (LSTM) ve Gated Recurrent Unit (GRU) modelleri kullanılmıştır. Ayrıca, duruş tespiti için transformer tabanlı bir yaklaşım da kullanılmıştır. Veri kümesinin hedefler arasındaki dengesizliği dikkate alındığında, bu algoritmalarla birlikte örnek azaltma ve örnek artırma yöntemleri de kullanılmıştır. Deney sonuçları, BERT tabanlı modellerin diğer tüm modelleri geride bıraktığını göstermektedir. Bu sonuçların yanı sıra, LSTM ve GRU da BERT tabanlı modelin sonuçlarına oldukça benzer sonuçlar üretmiştir. Yeni oluşturulan Türkçe veritabanı, bu araştırma alanı için değerli bir kaynak olarak kabul edilebilir ve gelecekte transformer tabanlı yaklaşımlarla birlikte kullanma potansiyeline sahiptir. Özetle, bu çalışma, Türkçe metin bağlamında duruş tespiti araştırma alanını ilerletmektedir.

Anahtar Kelimeler

Duruş tespiti, Doğal dil işleme, BERT, Derin Öğrenme

Kaynakça

ALDayel, Abeer, and Walid Magdy. 2021. “Stance Detection on Social Media: State of the Art and Trends.” Information Processing and Management 58(4):102597. https://www.doi.org/10.1016/j.ipm.2021.102597.
Allaway, Emily, and Kathleen McKeown. 2020. “Zero-Shot Stance Detection: A Dataset and Model Using Generalized Topic Representations.” EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference 8913–31. https://www.doi.org/10.18653/v1/2020.emnlp-main.717.
Bojanowski, Piotr, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. “Enriching Word Vectors with Subword Information.”
Breiman, Leo. 2001. “Random Forests.” Machine Learning 45(1):5–32. https://www.doi.org/10.1023/A:1010933404324.
Chawla, N. V, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002. “{SMOTE}: Synthetic Minority Over-Sampling Technique.” Journal of Artificial Intelligence Research 16:321–57. https://www.doi.org/10.1613/jair.953.
Chen, Tianqi, and Carlos Guestrin. 2016. “XGBoost: A Scalable Tree Boosting System.” Pp. 785–794 in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16. New York, NY, USA: Association for Computing Machinery.
Cho, Kyunghyun, Bart van Merrienboer, Çaglar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. “Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. BT - Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A Meeting of SIGDAT,.” 1724–34.
Cortes, Corinna, and Vladimir Vapnik. 1995. “Support-Vector Networks.” Machine Learning 20(3):273–97. https://www.doi.org/10.1007/BF00994018.
Cover, T., and P. Hart. 1967. “Nearest Neighbor Pattern Classification.” IEEE Transactions on Information Theory 13(1):21–27. https://www.doi.org/10.1109/TIT.1967.1053964.
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.”
Glandt, Kyle, Sarthak Khanal, Yingjie Li, Doina Caragea, and Cornelia Caragea. 2021. “Stance Detection in COVID-19 Tweets.” ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference 1596–1611. https://www.doi.org/10.18653/v1/2021.acl-long.127.
Grimminger, Lara, and Roman Klinger. 2021. “Hate Towards the Political Opponent: A Twitter Corpus Study of the 2020 US Elections on the Basis of Offensive Speech and Stance Detection.” WASSA 2021 - Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Proceedings of the 11th Workshop 171–80.
Hochreiter, Sepp, and Jürgen Schmidhuber. 1997. “Long Short-Term Memory.” Neural Comput. 9(8):1735–80. https://www.doi.org/10.1162/neco.1997.9.8.1735.
Jurafsky, Daniel, and James Martin. 2014. Speech and Language Processing. Vol. 3. Küçük, Dilek. 2017. “Stance Detection in Turkish Tweets.” CEUR Workshop Proceedings 1914:3–6. https://www.doi.org/10.475/123.
Küçük, Dilek, and Fazli Can. 2018. “Stance Detection on Tweets: An SVM-Based Approach.”
Küçük, Dilek, and Fazli Can. 2020. “Stance Detection: A Survey.” ACM Comput. Surv. 53(1). https://www.doi.org/10.1145/3369026.
Küçük, Doğan, and Nursal Arıcı. 2022. “Sentiment Analysis and Stance Detection in Turkish Tweets About COVID-19 Vaccination.” Pp. 371–87.
Li, Yingjie, and Cornelia Caragea. 2019. “Multi-Task Stance Detection with Sentiment and Stance Lexicons.” EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference 6299–6305. https://www.doi.org/10.18653/v1/d19-1657.
Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Efficient Estimation of Word Representations in Vector Space.” 1st International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings 1–12.
Mohammad, Saif, Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, and Colin Cherry. 2016. “A Dataset for Detecting Stance in Tweets.” Pp. 3945–52 in Proceedings of the Tenth International Conference on Language Resources and Evaluation ({LREC}’16). Portorož, Slovenia: European Language Resources Association (ELRA). Nababan, Arif Hamied, Rahmad Mahendra, and Indra Budi. 2021. “Twitter Stance Detection towards Job Creation Bill.” Procedia Computer Science 197(2021):76–81. https://www.doi.org/10.1016/j.procs.2021.12.120.
Pedregosa, Fabian, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, and others. 2011. “Scikit-Learn: Machine Learning in Python.” Journal of Machine Learning Research 12(Oct):2825–30.
Pennington, Jeffrey, Richard Socher, and Christopher Manning. 2014. “{G}lo{V}e: Global Vectors for Word Representation.” Pp. 1532–43 in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ({EMNLP}). Doha, Qatar: Association for Computational Linguistics.
Polat, Kaan Kemal, Nilgün Güler Bayazıt, and Olcay Taner Yildiz. 2021. “Türkçe Duruş Tespit Analizi.” European Journal of Science and Technology (23):99–107. https://www.doi.org/10.31590/ejosat.851584.
Pomerleau, Dean, and Delip Rao. 2015. “Fake News Challenge.” 2015 http://fakenewschallenge.org/. Retrieved (http://www.fakenewschallenge.org/).
Riedel, Benjamin, Isabelle Augenstein, Georgios P. Spithourakis, and Sebastian Riedel. 2018. “A Simple but Tough-to-Beat Baseline for the Fake News Challenge Stance Detection Task.”
Samih, Younes, and Kareem Darwish. 2021. “A Few Topical Tweets Are Enough for Effective User Stance Detection.” EACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference 2637–46. https://www.doi.org/10.18653/v1/2021.eacl-main.227.
Tunali, Volkan, and Turgay Tugay Bilgin. 2012. “Examining the Impact of Stemming on Clustering Turkish Texts.” Pp. 1–4 in 2012 International Symposium on Innovations in Intelligent Systems and Applications.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” in Advances in Neural Information Processing Systems. Vol. 30.
Yıldırım, Ezgi, Fatih Samet Çetin, Gülşen Eryiğit, and Tanel Temel. 2014. “The Impact of NLP on Turkish Sentiment Analysis.” Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi 7(1):43–51.

Stance Detection on Short Turkish Text: A Case Study of Russia-Ukraine War

Yıl 2024, Cilt: 24 Sayı: 3, 602 - 619, 27.06.2024

Serdar Arslan , Eray Fırat

https://doi.org/10.35414/akufemubid.1377465

Öz

In recent years, social media has emerged as a crucial source of information for gauging public sentiment on a variety of topics. As a result, the need for automated data extraction from these platforms has grown. Stance detection, a subtask in natural language processing, plays a pivotal role in this process by automatically determining users' opinions regarding specific subjects, events, or individuals. To address this, we developed a labeled Turkish dataset focused on determining users' stances on the Russia-Ukraine War using social media content. The dataset, comprising 8215 tweets from Twitter, was meticulously cleaned and annotated for two key targets: Russia and Ukraine. We evaluated several machine learning methods, including Support Vector Machines, Random Forest, k-Nearest Neighbor, XGBoost, Long-Short Term Memory (LSTM), and Gated Recurrent Unit (GRU), with word embeddings from GloVe and FastText. Additionally, we incorporated a transformer-based approach for stance detection. Given the dataset's imbalance between targets, we applied undersampling and oversampling techniques alongside these algorithms. Our experiment results indicate that BERT-based models outperformed all other methods, with LSTM and GRU producing similarly strong outcomes. The newly established Turkish corpus stands as a valuable resource in this field, with potential for future use in conjunction with transformer-based approaches. In summary, this study advances the field of stance detection research in the context of Turkish text.

Anahtar Kelimeler

BERT, NLP, Deep Learning, Stance Detection

Kaynakça

ALDayel, Abeer, and Walid Magdy. 2021. “Stance Detection on Social Media: State of the Art and Trends.” Information Processing and Management 58(4):102597. https://www.doi.org/10.1016/j.ipm.2021.102597.
Allaway, Emily, and Kathleen McKeown. 2020. “Zero-Shot Stance Detection: A Dataset and Model Using Generalized Topic Representations.” EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference 8913–31. https://www.doi.org/10.18653/v1/2020.emnlp-main.717.
Bojanowski, Piotr, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. “Enriching Word Vectors with Subword Information.”
Breiman, Leo. 2001. “Random Forests.” Machine Learning 45(1):5–32. https://www.doi.org/10.1023/A:1010933404324.
Chawla, N. V, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002. “{SMOTE}: Synthetic Minority Over-Sampling Technique.” Journal of Artificial Intelligence Research 16:321–57. https://www.doi.org/10.1613/jair.953.
Chen, Tianqi, and Carlos Guestrin. 2016. “XGBoost: A Scalable Tree Boosting System.” Pp. 785–794 in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16. New York, NY, USA: Association for Computing Machinery.
Cho, Kyunghyun, Bart van Merrienboer, Çaglar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. “Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. BT - Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A Meeting of SIGDAT,.” 1724–34.
Cortes, Corinna, and Vladimir Vapnik. 1995. “Support-Vector Networks.” Machine Learning 20(3):273–97. https://www.doi.org/10.1007/BF00994018.
Cover, T., and P. Hart. 1967. “Nearest Neighbor Pattern Classification.” IEEE Transactions on Information Theory 13(1):21–27. https://www.doi.org/10.1109/TIT.1967.1053964.
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.”
Glandt, Kyle, Sarthak Khanal, Yingjie Li, Doina Caragea, and Cornelia Caragea. 2021. “Stance Detection in COVID-19 Tweets.” ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference 1596–1611. https://www.doi.org/10.18653/v1/2021.acl-long.127.
Grimminger, Lara, and Roman Klinger. 2021. “Hate Towards the Political Opponent: A Twitter Corpus Study of the 2020 US Elections on the Basis of Offensive Speech and Stance Detection.” WASSA 2021 - Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Proceedings of the 11th Workshop 171–80.
Hochreiter, Sepp, and Jürgen Schmidhuber. 1997. “Long Short-Term Memory.” Neural Comput. 9(8):1735–80. https://www.doi.org/10.1162/neco.1997.9.8.1735.
Jurafsky, Daniel, and James Martin. 2014. Speech and Language Processing. Vol. 3. Küçük, Dilek. 2017. “Stance Detection in Turkish Tweets.” CEUR Workshop Proceedings 1914:3–6. https://www.doi.org/10.475/123.
Küçük, Dilek, and Fazli Can. 2018. “Stance Detection on Tweets: An SVM-Based Approach.”
Küçük, Dilek, and Fazli Can. 2020. “Stance Detection: A Survey.” ACM Comput. Surv. 53(1). https://www.doi.org/10.1145/3369026.
Küçük, Doğan, and Nursal Arıcı. 2022. “Sentiment Analysis and Stance Detection in Turkish Tweets About COVID-19 Vaccination.” Pp. 371–87.
Li, Yingjie, and Cornelia Caragea. 2019. “Multi-Task Stance Detection with Sentiment and Stance Lexicons.” EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference 6299–6305. https://www.doi.org/10.18653/v1/d19-1657.
Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Efficient Estimation of Word Representations in Vector Space.” 1st International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings 1–12.
Mohammad, Saif, Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, and Colin Cherry. 2016. “A Dataset for Detecting Stance in Tweets.” Pp. 3945–52 in Proceedings of the Tenth International Conference on Language Resources and Evaluation ({LREC}’16). Portorož, Slovenia: European Language Resources Association (ELRA). Nababan, Arif Hamied, Rahmad Mahendra, and Indra Budi. 2021. “Twitter Stance Detection towards Job Creation Bill.” Procedia Computer Science 197(2021):76–81. https://www.doi.org/10.1016/j.procs.2021.12.120.
Pedregosa, Fabian, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, and others. 2011. “Scikit-Learn: Machine Learning in Python.” Journal of Machine Learning Research 12(Oct):2825–30.
Pennington, Jeffrey, Richard Socher, and Christopher Manning. 2014. “{G}lo{V}e: Global Vectors for Word Representation.” Pp. 1532–43 in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing ({EMNLP}). Doha, Qatar: Association for Computational Linguistics.
Polat, Kaan Kemal, Nilgün Güler Bayazıt, and Olcay Taner Yildiz. 2021. “Türkçe Duruş Tespit Analizi.” European Journal of Science and Technology (23):99–107. https://www.doi.org/10.31590/ejosat.851584.
Pomerleau, Dean, and Delip Rao. 2015. “Fake News Challenge.” 2015 http://fakenewschallenge.org/. Retrieved (http://www.fakenewschallenge.org/).
Riedel, Benjamin, Isabelle Augenstein, Georgios P. Spithourakis, and Sebastian Riedel. 2018. “A Simple but Tough-to-Beat Baseline for the Fake News Challenge Stance Detection Task.”
Samih, Younes, and Kareem Darwish. 2021. “A Few Topical Tweets Are Enough for Effective User Stance Detection.” EACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference 2637–46. https://www.doi.org/10.18653/v1/2021.eacl-main.227.
Tunali, Volkan, and Turgay Tugay Bilgin. 2012. “Examining the Impact of Stemming on Clustering Turkish Texts.” Pp. 1–4 in 2012 International Symposium on Innovations in Intelligent Systems and Applications.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” in Advances in Neural Information Processing Systems. Vol. 30.
Yıldırım, Ezgi, Fatih Samet Çetin, Gülşen Eryiğit, and Tanel Temel. 2014. “The Impact of NLP on Turkish Sentiment Analysis.” Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi 7(1):43–51.

Toplam 29 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Bilgisayar Yazılımı
Bölüm	Makaleler
Yazarlar	Serdar Arslan 0000-0003-3115-0741 Eray Fırat 0009-0008-8114-2807
Erken Görünüm Tarihi	8 Haziran 2024
Yayımlanma Tarihi	27 Haziran 2024
Gönderilme Tarihi	17 Ekim 2023
Kabul Tarihi	8 Mayıs 2024
Yayımlandığı Sayı	Yıl 2024 Cilt: 24 Sayı: 3

Kaynak Göster

APA	Arslan, S., & Fırat, E. (2024). Stance Detection on Short Turkish Text: A Case Study of Russia-Ukraine War. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, 24(3), 602-619. https://doi.org/10.35414/akufemubid.1377465
AMA	Arslan S, Fırat E. Stance Detection on Short Turkish Text: A Case Study of Russia-Ukraine War. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi. Haziran 2024;24(3):602-619. doi:10.35414/akufemubid.1377465
Chicago	Arslan, Serdar, ve Eray Fırat. “Stance Detection on Short Turkish Text: A Case Study of Russia-Ukraine War”. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi 24, sy. 3 (Haziran 2024): 602-19. https://doi.org/10.35414/akufemubid.1377465.
EndNote	Arslan S, Fırat E (01 Haziran 2024) Stance Detection on Short Turkish Text: A Case Study of Russia-Ukraine War. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi 24 3 602–619.
IEEE	S. Arslan ve E. Fırat, “Stance Detection on Short Turkish Text: A Case Study of Russia-Ukraine War”, Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, c. 24, sy. 3, ss. 602–619, 2024, doi: 10.35414/akufemubid.1377465.
ISNAD	Arslan, Serdar - Fırat, Eray. “Stance Detection on Short Turkish Text: A Case Study of Russia-Ukraine War”. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi 24/3 (Haziran 2024), 602-619. https://doi.org/10.35414/akufemubid.1377465.
JAMA	Arslan S, Fırat E. Stance Detection on Short Turkish Text: A Case Study of Russia-Ukraine War. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi. 2024;24:602–619.
MLA	Arslan, Serdar ve Eray Fırat. “Stance Detection on Short Turkish Text: A Case Study of Russia-Ukraine War”. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, c. 24, sy. 3, 2024, ss. 602-19, doi:10.35414/akufemubid.1377465.
Vancouver	Arslan S, Fırat E. Stance Detection on Short Turkish Text: A Case Study of Russia-Ukraine War. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi. 2024;24(3):602-19.

Kapak Resmi İndir

Makale Dosyaları

Tam Metin

Bu eser Creative Commons Atıf-GayriTicari 4.0 Uluslararası Lisansı ile lisanslanmıştır.