Year 2021, Volume , Issue 23, Pages 99 - 107 2021-04-30

İnternet kullanımının yaygınlaşmasıyla birlikte insanlar düşüncelerini, o anki duygu durumlarını sosyal medya araçları ve çevrimiçi forumlar üzerinden paylaşmaya başladılar. Bu durum metin verilerinin miktarında büyük bir artışa neden oldu. Başta Twitter platformundan elde edilen veriler olmak üzere sosyal medya kaynaklı veriler duygu analizi, metin sınıflandırma, konu modelleme, ironi tespiti, görüş madenciliği gibi pek çok çalışmada kullanılmaktadır. Bu çalışmalardan biri de duruş tespitidir. Duruş tespiti, bir hedef-yorum çifti için yorum yazarının hedefe yönelik duruşunun yorum metninden otomatik olarak çıkarılması işlemidir. Burada hedef bir insan, olay, durum veya bir ürün olabilir. Duruş tespitinde amaç bir yorumun sahibinin belirli bir hedefe yönelik duruşunun “Destekliyor” / “Desteklemiyor” / “Duruş Yok” olarak sınıflandırılmasıdır. Türkçe dilinde duruş tespiti çalışmalarında kullanılmak üzere hazırlanmış kapsamlı bir veri kümesi bildiğimiz kadarıyla bulunmamaktadır. Çalışmada ilk olarak bir çevrimiçi forumdan veri kazıma yöntemi ile 6 hedef için toplanmış yorumlardan oluşan bir Türkçe Duruş Veri Seti oluşturulmuştur. Veri seti toplam 5031 hedef-yorum çiftinden oluşmaktadır. Her bir hedef-yorum çifti üniversite dil bölümü mezunu kişilerce etiketlenmiştir. Veri seti üzerinde Naive Bayes, Destek Vektör Makinesi, AdaBoost, XGBoost, Rastgele Orman ve Evrişimli Sinir Ağı yöntemleri ile duruş tespit analizi yapılarak sonuçlar paylaşılmıştır. Metin temsili olarak sözcük torbası, terim frekansı – ters doküman frekansı ve kelime gömme yöntemleri kullanılmıştır. Performans değerlendirmesinde Matthews Korelasyon Katsayısı kullanılmıştır. Yapılan deneylerde en iyi sonuçların XGBoost ve Evrişimli Sinir Ağı yöntemleri ile elde edildiği gözlemlenmiştir. Oluşturulan Evrişimli Sinir Ağı modelinden çıkartılan özniteliklere entegre grandyanlar yöntemi uygulanarak girdi verilerindeki özniteliklerin model tahminine katkıları incelenmiş; yazılan bir yorumdaki her kelimenin modelin tahminine katkısı görselleştirilerek örneklerle sunulmuştur.
With the widespread use of internet, people began to share their thoughts and their current moods through social media platforms and online forums. This led to a larger increase in the amount of generated text data. Data from social media, especially data obtained from Twitter, are used in many studies such as sentiment analysis, text classification, topic modelling, irony detection, opinion mining. One of these is stance detection. Stance detection is the process of automatically extracting the stance of a person commenting on a text from a target-comment pair. Here the target can be a person, event, case or a product. In stance detection, the purpose is to classify the stance of the commenting person as “Favor” / “Against” / “Neither”. As far as we know, there is no comprehensive dataset ready for use in stance detection studies in the Turkish language. The first contribution of the current work is the creation of a Turkish Stance Dataset consisting of comments collected for 6 targets by web scraping from an online forum. The dataset consists of a total of 5031 target-comment pairs. Each target-comment pair has been tagged by Language Department graduates. The Bag of Words, Term Frequency – Inverse Document Frequency and Word embedding methods have been used for text representation. The analysis of the results for stance detection based on Naive Bayes, Support Vector Machine, AdaBoost, XGBoost, Random Forest and Convolution Neural Networks methods are presented. Matthews Correlation Coefficient has been used for performance assessment. It has been observed that the best results have been obtained with the XGBoost and Convolutional Neural Network methods. By applying the integrated gradients method to the features extracted by the Convolutional Neural Network model, the contribution of the features input to this method to the prediction performance has been analyzed and the contribution of each word in a comment to the prediction performance has been presented by visual examples.
  • Addawood, A., Schneider, J., & Bashir, M. (2017). Stance Classification of Twitter Debates: The Encryption Debate as A Use Case. Proceedings of the 8th International Conference on Social Media & Society. https://doi.org/10.1145/3097286.3097288
  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
  • Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 785–794.
  • Chicco, D., & Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21(1), 6.
  • Chollet, F. (2018). Deep Learning mit Python und Keras: Das Praxis-Handbuch vom Entwickler der Keras-Bibliothek (pp. 180–195). MITP-Verlags GmbH & Co. KG.
  • Dey, K., Shrivastava, R., & Kaushik, S. (2017). Twitter stance detection—A subjectivity and sentiment polarity inspired two-phase approach. 2017 IEEE International Conference on Data Mining Workshops (ICDMW), 365–372.
  • Du Bois, J. W. (2007). The stance triangle. Stancetaking in Discourse: Subjectivity, Evaluation, Interaction, 164(3), 139–182.
  • Fake News Challenge. (2017). http://www.fakenewschallenge.org/
  • Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.
  • HaCohen-Kerner, Y., Ido, Z., & Ya’akobov, R. (2017). Stance classification of tweets using skip char ngrams. Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 266–278.
  • Hercig, T., Krejzl, P., Hourová, B., Steinberger, J., & Lenc, L. (2017). Detecting Stance in Czech News Commentaries. ITAT, 176–180.
  • Küçük, D. (2017). Stance Detection in Turkish Tweets.
  • Küçük, D., & Can, F. (2018). Stance Detection on Tweets: An SVM-based Approach.
  • Küçük, D., & Can, F. (2019). A Tweet Dataset Annotated for Named Entity Recognition and Stance Detection.
  • Lai, M., Far\’\ias, D. I. H., Patti, V., & Rosso, P. (2016). Friends and enemies of clinton and trump: using context for detecting stance in political tweets. Mexican International Conference on Artificial Intelligence, 155–168.
  • Luque, A., Carrasco, A., Mart\’\in, A., & de las Heras, A. (2019). The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognition, 91, 216–231.
  • Matthews, B. W. (1975). Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure, 405(2), 442–451.
  • Mohammad, S., Kiritchenko, S., Sobhani, P., Zhu, X., & Cherry, C. (2016). {S}em{E}val-2016 Task 6: Detecting Stance in Tweets. Proceedings of the 10th International Workshop on Semantic Evaluation ({S}em{E}val-2016), 31–41. https://doi.org/10.18653/v1/S16-1003
  • Mohammad, S. M., Sobhani, P., & Kiritchenko, S. (2017). Stance and sentiment in tweets. ACM Transactions on Internet Technology (TOIT), 17(3), 1–23.
  • Natekin, A., & Knoll, A. (2013). Gradient boosting machines, a tutorial. Frontiers in Neurorobotics, 7, 21.
  • Saif, H., Fernández, M., He, Y., & Alani, H. (2014). On stopwords, filtering and data sparsity for sentiment analysis of twitter.
  • Sobhani, P., Inkpen, D., & Matwin, S. (2015). From argumentation mining to stance classification. Proceedings of the 2nd Workshop on Argumentation Mining, 67–77.
  • Sobhani, P., Inkpen, D., & Zhu, X. (2017). A Dataset for Multi-Target Stance Detection. Proceedings of the 15th Conference of the {E}uropean Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, 551–557. https://www.aclweb.org/anthology/E17-2088
  • Sundararajan, M., Taly, A., & Yan, Q. (2017). Axiomatic attribution for deep networks. ArXiv Preprint ArXiv:1703.01365.
  • Taulé, M., Mart\’\i, M. A., Rangel, F. M., Rosso, P., Bosco, C., Patti, V., & others. (2017). Overview of the task on stance and gender detection in tweets on Catalan independence at IberEval 2017. 2nd Workshop on Evaluation of Human Language Technologies for Iberian Languages, IberEval 2017, 1881, 157–177.
  • Tsakalidis, A., Aletras, N., Cristea, A. I., & Liakata, M. (2018). Nowcasting the stance of social media users in a sudden vote: The case of the Greek Referendum. Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 367–376.
  • Xu, C., Paris, C., Nepal, S., & Sparks, R. (2018). Cross-Target Stance Classification with Self-Attention Networks.
  • Xu, R., Zhou, Y., Wu, D., Gui, L., Du, J., & Xue, Y. (2016). Overview of nlpcc shared task 4: Stance detection in chinese microblogs. In Natural Language Understanding and Intelligent Applications (pp. 907–916). Springer.
  • Yildiz, O. T., Avar, B., & Ercan, G. (2019). An Open, Extendible, and Fast {T}urkish Morphological Analyzer. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), 1364–1372. https://doi.org/10.26615/978-954-452-056-4_156
  • Zhang, S., Qiu, L., Chen, F., Zhang, W., Yu, Y., & Elhadad, N. (2017). We make choices we think are going to save us: Debate and stance identification for online breast cancer CAM discussions. Proceedings of the 26th International Conference on World Wide Web Companion, 1073–1081.
  • Zhou, Y., Cristea, A. I., & Shi, L. (2017). Connecting targets to tweets: Semantic attention-based model for target-specific stance detection. International Conference on Web Information Systems Engineering, 18–32.
  • Zubiaga, A., Aker, A., Bontcheva, K., Liakata, M., & Procter, R. (2018). Detection and Resolution of Rumours in Social Media: A Survey. ACM Comput. Surv., 51(2). https://doi.org/10.1145/3161603
  • Aker, A., Derczynski, L., & Bontcheva, K. (2017). Simple open stance classification for rumour analysis. ArXiv Preprint ArXiv:1708.05286.
  • Aldayel, A., & Magdy, W. (2019). Your stance is exposed! analysing possible factors for stance detection on social media. Proceedings of the ACM on Human-Computer Interaction, 3(CSCW), 1–20.
  • Liu, C., Li, W., Demarest, B., Chen, Y., Couture, S., Dakota, D., Haduong, N., Kaufman, N., Lamont, A., Pancholi, M., & others. (2016). Iucl at semeval-2016 task 6: An ensemble model for stance detection in twitter. Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), 394–400.
Primary Language tr
Subjects Engineering
Journal Section Articles
Authors

Orcid: 0000-0002-5472-8297
Author: Kaan Kemal POLAT (Primary Author)
Institution: YILDIZ TEKNİK ÜNİVERSİTESİ
Country: Turkey


Orcid: 0000-0003-0221-294X
Author: Nilgün GÜLER BAYAZIT
Institution: YILDIZ TEKNİK ÜNİVERSİTESİ
Country: Turkey


Orcid: 0000-0001-5838-4615
Author: Olcay Taner YILDIZ
Institution: ÖZYEĞİN ÜNİVERSİTESİ
Country: Turkey


Dates

Publication Date : April 30, 2021

APA Polat, K , Güler Bayazıt, N , Yıldız, O . (2021). Türkçe Duruş Tespit Analizi . Avrupa Bilim ve Teknoloji Dergisi , (23) , 99-107 . DOI: 10.31590/ejosat.851584