TY - JOUR T1 - Turkish Clickbait News Detection using Explainable Artificial Intelligence AU - Akgün, Alper Celal AU - İnkaya, Tülin PY - 2024 DA - November Y2 - 2024 JF - International Journal of Data Science and Applications PB - Sakarya University of Applied Sciences WT - DergiPark SN - 3062-3014 SP - 68 EP - 80 VL - 7 IS - 1 LA - en AB - Internet users frequently prefer digital journalism to acquire information. However, the content produced by malicious news sources leads to various issues for users. One of these issues is clickbait headlines, which are used to capture users' attention and direct them to specific content. Clickbait headlines exploit users' curiosity, causing them to navigate to targeted content and spend more time on it. Such content, which can be malicious, is one of the main problems for today's internet users. In the literature, artificial intelligence-based approaches using machine learning and deep learning models have been developed for the problem of clickbait detection. However, there is a need for studies on the explainability of artificial intelligence models developed in this field. Explainable artificial intelligence (XAI) aims to explain the transparency, understandability and decision-making processes of machine learning models. This study aims to develop explainable artificial intelligence-based models for the clickbait detection problem. In this context, a Turkish dataset compiled from different news sources was used. Initially, data preprocessing activities including feature engineering, missing data handling, stemming, normalization and term frequency-inverse document-frequency (TF-IDF) transformation were performed. Subsequently, k-nearest neighbors, Naive Bayes, logistic regression, decision tree, random forest, extreme gradient boosting (XGBoost), support vector machine and multi-layer perceptron (MLP) models were developed using the dataset. Hyperparameter optimization was applied to determine the most suitable parameter values for each model. The performances of the applied models were comparatively evaluated. Finally, to ensure the explainability of artificial intelligence models in clickbait detection, the SHAP method was used for identifying the factors affecting the classification results. KW - Clickbait Detection KW - Natural Language Processing KW - SHAP KW - Explainable Artificial Intelligence CR - We Are Social, “Digital 2023 Global Overview Report,” [Online]. Available: https://wearesocial.com/wp-content/uploads/2023/03/Digital-2023-Global-Overview-Report.pdf. Accessed: Aug. 9, 2024. CR - Z. B. Şahin and Y. Birincioğlu, “Tık odaklı başlıklar ve okuyucu refleksleri üzerine bir araştırma: Odak grup çalışması,” TRT Akademi, vol. 7, no. 14, pp. 236–261, 2022. CR - R. Raj, C. Sharma, R. Uttara, and C. R. Animon, “A Literature Review on Clickbait Detection Techniques for Social Media,” Proc. 2024 11th Int. Conf. Reliability, Infocom Technol. Optimization (ICRITO), pp. 1–5, Mar. 2024. http://dx.doi.org/10.1109/ICRITO61523.2024.10522359 CR - A. F. H. N. Adrian, N. N. Handradika, A. E. Prasojo, A. A. S. Gunawan, and K. E. Setiawan, “Clickbait Detection on Online News Headlines Using Naive Bayes and LSTM,” Proc. 2024 IEEE Int. Conf. Artificial Intell. Mechatronics Syst. (AIMS), pp. 1–6, Feb. 2024. https://doi.org/10.1109/AIMS61812.2024.10512986 CR - Y. Arfat and S. C. Tista, “Bangla Misleading Clickbait Detection Using Ensemble Learning Approach,” Proc. 2024 6th Int. Conf. Electrical Eng. Inf. Commun. Technol. (ICEEICT), pp. 184–189, May 2024. https://doi.org/10.1109/ICEEICT62016.2024.10534333 CR - W. Yang, Y. Wei, H. Wei, Y. Chen, G. Huang, X. Li, and B. Kang, “Survey on explainable AI: From approaches, limitations and applications aspects,” Human-Centric Intell. Syst., vol. 3, no. 3, pp. 161–188, 2023. https://doi.org/10.1007/s44230-023-00038-y CR - M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should I trust you? Explaining the predictions of any classifier,” Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery Data Mining, pp. 1135–1144, Aug. 2016. https://doi.org/10.1145/2939672.2939778 CR - S. M. Lundberg and S. I. Lee, “A unified approach to interpreting model predictions,” Adv. Neural Inf. Process. Syst., vol. 30, 2017. https://doi.org/10.48550/arXiv.1705.07874 CR - M. Potthast, S. Köpsel, B. Stein, and M. Hagen, “Clickbait detection,” Adv. Inf. Retrieval: Proc. 38th European Conf. IR Res. (ECIR), pp. 810–817, Mar. 2016. https://doi.org/10.1007/978-3-319-30671-1_72 CR - A. Chakraborty, B. Paranjape, S. Kakarla, and N. Ganguly, “Stop clickbait: Detecting and preventing clickbaits in online news media,” Proc. 2016 IEEE/ACM Int. Conf. Advances Social Networks Anal. Mining (ASONAM), pp. 9–16, Aug. 2016. https://doi.org/10.1109/ASONAM.2016.7752207 CR - K. K. Yadav and N. Bansal, “A Comparative Study on Clickbait Detection using Machine Learning Based Methods,” Proc. 2023 Int. Conf. Disruptive Technol. (ICDT), pp. 661–665, May 2023. https://doi.org/10.1109/ICDT57929.2023.10150475 CR - A. Chowanda, N. Nadia, and L. M. M. Kolbe, “Identifying clickbait in online news using deep learning,” Bull. Electrical Eng. Informatics, vol. 12, no. 3, pp. 1755–1761, 2023. https://doi.org/10.11591/eei.v12i3.4444 CR - C. I. Coste, D. Bufnea, and V. Niculescu, “A new language independent strategy for clickbait detection,” Proc. 2020 Int. Conf. Software, Telecommun. Comput. Networks (SoftCOM), pp. 1–6, Sep. https://doi.org/10.23919/SoftCOM50211.2020.9238342 CR - M. M. Mahtab, M. Haque, M. Hasan, and F. Sadeque, “Banglabait: Semi-supervised adversarial approach for clickbait detection on Bangla clickbait dataset,” 14th International Conference on Recent Advances in Natural Language Processing, pp 748–758, 2023. https://doi.org/10.26615/978-954-452-092-2_081 CR - D. M. Broscoteanu and R. T. Ionescu, “A Novel Contrastive Learning Method for Clickbait Detection on RoCliCo: A Romanian Clickbait Corpus of News Articles,” Findings of the Association for Computational Linguistics: EMNLP 2023, pp. 9547–9555, 2023. https://doi.org/10.18653/v1/2023.findings-emnlp.640 CR - T. Liu, K. Yu, L. Wang, X. Zhang, H. Zhou, and X. Wu, “Clickbait detection on WeChat: a deep model integrating semantic and syntactic information,” Knowledge-Based Syst., vol. 245, p. 108605, 2022. https://doi.org/10.1016/j.knosys.2022.108605 CR - Ş. Genç, “Turkish clickbait detection in social media via machine learning algorithms,” MSc Thesis, Middle East Technical University, Ankara, 2021. https://hdl.handle.net/11511/92039 CR - K. Shu, L. Cui, S. Wang, D. Lee, and H. Liu, “DEFEND: Explainable fake news detection,” Proc. 25th ACM SIGKDD Int. Conf. Knowledge Discovery Data Mining, pp. 395–405, Jul. 2019. https://doi.org/10.1145/3292500.3330935 CR - S. Y. Chien, C. J. Yang, and F. Yu, “XFlag: Explainable fake news detection model on social media,” Int. J. Human–Comput. Interaction, vol. 38, no. 18-20, pp. 1808–1827, 2022. https://doi.org/10.1080/10447318.2022.2062113 CR - V. Sharma and D. Midhunchakkaravarthy, “XGBoost classification of XAI based LIME and SHAP for detecting dementia in young adults,” Proc. 2023 14th Int. Conf. Comput. Commun. Networking Technol. (ICCCNT), pp. 1–6, Jul. 2023. https://doi.org/10.1109/ICCCNT56998.2023.10307791 CR - G. I. Pérez-Landa, O. Loyola-González, and M. A. Medina-Pérez, “An explainable artificial intelligence model for detecting,” Human-Centric Intell. Syst., vol. 2, no. 3, pp. 160–188, 2021. https://doi.org/10.3390/app112210801 CR - M. Zhou, W. Xu, W. Zhang, and Q. Jiang, “Leverage knowledge graph and GCN for fine-grained-level clickbait detection,” World Wide Web, vol. 25, no. 3, pp. 1243–1258, 2022. https://doi.org/10.1007/s11280-022-01032-3 CR - T. Turan, E. Küçüksille, and N. K. Alagöz, “Prediction of Turkish Constitutional Court decisions with explainable artificial intelligence,” Bilge Int. J. Sci. Technol. Res., vol. 7, no. 2, pp. 128–141, 2023. https://doi.org/10.30516/bilgesci.1317525 CR - S. Rao, S. Mehta, S. Kulkarni, H. Dalvi, N. Katre, and M. Narvekar, “A study of LIME and SHAP model explainers for autonomous disease predictions,” Proc. 2022 IEEE Bombay Sect. Signature Conf. (IBSSC), pp. 1–6, Dec. 2022. https://doi.org/10.1109/IBSSC56953.2022.10037324 CR - Turkish News Title 20000+ Clickbait Classified, [Online]. Available: https://www.kaggle.com/datasets/suleymancan/turkishnewstitle20000clickbaitclassified . Accessed: Aug. 9, 2024. CR - P. Domingos, “A few useful things to know about machine learning,” Commun. ACM, vol. 55, no. 10, pp. 78–87, Oct. 2012. http://dx.doi.org/10.1145/2347736.2347755 CR - A. A. Akın and M. D. Akın, “Zemberek, an open source NLP framework for Turkic languages,” Structure, vol. 10, pp. 1–5, 2007. CR - K. Sparck Jones, “A statistical interpretation of term specificity and its application in retrieval,” J. Documentation, vol. 28, no. 1, pp. 11–21, 1972. https://doi.org/10.1108/eb026526 CR - A. Kiran and D. Vasumathi, “Data mining: Min–max normalization based data perturbation technique for privacy preservation,” Proc. Third Int. Conf. Comput. Intell. Informatics: ICCII 2018, pp. 723–734, Mar. 2020. https://doi.org/10.1007/978-981-15-1480-7_66 CR - N. A. Zuhroh and N. A. Rakhmawati, “Clickbait detection: A literature review of the methods used,” Register: J. Ilmiah Teknologi Sistem Informasi, vol. 6, no. 1, pp. 1–10, 2022. http://dx.doi.org/10.26594/register.v6i1.1561 CR - S. B. Kotsiantis, I. Zaharakis, and P. Pintelas, “Supervised machine learning: A review of classification techniques,” Emerg. Artificial Intell. Applications Comput. Eng., vol. 160, no. 1, pp. 3–24, 2007. CR - I. H. Sarker, “Machine learning: Algorithms, real-world applications and research directions,” SN Comput. Sci., vol. 2, no. 3, p. 160, 2021. https://doi.org/10.1007/s42979-021-00592-x CR - G. Van den Broeck, A. Lykov, M. Schleich, and D. Suciu, “On the tractability of SHAP explanations,” J. Artificial Intell. Res., vol. 74, pp. 851–886, 2022. https://doi.org/10.1613/jair.1.13283 CR - D. M. Powers, “Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation,” International Journal of Machine Learning Technology vol. 2, no. 1, pp.37-63, 2011. https://doi.org/10.48550/arXiv.2010.16061 CR - M. Friedman, “The use of ranks to avoid the assumption of normality implicit in the analysis of variance,” J. American Stat. Assoc., vol. 32, no. 200, pp. 675–701, 1937. https://doi.org/10.1080/01621459.1937.10503522 CR - A. Benavoli, G. Corani, and F. Mangili, “Should we really use post-hoc tests based on mean-ranks?” J. Machine Learn. Res., vol. 17, no. 1, pp. 152–161, 2016. https://doi.org/10.48550/arXiv.1505.02288 CR - P. Biyani, K. Tsioutsiouliklis, and J. Blackmer, “‘8 amazing secrets for getting more clicks’: Detecting clickbaits in news streams using article informality,” Proc. 2016 AAAI Conf. Artificial Intell., vol. 30, no. 1, 2016. https://doi.org/10.1609/aaai.v30i1.9966 UR - https://dergipark.org.tr/en/pub/joindata/issue//1566476 L1 - https://dergipark.org.tr/en/download/article-file/4284992 ER -