Fırat Üniversitesi Fen Bilimleri Dergisi

1308-9064

Fırat Üniversitesi

Classification Algorithms

Sınıflandırma algoritmaları

Sosyal Medya İçeriklerinde Toksik Dilin Tespitine Yönelik Hibrit Derin Öğrenme Modeli

A Hybrid Deep Learning Model for Detecting Toxic Language in Social Media Content

https://orcid.org/0000-0002-7240-8713

Utku

Anıl

MUNZUR ÜNİVERSİTESİ

03 30 2026

38 1 13 24 12 21 2025 02 15 2026

1987

Fırat Üniversitesi Fen Bilimleri Dergisi

Bu çalışmada, sosyal medya platformlarında yer alan toksik içeriklerin otomatik olarak tespit edilmesi amacıyla klasik makine öğrenmesi yöntemleri, derin öğrenme tabanlı modeller ve CNN–BiLSTM hibrit mimarisi karşılaştırmalı olarak ele alınmıştır. Deneysel analizler, kısa ve gürültülü metinlerden oluşan ve belirgin sınıf dengesizliği içeren Twitter veri seti üzerinde gerçekleştirilmiştir. LR, NB ve SVM gibi klasik yöntemler ile LSTM, BiLSTM ve CNN–BiLSTM hibrit modeli aynı eğitim–test bölme stratejisi altında değerlendirilmiştir. Model performansları doğruluk, kesinlik, duyarlılık ve F-skor metrikleri kullanılarak analiz edilmiştir. Deneysel sonuçlar, klasik makine öğrenmesi modellerinin toksik olmayan sınıf üzerinde yüksek doğruluk sağlarken, azınlık sınıf olan toksik içeriklerin tespitinde sınırlı kaldığını göstermektedir. Derin öğrenme tabanlı modellerin bağlamsal bağımlılıkları öğrenme yetenekleri sayesinde daha dengeli sonuçlar ürettiği gözlemlenmiştir. Önerilen CNN–BiLSTM hibrit modelinin ise yerel ve bağlamsal özellikleri birlikte ele alarak tüm modeller arasında en yüksek ve en dengeli performansı sağladığı ortaya konulmuştur.

This study comparatively examines classical machine learning methods, deep learning-based models, and a CNN–BiLSTM hybrid architecture for the automatic detection of toxic content on social media platforms. Experimental analyses were conducted on a Twitter dataset consisting of short, noisy texts with significant class imbalance. Classical methods such as LR, NB, and SVM, along with LSTM, BiLSTM, and the CNN–BiLSTM hybrid model, were evaluated under the same training-test split strategy. Model performances were analyzed using accuracy, precision, sensitivity, and F-score metrics. Experimental results show that while classical machine learning models achieve high accuracy on the non-toxic class, they are limited in detecting the minority toxic class. Deep learning-based models were observed to produce more stable results because they can learn contextual dependencies. The proposed CNN–BiLSTM hybrid model achieved the highest and most stable performance among all models, balancing local and contextual features.

Toksik dil tespiti metin sınıflandırma derin öğrenme doğal dil işleme

Toxic language detection text classification deep learning natural language processing

Santos, M. L. B. D. The “so-called” UGC: an updated definition of user-generated content in the age of social media. Online Inf Rev 2022; 46 (1), 95-113.

Canbay, P., Ekinci, E. (2023). Derin ve Sığ Makine Öğrenmesi Yöntemleri ile Türkçe Tweetlerden Saldırgan Dil Tespiti. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi 2023; 16(1), 1-10.

Sheth, A., Shalin, V. L., Kursuncu, U. Defining and detecting toxicity on social media: context and knowledge are key. Neurocomputing 2022; 490, 312-318.

Khan, J., Ahmad, K., Jagatheesaperumal, S. K., Sohn, K. A. Textual variations in social media text processing applications: challenges, solutions, and trends. Artif Intell Rev 2025; 58(3), 89.

Feuerriegel, S., Maarouf, A., Bär, D., Geissler, D., Schweisthal, J., Pröllochs, N., Van Bavel, J. J. Using natural language processing to analyse text data in behavioural science. Nat Rev Psychol 2025; 4(2), 96-111.

Zhang, L. Features extraction based on Naive Bayes algorithm and TF-IDF for news classification. PLoS One 2025; 20(7).

Das, S., Tariq, A., Santos, T., Kantareddy, S. S., Banerjee, I. Recurrent neural networks (RNNs): architectures, training tricks, and introduction to influential research. Machine learning for Brain disorders 2023; 117-138.

Rupapara, V., Rustam, F., Shahzad, H. F., Mehmood, A., Ashraf, I., Choi, G. S. (2021). Impact of SMOTE on imbalanced text features for toxic comments classification using RVVC model. IEEE Access 2021; 9, 78621-78634.

Abbasi, A., Javed, A. R., Iqbal, F., Kryvinska, N., Jalil, Z. Deep learning for religious and continent-based toxic content detection and classification. Sci Rep 2022; 12(1), 17478.

Ahmed, S. F., Alam, M. S. B., Hassan, M., Rozbu, M. R., Ishtiak, T., Rafa, N., Gandomi, A. H. Deep learning modelling techniques: current progress, applications, advantages, and challenges. Artif Intell Rev 2023; 56(11), 13521-13617.

Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., Gao, J. Deep learning--based text classification: a comprehensive review. ACM comput surv 2021; 54(3), 1-40.

Fortuna, P., Nunes, S. A survey on automatic detection of hate speech in text. Acm Comput Surv 2018; 51(4), 1-30.

Davidson, T., Warmsley, D., Macy, M., Weber, I. Automated hate speech detection and the problem of offensive language. In Proceedings of the international AAAI conference on web and social media; Mayıs 2017; Montreal, Canada. 512-515.

Wulczyn, E., Thain, N., Dixon, L. Ex machina: Personal attacks seen at scale. In Proceedings of the 26th international conference on world wide web; Nisan 2017; Perth Australia. 1391-1399.

Hochreiter, S., Schmidhuber, J. Long short-term memory. Neural comput 1997; 9(8), 1735-1780.

Badjatiya, P., Gupta, S., Gupta, M., Varma, V. Deep learning for hate speech detection in tweets. In Proceedings of the 26th international conference on World Wide Web companion; Nisan 2017; Perth Australia. 759-760.

Schuster, M., Paliwal, K. K. Bidirectional recurrent neural networks. IEEE Trans Signal Process 1997; 45(11), 2673-2681.

Hassan, A., Mahmood, A. Convolutional recurrent deep learning model for sentence classification. IEEE Acc 2018; 6, 13949-13957.

Zhang, Z., Robinson, D., Tepper, J. Detecting hate speech on twitter using a convolution-gru based deep neural network. In European semantic web conference; Haziran 2018; Crete, Greece. 745-760.

Devlin, J., Chang, M. W., Lee, K., Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language Technologies; Haziran 2019; Minneapolis, Minnesota. 4171-4186.

Kayaardı, Ümit. 2020. Twitter Toxic Tweets Dataset. Kaggle. https://www.kaggle.com/datasets/umitka/twitter-toxic-tweets/data (Erişim Tarihi: 10.10.2025)

Graf, R., Zeldovich, M., Friedrich, S. Comparing linear discriminant analysis and supervised learning algorithms for binary classification—A method comparison study. Biometrical Journal 2024; 66(1), 2200098.

Xiang, L. Application of an Improved TF‐IDF Method in Literary Text Classification. Adv Multimed 2022; 2022(1), 9285324.

Valkenborg, D., Rousseau, A. J., Geubbelmans, M., Burzykowski, T. Support vector machines. Am J Orthod Dentofacial Orthop 2023; 164(5), 754-757.

Gasparetto, A., Marcuzzo, M., Zangari, A., Albarelli, A. (2022). A survey on text classification algorithms: From text to predictions. Information, 13(2), 83.

Kumar, R., Goswami, B., Mhatre, S. M., Agrawal, S. Naive bayes in focus: a thorough examination of its algorithmic foundations and use cases. Int. J. Innov. Sci. Res. Technol 2024; 9(5), 2078-2081.

Palanivinayagam, A., El-Bayeh, C. Z., Damaševičius, R. Twenty years of machine-learning-based text classification: A systematic review. Algorithms 2023; 16(5), 236.

Krichen, M., Mihoub, A. Long short-term memory networks: A comprehensive survey. AI 2025; 6(9), 215.

Naik, D., Jaidhar, C. D. A novel Multi-Layer Attention Framework for visual description prediction using bidirectional LSTM. J of Big Data 2022; 9(1), 104.

Kowsher, M., Tahabilder, A., Sanjid, M. Z. I., Prottasha, N. J., Uddin, M. S., Hossain, M. A., Jilani, M. A. K. LSTM-ANN & BiLSTM-ANN: Hybrid deep learning models for enhanced classification accuracy. Proc Comp Sci 2021; 193, 131-140.

Utku, A., Kaya, M., Canbay, Y. A New Hybrid ConvViT Model for Dangerous Farm Insect Detection. App Sci 2025; 15(5), 2518.

Kaya, Y., Yiner, Z., Kaya, M., & Kuncan, F. A new approach to COVID-19 detection from X-ray images using angle transformation with GoogleNet and LSTM. Measurement Science and Technology 2022; 33(12), 124011.

Obi, J. C. A comparative study of several classification metrics and their performances on data. World J. Adv. Eng. Technol. Sci 2023; 8(1), 308-314.

Abbasi, A., Javed, A. R., Iqbal, F., Kryvinska, N., Jalil, Z. Deep learning for religious and continent-based toxic content detection and classification. Sci Rep 2022; 12(1), 17478.

Utku, A. Dental radyografi görüntülerinin sınıflandırılmasına yönelik hibrit ConvViT modeli. J Fac Eng Archit Gazi Univ 2025; 40(3), 2071-2086.