Türkçe Tweetler üzerinde Makine Öğrenmesi ile Nefret Söylemi Tespiti

İslam Mayda; Banu Diri; Tuğba Yıldız

doi:10.31590/ejosat.903854

TR EN

Türkçe Tweetler üzerinde Makine Öğrenmesi ile Nefret Söylemi Tespiti

Öz

Sosyal medya ağlarının sayısının ve kullanımının artması beraberinde nefret söylemi içeriklerinin de daha çok paylaşılması problemini doğurmuştur. Gerek kamu otoriteleri gerekse sosyal medya ağlarının kendileri, artan nefret söylemiyle mücadele kapsamında çeşitli politikalar üretmektedir. Kullanıcılar tarafından üretilen verinin hacminin oldukça büyük olması nedeniyle nefret söylemi tespitinde otomatik sistemlere ihtiyaç duyulmaktadır. Özellikle son yıllarda başta İngilizce olmak üzere birçok dil üzerinde otomatik nefret söylemi çalışması yapılmış olmasına rağmen Türkçe üzerine kapsamlı bir çalışma henüz sunulmamıştır. Bu çalışma bu ihtiyaca karşılık vermek amacıyla yapılmıştır. Farklı hedef gruplara dair anahtar kelimelerin geçtiği 1000 adet Türkçe tweet toplanmış ve iki değerlendirici tarafından üç sınıflı (nefret söylemi, saldırgan ifade, hiçbiri) olarak ayrı ayrı etiketlenmiştir. Oluşturulan Türkçe nefret söylemi veri seti sonraki çalışmalarda kullanılabilmesi için kamuya açık olarak paylaşılmıştır. Bu veri seti üzerinde farklı özellik kümeleri ve farklı makine öğrenmesi algoritmaları kullanılarak çeşitli testler gerçekleştirilmiştir. Üç sınıflı veri seti üzerinde en yüksek performans %79,9 F-ölçüm değeri ile SMO (Sıralı Minimal Optimizasyon) algoritmasının kullanıldığı testte elde edilmiştir. Türkçe nefret söylemi tespitinde daha başarılı sonuçlar almak için veri seti boyutunun artırılması gerekirken, sunulan bu çalışmanın gelecekte yapılacak çalışmalara öncü niteliğinde olması beklenmektedir.

Anahtar Kelimeler

Hate Speech Detection with Machine Learning on Turkish Tweets

Öz

The increase in the number and use of social media networks has led to the problem of sharing hate speech content more. Both public authorities and social media networks themselves produce various policies within the scope of combating increasing hate speech. Automated systems are needed to detect hate speech due to the very large volume of the data produced by users. Although, in recent years, automatic hate speech studies have been conducted on many languages, especially English, a comprehensive study on Turkish has not been presented yet. This study was carried out in order to meet this need. 1000 tweets in Turkish with keywords for different target groups were collected and labeled separately in three categories (hate speech, offensive expression, none of them) by two evaluators. The Turkish hate speech data set created was shared publicly for use in future studies. Various tests were carried out on this data set using different feature sets and different machine learning algorithms. The highest performance on the three-class data set was obtained in the test using the SMO (Sequential Minimal Optimization) algorithm with 79.9% F-measure value. While the size of the data set needs to be increased in order to achieve more successful results in detecting Turkish hate speech, this study is expected to be a pioneer for future studies.

Anahtar Kelimeler

Kaynakça

Akın, M. D., & Akın, A. A. (2007, Ağustos). Türk Dilleri İçin Açık Kaynaklı Doğal Dil İşleme Kütüphanesi: Zemberek. Elektrik Mühendisliği, (431), 38-44.
Alp, H. (2016). Çingenelere Yönelik Nefret Söyleminin Ekşi Sözlük’te Yeniden Üretilmesi. Ankara Üniversitesi İLEF Dergisi, 3(2), 143-172. https://doi.org/10.24955/ilef.305520
Alp, H. (2018). Suriyeli Sığınmacılara Yönelik Ayrımcı ve Ötekileştirici Söylemin Yerel Medyada Yeniden Üretilmesi. Karadeniz Teknik Üniversitesi İletişim Fakültesi Elektronik Dergisi, 5(15), 22-37.
Arcan, H. E., (2013). Interrupted Social Peace: Hate Speech in Turkish Media. The IAFOR Journal of Media, Communication and Film, 1(1), 43-56. https://doi.org/10.22492/ijmcf.1.1.04
Çelik, E. (2013). Nefret Söylemi İfade Özgürlüğünün Neresinde?. İnönü Üniversitesi Hukuk Fakültesi Dergisi, 4(2), 205-239. https://doi.org/10.21492/inuhfd.239845
Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., & Bhamidipati, N. (2015, Mayıs). Hate Speech Detection with Comment Embeddings. Proceedings of the 24th International Conference on World Wide Web (WWW’15) (pp. 29-30). https://doi.org/10.1145/2740908.2742760
Fortuna, P., & Nunes, S. (2018, Temmuz). A Survey on Automatic Detection of Hate Speech in Text. ACM Computing Surveys (CSUR), 51(4). https://doi.org/10.1145/3232676
Gagliardone, I., Gal, D., Alves, T., & Martinez, G. (2015). Countering Online Hate Speech. Paris: UNESCO Publishing.

Gelber, K., & McNamara, L. (2016). Evidencing the harms of hate speech. Social Identities, 22(3), 324-341. https://doi.org/10.1080/13504630.2015.1128810
İnsan Hakları Yüksek Komiserliği Ofisi, Birleşmiş Milletler. (1976, Mart 23). International Covenant on Civil and Political Rights. https://www.ohchr.org/en/professional interest/pages/ccpr.aspx
Kaya, S. (2018). Nefret Söyleminin Üretimi Ve Nefret Suçlarının Dolaşıma Girmesinde Facebook’un Etkisi ve Facebook Kullanım Pratiklerine Bakış. Journal of Social and Humanities Sciences Research (JSHSR), 5(28), 3263-3275. https://doi.org/10.26450/jshsr.735
Kwok, I., & Wang, Y. (2013, Temmuz). Locate the Hate: Detecting Tweets against Blacks, Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence (pp. 1621-1622).
Malmasi, S., & Zampieri, M. (2017, Eylül). Detecting Hate Speech in Social Media. Proceedings of the International Conference Recent Advances in Natural Language Processing (pp. 467–472). http://dx.doi.org/10.26615/978-954-452-049-6_062
Nockleby, J. T. (2000). Hate Speech. In Encyclopedia of the American Constitution (2nd ed.). Macmillan Reference USA.
Razavi, A. H., Inkpen, D., Uritsky, S., & Matwin, S. (2010). Offensive Language Detection Using Multi-level Classification, The 23th Canadian Conference on Artificial Intelligence (pp. 16-27). Springer. https://doi.org/10.1007/ 978-3-642-13059-5_5
Research Centre on Security and Crime (RiSSC). (2017). An Overview on Hate Crime and Hate Speech in 9 EU Countries.
Schmidt, A., & Wiegand, M. (2017). A Survey on Hate Speech Detection using Natural Language Processing. Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media (pp. 1-10). http://dx.doi.org/10.18653/v1/W17-1101
Şahi, H., Kılıç, Y., & Sağlam, R. B. (2018). Automated Detection of Hate Speech towards Woman on Twitter. Proceedings of the 3rd International Conference on Computer Science and Engineering (UBMK’18) (pp. 533-536). https://doi.org/10.1109/UBMK.2018.8566304
Vardal, Z. B. (2015). Nefret Söylemi ve Yeni Medya. Maltepe Üniversitesi İletişim Fakültesi Dergisi, 2(1), 132-156.
Waldron, J. (2014). The Harm in Hate Speech. Harvard University Press.
Warner, W., & Hirschberg, J. (2012, Haziran). Detecting Hate Speech on the World Wide Web, Proceedings of the 2012 Workshop on Language in Social Media (pp. 19-26).
Waseem, Z. (2016, Kasım). Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter. Proceedings of EMNLP 2016 Workshop on Natural Language Processing and Computational Social Science (pp. 138-142). http://dx.doi.org/10.18653/v1/W16-5618
Weber, A. (2009). Manual on hate speech. Council of Europe Publishing.

Ayrıntılar

Birincil Dil

Türkçe

Konular

Mühendislik

Bölüm

Araştırma Makalesi

Yazarlar

İslam Mayda ^*
0000-0001-5584-0259
Türkiye

Banu Diri
0000-0002-6652-4339
Türkiye

Tuğba Yıldız
0000-0002-5868-5407
Türkiye

Yayımlanma Tarihi

15 Nisan 2021

Gönderilme Tarihi

26 Mart 2021

Kabul Tarihi

6 Nisan 2021

Yayımlandığı Sayı

Yıl 2021 Sayı: 24

DOI

https://doi.org/10.31590/ejosat.903854

IZ

https://izlik.org/JA66RL66MT

Kaynak Göster

RIS / Bibtex

APA

Mayda, İ., Diri, B., & Yıldız, T. (2021). Türkçe Tweetler üzerinde Makine Öğrenmesi ile Nefret Söylemi Tespiti. Avrupa Bilim ve Teknoloji Dergisi, 24, 328-334. https://doi.org/10.31590/ejosat.903854

AMA

1.Mayda İ, Diri B, Yıldız T. Türkçe Tweetler üzerinde Makine Öğrenmesi ile Nefret Söylemi Tespiti. EJOSAT. 2021;(24):328-334. doi:10.31590/ejosat.903854

Chicago

Mayda, İslam, Banu Diri, ve Tuğba Yıldız. 2021. “Türkçe Tweetler üzerinde Makine Öğrenmesi ile Nefret Söylemi Tespiti”. Avrupa Bilim ve Teknoloji Dergisi, sy 24: 328-34. https://doi.org/10.31590/ejosat.903854.

EndNote

Mayda İ, Diri B, Yıldız T (01 Nisan 2021) Türkçe Tweetler üzerinde Makine Öğrenmesi ile Nefret Söylemi Tespiti. Avrupa Bilim ve Teknoloji Dergisi 24 328–334.

IEEE

[1]İ. Mayda, B. Diri, ve T. Yıldız, “Türkçe Tweetler üzerinde Makine Öğrenmesi ile Nefret Söylemi Tespiti”, EJOSAT, sy 24, ss. 328–334, Nis. 2021, doi: 10.31590/ejosat.903854.

ISNAD

Mayda, İslam - Diri, Banu - Yıldız, Tuğba. “Türkçe Tweetler üzerinde Makine Öğrenmesi ile Nefret Söylemi Tespiti”. Avrupa Bilim ve Teknoloji Dergisi. 24 (01 Nisan 2021): 328-334. https://doi.org/10.31590/ejosat.903854.

JAMA

1.Mayda İ, Diri B, Yıldız T. Türkçe Tweetler üzerinde Makine Öğrenmesi ile Nefret Söylemi Tespiti. EJOSAT. 2021;:328–334.

MLA

Mayda, İslam, vd. “Türkçe Tweetler üzerinde Makine Öğrenmesi ile Nefret Söylemi Tespiti”. Avrupa Bilim ve Teknoloji Dergisi, sy 24, Nisan 2021, ss. 328-34, doi:10.31590/ejosat.903854.

Vancouver

1.İslam Mayda, Banu Diri, Tuğba Yıldız. Türkçe Tweetler üzerinde Makine Öğrenmesi ile Nefret Söylemi Tespiti. EJOSAT. 01 Nisan 2021;(24):328-34. doi:10.31590/ejosat.903854

Cited By

Mültecilere Yönelik Nefret Söyleminin Tespitinde Makine Öğrenmesi Modellerinin Kullanılması

European Journal of Science and Technology

https://doi.org/10.31590/ejosat.1253132

Derin ve Sığ Makine Öğrenmesi Yöntemleri ile Türkçe Tweetlerden Saldırgan Dil Tespiti

Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi

https://doi.org/10.54525/tbbmd.1169009

So-haTRed: A Novel Hybrid System for Turkish Hate Speech Detection in Social Media With Ensemble Deep Learning Improved by BERT and Clustered-Graph Networks

IEEE Access

https://doi.org/10.1109/ACCESS.2024.3415350

Occupation Prediction from Twitter Data

Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi

https://doi.org/10.21205/deufmd.2025278013