Türkçe metinlerde makine öğrenmesi yöntemleri ile siber zorbalık tespiti

Enver Yazğılı; Muhammet Baykara

doi:10.17714/gumusfenbil.935448

Araştırma Makalesi

BibTex

RIS

Kaynak Göster

Türkçe metinlerde makine öğrenmesi yöntemleri ile siber zorbalık tespiti

Yıl 2022, Cilt: 12 Sayı: 2, 443 - 453, 15.04.2022

Enver Yazğılı , Muhammet Baykara

https://doi.org/10.17714/gumusfenbil.935448

Cited By: 3

Öz

İnternet kullanımının yaygınlaşması ve sosyal medya platformlarının popülaritesinin artması siber zorbalık olarak adlandırılan eylemlerin hızla yayılmasına neden olmuştur. Dünya genelinde siber zorbalığa maruz kalan kişilerin sayısı her geçen gün artmaktadır ve bu da mağdurlar üzerinde büyük etkiler yaratmaktadır. Bu eylemin tespit edilmesi, yeni mağdurların ortaya çıkmaması ve mevcut mağdurların daha fazla bu eyleme maruz kalmaması açısından büyük önem taşımaktadır. Bu noktada literatürde siber zorbalık tespitine yönelik birçok çalışmanın gerçekleştirildiği görülmüş ancak Türkçe metinlerde yapılan çalışma sayısının çok az olduğu tespit edilmiştir. Bu çalışmada kaggle adlı paylaşım sitesinden elde edilmiş ve manuel olarak oluşturulan 3000 cümlelik hazır Türkçe bir veri seti üzerinde doğal dil işleme yöntemleri kullanılarak siber zorbalık tespiti gerçekleştirilmiştir. Çalışmada kullanılan veri setinin yeni olması ve bildiğimiz kadarıyla bu kadar çok sayıda algoritmanın literatürde test edilmemiş olması nedeniyle bu çalışmanın literatüre katkı sağlayacağı düşünülmektedir. Çalışmada bu veri seti üzerinde Bagging, Boosting, C4.5, Gradient Boosting, K-Means, KNN, LR, NB, ANN, RO, DVM, Stokastik Gradient Descent ve XGBoost algoritmaları karşılaştırmalı olarak ilk kez kullanılmıştır.

Anahtar Kelimeler

Bilgi güvenliği, Makine öğrenmesi, Siber güvenlik, Siber suç, Siber zorbalık, Veri analizi

Kaynakça

1. Barlet, C. P. “Cyberbullying, Traditional Bullying, and Aggression: A Complicated Relationship”, Predicting Cyberbullying Research, Theory, and Intervention-2019, Pages 11-16, https://doi.org/10.1016/B978-0-12-816653-6.00002-9
2. TUIK, Hane halkı bilişim teknolojileri kullanım araştırması. Sayı: 33679.
3. Balakrishnan, V. Khan, S. Fernandez, T. Arabnia,H. R. “Cyberbullying detection on twitter using Big Five and Dark Triad features”, Personality and Individual Differences, Volume 141, 15 April 2019, Pages 252-257, https://doi.org/10.1016/j.paid.2019.01.024
4. Balakrishnan, V. Khan, S. Arabnia, H. R. “Improving cyberbullying detection using Twitter users’ psychological features and machine learning”, Computers & Security, Volume 90, March 2020, 101710, https://doi.org/10.1016/j.cose.2019.101710
5. Modha, S. Majumder, P. Mandl, T. Mandalia, C. “Detecting and visualizing hate speech in social media: A cyber Watchdog for surveillance”, Expert Systems with Applications Volume 161, 15 December 2020, 113725, https://doi.org/10.1016/j.eswa.2020.113725
6. MinSong, T. Song, J. “Prediction of Risk Factors of cyberbullying-related words in Korea: Application of Data Mining Using Social Big Data”, Telematics and Informatics Available online 9 November 2020, 101524, In Press, Journal Pre-proofWhat are Journal Pre-proof articles?, https://doi.org/10.1016/j.tele.2020.101524
7. Fortunatus, M. Anthony, P. Charters, S. “Combining textual features to detect cyberbullying in social media posts”, Procedia Computer Science Volume 176, 2020, Pages 612-621, https://doi.org/10.1016/j.procs.2020.08.063
8. Agrawal, S. Aweka, A. “Deep Learning for Detecting Cyberbullying Across Multiple Social Media Platforms”, Springer International Publishing AG, part of Springer Nature 2018 G. Pasi et al. (Eds.): ECIR 2018, LNCS 10772, pp. 141–153, 2018. https://doi.org/10.1007/978-3-319-76941-7_11
9. Hosseinmardi, H. Mattson, S. A. Ibn Rafiq, R. Han, R. Lv, Q. Mishra, S. “Detection of Cyberbullying Incidents on the Instagram Social Network”, arXiv: 1503.03909v1 [cs.SI] 12 Mar 2015
10. N-Garci´a, P. G. De La Puerta, J. G. Go´Mez, C. L. Santos, I. Bringas, P. G. “Supervised machine learning for the detection of troll profiles in twitter social network: application to a real case of cyberbullying”, Vol. 24 No. 1, © The Author 2015. Published by Oxford University Press. All rights reserved. doi:10.1093/jigpal/jzv048 Advance Access published 31 October 2015
11. Duwairi, R. M. Marji, R. Sha'ban, N. Rushaidat, S. “Sentiment Analysis in Arabic Tweets”, 5th International Conference on Information and Communication Systems (ICICS), 2014.
12. Squicciarini, A. Rajtmajer, S. Liu, Y. Griffin, C. “Identification and characterization of cyberbullying dynamics in an online social network”, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM '15, August 25-28, 2015, Paris, France 280 © 2015 ACM. ISBN 978-1-4503-3854-7/15/08 $15.00 DOI: http://dx.doi.org/10.1145/2808797.2809398
13. Al-Mamun, A. Akhter, S. “Social Media Bullying Detection Using Machine Learning On Bangla Text”, 10th International Conference on Electrical and Computer Engineering, 20-22 December 2018, Dhaka, Bangladesh.
14. Hussain, M. G. Al Mahmud, T. Akthar, W. “An Approach to Detect Abusive Bangla Text”, International Conference on Innovation in Engineering and Technology (ICIET), 27-29 December 2018.
15. Shekhar, A. Mathangi, V. “A Bag-of-Phonetic-Codes Modelfor Cyber-Bullying Detection in Twitter.”, 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT) (2018): 1-7.
16. Venckauskas, A. Karpavicius, A. Damaševičius, R. Marcinkevičius, R. Kapočiūte-Dzikiené, J. Napoli, C. “Open Class Authorship Attribution of Lithuanian Internet Comments using One-Class Classifier”, 2017 Federated Conference on Computer Science and Information Systems (FedCSIS), 3-6 September 2017, Prague, Czech Republic.
17. Zois, D. S. Kapodistria, A. Yao, M. Chelmis, C. “Optimal Online Cyberbullying Detection”, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 15-20 April 2018, Calgary, AB, Canada.
18. Dadvar, M. Jong, F. D. Ordelman, R. Trieschnigg, D. “Improved Cyberbullying Detection Using Gender Information”, Human Media Interaction Group, University of Twente PO Box 217, 7500AE, Enschede, the Netherlands
19. Al-Mamun, A. Akhter, S. “Social Media Bullying Detection Using Machine Learning On Bangla Text”, 10th International Conference on Electrical and Computer Engineering, 20-22 December 2018, Dhaka, Bangladesh.
20. https://www.kaggle.com/tbrknt/detection-of-cyberbullying-in-turkish
21. Breiman, L. “Random Forests, Machine Learning”, 45 (1): 5-32, 2001
22. Akman, M. Genç, Y. Ankaralı, H. 2011. Random forests yöntemi ve sağlık alanında bir uygulama, Türkiye Klinikleri Journal of Biostatistics, 3 (1): 36-48.
23. J. R. Quinlan, “Bagging, boosting, and c4.5,” in Proceedings of the Thirteenth National Conference on Artificial Intelligence, Portland, Oregon, 1996, pp. 725–730.
24. Aydın, C. “Makine Öğrenmesi Algoritmaları Kullanılarak İtfaiye İstasyonu İhtiyacının Sınıflandırılması”, Avrupa Bilim ve Teknoloji Dergisi Sayı 14, S.169-175, Aralık 2018
25. A. McCallum and K. Nigam, “A comparison of event models for Naive Bayes text classification,” in AAAI-98 workshop on learning for text categorization, vol. 752. no. 1. 1998.
26. Saravanaraj, A. Sheeba, J. I. Pradeep Devaneyan, S. “Automatic Detection Of Cyberbullying From Twitter”, IRACST - International Journal of Computer Science and Information Technology & Security (IJCSITS), ISSN: 2249-9555 Vol.6, No.6, Nov-Dec 2016
27. Ayo, F. E. Folorunso, O. Ibharalu, F. T. Osinuga, I. A. “Machine learning techniques for hate speech classification of twitter data: State-of-the-art, future challenges and research directions”, Computer Science Review Volume 38, November 2020, 100311November 2020, 100311, https://doi.org/10.1016/j.cosrev.2020.100311
28. Atalay, M. Çelik, E. “Büyük Veri Analizinde Yapay Zekâ Ve Makine Öğrenmesi Uygulamaları”, Mehmet Akif Ersoy Üniversitesi Sosyal Bilimler Enstitüsü Dergisi Cilt.9 Sayı.22 2017 - Aralık (s.155-172)
29. Chandrashekhar, A. M. Raghuveer, K. “Amalgamation of K-means Clustering Algorithm with Standard MLP and SVM Based Neural Networks to Implement Network Intrusion Detection System”, Advanced Computing, Networking and Informatics - Volume 2, Smart Innovation, Systems and Technologies 28, 273, DOI: 10.1007/978-3-319-07350-7_31, © Springer International Publishing Switzerland 2014
30. Sheikhi, S.” An Efficient Method for Detection of Fake Accounts on the Instagram Platform”, International Information and Engineering Technology Association, Page: 429-436, September 2020, https://doi.org/10.18280/ria.340407
31. Callens, A. Morichon, D. Abadie, S. Delpey, M. Liquet, B. “, Using Random forest and Gradient boosting trees to improve wave forecast at a specific location”, Applied Ocean Research Volume 104, November 2020, 10233, https://doi.org/10.1016/j.apor.2020.102339
32. Bardina, M. Vaganov, D. Guleva, V. “Socio-demographic features meet interests: on subscription patterns and attention distribution in online social media”, Procedia Computer Science Volume 178, 2020, Pages 162-171,
33. https://doi.org/10.1016/j.procs.2020.11.018Subsection (Brief Heeding)

Yıl 2022, Cilt: 12 Sayı: 2, 443 - 453, 15.04.2022

Enver Yazğılı , Muhammet Baykara

https://doi.org/10.17714/gumusfenbil.935448

Cited By: 3

Öz

Undoubtedly, the widespread use of the internet and the increasing popularity of social media platforms have caused the rapid spread of the actions called cyberbullying. The number of people subjected to cyberbullying throughout the world is increasing day by day and it has a great impact on their victims. Identifying this action is of great importance in terms of preventing the emergence of new victims and not being exposed to this action any more. At this point, it has been observed that many studies have been carried out in the literature on the detection of cyberbullying, but it has been determined that the number of studies in Turkish texts is very low. It is thought that this study will contribute to the literature because the dataset used in the study is new and to the best of our knowledge, such a large number of algorithms have not been tested in the literature. In the study, Bagging, Boosting, C4.5, Gradient Boosting, K-Means, KNN, LR, NB, ANN, RO, DVM, Stochastic Gradient Descent and XGBoost algorithms were used comparatively for the first time on this data set.

Anahtar Kelimeler

Information security, Machine learning, Cyber security, Cyber crime, Cyberbullying, Data analysis

Kaynakça

1. Barlet, C. P. “Cyberbullying, Traditional Bullying, and Aggression: A Complicated Relationship”, Predicting Cyberbullying Research, Theory, and Intervention-2019, Pages 11-16, https://doi.org/10.1016/B978-0-12-816653-6.00002-9
2. TUIK, Hane halkı bilişim teknolojileri kullanım araştırması. Sayı: 33679.
3. Balakrishnan, V. Khan, S. Fernandez, T. Arabnia,H. R. “Cyberbullying detection on twitter using Big Five and Dark Triad features”, Personality and Individual Differences, Volume 141, 15 April 2019, Pages 252-257, https://doi.org/10.1016/j.paid.2019.01.024
4. Balakrishnan, V. Khan, S. Arabnia, H. R. “Improving cyberbullying detection using Twitter users’ psychological features and machine learning”, Computers & Security, Volume 90, March 2020, 101710, https://doi.org/10.1016/j.cose.2019.101710
5. Modha, S. Majumder, P. Mandl, T. Mandalia, C. “Detecting and visualizing hate speech in social media: A cyber Watchdog for surveillance”, Expert Systems with Applications Volume 161, 15 December 2020, 113725, https://doi.org/10.1016/j.eswa.2020.113725
6. MinSong, T. Song, J. “Prediction of Risk Factors of cyberbullying-related words in Korea: Application of Data Mining Using Social Big Data”, Telematics and Informatics Available online 9 November 2020, 101524, In Press, Journal Pre-proofWhat are Journal Pre-proof articles?, https://doi.org/10.1016/j.tele.2020.101524
7. Fortunatus, M. Anthony, P. Charters, S. “Combining textual features to detect cyberbullying in social media posts”, Procedia Computer Science Volume 176, 2020, Pages 612-621, https://doi.org/10.1016/j.procs.2020.08.063
8. Agrawal, S. Aweka, A. “Deep Learning for Detecting Cyberbullying Across Multiple Social Media Platforms”, Springer International Publishing AG, part of Springer Nature 2018 G. Pasi et al. (Eds.): ECIR 2018, LNCS 10772, pp. 141–153, 2018. https://doi.org/10.1007/978-3-319-76941-7_11
9. Hosseinmardi, H. Mattson, S. A. Ibn Rafiq, R. Han, R. Lv, Q. Mishra, S. “Detection of Cyberbullying Incidents on the Instagram Social Network”, arXiv: 1503.03909v1 [cs.SI] 12 Mar 2015
10. N-Garci´a, P. G. De La Puerta, J. G. Go´Mez, C. L. Santos, I. Bringas, P. G. “Supervised machine learning for the detection of troll profiles in twitter social network: application to a real case of cyberbullying”, Vol. 24 No. 1, © The Author 2015. Published by Oxford University Press. All rights reserved. doi:10.1093/jigpal/jzv048 Advance Access published 31 October 2015
11. Duwairi, R. M. Marji, R. Sha'ban, N. Rushaidat, S. “Sentiment Analysis in Arabic Tweets”, 5th International Conference on Information and Communication Systems (ICICS), 2014.
12. Squicciarini, A. Rajtmajer, S. Liu, Y. Griffin, C. “Identification and characterization of cyberbullying dynamics in an online social network”, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM '15, August 25-28, 2015, Paris, France 280 © 2015 ACM. ISBN 978-1-4503-3854-7/15/08 $15.00 DOI: http://dx.doi.org/10.1145/2808797.2809398
13. Al-Mamun, A. Akhter, S. “Social Media Bullying Detection Using Machine Learning On Bangla Text”, 10th International Conference on Electrical and Computer Engineering, 20-22 December 2018, Dhaka, Bangladesh.
14. Hussain, M. G. Al Mahmud, T. Akthar, W. “An Approach to Detect Abusive Bangla Text”, International Conference on Innovation in Engineering and Technology (ICIET), 27-29 December 2018.
15. Shekhar, A. Mathangi, V. “A Bag-of-Phonetic-Codes Modelfor Cyber-Bullying Detection in Twitter.”, 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT) (2018): 1-7.
16. Venckauskas, A. Karpavicius, A. Damaševičius, R. Marcinkevičius, R. Kapočiūte-Dzikiené, J. Napoli, C. “Open Class Authorship Attribution of Lithuanian Internet Comments using One-Class Classifier”, 2017 Federated Conference on Computer Science and Information Systems (FedCSIS), 3-6 September 2017, Prague, Czech Republic.
17. Zois, D. S. Kapodistria, A. Yao, M. Chelmis, C. “Optimal Online Cyberbullying Detection”, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 15-20 April 2018, Calgary, AB, Canada.
18. Dadvar, M. Jong, F. D. Ordelman, R. Trieschnigg, D. “Improved Cyberbullying Detection Using Gender Information”, Human Media Interaction Group, University of Twente PO Box 217, 7500AE, Enschede, the Netherlands
19. Al-Mamun, A. Akhter, S. “Social Media Bullying Detection Using Machine Learning On Bangla Text”, 10th International Conference on Electrical and Computer Engineering, 20-22 December 2018, Dhaka, Bangladesh.
20. https://www.kaggle.com/tbrknt/detection-of-cyberbullying-in-turkish
21. Breiman, L. “Random Forests, Machine Learning”, 45 (1): 5-32, 2001
22. Akman, M. Genç, Y. Ankaralı, H. 2011. Random forests yöntemi ve sağlık alanında bir uygulama, Türkiye Klinikleri Journal of Biostatistics, 3 (1): 36-48.
23. J. R. Quinlan, “Bagging, boosting, and c4.5,” in Proceedings of the Thirteenth National Conference on Artificial Intelligence, Portland, Oregon, 1996, pp. 725–730.
24. Aydın, C. “Makine Öğrenmesi Algoritmaları Kullanılarak İtfaiye İstasyonu İhtiyacının Sınıflandırılması”, Avrupa Bilim ve Teknoloji Dergisi Sayı 14, S.169-175, Aralık 2018
25. A. McCallum and K. Nigam, “A comparison of event models for Naive Bayes text classification,” in AAAI-98 workshop on learning for text categorization, vol. 752. no. 1. 1998.
26. Saravanaraj, A. Sheeba, J. I. Pradeep Devaneyan, S. “Automatic Detection Of Cyberbullying From Twitter”, IRACST - International Journal of Computer Science and Information Technology & Security (IJCSITS), ISSN: 2249-9555 Vol.6, No.6, Nov-Dec 2016
27. Ayo, F. E. Folorunso, O. Ibharalu, F. T. Osinuga, I. A. “Machine learning techniques for hate speech classification of twitter data: State-of-the-art, future challenges and research directions”, Computer Science Review Volume 38, November 2020, 100311November 2020, 100311, https://doi.org/10.1016/j.cosrev.2020.100311
28. Atalay, M. Çelik, E. “Büyük Veri Analizinde Yapay Zekâ Ve Makine Öğrenmesi Uygulamaları”, Mehmet Akif Ersoy Üniversitesi Sosyal Bilimler Enstitüsü Dergisi Cilt.9 Sayı.22 2017 - Aralık (s.155-172)
29. Chandrashekhar, A. M. Raghuveer, K. “Amalgamation of K-means Clustering Algorithm with Standard MLP and SVM Based Neural Networks to Implement Network Intrusion Detection System”, Advanced Computing, Networking and Informatics - Volume 2, Smart Innovation, Systems and Technologies 28, 273, DOI: 10.1007/978-3-319-07350-7_31, © Springer International Publishing Switzerland 2014
30. Sheikhi, S.” An Efficient Method for Detection of Fake Accounts on the Instagram Platform”, International Information and Engineering Technology Association, Page: 429-436, September 2020, https://doi.org/10.18280/ria.340407
31. Callens, A. Morichon, D. Abadie, S. Delpey, M. Liquet, B. “, Using Random forest and Gradient boosting trees to improve wave forecast at a specific location”, Applied Ocean Research Volume 104, November 2020, 10233, https://doi.org/10.1016/j.apor.2020.102339
32. Bardina, M. Vaganov, D. Guleva, V. “Socio-demographic features meet interests: on subscription patterns and attention distribution in online social media”, Procedia Computer Science Volume 178, 2020, Pages 162-171,
33. https://doi.org/10.1016/j.procs.2020.11.018Subsection (Brief Heeding)

Toplam 33 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	Mühendislik
Bölüm	Makaleler
Yazarlar	Enver Yazğılı 0000-0001-8459-3488 Muhammet Baykara 0000-0001-5223-1343
Yayımlanma Tarihi	15 Nisan 2022
Gönderilme Tarihi	10 Mayıs 2021
Kabul Tarihi	1 Şubat 2022
Yayımlandığı Sayı	Yıl 2022 Cilt: 12 Sayı: 2

Kaynak Göster

APA	Yazğılı, E., & Baykara, M. (2022). Türkçe metinlerde makine öğrenmesi yöntemleri ile siber zorbalık tespiti. Gümüşhane Üniversitesi Fen Bilimleri Dergisi, 12(2), 443-453. https://doi.org/10.17714/gumusfenbil.935448

Cited By

The Effectiveness of Machine Learning Algorithms in Extractive Text Summarization: A Comparative Analysis of K-Means, Random Forest, GBM, Logistic Regression, and SVM

Doğu Fen Bilimleri Dergisi

https://doi.org/10.57244/dfbd.1538959

SOSYAL MEDYA PAYLAŞIMLARINDA KARAR MEKANİZMALARININ ÖĞRENME ALGORİTMALARIYLA KARŞILAŞTIRMALI ANALİZİ

Uluslararası Sürdürülebilir Mühendislik ve Teknoloji Dergisi

https://doi.org/10.62301/usmtd.1462808

Öznitelik Seçici Olarak Balina Optimizasyon Algoritması Kullanarak Türkçe Metinlerde Siber Zorbalığın Tespiti

Karadeniz Fen Bilimleri Dergisi

https://doi.org/10.31466/kfbd.1105503

Kapak Resmi İndir

Makale Dosyaları

Tam Metin