Research Article
BibTex RIS Cite

Detection of Cyberbullying in Turkish Texts Using Whale Optimization Algorithm as Feature Selector

Year 2022, Volume: 12 Issue: 2, 676 - 690, 15.12.2022
https://doi.org/10.31466/kfbd.1105503

Abstract

With the increase in the use of social media worldwide, the number of cyberbullying and naturally the number of people who are exposed to cyberbullying is increasing at the same rate. It is important to detect cyberbullying in order not to expose the victims to this situation anymore and to prevent new victimizations. While there are many studies on cyberbullying in the literature, there are not many studies that detect cyberbullying by analyzing sentences in Turkish. The difference of this study from existing studies is to both measure the success of detecting cyberbullying by preprocessing on a data set prepared in Turkish, and to find a method that will not reduce the success by reducing the number of features when working with very large documents. For this reason, Whale Optimization algorithm, which is a method that has not yet been tried in Turkish Cyberbullying datasets as feature selector, was used in this study, and the success of detecting cyberbullying was measured with the K-Nearest Neighbor (KNN), Multinomial Naïve Bayes (MNB) and Random Forest (RF) classifier algorithms by performing preprocessing on the dataset. According to the experiments, the number of features decreased and the accuracy value increased significantly when both preprocessing and feature selection were made with the Whale Optimization Algorithm, while the raw data set was classified with all three classifiers. Especially in the data set where all the other preprocessing except the stemming process took place, the accuracy rate increased from 85% to 91% when the RF Algorithm as a classifier and the Whale Optimization Algorithm as a feature selector were used together. This shows that preprocessing and feature selection with Whale Optimization Algorithm significantly reduces the number of features and increases the success in cyberbullying detection.

References

  • Al-Mamun, A., ve Akhter, S. (2018, Aralık). Social Media Bullying Detection Using Machine Learning On Bangla Text. 10th International Conference on Electrical and Computer Engineering (pp. 20-22). Dhaka, Bangladesh.
  • Altay, E. V., ve Alataş, B. (2018, Aralık). Detection of Cyberbullying in Social Networks Using Machine Learning Methods. International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism, Ankara, Turkey.
  • Bozyiğit, A., Utku, S., ve Nasiboğlu, E. (2018). Sanal zorbalık içeren sosyal medya mesajlarının tespiti. In 3rd International Conference on Computer Sciences and Engineering UBMK. Sarajevo, Bosnia and Herzegovina.
  • Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
  • Canayaz, M., ve Demir, M. (2017, Eylül). Feature selection with the whale optimization algorithm and artificial neural network. In 2017 International Artificial Intelligence and Data Processing Symposium (IDAP) (s. 1-5). Malatya, Türkiye: İnönü Üniversitesi.
  • Çürük, E. (2018). Sosyal Ağlardaki Siber Zorbalığın Yapay Zeka Algoritmaları İle Tespiti Ve Sınıflandırılması. Yüksek Lisans Tezi, Mersin Üniversitesi, Fen Bilimleri Enstitüsü, Mersin.
  • Dadvar, M., Trieschnigg, D., ve de Jong, F. (2014). Experts and Machines against Bullies: A Hybrid Approach to Detect Cyberbullies. In: Advances in Artificial Intelligence (pp. 275–281). Canada.
  • Hof, P. R., ve Van der Gucht, E. (2007). Structure of the cerebral cortex of the humpback whale, Megaptera novaeangliae (Cetacea, Mysticeti, Balaenopteridae). The Anatomical Record: Advances in Integrative Anatomy and Evolutionary Biology: Advances in Integrative Anatomy and Evolutionary Biology, 290(1), 1-31.
  • Hosseinmardi, H., Mattson, S. A., Ibn Rafiq, R., Han, R., Lv, Q., ve Mishra, S. (2015). Detection of cyberbullying ıncidents on the Instagram social network. arXiv: 1503.03909v1 [cs.SI]. Retrieved from https://arxiv.org/abs/1503.03909.
  • Hussain, M. G., Mahmud, T. A., ve Akthar, W. (2018). An Approach to Detect Abusive Bangla Text. 2018 International Conference on Innovation in Engineering and Technology (ICIET) (pp. 27-29). Dhaka, Bangladesh. Karasu, S., ve Altan, A. (2019, Kasım). Recognition model for solar radiation time series based on random forest with feature selection approach. 2019 11th International Conference on Electrical and Electronics Engineering (ELECO). Bursa, Türkiye.
  • Kontostathis, A., Reynolds, K., Garron, A., ve Edwards, L. (2013). Detecting cyberbullying: Query terms and techniques. Proceedings of the 5th Annual ACM Web Science Conference (pp.195–204). New York, NY, USA. Mirjalili, S., ve Lewis, A. (2016). The whale optimization algorithm. Advances in engineering software, 95, 51-67.
  • Özel, S. A., Saraç, E., Akdemir, S., ve Aksu, H. (2017). Detection of cyberbullying on social media messages in Turkish, 2017 International Conference on Computer Science and Engineering (UBMK) (pp. 366–370). Antalya, Turkey.
  • Prabhakar, E., Santhosh, M., Krishnan, A. H., Kumar, T., ve Sudhakar, R. (2019). Sentiment analysis of US airline twitter data using new adaboost approach. International Journal of Engineering Research & Technology (IJERT), 7(1), 1-6.
  • Qiu, X., Zhang, L., Nagaratnam Suganthan, P., ve Amaratunga, G. A. (2017). Oblique random forest ensemble via least Square estimation for time series forecasting. Information Sciences, 420, 249-262.
  • Ren, J., Lee, S. D., Chen, X., Kao, B., Cheng, R., ve Cheung, D. (2009). Naive Bayes classification of uncertain data. 2009 Ninth IEEE International Conference on Data Mining. Miami, Florida.
  • Sintaha, M., ve Mostakim, M. (2018, Aralık). An Empirical Study and Analysis of the Machine Learning Algorithms Used in Detecting Cyberbullying in Social Media. 21st International Conference of Computer and Information Technology (ICCIT). Dhaka, Bangladesh: United International University.
  • Stehman, Stephen V. (1997). Selecting and interpreting measures of thematic classification accuracy. Remote Sensing of Environment. 62(1), 77–89.
  • Taunk, K., De, S., Verma, S., ve Swetapadma, A. (2019). A brief review of nearest neighbor algorithm for learning and classification. 2019 International Conference on Intelligent Computing and Control Systems (ICCS). Secunderabad , India.
  • URL-1: https://recrodigital.com/dijital-2021-raporunda-turkiye-ve-dunyada-internet-ve-sosyal-medya-kullanimi-karsilastirmasi-ocak-2021/, (Erişim Tarihi: 1 Nisan 2022).
  • URL-2: https://github.com/gulsan-celep/Siber-Zorbalik-ile-Model-Egitimi, (Erişim Tarihi: 1 Nisan 2022).
  • URL-3: https://github.com/otuncelli/turkish-stemmer-python, (Erişim Tarihi: 1 Nisan 2022).
  • Yazğılı, E., ve Baykara, M. (2022). Türkçe metinlerde makine öğrenmesi yöntemleri ile siber zorbalık tespiti. Gümüşhane Üniversitesi Fen Bilimleri Dergisi, 12 (2), 443-453.

Öznitelik Seçici Olarak Balina Optimizasyon Algoritması Kullanarak Türkçe Metinlerde Siber Zorbalığın Tespiti

Year 2022, Volume: 12 Issue: 2, 676 - 690, 15.12.2022
https://doi.org/10.31466/kfbd.1105503

Abstract

Dünya genelinde sosyal medya kullanımının artması ile siber zorbalığın ve doğal olarak siber zorbalığa maruz kalan kişilerin sayısı da aynı oranda artmaktadır. Mağdurların daha fazla bu duruma maruz kalmaması, aynı zamanda yeni mağduriyetlerin de oluşmaması açısından siber zorbalığın tespiti önem arz etmektedir. Literatürde siber zorbalıkla ilgili birçok çalışma bulunmakta iken, Türkçe dilindeki cümleleri analiz ederek siber zorbalığı tespit eden çok fazla çalışmaya rastlanmamıştır. Bu çalışmanın mevcut çalışmalardan farkı, Türkçe hazırlanmış bir veri seti üzerinde hem önişlem yaparak siber zorbalığın tespitinin başarısını ölçmek hem de çok büyük dokümanlarla çalışıldığında öznitelik sayısını düşürerek başarıyı düşürmeyecek bir yöntem bulmaktır. Bu sebeple öznitelik seçici olarak henüz Türkçe Siber zorbalık veri setlerinde denenmemiş bir yöntem olan Balina Optimizasyon algoritması bu çalışmada kullanılmış olup veri setine önişlemler gerçekleştirilerek K-En Yakın Komşu (KNN), Çok Terimli Naïve Bayes (MNB) ve Rastgele Orman (RF) sınıflandırıcı algoritmaları ile siber zorbalığın tespitinin başarısı ölçülmüştür. Yapılan deneylere göre, her üç sınıflandırıcı ile ham veri setine sınıflandırma işlemi gerçekleştirilirken hem önişlem yapıldığında hem de Balina Optimizasyon Algoritması ile öznitelik seçimi yapıldığında öznitelik sayısı azalmış olup doğruluk değeri büyük oranda artmıştır. Özellikle kök alma işlemi hariç diğer tüm önişlemlerin gerçekleştiği veri setinde sınıflandırıcı olarak RF Algoritması ile öznitelik seçici olarak Balina Optimizasyon Algoritması birlikte kullanıldığında doğruluk oranı %85’ten %91’e yükselmiştir. Bu da gösteriyor ki, önişlem yapma ve Balina Optimizasyon Algoritması ile öznitelik seçimi nitelik sayısını da önemli ölçüde azaltarak siber zorbalık tespitindeki başarıyı arttırmaktadır.

References

  • Al-Mamun, A., ve Akhter, S. (2018, Aralık). Social Media Bullying Detection Using Machine Learning On Bangla Text. 10th International Conference on Electrical and Computer Engineering (pp. 20-22). Dhaka, Bangladesh.
  • Altay, E. V., ve Alataş, B. (2018, Aralık). Detection of Cyberbullying in Social Networks Using Machine Learning Methods. International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism, Ankara, Turkey.
  • Bozyiğit, A., Utku, S., ve Nasiboğlu, E. (2018). Sanal zorbalık içeren sosyal medya mesajlarının tespiti. In 3rd International Conference on Computer Sciences and Engineering UBMK. Sarajevo, Bosnia and Herzegovina.
  • Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
  • Canayaz, M., ve Demir, M. (2017, Eylül). Feature selection with the whale optimization algorithm and artificial neural network. In 2017 International Artificial Intelligence and Data Processing Symposium (IDAP) (s. 1-5). Malatya, Türkiye: İnönü Üniversitesi.
  • Çürük, E. (2018). Sosyal Ağlardaki Siber Zorbalığın Yapay Zeka Algoritmaları İle Tespiti Ve Sınıflandırılması. Yüksek Lisans Tezi, Mersin Üniversitesi, Fen Bilimleri Enstitüsü, Mersin.
  • Dadvar, M., Trieschnigg, D., ve de Jong, F. (2014). Experts and Machines against Bullies: A Hybrid Approach to Detect Cyberbullies. In: Advances in Artificial Intelligence (pp. 275–281). Canada.
  • Hof, P. R., ve Van der Gucht, E. (2007). Structure of the cerebral cortex of the humpback whale, Megaptera novaeangliae (Cetacea, Mysticeti, Balaenopteridae). The Anatomical Record: Advances in Integrative Anatomy and Evolutionary Biology: Advances in Integrative Anatomy and Evolutionary Biology, 290(1), 1-31.
  • Hosseinmardi, H., Mattson, S. A., Ibn Rafiq, R., Han, R., Lv, Q., ve Mishra, S. (2015). Detection of cyberbullying ıncidents on the Instagram social network. arXiv: 1503.03909v1 [cs.SI]. Retrieved from https://arxiv.org/abs/1503.03909.
  • Hussain, M. G., Mahmud, T. A., ve Akthar, W. (2018). An Approach to Detect Abusive Bangla Text. 2018 International Conference on Innovation in Engineering and Technology (ICIET) (pp. 27-29). Dhaka, Bangladesh. Karasu, S., ve Altan, A. (2019, Kasım). Recognition model for solar radiation time series based on random forest with feature selection approach. 2019 11th International Conference on Electrical and Electronics Engineering (ELECO). Bursa, Türkiye.
  • Kontostathis, A., Reynolds, K., Garron, A., ve Edwards, L. (2013). Detecting cyberbullying: Query terms and techniques. Proceedings of the 5th Annual ACM Web Science Conference (pp.195–204). New York, NY, USA. Mirjalili, S., ve Lewis, A. (2016). The whale optimization algorithm. Advances in engineering software, 95, 51-67.
  • Özel, S. A., Saraç, E., Akdemir, S., ve Aksu, H. (2017). Detection of cyberbullying on social media messages in Turkish, 2017 International Conference on Computer Science and Engineering (UBMK) (pp. 366–370). Antalya, Turkey.
  • Prabhakar, E., Santhosh, M., Krishnan, A. H., Kumar, T., ve Sudhakar, R. (2019). Sentiment analysis of US airline twitter data using new adaboost approach. International Journal of Engineering Research & Technology (IJERT), 7(1), 1-6.
  • Qiu, X., Zhang, L., Nagaratnam Suganthan, P., ve Amaratunga, G. A. (2017). Oblique random forest ensemble via least Square estimation for time series forecasting. Information Sciences, 420, 249-262.
  • Ren, J., Lee, S. D., Chen, X., Kao, B., Cheng, R., ve Cheung, D. (2009). Naive Bayes classification of uncertain data. 2009 Ninth IEEE International Conference on Data Mining. Miami, Florida.
  • Sintaha, M., ve Mostakim, M. (2018, Aralık). An Empirical Study and Analysis of the Machine Learning Algorithms Used in Detecting Cyberbullying in Social Media. 21st International Conference of Computer and Information Technology (ICCIT). Dhaka, Bangladesh: United International University.
  • Stehman, Stephen V. (1997). Selecting and interpreting measures of thematic classification accuracy. Remote Sensing of Environment. 62(1), 77–89.
  • Taunk, K., De, S., Verma, S., ve Swetapadma, A. (2019). A brief review of nearest neighbor algorithm for learning and classification. 2019 International Conference on Intelligent Computing and Control Systems (ICCS). Secunderabad , India.
  • URL-1: https://recrodigital.com/dijital-2021-raporunda-turkiye-ve-dunyada-internet-ve-sosyal-medya-kullanimi-karsilastirmasi-ocak-2021/, (Erişim Tarihi: 1 Nisan 2022).
  • URL-2: https://github.com/gulsan-celep/Siber-Zorbalik-ile-Model-Egitimi, (Erişim Tarihi: 1 Nisan 2022).
  • URL-3: https://github.com/otuncelli/turkish-stemmer-python, (Erişim Tarihi: 1 Nisan 2022).
  • Yazğılı, E., ve Baykara, M. (2022). Türkçe metinlerde makine öğrenmesi yöntemleri ile siber zorbalık tespiti. Gümüşhane Üniversitesi Fen Bilimleri Dergisi, 12 (2), 443-453.
There are 22 citations in total.

Details

Primary Language Turkish
Subjects Engineering
Journal Section Articles
Authors

Deniz Furkan Kanbak 0000-0002-7125-9103

Mümine Kaya Keleş 0000-0001-8414-1713

Publication Date December 15, 2022
Published in Issue Year 2022 Volume: 12 Issue: 2

Cite

APA Kanbak, D. F., & Kaya Keleş, M. (2022). Öznitelik Seçici Olarak Balina Optimizasyon Algoritması Kullanarak Türkçe Metinlerde Siber Zorbalığın Tespiti. Karadeniz Fen Bilimleri Dergisi, 12(2), 676-690. https://doi.org/10.31466/kfbd.1105503