Year 2019, Volume 7 , Issue 3, Pages 355 - 361 2019-09-28

Application of Grid Search Parameter Optimized Bayesian Logistic Regression Algorithm to Detect Cyberbullying in Turkish Microblog Data
Grid Aramayla Optimize Edilmiş Bayes Lojistik Regresyon Algoritmasının Türkçe Mikro Blog Verilerinde Sanal Zorbalık Tespitinde Kullanılması

Akın Özçift [1] , Deniz Kılınç [2] , Fatma Bozyiğit [3]


There is a huge interaction between users of various social media platforms. This communication produces enormous amount of user data worth to be analyzed from numerous aspects. One of the research area emerging from the user data is a major security issue known as cyberbullying. Since this problem has been recognized as the source of cybercrimes, design of a system to detect cyberbullying attacks/sources through the micro-blog texts is evident. Most of the academic search of this topic has been conducted in English language. The originality of this paper is that we develop an accurate cyberbullying detection system for Turkish language. We used data from Twitter to develop a supervised machine learning model on top of Bayesian Logistic Regression whose parameters are tuned with the use of grid-search algorithm. Since the text data produces a high dimensional training space for machine learning algorithms, we also used Chi-Squared (CH2) feature selection strategy to obtain best subset of features. The optimized version of the proposed algorithm on top of reduced feature dimension has produced an f-measure value of 0.925. Finally, we also compared the results of the proposed algorithm with the frequently used machine learning methods from literature and we provided the corresponding results in related sections.

İnternet kullanıcıları ve sosyal medya platformları arasında büyük bir etkileşim vardır. Bu etkileşimin sonucu olarak ortaya çıkan devasa boyutlardaki kullanıcı verileri birçok yönden incelenmeye değerdir. Kullanıcı verilerini baz alarak ortaya çıkan araştırma alanlarından birisi de önemli güvenlik problemlerinden biri olan siber zorbalıktır. Bu sorun, siber suçların kaynağı olarak kabul edildiğinden, mikro-blog metinleri üzerinden siber zorbalık saldırılarını/kaynaklarını tespit etmeyi hedefleyen bir sistemin tasarımı önemli bir konudur. Bu alandaki akademik çalışmaların birçoğu İngilizce dilinde yazılmış metinleri ele almaktadır. Bu çalışmanın özgünlüğü Türkçe metinlerde yer alan sanal zorbalık öğelerini en doğru şekilde tespit edebiliyor olmasıdır. Bu amaçla, Twitter’dan toplanan kullanıcı twitleri üzerinde parametreleri Grid Arama Algoritması ile belirlenen, Bayes Lojistik Regresyon denetimli öğrenme algoritması kullanılmıştır. Metin verilerinin makine öğrenmesi algoritmaları için yüksek boyutlu bir eğitim alanı oluşturması sebebi ile Ki-Kare özellik seçim stratejisi kullanılarak en belirleyici özelliklere karar verilmiştir. Sonuç olarak, çalışmamız özellik sayısının minimum hale getirilmiş versiyonu ile, 0.925'lik bir F-ölçüm değeri üretmiştir. Önerilen yöntemimizin sonuçları literatürde sıkça kullanılan makine öğrenme yöntemleri ile karşılaştırılmış ve ilgili bölümlerde sonuçları paylaşılmıştır.

  • [1] M. A. Al-garadi, K. D. Varathan, and S. D. Ravana, “Cybercrime detection in online communications: The experimental case of cyberbullying detection in the Twitter network”, Computers in Human Behavior, vol. 63, pp. 433–443, 2016.
  • [2] N. Tahmasbi and A. Fuchberger, “Challenges and future directions of automated cyberbullying detection”, in Twenty-fourth Americas Conference on Information Systems, New Orleans, USA, (2018).
  • [3] M. Arntfield, “Toward a cybervictimology: Cyberbullying, routine activities theory, and the anti-sociality of social media”, Canadian Journal of Communication, vol. 40, pp. 371-388, 2015.
  • [4] C. Salmivalli, “Bullying and the peer group: A review”, Aggression and Violent Behavior, vol. 15, pp. 112-120, 2010.
  • [5] R. M. Kowalski, G. W. Giumetti, A. N. Schroeder, and M. R. Lattanner, “Bullying in the digital age: A critical review and meta-analysis of cyberbullying research among youth”, Psychological Bulletin, vol. 140, no. 4, pp. 1073-1137, 2014.
  • [6] E. Menesini et al., “Cyberbullying definition among adolescents: A comparison across six european countries”, Cyberpsychology, Behavior, and Social Networking, vol. 15, no. 9, pp. 455–463, 2012.
  • [7] K. Dinakar, B. Jones, C. Havasi, H. Lieberman, and R. Picard, “Common sense reasoning for detection, prevention, and mitigation of cyberbullying”, ACM Transactions on Interactive Intelligent Systems, vol. 2, no. 3, pp. 1–30, 2012.
  • [8] S. Nadali, M. A. A. Murad, N. M. Sharef, A. Mustapha, and S. Shojaee, “A Review of cyberbullying detection . An overview”, in 2013 13th International Conference on Intelligent Systems Design and Applications (ISDA), Kuala Lumpur, Malaysia, 325-330, 2013.
  • [9] C. Van Hee, E. Lefever, B. Verhoeven, J. Mennes, B. Desmet, G. De Pauw, and V. Hoste, “Automatic detection and prevention of cyberbullying”, in: International Conference on Human and Social Analytics (HUSO 2015), Julians, Malta, 13-18, Oct. 2015.
  • [10] “Ask and Answer”, ASKfm. [Online]. Available: https://ask.fm/. [Accessed: 06-Dec-2018].
  • [11] “Featured Content on Myspace”, Myspace. [Online]. Available: https://myspace.com/discover/featured. [Accessed: 09-Dec-2018].
  • [12] B. S. Nandhini and J. I. Sheeba, “Online social network bullying detection using intelligence techniques”, Procedia Computer Science, vol. 45, pp. 485–492, 2015.
  • [13] R. I. Rafiq, H. Hosseinmardi, S.A. Mattson, R. Han, Q. Lv, and S. Mishra, “Analysis and detection of labeled cyberbullying instances in Vine, a video-based social network”, Soc. Netw. Anal. Min., vol. 6, no.88, pp. 87-103, 2016.
  • [14] K., A. Sudhir, “A predictive model to detect online cyberbullying”, PhD Thesis, Auckland University of Technology, 2015.
  • [15] Q. Huang, V. K. Singh, and P. K. Atrey, “Cyberbullying detection using social and textual analysis”, in Proceedings of the 3rd International Workshop on Socially-Aware Multimedia , Orlando, Florida, USA, 3-6, 2014.
  • [16] A. Bozyiğit, S. Utku, and E. Nasiboğlu, “Sanal zorbalık içeren sosyal medya mesajlarının tespiti”,presented at the 2018 3rd International Conference on Computer Sciences and Engineering UBMK’2018, Sarajevo-Bosnia, Bosnia Herzegovina, September 2018.
  • [17] S. A. Ozel, E. Sarac, S. Akdemir, and H. Aksu, “Detection of cyberbullying on social media messages in Turkish”, in 2nd International Conference on Computer Science and Engineering UBMK’2017, Antalya, Turkey, September , 366–370, 2017.
  • [18] F. P. Shah and V. Patel, “A review on feature selection and feature extraction for text classification”, in 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), Chennai, India, March 2016.
  • [19] S. D. Sarkar, S. Goswami, A. Agarwal, and J. Aktar, “A novel feature selection technique for text classification using Naive Bayes”, International scholarly research notices, 2014.
  • [20] P. Kumbhar and M. Mali, “A survey on feature selection techniques and classification algorithms for efficient text classification”, International Journal of Science and Research (IJSR), vol. 5, no.5, pp. 1267-1275, 2016.
  • [21] S. George K, and S. Joseph, “Text Classification by Augmenting Bag of Words (BOW) Representation with Co-occurrence Feature”, IOSR Journal of Computer Engineering, vol. 16, no. 1, pp. 34–38, 2014.
  • [22] A. McCallum and K. Nigam, “A comparison of event models for naive bayes text classification”,in Workshop On Learning For Text Categorization, July 1998, pp. 41-48.
  • [23] A. Soualhi, K. Medjaher, and N. Zerhouni, ‘Bearing Health Monitoring Based on Hilbert–Huang Transform, Support Vector Machine, and Regression’, IEEE Transactions on Instrumentation and Measurement, vol. 64, no. 1, pp. 52–62, Jan. 2015.
  • [24] H. Zhang, A. C. Berg, M. Maire, and J. Malik, “SVM KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition”, in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), 2006, vol. 2, pp. 2126–2136.
  • [25] D. Kılınç, A. Özçift, F. Bozyigit, P. Yıldırım, F. Yücalar, and E. Borandag, “TTC-3600: A new benchmark dataset for Turkish text categorization”, Journal of Information Science, vol. 43, no:2, pp. 174-185, 2017.
  • [26] P. Yildirim and D. Birant, “The relative performance of deep learning and ensemble learning for textile object classification”, 2018 3rd International Conference on Computer Science and Engineering (UBMK), Sarajevo, Bosnia and Herzegovina, September 2018, pp. 22-26.
  • [27] A. Genkin, D. Lewis, and D. Madigan, “Large-scale Bayesian Logistic Regression for text categorization”, Journal Technometrics, pp. 291–304, 2012.
  • [28] R. A Thisted. “Elements of statistical computing”, London: Chapman and Hall, 1988.
Primary Language en
Subjects Engineering
Journal Section Articles
Authors

Orcid: 0000-0003-2840-1917
Author: Akın Özçift
Institution: Manisa Celal Bayar University
Country: Turkey


Orcid: 0000-0002-2336-8831
Author: Deniz Kılınç
Institution: Manisa Celal Bayar University
Country: Turkey


Orcid: 0000-0002-5898-7464
Author: Fatma Bozyiğit (Primary Author)
Institution: Manisa Celal Bayar University
Country: Turkey


Dates

Publication Date : September 28, 2019

Bibtex @research article { apjes496018, journal = {Akademik Platform Mühendislik ve Fen Bilimleri Dergisi}, issn = {}, eissn = {2147-4575}, address = {}, publisher = {Academic Platform}, year = {2019}, volume = {7}, pages = {355 - 361}, doi = {10.21541/apjes.496018}, title = {Application of Grid Search Parameter Optimized Bayesian Logistic Regression Algorithm to Detect Cyberbullying in Turkish Microblog Data}, key = {cite}, author = {Özçift, Akın and Kılınç, Deniz and Bozyiğit, Fatma} }
APA Özçift, A , Kılınç, D , Bozyiğit, F . (2019). Application of Grid Search Parameter Optimized Bayesian Logistic Regression Algorithm to Detect Cyberbullying in Turkish Microblog Data. Akademik Platform Mühendislik ve Fen Bilimleri Dergisi , 7 (3) , 355-361 . DOI: 10.21541/apjes.496018
MLA Özçift, A , Kılınç, D , Bozyiğit, F . "Application of Grid Search Parameter Optimized Bayesian Logistic Regression Algorithm to Detect Cyberbullying in Turkish Microblog Data". Akademik Platform Mühendislik ve Fen Bilimleri Dergisi 7 (2019 ): 355-361 <https://dergipark.org.tr/en/pub/apjes/issue/44190/496018>
Chicago Özçift, A , Kılınç, D , Bozyiğit, F . "Application of Grid Search Parameter Optimized Bayesian Logistic Regression Algorithm to Detect Cyberbullying in Turkish Microblog Data". Akademik Platform Mühendislik ve Fen Bilimleri Dergisi 7 (2019 ): 355-361
RIS TY - JOUR T1 - Application of Grid Search Parameter Optimized Bayesian Logistic Regression Algorithm to Detect Cyberbullying in Turkish Microblog Data AU - Akın Özçift , Deniz Kılınç , Fatma Bozyiğit Y1 - 2019 PY - 2019 N1 - doi: 10.21541/apjes.496018 DO - 10.21541/apjes.496018 T2 - Akademik Platform Mühendislik ve Fen Bilimleri Dergisi JF - Journal JO - JOR SP - 355 EP - 361 VL - 7 IS - 3 SN - -2147-4575 M3 - doi: 10.21541/apjes.496018 UR - https://doi.org/10.21541/apjes.496018 Y2 - 2019 ER -
EndNote %0 Akademik Platform Mühendislik ve Fen Bilimleri Dergisi Application of Grid Search Parameter Optimized Bayesian Logistic Regression Algorithm to Detect Cyberbullying in Turkish Microblog Data %A Akın Özçift , Deniz Kılınç , Fatma Bozyiğit %T Application of Grid Search Parameter Optimized Bayesian Logistic Regression Algorithm to Detect Cyberbullying in Turkish Microblog Data %D 2019 %J Akademik Platform Mühendislik ve Fen Bilimleri Dergisi %P -2147-4575 %V 7 %N 3 %R doi: 10.21541/apjes.496018 %U 10.21541/apjes.496018
ISNAD Özçift, Akın , Kılınç, Deniz , Bozyiğit, Fatma . "Application of Grid Search Parameter Optimized Bayesian Logistic Regression Algorithm to Detect Cyberbullying in Turkish Microblog Data". Akademik Platform Mühendislik ve Fen Bilimleri Dergisi 7 / 3 (September 2019): 355-361 . https://doi.org/10.21541/apjes.496018
AMA Özçift A , Kılınç D , Bozyiğit F . Application of Grid Search Parameter Optimized Bayesian Logistic Regression Algorithm to Detect Cyberbullying in Turkish Microblog Data. APJES. 2019; 7(3): 355-361.
Vancouver Özçift A , Kılınç D , Bozyiğit F . Application of Grid Search Parameter Optimized Bayesian Logistic Regression Algorithm to Detect Cyberbullying in Turkish Microblog Data. Akademik Platform Mühendislik ve Fen Bilimleri Dergisi. 2019; 7(3): 361-355.