Research Article
BibTex RIS Cite

Grid Aramayla Optimize Edilmiş Bayes Lojistik Regresyon Algoritmasının Türkçe Mikro Blog Verilerinde Sanal Zorbalık Tespitinde Kullanılması

Year 2019, Volume: 7 Issue: 3, 355 - 361, 28.09.2019
https://doi.org/10.21541/apjes.496018

Abstract

İnternet kullanıcıları ve sosyal medya platformları arasında büyük bir etkileşim vardır. Bu etkileşimin sonucu olarak ortaya çıkan devasa boyutlardaki kullanıcı verileri birçok yönden incelenmeye değerdir. Kullanıcı verilerini baz alarak ortaya çıkan araştırma alanlarından birisi de önemli güvenlik problemlerinden biri olan siber zorbalıktır. Bu sorun, siber suçların kaynağı olarak kabul edildiğinden, mikro-blog metinleri üzerinden siber zorbalık saldırılarını/kaynaklarını tespit etmeyi hedefleyen bir sistemin tasarımı önemli bir konudur. Bu alandaki akademik çalışmaların birçoğu İngilizce dilinde yazılmış metinleri ele almaktadır. Bu çalışmanın özgünlüğü Türkçe metinlerde yer alan sanal zorbalık öğelerini en doğru şekilde tespit edebiliyor olmasıdır. Bu amaçla, Twitter’dan toplanan kullanıcı twitleri üzerinde parametreleri Grid Arama Algoritması ile belirlenen, Bayes Lojistik Regresyon denetimli öğrenme algoritması kullanılmıştır. Metin verilerinin makine öğrenmesi algoritmaları için yüksek boyutlu bir eğitim alanı oluşturması sebebi ile Ki-Kare özellik seçim stratejisi kullanılarak en belirleyici özelliklere karar verilmiştir. Sonuç olarak, çalışmamız özellik sayısının minimum hale getirilmiş versiyonu ile, 0.925'lik bir F-ölçüm değeri üretmiştir. Önerilen yöntemimizin sonuçları literatürde sıkça kullanılan makine öğrenme yöntemleri ile karşılaştırılmış ve ilgili bölümlerde sonuçları paylaşılmıştır.

References

  • [1] M. A. Al-garadi, K. D. Varathan, and S. D. Ravana, “Cybercrime detection in online communications: The experimental case of cyberbullying detection in the Twitter network”, Computers in Human Behavior, vol. 63, pp. 433–443, 2016.
  • [2] N. Tahmasbi and A. Fuchberger, “Challenges and future directions of automated cyberbullying detection”, in Twenty-fourth Americas Conference on Information Systems, New Orleans, USA, (2018).
  • [3] M. Arntfield, “Toward a cybervictimology: Cyberbullying, routine activities theory, and the anti-sociality of social media”, Canadian Journal of Communication, vol. 40, pp. 371-388, 2015.
  • [4] C. Salmivalli, “Bullying and the peer group: A review”, Aggression and Violent Behavior, vol. 15, pp. 112-120, 2010.
  • [5] R. M. Kowalski, G. W. Giumetti, A. N. Schroeder, and M. R. Lattanner, “Bullying in the digital age: A critical review and meta-analysis of cyberbullying research among youth”, Psychological Bulletin, vol. 140, no. 4, pp. 1073-1137, 2014.
  • [6] E. Menesini et al., “Cyberbullying definition among adolescents: A comparison across six european countries”, Cyberpsychology, Behavior, and Social Networking, vol. 15, no. 9, pp. 455–463, 2012.
  • [7] K. Dinakar, B. Jones, C. Havasi, H. Lieberman, and R. Picard, “Common sense reasoning for detection, prevention, and mitigation of cyberbullying”, ACM Transactions on Interactive Intelligent Systems, vol. 2, no. 3, pp. 1–30, 2012.
  • [8] S. Nadali, M. A. A. Murad, N. M. Sharef, A. Mustapha, and S. Shojaee, “A Review of cyberbullying detection . An overview”, in 2013 13th International Conference on Intelligent Systems Design and Applications (ISDA), Kuala Lumpur, Malaysia, 325-330, 2013.
  • [9] C. Van Hee, E. Lefever, B. Verhoeven, J. Mennes, B. Desmet, G. De Pauw, and V. Hoste, “Automatic detection and prevention of cyberbullying”, in: International Conference on Human and Social Analytics (HUSO 2015), Julians, Malta, 13-18, Oct. 2015.
  • [10] “Ask and Answer”, ASKfm. [Online]. Available: https://ask.fm/. [Accessed: 06-Dec-2018].
  • [11] “Featured Content on Myspace”, Myspace. [Online]. Available: https://myspace.com/discover/featured. [Accessed: 09-Dec-2018].
  • [12] B. S. Nandhini and J. I. Sheeba, “Online social network bullying detection using intelligence techniques”, Procedia Computer Science, vol. 45, pp. 485–492, 2015.
  • [13] R. I. Rafiq, H. Hosseinmardi, S.A. Mattson, R. Han, Q. Lv, and S. Mishra, “Analysis and detection of labeled cyberbullying instances in Vine, a video-based social network”, Soc. Netw. Anal. Min., vol. 6, no.88, pp. 87-103, 2016.
  • [14] K., A. Sudhir, “A predictive model to detect online cyberbullying”, PhD Thesis, Auckland University of Technology, 2015.
  • [15] Q. Huang, V. K. Singh, and P. K. Atrey, “Cyberbullying detection using social and textual analysis”, in Proceedings of the 3rd International Workshop on Socially-Aware Multimedia , Orlando, Florida, USA, 3-6, 2014.
  • [16] A. Bozyiğit, S. Utku, and E. Nasiboğlu, “Sanal zorbalık içeren sosyal medya mesajlarının tespiti”,presented at the 2018 3rd International Conference on Computer Sciences and Engineering UBMK’2018, Sarajevo-Bosnia, Bosnia Herzegovina, September 2018.
  • [17] S. A. Ozel, E. Sarac, S. Akdemir, and H. Aksu, “Detection of cyberbullying on social media messages in Turkish”, in 2nd International Conference on Computer Science and Engineering UBMK’2017, Antalya, Turkey, September , 366–370, 2017.
  • [18] F. P. Shah and V. Patel, “A review on feature selection and feature extraction for text classification”, in 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), Chennai, India, March 2016.
  • [19] S. D. Sarkar, S. Goswami, A. Agarwal, and J. Aktar, “A novel feature selection technique for text classification using Naive Bayes”, International scholarly research notices, 2014.
  • [20] P. Kumbhar and M. Mali, “A survey on feature selection techniques and classification algorithms for efficient text classification”, International Journal of Science and Research (IJSR), vol. 5, no.5, pp. 1267-1275, 2016.
  • [21] S. George K, and S. Joseph, “Text Classification by Augmenting Bag of Words (BOW) Representation with Co-occurrence Feature”, IOSR Journal of Computer Engineering, vol. 16, no. 1, pp. 34–38, 2014.
  • [22] A. McCallum and K. Nigam, “A comparison of event models for naive bayes text classification”,in Workshop On Learning For Text Categorization, July 1998, pp. 41-48.
  • [23] A. Soualhi, K. Medjaher, and N. Zerhouni, ‘Bearing Health Monitoring Based on Hilbert–Huang Transform, Support Vector Machine, and Regression’, IEEE Transactions on Instrumentation and Measurement, vol. 64, no. 1, pp. 52–62, Jan. 2015.
  • [24] H. Zhang, A. C. Berg, M. Maire, and J. Malik, “SVM KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition”, in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), 2006, vol. 2, pp. 2126–2136.
  • [25] D. Kılınç, A. Özçift, F. Bozyigit, P. Yıldırım, F. Yücalar, and E. Borandag, “TTC-3600: A new benchmark dataset for Turkish text categorization”, Journal of Information Science, vol. 43, no:2, pp. 174-185, 2017.
  • [26] P. Yildirim and D. Birant, “The relative performance of deep learning and ensemble learning for textile object classification”, 2018 3rd International Conference on Computer Science and Engineering (UBMK), Sarajevo, Bosnia and Herzegovina, September 2018, pp. 22-26.
  • [27] A. Genkin, D. Lewis, and D. Madigan, “Large-scale Bayesian Logistic Regression for text categorization”, Journal Technometrics, pp. 291–304, 2012.
  • [28] R. A Thisted. “Elements of statistical computing”, London: Chapman and Hall, 1988.

Application of Grid Search Parameter Optimized Bayesian Logistic Regression Algorithm to Detect Cyberbullying in Turkish Microblog Data

Year 2019, Volume: 7 Issue: 3, 355 - 361, 28.09.2019
https://doi.org/10.21541/apjes.496018

Abstract

There is a huge interaction between users of various social media platforms. This communication produces enormous amount of user data worth to be analyzed from numerous aspects. One of the research area emerging from the user data is a major security issue known as cyberbullying. Since this problem has been recognized as the source of cybercrimes, design of a system to detect cyberbullying attacks/sources through the micro-blog texts is evident. Most of the academic search of this topic has been conducted in English language. The originality of this paper is that we develop an accurate cyberbullying detection system for Turkish language. We used data from Twitter to develop a supervised machine learning model on top of Bayesian Logistic Regression whose parameters are tuned with the use of grid-search algorithm. Since the text data produces a high dimensional training space for machine learning algorithms, we also used Chi-Squared (CH2) feature selection strategy to obtain best subset of features. The optimized version of the proposed algorithm on top of reduced feature dimension has produced an f-measure value of 0.925. Finally, we also compared the results of the proposed algorithm with the frequently used machine learning methods from literature and we provided the corresponding results in related sections.

References

  • [1] M. A. Al-garadi, K. D. Varathan, and S. D. Ravana, “Cybercrime detection in online communications: The experimental case of cyberbullying detection in the Twitter network”, Computers in Human Behavior, vol. 63, pp. 433–443, 2016.
  • [2] N. Tahmasbi and A. Fuchberger, “Challenges and future directions of automated cyberbullying detection”, in Twenty-fourth Americas Conference on Information Systems, New Orleans, USA, (2018).
  • [3] M. Arntfield, “Toward a cybervictimology: Cyberbullying, routine activities theory, and the anti-sociality of social media”, Canadian Journal of Communication, vol. 40, pp. 371-388, 2015.
  • [4] C. Salmivalli, “Bullying and the peer group: A review”, Aggression and Violent Behavior, vol. 15, pp. 112-120, 2010.
  • [5] R. M. Kowalski, G. W. Giumetti, A. N. Schroeder, and M. R. Lattanner, “Bullying in the digital age: A critical review and meta-analysis of cyberbullying research among youth”, Psychological Bulletin, vol. 140, no. 4, pp. 1073-1137, 2014.
  • [6] E. Menesini et al., “Cyberbullying definition among adolescents: A comparison across six european countries”, Cyberpsychology, Behavior, and Social Networking, vol. 15, no. 9, pp. 455–463, 2012.
  • [7] K. Dinakar, B. Jones, C. Havasi, H. Lieberman, and R. Picard, “Common sense reasoning for detection, prevention, and mitigation of cyberbullying”, ACM Transactions on Interactive Intelligent Systems, vol. 2, no. 3, pp. 1–30, 2012.
  • [8] S. Nadali, M. A. A. Murad, N. M. Sharef, A. Mustapha, and S. Shojaee, “A Review of cyberbullying detection . An overview”, in 2013 13th International Conference on Intelligent Systems Design and Applications (ISDA), Kuala Lumpur, Malaysia, 325-330, 2013.
  • [9] C. Van Hee, E. Lefever, B. Verhoeven, J. Mennes, B. Desmet, G. De Pauw, and V. Hoste, “Automatic detection and prevention of cyberbullying”, in: International Conference on Human and Social Analytics (HUSO 2015), Julians, Malta, 13-18, Oct. 2015.
  • [10] “Ask and Answer”, ASKfm. [Online]. Available: https://ask.fm/. [Accessed: 06-Dec-2018].
  • [11] “Featured Content on Myspace”, Myspace. [Online]. Available: https://myspace.com/discover/featured. [Accessed: 09-Dec-2018].
  • [12] B. S. Nandhini and J. I. Sheeba, “Online social network bullying detection using intelligence techniques”, Procedia Computer Science, vol. 45, pp. 485–492, 2015.
  • [13] R. I. Rafiq, H. Hosseinmardi, S.A. Mattson, R. Han, Q. Lv, and S. Mishra, “Analysis and detection of labeled cyberbullying instances in Vine, a video-based social network”, Soc. Netw. Anal. Min., vol. 6, no.88, pp. 87-103, 2016.
  • [14] K., A. Sudhir, “A predictive model to detect online cyberbullying”, PhD Thesis, Auckland University of Technology, 2015.
  • [15] Q. Huang, V. K. Singh, and P. K. Atrey, “Cyberbullying detection using social and textual analysis”, in Proceedings of the 3rd International Workshop on Socially-Aware Multimedia , Orlando, Florida, USA, 3-6, 2014.
  • [16] A. Bozyiğit, S. Utku, and E. Nasiboğlu, “Sanal zorbalık içeren sosyal medya mesajlarının tespiti”,presented at the 2018 3rd International Conference on Computer Sciences and Engineering UBMK’2018, Sarajevo-Bosnia, Bosnia Herzegovina, September 2018.
  • [17] S. A. Ozel, E. Sarac, S. Akdemir, and H. Aksu, “Detection of cyberbullying on social media messages in Turkish”, in 2nd International Conference on Computer Science and Engineering UBMK’2017, Antalya, Turkey, September , 366–370, 2017.
  • [18] F. P. Shah and V. Patel, “A review on feature selection and feature extraction for text classification”, in 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), Chennai, India, March 2016.
  • [19] S. D. Sarkar, S. Goswami, A. Agarwal, and J. Aktar, “A novel feature selection technique for text classification using Naive Bayes”, International scholarly research notices, 2014.
  • [20] P. Kumbhar and M. Mali, “A survey on feature selection techniques and classification algorithms for efficient text classification”, International Journal of Science and Research (IJSR), vol. 5, no.5, pp. 1267-1275, 2016.
  • [21] S. George K, and S. Joseph, “Text Classification by Augmenting Bag of Words (BOW) Representation with Co-occurrence Feature”, IOSR Journal of Computer Engineering, vol. 16, no. 1, pp. 34–38, 2014.
  • [22] A. McCallum and K. Nigam, “A comparison of event models for naive bayes text classification”,in Workshop On Learning For Text Categorization, July 1998, pp. 41-48.
  • [23] A. Soualhi, K. Medjaher, and N. Zerhouni, ‘Bearing Health Monitoring Based on Hilbert–Huang Transform, Support Vector Machine, and Regression’, IEEE Transactions on Instrumentation and Measurement, vol. 64, no. 1, pp. 52–62, Jan. 2015.
  • [24] H. Zhang, A. C. Berg, M. Maire, and J. Malik, “SVM KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition”, in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), 2006, vol. 2, pp. 2126–2136.
  • [25] D. Kılınç, A. Özçift, F. Bozyigit, P. Yıldırım, F. Yücalar, and E. Borandag, “TTC-3600: A new benchmark dataset for Turkish text categorization”, Journal of Information Science, vol. 43, no:2, pp. 174-185, 2017.
  • [26] P. Yildirim and D. Birant, “The relative performance of deep learning and ensemble learning for textile object classification”, 2018 3rd International Conference on Computer Science and Engineering (UBMK), Sarajevo, Bosnia and Herzegovina, September 2018, pp. 22-26.
  • [27] A. Genkin, D. Lewis, and D. Madigan, “Large-scale Bayesian Logistic Regression for text categorization”, Journal Technometrics, pp. 291–304, 2012.
  • [28] R. A Thisted. “Elements of statistical computing”, London: Chapman and Hall, 1988.
There are 28 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Articles
Authors

Akın Özçift This is me 0000-0003-2840-1917

Deniz Kılınç This is me 0000-0002-2336-8831

Fatma Bozyiğit 0000-0002-5898-7464

Publication Date September 28, 2019
Submission Date December 12, 2018
Published in Issue Year 2019 Volume: 7 Issue: 3

Cite

IEEE A. Özçift, D. Kılınç, and F. Bozyiğit, “Application of Grid Search Parameter Optimized Bayesian Logistic Regression Algorithm to Detect Cyberbullying in Turkish Microblog Data”, APJES, vol. 7, no. 3, pp. 355–361, 2019, doi: 10.21541/apjes.496018.

Cited By