Comparison of SVM and Naïve Bayes Algorithms with InNER enriched to Predict Hate Speech

Isnen Hadi Al Ghozali; Arif Pirman; Indra Indra

doi:10.31202/ecjse.1325078

EN

Comparison of SVM and Naïve Bayes Algorithms with InNER enriched to Predict Hate Speech

Öz

Hate speech is one of the negative sides of social media abuse. Hate speech can be classified into insults, defamation, unpleasant acts, provoking, inciting, and spreading fake news (hoax). The purpose of this study is to compare the SVM and Naïve Bayes methods with feature extraction in the form of Indonesian NER (InNER) for detecting hate speech. To obtain the best model, this study applies five steps: a) data collection; b) data preprocessing; c) feature engineering; d) model development; and e) evaluating and comparing models. In this study, we have collected 7100 tweets as an initial dataset. After manual annotation, this study produced 1681 tweets: 548 insult tweets, 288 blasphemy tweets, 272 provocative tweets, and 573 neutral tweets. This study use two Python libraries that accommodate NER in Indonesian, namely the NLTK library and the Polyglot library. Based on the results of the evaluation of the proposed model, model 5, which develops the SVM algorithm with the NLTK library, is the best model proposed. This model shows an accuracy score of 92.88% with a precision of 0.93, a recall of 0.93, and an F-1 score of 0.92.

Anahtar Kelimeler

Destekleyen Kurum

Universitas Budi Luhur

Kaynakça

[1]. J. Govers, P. Feldman, A. Dant, and P. Patros, “Down the Rabbit Hole: Detecting Online Extremism, Radicalisation, and Politicised Hate Speech,” ACM Comput. Surv., p. 3583067, Feb. 2023, doi: 10.1145/3583067.
[2]. D. Khurana, A. Koli, K. Khatter, and S. Singh, “Natural language processing: state of the art, current trends and challenges,” Multimed. Tools Appl., vol. 82, no. 3, pp. 3713-3744, Jan. 2023, doi: 10.1007/s11042-022-13428-4.
[3]. A. Shvets, P. Fortuna, J. Soler, and L. Wanner, “Targets and Aspects in Social Media Hate Speech,” in Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021), Online: Association for Computational Linguistics, Aug. 2021, pp. 179-190. doi: 10.18653/v1/2021.woah-1.19.
[4]. S. S. Pandey, I. Chhabra, R. Garg, and S. Sahu, “Hate Speech Detection,” Int. J. Adv. Eng. Manag. IJAEM, vol. 5, no. 4, pp. 897–903, 2023, doi: 10.35629/5252-0504897903.
[5]. S. S. Roy, A. Roy, P. Samui, M. Gandomi, and A. H. Gandomi, “Hateful Sentiment Detection in Real-Time Tweets: An LSTM-Based Comparative Approach,” IEEE Trans. Comput. Soc. Syst., pp. 1-10, 2023, doi: 10.1109/TCSS.2023.3260217.
[6]. S. Abarna, J. I. Sheeba, S. Jayasrilakshmi, and S. P. Devaneyan, “Identification of cyber harassment and intention of target users on social media platforms,” Eng. Appl. Artif. Intell., vol. 115, p. 105283, Oct. 2022, doi: 10.1016/j.engappai.2022.105283.
[7]. H. Faris, I. Aljarah, M. Habib, and P. Castillo, “Hate Speech Detection using Word Embedding and Deep Learning in the Arabic Language Context:,” in Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods, Valletta, Malta: SCITEPRESS - Science and Technology Publications, 2020, pp. 453-460. doi: 10.5220/0008954004530460.
[8]. J. Patihullah and E. Winarko, “Hate Speech Detection for Indonesia Tweets Using Word Embedding And Gated Recurrent Unit,” IJCCS Indones. J. Comput. Cybern. Syst., vol. 13, no. 1, p. 43, Jan. 2019, doi: 10.22146/ijccs.40125.

[9]. O. Oriola and E. Kotze, “Evaluating Machine Learning Techniques for Detecting Offensive and Hate Speech in South African Tweets,” IEEE Access, vol. 8, pp. 21496-21509, 2020, doi: 10.1109/ACCESS.2020.2968173.
[10]. A. M. U. D. Khanday, S. T. Rabani, Q. R. Khan, and S. H. Malik, “Detecting twitter hate speech in COVID-19 era using machine learning and ensemble learning techniques,” Int. J. Inf. Manag. Data Insights, vol. 2, no. 2, p. 100120, Nov. 2022, doi: 10.1016/j.jjimei.2022.100120.
[11]. S. E. Viswapriya, A. Gour, and B. G. Chand, “Detecting Hate Speech and Offensive Language on Twitter using Machine Learning,” Int. J. Comput. Sci. Mob. Comput., vol. 10, no. 4, pp. 22-27, Apr. 2021, doi: 10.47760/ijcsmc.2021.v10i04.004.
[12]. D. C. Asogwa, C. I. Chukwuneke, C. C. Ngene, and G. N. Anigbogu, “Hate Speech Classification Using SVM and Naive BAYES.” Mar. 21, 2022. doi: 10.9790/0050-09012734.
[13]. I. Ivan, Y. A. Sari, and P. P. Adikara, “Klasifikasi Hate Speech Berbahasa IndonesiadiTwitterMenggunakan Naive Bayes dan Seleksi Fitur Information Gain dengan Normalisasi Kata,” J. Pengemb. Teknol. Inf. Dan Ilmu Komput., vol. 3, no. 5, pp. 4914-4922, 2019.
[14]. P. Fortuna, J. Soler-Company, and L. Wanner, “How well do hate speech, toxicity, abusive and offensive language classification models generalize across datasets?,” Inf. Process. Manag., vol. 58, no. 3, p. 102524, May 2021, doi: 10.1016/j.ipm.2021.102524.
[15]. C.-C. Wang, M.-Y. Day, and C.-L. Wu, “Political Hate Speech Detection and Lexicon Building: A Study in Taiwan,” IEEE Access, vol. 10, pp. 44337-44346, 2022, doi: 10.1109/ACCESS.2022.3160712.
[16]. J. Camacho-Collados et al., “TweetNLP: Cutting-Edge Natural Language Processing for Social Media.” arXiv, Oct. 25, 2022. doi: 10.48550/arXiv.2206.14774.
[17]. K. Englmeier and J. Mothe, “Application-Oriented Approach for DetectingCyberaggression in Social Media”.
[18]. R. Rianto, A. B. Mutiara, E. P. Wibowo, and P. I. Santosa, “Improving the accuracy of text classification using stemming method, a case of non‑formal Indonesian conversation,” J. Bg Data, vol. 8, no. 26, pp. 1-26, 2021, doi: https://doi.org/10.1186/s40537‑021‑00413‑1.
[19]. A. A. Gultiaev and J. V. Domashova, “Developing a named entity recognition model for text documents in Russian to detect personal data using machine learning methods,” Procedia Comput. Sci., vol. 213, pp. 127-135, 2022, doi: 10.1016/j.procs.2022.11.047.
[20]. B. Evkoski, N. Ljubešić, A. Pelicon, I. Mozetič, and P. Kralj Novak, “Evolution of topics and hate speech in retweet network communities,” Appl. Netw. Sci., vol. 6, no. 1, p. 96, Dec. 2021, doi: 10.1007/s41109-021-00439-7.
[21]. Z. Mansur, N. Omar, and S. Tiun, “Twitter Hate Speech Detection: A Systematic Review of Methods, Taxonomy Analysis, Challenges, and Opportunities,” IEEE Access, vol. 11, pp. 16226-16249, 2023, doi: 10.1109/ACCESS.2023.3239375.
[22]. J. M. Pérez et al., “Assessing the Impact of Contextual Information in Hate Speech Detection,” IEEE Access, vol. 11, pp. 30575–30590, 2023, doi: 10.1109/ACCESS.2023.3258973.
[23]. A. U. R. Khan, M. Khan, and M. B. Khan, “Naïve Multi-label Classification of YouTube Comments Using Comparative Opinion Mining,” Procedia Comput. Sci., vol. 82, pp. 57-64, 2016, doi: 10.1016/j.procs.2016.04.009. [24]. R. Jain, D. Goel, P. Sahu, A. Kumar, and J. P. Singh, “Profiling Hate Speech Spreaders on Twitter,” in Conference and Labs of the Evaluation Forum, Bucharest, Romania, Sep. 2021.
[25]. K. K. Kiilu, “Sentiment Classification for Hate Tweet Detection in Kenya on Twitter Data Using Naïve Bayes Algorithm,” Jomo Kenyatta University of Agriculture and Technology, Juja, 2020. Accessed: Jun. 03, 2023. [Online]. Available: http://ir.jkuat.ac.ke/bitstream/handle/123456789/5521/Project%20formatted.pdf?sequence=1&isAllowed=y
[26]. H. Watanabe, M. Bouazizi, and T. Ohtsuki, “Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions and Perform Hate Speech Detection,” IEEE Access, vol. 6, pp. 13825-13835, 2018, doi: 10.1109/ACCESS.2018.2806394.
[27]. M. P. Geetha and D. Karthika Renuka, “Improving the performance of aspect based sentiment analysis using fine-tuned Bert Base Uncased model,” Int. J. Intell. Netw., vol. 2, pp. 64-69, 2021, doi: 10.1016/j.ijin.2021.06.005.
[28]. L. H. Son, A. Kumar, S. R. Sangwan, A. Arora, A. Nayyar, and M. Abdel-Basset, “Sarcasm Detection Using Soft Attention-Based Bidirectional Long Short-Term Memory Model With Convolution Network,” IEEE Access, vol. 7, pp. 23319-23328, 2019, doi: 10.1109/ACCESS.2019.2899260.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Mühendislik Uygulaması

Bölüm

Araştırma Makalesi

Yazarlar

Isnen Hadi Al Ghozali ^*
0000-0001-8155-6438
Indonesia

Arif Pirman Bu kişi benim
0009-0005-5818-3991
Indonesia

Indra Indra Bu kişi benim
0000-0002-8180-4849
Indonesia

Yayımlanma Tarihi

30 Eylül 2023

Gönderilme Tarihi

10 Temmuz 2023

Kabul Tarihi

25 Eylül 2023

Yayımlandığı Sayı

Yıl 2023 Cilt: 10 Sayı: 3

DOI

https://doi.org/10.31202/ecjse.1325078

IZ

https://izlik.org/JA47XM49XF

Kaynak Göster

RIS / Bibtex

APA

Hadi Al Ghozali, I., Pirman, A., & Indra, I. (2023). Comparison of SVM and Naïve Bayes Algorithms with InNER enriched to Predict Hate Speech. El-Cezeri, 10(3), 600-611. https://doi.org/10.31202/ecjse.1325078

AMA

1.Hadi Al Ghozali I, Pirman A, Indra I. Comparison of SVM and Naïve Bayes Algorithms with InNER enriched to Predict Hate Speech. ECJSE. 2023;10(3):600-611. doi:10.31202/ecjse.1325078

Chicago

Hadi Al Ghozali, Isnen, Arif Pirman, ve Indra Indra. 2023. “Comparison of SVM and Naïve Bayes Algorithms with InNER enriched to Predict Hate Speech”. El-Cezeri 10 (3): 600-611. https://doi.org/10.31202/ecjse.1325078.

EndNote

Hadi Al Ghozali I, Pirman A, Indra I (01 Eylül 2023) Comparison of SVM and Naïve Bayes Algorithms with InNER enriched to Predict Hate Speech. El-Cezeri 10 3 600–611.

IEEE

[1]I. Hadi Al Ghozali, A. Pirman, ve I. Indra, “Comparison of SVM and Naïve Bayes Algorithms with InNER enriched to Predict Hate Speech”, ECJSE, c. 10, sy 3, ss. 600–611, Eyl. 2023, doi: 10.31202/ecjse.1325078.

ISNAD

Hadi Al Ghozali, Isnen - Pirman, Arif - Indra, Indra. “Comparison of SVM and Naïve Bayes Algorithms with InNER enriched to Predict Hate Speech”. El-Cezeri 10/3 (01 Eylül 2023): 600-611. https://doi.org/10.31202/ecjse.1325078.

JAMA

1.Hadi Al Ghozali I, Pirman A, Indra I. Comparison of SVM and Naïve Bayes Algorithms with InNER enriched to Predict Hate Speech. ECJSE. 2023;10:600–611.

MLA

Hadi Al Ghozali, Isnen, vd. “Comparison of SVM and Naïve Bayes Algorithms with InNER enriched to Predict Hate Speech”. El-Cezeri, c. 10, sy 3, Eylül 2023, ss. 600-11, doi:10.31202/ecjse.1325078.

Vancouver

1.Isnen Hadi Al Ghozali, Arif Pirman, Indra Indra. Comparison of SVM and Naïve Bayes Algorithms with InNER enriched to Predict Hate Speech. ECJSE. 01 Eylül 2023;10(3):600-11. doi:10.31202/ecjse.1325078