ALBERT4Spam: A Novel Approach for Spam Detection on Social Networks

Rezan Bakır; Hasan Erbay; Halit Bakır

doi:10.17671/gazibtd.1426230

Araştırma Makalesi

ALBERT4Spam: Sosyal Ağlarda Spam Tespitinde Yeni Bir Yaklaşım

Yıl 2024, Cilt: 17 Sayı: 2, 81 - 94, 30.04.2024

Rezan Bakır , Hasan Erbay , Halit Bakır

https://doi.org/10.17671/gazibtd.1426230

Öz

Sosyal medyada gezinmek, insanların katıldığı en popüler çevrimiçi etkinliklerden biridir. Sosyal medya, günlük hayatlarımıza daha fazla entegre oldukça, kurbanlarına sosyal ağ siteleri aracılığıyla ulaşmak isteyen spam göndericilere sayısız fırsat sunmaktadır. Sosyal ağlar üzerinden iletilen mesajlar genellikle kısa ve seyrek olduğu için, kısa metin sınıflandırma problemleri ortaya çıkmaktadır. Bu tür sorunların üstesinden gelmek için, sınıflandırıcının etkinliğini artırmak amacıyla metni uygun şekilde temsil etmek önemlidir. Bu amaçla, bu çalışma, sosyal medya platformlarında spam tanımlamak için derin öğrenme yaklaşımı olan ALBERT4Spam'i tanıtmaktadır. ALBERT modelinden gelen bağlamsal kelime temsilleri kullanılarak önerilen Çift Yönlü Uzun Kısa Süreli Bellek (BLSTM) sinir ağı mimarisinin performansı artırılmıştır. Önerilen ALBERT4Spam modelinde kullanılan BLSTM katman sayısı, nöron sayısı, katman sayısı, aktivasyon fonksiyonu, öğrenme oranı, ağırlık başlangıç, optimizer ve bırakma gibi hiperparametreler, en iyi performansa ulaşmak için rastgele arama yöntemi kullanılarak optimize edilmiştir. Üç farklı standart veri seti üzerinde yapılan deneysel sonuçlar, önerilen modelin mevcut modellere kıyasla sosyal medya platformlarındaki spam mesajlarını daha başarılı bir şekilde tespit ettiğini göstermektedir. Yapılan deneyler, Twitter, YouTube ve SMS veri setlerinde sırasıyla %98, %96 ve %98 kesinlik sonuçlarıyla daha üstün performans sergilediğini ortaya koymaktadır.

Anahtar Kelimeler

Spam tespiti, Kelime gömme, Derin öğrenme, BERT, ALBERT, BLSTM

Kaynakça

T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543.
A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for efficient text classification,” arXiv preprint arXiv:1607.01759, 2016.
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, “Albert: A lite bert for self-supervised learning of language representations,” arXiv preprint arXiv:1909.11942, 2019.
A. Al-Zoubi, J. Alqatawna, H. Faris, and M. A. Hassonah, “Spam profiles detection on social networks using computational intelligence methods: the effect of the lingual context,” J Inf Sci, vol. 47, no. 1, pp. 58–81, 2021.
D. Niranjan Koggalahewa, Y. Xu, and E. Foo, “Spam detection in social networks based on peer acceptance,” in Proceedings of the Australasian Computer Science Week Multiconference, 2020, pp. 1–7.
K. S. Adewole, T. Han, W. Wu, H. Song, and A. K. Sangaiah, “Twitter spam account detection based on clustering and classification methods,” J Supercomput, vol. 76, pp. 4802–4837, 2020.
A. Kumar, M. Singh, and A. R. Pais, “Fuzzy string matching algorithm for spam detection in Twitter,” in Security and Privacy: Second ISEA International Conference, ISEA-ISAP 2018, Jaipur, India, January, 9–11, 2019, Revised Selected Papers 2, Springer, 2019, pp. 289–301.
O. ÇITLAK, M. DÖRTERLER, and İ. DOGRU, “A hybrid spam detection framework for social networks,” Politeknik Dergisi, p. 1, 2023.
C. Kumar, T. S. Bharti, and S. Prakash, “A hybrid Data-Driven framework for Spam detection in Online Social Network,” Procedia Comput Sci, vol. 218, pp. 124–132, 2023.
A. Aziz, C. F. M. Foozy, P. Shamala, and Z. Suradi, “YouTube spam comment detection using support vector machine and K–nearest neighbor,” Indones. J. Electr. Eng. Comput. Sci, vol. 12, no. 2, p. 612, 2018.
A. Ali and M. Amin, “An Approach for Spam Detection in YouTube Comments Based on Supervised Learning,” 2016.
A. T. Kabakus and R. Kara, “‘TwitterSpamDetector’: a spam detection framework for Twitter,” International Journal of Knowledge and Systems Science (IJKSS), vol. 10, no. 3, pp. 1–14, 2019.
P. Nagaraj, K. M. Sudar, P. Thrived, P. G. K. Reddy, S. B. Babu, and P. S. R. Krishna, “Youtube Comment Spam Detection,” in 2023 International Conference on Computer Communication and Informatics (ICCCI), IEEE, 2023, pp. 1–6.
H. Valpadasu, P. Chakri, P. Harshitha, and P. Tarun, “Machine Learning based Spam Comments Detection on YouTube,” in 2023 7th International Conference on Intelligent Computing and Control Systems (ICICCS), IEEE, 2023, pp. 1234–1239.
L. Shabadi, P. Srikanth, V. Kumar, and U. Kashyap, “Youtube Spam Detection Scheme Using Stacked Ensemble Machine Learning Model,” in 2023 International Conference on Network, Multimedia and Information Technology (NMITCON), IEEE, 2023, pp. 1–7.
T. C. Alberto, J. V Lochter, and T. A. Almeida, “Tubespam: Comment spam filtering on youtube,” in 2015 IEEE 14th international conference on machine learning and applications (ICMLA), IEEE, 2015, pp. 138–143.
T. Wu, S. Liu, J. Zhang, and Y. Xiang, “Twitter spam detection based on deep learning,” in Proceedings of the australasian computer science week multiconference, 2017, pp. 1–8.
R. Ghanem and H. Erbay, “Spam detection on social networks using deep contextualized word representation,” Multimed Tools Appl, vol. 82, no. 3, pp. 3697–3712, 2023.
R. Ghanem and H. Erbay, “Context-dependent model for spam detection on social networks,” SN Appl Sci, vol. 2, pp. 1–8, 2020.
R. Ghanem, H. Erbay, and K. Bakour, “Contents-Based Spam Detection on Social Networks Using RoBERTa Embedding and Stacked BLSTM,” SN Comput Sci, vol. 4, no. 4, p. 380, 2023.
H. Bakir and G. Tarihi, “Using Transfer Learning Technique as a Feature Extraction Phase for Diagnosis of Cataract Disease in the Eye.”
H. BAKIR, S. OKTAY, and E. TABARU, “DETECTION OF PNEUMONIA FROM X-RAY IMAGES USING DEEP LEARNING TECHNIQUES,” Journal of Scientific Reports-A, no. 052, pp. 419–440, Mar. 2023, doi: 10.59313/jsr-a.1219363.
H. Bakır, “Evaluating the impact of tuned pre-trained architectures’ feature maps on deep learning model performance for tomato disease detection,” Multimed Tools Appl, pp. 1–22, 2023.
H. Bakır and R. Bakır, “DroidEncoder: Malware detection using auto-encoder based feature extractor and machine learning algorithms,” Computers and Electrical Engineering, vol. 110, p. 108804, 2023.
E. Doğan and H. BAKIR, “Hiperparemetreleri Ayarlanmış Makine Öğrenmesi Yöntemleri Kullanılarak Ağdaki Saldırıların Tespiti,” in International Conference on Pioneer and Innovative Studies, 2023, pp. 274–286.
S. Madisetty and M. S. Desarkar, “A neural network-based ensemble approach for spam detection in Twitter,” IEEE Trans Comput Soc Syst, vol. 5, no. 4, pp. 973–984, 2018.
A. Barushka and P. Hajek, “Spam detection on social networks using cost-sensitive feature selection and ensemble-based regularized deep neural networks,” Neural Comput Appl, vol. 32, pp. 4239–4257, 2020.
G. Jain, M. Sharma, and B. Agarwal, “Spam detection in social media using convolutional and long short term memory neural network,” Ann Math Artif Intell, vol. 85, no. 1, pp. 21–44, 2019.
W. Chen, C. K. Yeo, C. T. Lau, and B. S. Lee, “A study on real-time low-quality content detection on Twitter from the users’ perspective,” PLoS One, vol. 12, no. 8, p. e0182487, 2017.
G. Jain, M. Sharma, and B. Agarwal, “Spam detection in social media using convolutional and long short term memory neural network,” Ann Math Artif Intell, vol. 85, no. 1, pp. 21–44, 2019.
P. Nagaraj, K. M. Sudar, P. Thrived, P. G. K. Reddy, S. B. Babu, and P. S. R. Krishna, “Youtube Comment Spam Detection,” in 2023 International Conference on Computer Communication and Informatics (ICCCI), IEEE, 2023, pp. 1–6.

ALBERT4Spam: A Novel Approach for Spam Detection on Social Networks

Yıl 2024, Cilt: 17 Sayı: 2, 81 - 94, 30.04.2024

Rezan Bakır , Hasan Erbay , Halit Bakır

https://doi.org/10.17671/gazibtd.1426230

Öz

Engaging in social media browsing stands out as one of the most prevalent online activities. As social media increasingly integrates into our daily routines, it opens up numerous opportunities for spammers seeking to target individuals through these platforms. Given the concise and sporadic nature of messages exchanged on social networks, they fall within the realm of short text classification challenges. Effectively addressing such issues requires appropriately representing the text to enhance classifier efficiency.Accordingly, this study utilizes robust representations derived from contextualized models as a component of the feature extraction process within our deep neural network model, which is built upon the Bidirectional Long Short-Term Memory neural network (BLSTM). Introducing ALBERT4Spam, the study presents a deep learning methodology aimed at identifying spam on social networking platforms. It harnesses the proven ALBERT model to acquire contextualized word representations, thereby elevating the effectiveness of the suggested neural network framework.The random search method was used to fine-tune the ALBERT4Spam model's hyperparameters, which included the number of BLSTM layers, neuron count, layer count, activation function, weight initializer, learning rate, optimizer, and dropout, in order to obtain optimal performance. The experiments conducted on three benchmark datasets demonstrate that our innovative model surpasses widely used methods in social network spam detection. The precision results stand at 0.98, 0.96, and 0.98 for Twitter, YouTube, and SMS datasets, respectively, showcasing superior performance outcomes.

Anahtar Kelimeler

Spam detection, Word embedding, Deep learning, BERT, ALBERT, BLSTM

Kaynakça

T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543.
A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for efficient text classification,” arXiv preprint arXiv:1607.01759, 2016.
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, “Albert: A lite bert for self-supervised learning of language representations,” arXiv preprint arXiv:1909.11942, 2019.
A. Al-Zoubi, J. Alqatawna, H. Faris, and M. A. Hassonah, “Spam profiles detection on social networks using computational intelligence methods: the effect of the lingual context,” J Inf Sci, vol. 47, no. 1, pp. 58–81, 2021.
D. Niranjan Koggalahewa, Y. Xu, and E. Foo, “Spam detection in social networks based on peer acceptance,” in Proceedings of the Australasian Computer Science Week Multiconference, 2020, pp. 1–7.
K. S. Adewole, T. Han, W. Wu, H. Song, and A. K. Sangaiah, “Twitter spam account detection based on clustering and classification methods,” J Supercomput, vol. 76, pp. 4802–4837, 2020.
A. Kumar, M. Singh, and A. R. Pais, “Fuzzy string matching algorithm for spam detection in Twitter,” in Security and Privacy: Second ISEA International Conference, ISEA-ISAP 2018, Jaipur, India, January, 9–11, 2019, Revised Selected Papers 2, Springer, 2019, pp. 289–301.
O. ÇITLAK, M. DÖRTERLER, and İ. DOGRU, “A hybrid spam detection framework for social networks,” Politeknik Dergisi, p. 1, 2023.
C. Kumar, T. S. Bharti, and S. Prakash, “A hybrid Data-Driven framework for Spam detection in Online Social Network,” Procedia Comput Sci, vol. 218, pp. 124–132, 2023.
A. Aziz, C. F. M. Foozy, P. Shamala, and Z. Suradi, “YouTube spam comment detection using support vector machine and K–nearest neighbor,” Indones. J. Electr. Eng. Comput. Sci, vol. 12, no. 2, p. 612, 2018.
A. Ali and M. Amin, “An Approach for Spam Detection in YouTube Comments Based on Supervised Learning,” 2016.
A. T. Kabakus and R. Kara, “‘TwitterSpamDetector’: a spam detection framework for Twitter,” International Journal of Knowledge and Systems Science (IJKSS), vol. 10, no. 3, pp. 1–14, 2019.
P. Nagaraj, K. M. Sudar, P. Thrived, P. G. K. Reddy, S. B. Babu, and P. S. R. Krishna, “Youtube Comment Spam Detection,” in 2023 International Conference on Computer Communication and Informatics (ICCCI), IEEE, 2023, pp. 1–6.
H. Valpadasu, P. Chakri, P. Harshitha, and P. Tarun, “Machine Learning based Spam Comments Detection on YouTube,” in 2023 7th International Conference on Intelligent Computing and Control Systems (ICICCS), IEEE, 2023, pp. 1234–1239.
L. Shabadi, P. Srikanth, V. Kumar, and U. Kashyap, “Youtube Spam Detection Scheme Using Stacked Ensemble Machine Learning Model,” in 2023 International Conference on Network, Multimedia and Information Technology (NMITCON), IEEE, 2023, pp. 1–7.
T. C. Alberto, J. V Lochter, and T. A. Almeida, “Tubespam: Comment spam filtering on youtube,” in 2015 IEEE 14th international conference on machine learning and applications (ICMLA), IEEE, 2015, pp. 138–143.
T. Wu, S. Liu, J. Zhang, and Y. Xiang, “Twitter spam detection based on deep learning,” in Proceedings of the australasian computer science week multiconference, 2017, pp. 1–8.
R. Ghanem and H. Erbay, “Spam detection on social networks using deep contextualized word representation,” Multimed Tools Appl, vol. 82, no. 3, pp. 3697–3712, 2023.
R. Ghanem and H. Erbay, “Context-dependent model for spam detection on social networks,” SN Appl Sci, vol. 2, pp. 1–8, 2020.
R. Ghanem, H. Erbay, and K. Bakour, “Contents-Based Spam Detection on Social Networks Using RoBERTa Embedding and Stacked BLSTM,” SN Comput Sci, vol. 4, no. 4, p. 380, 2023.
H. Bakir and G. Tarihi, “Using Transfer Learning Technique as a Feature Extraction Phase for Diagnosis of Cataract Disease in the Eye.”
H. BAKIR, S. OKTAY, and E. TABARU, “DETECTION OF PNEUMONIA FROM X-RAY IMAGES USING DEEP LEARNING TECHNIQUES,” Journal of Scientific Reports-A, no. 052, pp. 419–440, Mar. 2023, doi: 10.59313/jsr-a.1219363.
H. Bakır, “Evaluating the impact of tuned pre-trained architectures’ feature maps on deep learning model performance for tomato disease detection,” Multimed Tools Appl, pp. 1–22, 2023.
H. Bakır and R. Bakır, “DroidEncoder: Malware detection using auto-encoder based feature extractor and machine learning algorithms,” Computers and Electrical Engineering, vol. 110, p. 108804, 2023.
E. Doğan and H. BAKIR, “Hiperparemetreleri Ayarlanmış Makine Öğrenmesi Yöntemleri Kullanılarak Ağdaki Saldırıların Tespiti,” in International Conference on Pioneer and Innovative Studies, 2023, pp. 274–286.
S. Madisetty and M. S. Desarkar, “A neural network-based ensemble approach for spam detection in Twitter,” IEEE Trans Comput Soc Syst, vol. 5, no. 4, pp. 973–984, 2018.
A. Barushka and P. Hajek, “Spam detection on social networks using cost-sensitive feature selection and ensemble-based regularized deep neural networks,” Neural Comput Appl, vol. 32, pp. 4239–4257, 2020.
G. Jain, M. Sharma, and B. Agarwal, “Spam detection in social media using convolutional and long short term memory neural network,” Ann Math Artif Intell, vol. 85, no. 1, pp. 21–44, 2019.
W. Chen, C. K. Yeo, C. T. Lau, and B. S. Lee, “A study on real-time low-quality content detection on Twitter from the users’ perspective,” PLoS One, vol. 12, no. 8, p. e0182487, 2017.
G. Jain, M. Sharma, and B. Agarwal, “Spam detection in social media using convolutional and long short term memory neural network,” Ann Math Artif Intell, vol. 85, no. 1, pp. 21–44, 2019.
P. Nagaraj, K. M. Sudar, P. Thrived, P. G. K. Reddy, S. B. Babu, and P. S. R. Krishna, “Youtube Comment Spam Detection,” in 2023 International Conference on Computer Communication and Informatics (ICCCI), IEEE, 2023, pp. 1–6.

Toplam 33 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Derin Öğrenme, Doğal Dil İşleme
Bölüm	Makaleler
Yazarlar	Rezan Bakır 0000-0002-4373-2231 Hasan Erbay Halit Bakır 0000-0003-3327-2822
Yayımlanma Tarihi	30 Nisan 2024
Gönderilme Tarihi	26 Ocak 2024
Kabul Tarihi	4 Mart 2024
Yayımlandığı Sayı	Yıl 2024 Cilt: 17 Sayı: 2

Kaynak Göster

APA	Bakır, R., Erbay, H., & Bakır, H. (2024). ALBERT4Spam: A Novel Approach for Spam Detection on Social Networks. Bilişim Teknolojileri Dergisi, 17(2), 81-94. https://doi.org/10.17671/gazibtd.1426230

Kapak Resmi İndir

Makale Dosyaları

Tam Metin