Research Article
BibTex RIS Cite

Spam Content Detection with Large Language Models: Comparative Analysis of Gemini and Deepseek Models on X Platform

Year 2025, Volume: 11 Issue: 3, 376 - 397, 31.12.2025

Abstract

Social media platforms offer significant opportunities for information dissemination, yet they also pose considerable risks regarding the spread of spam content. This study comparatively evaluates the performance of large language models (LLMs) in generating content analysis-based features for detecting spam content in visual posts shared on the X social media platform. Using the same dataset, two different LLMs — Google Gemini and Deepseek — were employed to generate semantically scaled features (e.g., tag consistency, inter-tag relationships, text matching) from the post texts. Visual analyses were supported by Cloud Vision AI. The resulting features were tested using five different machine learning algorithms: Decision Trees, Random Forest, Support Vector Machines (SVM), Logistic Regression, and Multilayer Perceptron. The analysis results indicated that the Random Forest algorithm, in particular, achieved the highest F1 scores and ROC AUC values with both models. However, the features generated by the Gemini and Deepseek models resulted in significant differences in classification performance. This study highlights the differences between LLMs in generating scaled semantic features and underscores the impact of LLM selection on classification performance in spam detection tasks.

References

  • [1] P. Sharma, T. Nagpal, G. Shrivastava, and J. D. Kumar, "A systematic review on social bots account detection using machine learning," in Proc. of the 2023 5th Int. Conf. on Advances in Computing, Communication Control and Networking (ICAC3N), Greater Noida, India, Dec. 2023. New York, NY, USA: IEEE, 2023, pp. 862–866. doi: 10.1109/icac3n60023.2023.10541657
  • [2] S. Ghosh, et al., "Understanding and combating link farming in the Twitter social network," in Proc. of The Web Conference (WWW 2012), Lyon, France, Apr. 2012. New York, NY, USA: ACM, 2012. doi: 10.1145/2187836.2187846
  • [3] L. Jin, Y. Chen, T. Wang, P. Hui, and A. V. Vasilakos, “Understanding user behavior in online social networks: a survey,” IEEE Communications Magazine, vol. 51, no. 9, pp. 144–150, Sept. 2013. doi: 10.1109/mcom.2013.6588663
  • [4] K. Thomas, C. Grier, D. Song, and V. Paxson, "Suspended accounts in retrospect," in Proc. of the 2011 ACM SIGCOMM Conf. on Internet Measurement Conference (IMC 2011), Berlin, Germany, Nov. 2011. New York, NY, USA: ACM, 2011. doi: 10.1145/2068816.2068840
  • [5] A. Aggarwal, A. Rajadesingan, and P. Kumaraguru, "PhishAri: Automatic realtime phishing detection on Twitter," in Proc. of the 2012 IEEE Int. Conf. on Communications (ICC), Ottawa, ON, Canada, Jun. 2012 [Online]. New York, NY, USA: IEEE, 2012. Available: IEEE Xplore, https://ieeexplore.ieee.org/document/6489521/. [Accessed: Jul. 16, 2025].
  • [6] D. Wang, S. B. Navathe, L. Liu, D. Irani, A. Tamersoy, and C. Pu, "Click traffic analysis of short URL spam on Twitter," in Proc. of the 2013 IEEE Int. Conf. on Communications (ICC), Budapest, Hungary, Jun. 2013 [Online]. New York, NY, USA: IEEE, 2013. Available: IEEE Xplore, https://ieeexplore.ieee.org/abstract/document/6679991/. [Accessed: Jul. 16, 2025].
  • [7] C. Grier, K. Thomas, V. Paxson, and M. Zhang, "@spam," in Proc. of the 17th ACM Conf. on Computer and Communications Security (CCS’10), Chicago, IL, USA, Oct. 2010. New York, NY, USA: ACM, 2010. doi: 10.1145/1866307.1866311
  • [8] T. Wu, S. Liu, J. Zhang, and Y. Xiang, "Twitter spam detection based on deep learning," in Proc. of the Australasian Computer Science Week Multiconference (ACSW 2017), Geelong, Australia, Jan. 2017. New York, NY, USA: ACM, 2017. doi: 10.1145/3014812.3014815
  • [9] M. Mateen, M. A. Iqbal, M. Aleem, and M. A. Islam, "A hybrid approach for spam detection for Twitter," in Proc. of the 2017 14th Int. Bhurban Conf. on Applied Sciences and Technology (IBCAST), Islamabad, Pakistan, Jan. 2017. New York, NY, USA: IEEE, 2017. doi: 10.1109/ibcast.2017.7868095
  • [10] S. Sedhai and A. Sun, “Semi-supervised spam detection in Twitter stream,” IEEE Transactions on Computational Social Systems, vol. 5, no. 1, pp. 169–175, Mar. 2018. doi: 10.1109/tcss.2017.2773581
  • [11] I. Inuwa-Dutse, M. Liptrott, and I. Korkontzelos, “Detection of spam-posting accounts on Twitter,” Neurocomputing, vol. 315, pp. 496–511, Nov. 2018. doi: 10.1016/j.neucom.2018.07.044
  • [12] S. Madisetty and M. S. Desarkar, “A neural network-based ensemble approach for spam detection in Twitter,” IEEE Transactions on Computational Social Systems, vol. 5, no. 4, pp. 973–984, Dec. 2018. doi: 10.1109/tcss.2018.2878852
  • [13] R. Kaur, S. Singh, and H. Kumar, “Rise of spam and compromised accounts in online social networks: A state-of-the-art review of different combating approaches,” Journal of Network and Computer Applications, vol. 112, pp. 53–88, Jun. 2018. doi: 10.1016/j.jnca.2018.03.015
  • [14] K. Binsaeed, G. Stringhini, and A. E., “Detecting spam in Twitter microblogging services: A novel machine learning approach based on domain popularity,” International Journal of Advanced Computer Science and Applications, vol. 11, no. 11, 2020. doi: 10.14569/ijacsa.2020.0111103
  • [15] K. S. Adewole, T. Han, W. Wu, H. Song, and A. K. Sangaiah, “Twitter spam account detection based on clustering and classification methods,” The Journal of Supercomputing, vol. 76, no. 7, pp. 4802–4837, Oct. 2018. doi: 10.1007/s11227-018-2641-x
  • [16] N. El-Mawass, P. Honeine, and L. Vercouter, “SimilCatch: Enhanced social spammers detection on Twitter using Markov random fields,” Information Processing & Management, vol. 57, no. 6, p. 102317, Nov. 2020. doi: 10.1016/j.ipm.2020.102317
  • [17] Y. Kontsewaya, E. Antonov, and A. Artamonov, “Evaluating the effectiveness of machine learning methods for spam detection,” Procedia Computer Science, vol. 190, pp. 479–486, 2021. doi: 10.1016/j.procs.2021.06.056
  • [18] K. Ushasree, S. Santoshi, S. Bhavya, Y. Bhavya Sri, and B. Venkateswarlu, “Twitter spam detection using Naïve Bayes classifier,” in Proc. of the 2021 IEEE International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, Jan. 2021 [Online]. Piscataway, NJ, USA: IEEE. Available: IEEE Xplore, https://ieeexplore.ieee.org/document/9358579/. [Accessed: Jul. 16, 2025].
  • [19] S. Bazzaz Abkenar, E. Mahdipour, S. M. Jameii, and M. Haghi Kashani, “A hybrid classification method for Twitter spam detection based on differential evolution and random forest,” Concurrency and Computation: Practice and Experience, Jun. 2021. doi: 10.1002/cpe.6381
  • [20] C. Kumar, T. S. Bharti, and S. Prakash, “A hybrid data-driven framework for spam detection in online social network,” Procedia Computer Science, vol. 218, pp. 124–132, 2023. doi: 10.1016/j.procs.2022.12.408
  • [21] M. Sumathi and S. P. Raja, “Machine learning algorithm-based spam detection in social networks,” Social Network Analysis and Mining, vol. 13, no. 1, Aug. 2023. doi: 10.1007/s13278-023-01108-6
  • [22] M. Thomas and B. B. Meshram, “ChSO-DNFNet: Spam detection in Twitter using feature fusion and optimized deep neuro fuzzy network,” Advances in Engineering Software, vol. 175, p. 103333, Jan. 2023. doi: 10.1016/j.advengsoft.2022.103333
  • [23] S. J. Alsunaidi, R. T. Alraddadi, and H. Aljamaan, “Twitter spam accounts detection using machine learning models,” in Proc. of the 2022 14th International Conference on Computational Intelligence and Communication Networks (CICN), Bhimtal, India, Dec. 2022, pp. 525–531. Piscataway, NJ, USA: IEEE, 2022. doi: 10.1109/cicn56167.2022.10008339
  • [24] S. Kaddoura and S. Henno, “Dataset of Arabic spam and ham tweets,” Data in Brief, vol. 52, p. 109904, Dec. 2023. doi: 10.1016/j.dib.2023.109904
  • [25] D. S. Krishna and G. Srinivas, “StopSpamX: A multi-modal fusion approach for spam detection in social networking,” MethodsX, vol. 14, pp. 103227–103227, Feb. 2025. doi: 10.1016/j.mex.2025.103227
  • [26] K. P. Sharma, et al., “Quantum behaved binary gravitational search algorithm with random forest for Twitter spammer detection,” Results in Engineering, pp. 103993–103993, Jan. 2025. doi: 10.1016/j.rineng.2025.103993
  • [27] OpenAI, "GPT-4 Technical Report," arXiv, Mar. 2023. [Online]. Available: https://arxiv.org/abs/2303.08774. [Accessed: Jul. 16, 2025].
  • [28] S. Minaee et al., "Large Language Models: A Survey," arXiv, Feb. 2024. [Online]. Available: https://arxiv.org/abs/2402.06196. [Accessed: Jul. 16, 2025].
  • [29] A. Radford et al., "Learning Transferable Visual Models from Natural Language Supervision," arXiv:2103.00020 [cs], Feb. 2021. [Online]. Available: https://arxiv.org/abs/2103.00020. [Accessed: Jul. 16, 2025].
  • [30] P. Yang, J. Ma, Y. Liu, and M. Liu, “Multi-modal transformer for fake news detection,” Mathematical Biosciences and Engineering, vol. 20, no. 8, pp. 14699–14717, 2023. doi: 10.3934/mbe.2023657
  • [31] Z. Wu et al., "DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding," arXiv, Dec. 13, 2024. [Online]. Available: https://arxiv.org/abs/2412.10302. [Accessed: Jul. 16, 2025].
  • [32] Google AI, "Gemini 1.0 Pro Large Language Model," Vertex AI (Google Cloud), 2024. [Online]. Available: https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/gemini-pro. [Accessed: Jul. 16, 2025].
  • [33] DeepSeek-AI et al., "DeepSeek LLM: Scaling Open-Source Language Models with Longtermism," arXiv, Jan. 2024. [Online]. Available: https://arxiv.org/abs/2401.02954. [Accessed: Jul. 16, 2025].
  • [34] M. A. Şentürk and Ş. Bahtiyar, “A survey on large language models in phishing detection,” in Proc. of the 2025 12th IFIP International Conference on New Technologies, Mobility and Security (NTMS), Istanbul, Türkiye, Jun. 2025, pp. 196–204. Piscataway, NJ, USA: IEEE, 2025. doi: 10.1109/ntms65597.2025.11076905
  • [35] M. G. Taş, “Türkçe oltalama e-postalarının anlamsal tespiti: Doğal dil işleme ve derin öğrenme tabanlı bir yaklaşım,” Siber Güvenlik ve Dijital Ekonomi, vol. 1, no. 1, pp. 29–42, 2025.
  • [36] G. K. Koru and Ç. Uluyol, “Detection of Turkish fake news from tweets with BERT models,” IEEE Access, pp. 1–1, Jan. 2024. doi: 10.1109/access.2024.3354165
  • [37] C. Boididou, et al., “Verifying information with multimedia content on Twitter,” Multimedia Tools and Applications, vol. 77, no. 12, pp. 15545–15571, Sept. 2017. doi: 10.1007/s11042-017-5132-9
  • [38] R. Anil et al., "PaLM 2 Technical Report," arXiv:2305.10403, May 17, 2023. [Online]. Available: https://arxiv.org/abs/2305.10403. [Accessed: Jul. 16, 2025].
  • [39] D. Chafle, S. Moharle, and A. Kathwate, “Spam Spyder (spam detection using MI & AI),” Int. J. of Trend in Scientific Research and Development, vol. 8, no. 5, Oct. 2024. [Online]. Available: https://www.ijtsrd.com/computer-science/other/70490/spam-spyder-spam-detection-using-mi-and-ai/anuja-kathwate. [Accessed: Jul. 16, 2025].
  • [40] Republic of Türkiye Directorate of Communications, "Social Media Usage Guide," Türkiye, 2020. [Online]. Available: https://www.iletisim.gov.tr/uploads/docs/SosyalMedyaKullanimKilavuzu.pdf. [Accessed: Jul. 16, 2025].
  • [41] Q. Ren, H. Cheng, and H. Han, “Research on machine learning framework based on random forest algorithm,” 2017. doi: 10.1063/1.4977376
  • [42] A. Gupte, S. Joshi, P. Gadgul, and A. Kadam, “Comparative study of classification algorithms used in sentiment analysis,” Int. J. of Computer Science and Information Technologies (IJCSIT), vol. 5, no. 5, pp. 6261–6264, 2014.
There are 42 citations in total.

Details

Primary Language English
Subjects Computer Software
Journal Section Research Article
Authors

Ebutalha Camadan 0000-0001-7669-5601

Mehmet Şimşek 0000-0002-9797-5028

Submission Date July 19, 2025
Acceptance Date September 1, 2025
Publication Date December 31, 2025
Published in Issue Year 2025 Volume: 11 Issue: 3

Cite

IEEE E. Camadan and M. Şimşek, “Spam Content Detection with Large Language Models: Comparative Analysis of Gemini and Deepseek Models on X Platform”, GJES, vol. 11, no. 3, pp. 376–397, 2025.

GJES is indexed and archived by:

3311333114331153311633117

Gazi Journal of Engineering Sciences (GJES) publishes open access articles under a Creative Commons Attribution 4.0 International License (CC BY) 1366_2000-copia-2.jpg