Research Article
BibTex RIS Cite

A Comparative Study of Machine Learning Classifiers for Different Language Spam SMS Detection: Performance Evaluation and Analysis

Year 2024, Volume: 4 Issue: 2, 69 - 77, 30.12.2024
https://doi.org/10.54569/aair.1549781

Abstract

With the continuous rise in the number of mobile device users, SMS (Short Message Service) remains a prevalent communication tool accessible on both smartphones and basic phones. Consequently, SMS traffic has experienced a significant surge. This increase has also led to a rise in spam messages, as spammers seek financial or business gains through activities like marketing promotions, lottery scams, and credit card information theft. Consequently, spam classification has become a focal point of research. In this paper, we explore the effectiveness of 11 machine learning algorithms for SMS spam detection, including multinomial Naïve Bayes, K-Nearest Neighbors (KNN), and Random Forest, among others. Utilizing datasets from UCI and Bangla SMS collections, our experimental results reveal that the multinomial Naïve Bayes algorithm surpasses previous models in spam detection, achieving accuracies of 98.65% and 89.10% in the respective datasets.

References

  • A. Alli and S. Misra, "A deep learning method for automatic SMS spam classification: Performance of learning algorithms on indigenous dataset," Concurrency and Computation: Practice and Experience, vol. 34, p. 34, 2022.
  • S. D. Gupta, S. Saha and S. K. Das, "SMS spam detection using machine learning," in Journal of Physics: Conference Series, 2021.
  • T. Almeida and J. Hidalgo, "SMS Spam Collection," 2011.
  • X. Liu, H. Lu and A. Nayak, "A Spam Transformer Model for SMS Spam Detection," IEEE Access, vol. 9, pp. 80253-80263, 2021.
  • S. Gadde, A. Lakshmanarao and S. Satyanarayana, "SMS Spam Detection using Machine Learning and Deep Learning Techniques," 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), vol. 1, pp. 358-362, 2021.
  • P. J. Yerima and S, "A comparative study of word embedding techniques for SMS spam detection," 2022 14th International Conference on Computational Intelligence and Communication Networks (CICN), pp. 149-155, 2022.
  • D. Suleiman and G. Al-Naymat, "SMS spam detection using H2O framework," Procedia computer science 113, pp. 154-161, 2017.
  • G. L. Haq, S. Nazir and H. U. Khan, "Spam Detection Approach for Secure Mobile Message Communication Using Machine Learning Algorithms," Secur. Commun. Networks, vol. 2020, pp. 8873639:1-8873639:6, 2020.
  • L. P. Lim and M. M. Singh, "Resolving the imbalance issue in short messaging service spam dataset using cost-sensitive techniques," Journal of Information Security and Applications, vol. 54, p. 102558, 2020.
  • E. Wijaya, G. Noveliora, K. D. Utami, Rojali and G. Z. Nabiilah, "Spam Detection in Short Message Service (SMS) Using Naïve Bayes, SVM, LSTM, and CNN," 2023 10th International Conference on Information Technology, Computer, and Electrical Engineering (ICITACEE), pp. 431-436, 2023.
  • E. Sankar, "Sms Spam Detection Using Machine Learning," Interantional Journal Of Scientific Research In Engineering And Management, 2023.
  • Mahadev and H. Jain, "An Analysis of SMS Spam Detection using Machine Learning Model," 2022 Fifth International Conference on Computational Intelligence and Communication Technologies (CCICT), pp. 151-156, 2022.
  • A. A. M. Tasmia, A. A. N. ,. Jidney and Z. M. A. M. Haque, "Ensemble Approach to Classify Spam SMS from Bengali Text," in Springer Nature, kolkata, 2023.
  • F. Khan, R. Mustafa, F. Tasnim, T. Mahmud, M. S. Hossain and K. Andersson, "Exploring BERT and ELMo for Bangla Spam SMS Dataset Creation and Detection," in 2023 26th International Conference on Computer and Information Technology (ICCIT), 2023.
  • R. G. d. Luna, V. C. Magnaye, R. A. L. Reaño, K. L. Enriquez, D. Astorga, T. Celestial, A. M. Española, B. A. Lanting, D. Mugar, M. Ramos and J. Redondo, "A Machine Learning Approach for Efficient Spam Detection in Short Messaging System (SMS)," TENCON 2023 - 2023 IEEE Region 10 Conference (TENCON), pp. 53-58, 2023.
  • R. G. d. L. Redondo, V. C. Magnaye, R. A. L. Reaño, K. L. E. a. D. Astorga, T. Celestial, A. M. Española, B. A. Lanting, D. Mugar, M. Ramos and Jenjazel, "A Machine Learning Approach for Efficient Spam Detection in Short Messaging System (SMS)," TENCON 2023 - 2023 IEEE Region 10 Conference (TENCON), pp. 53-58, 2023.
  • Ojo and D. A. Oyeyemi, "SMS Spam Detection and Classification to Combat Abuse in Telephone Networks Using Natural Language Processing," Journal of Advances in Mathematics and Computer Science, 2023.
  • S. Alghazzawi and D. Alqahtani, "A survey of Emerging Techniques in Detecting SMS Spam," Transactions on Machine Learning and Artificial Intelligence, 2019.
  • U. M. Kundi, S. Rehman, T. Ali, K. Mahmood and T. Alsaedi, "An Intelligent Framework Based on Deep Learning for SMS and e-mail Spam Detection," Appl. Comput. Intell. Soft Comput., vol. 2023, pp. 6648970:1-6648970:16, 2023.
  • M. Gupta, A. Bakliwal, S. Agarwal and P. Mehndiratta, "A comparative study of spam SMS detection using machine learning classifiers," in IEEE, 2018.
  • T. A. H. Almeida and Y. A. Jos'e Maria G, "Contributions to the study of SMS spam filtering: new collection and results," in Association for Computing Machinery, New York, NY, USA, 2011.
  • Bashar and S. Yerima, "Semi-supervised novelty detection with one class SVM for SMS spam detection," 2022 29th International Conference on Systems, Signals and Image Processing (IWSSIP), Vols. CFP2255E-ART, pp. 1-4, 2022.
  • S. Gadde, A. Lakshmanarao and S. Satyanarayana, "SMS spam detection using machine learning and deep learning techniques," 2021 7th international conference on advanced computing and communication systems (ICACCS), vol. 1, pp. 358-362, 2021.
  • E. W. Nabiilah, G. Noveliora, K. D. Utami, Rojali and G. Zain, "Spam Detection in Short Message Service (SMS) Using Naïve Bayes, SVM, LSTM, and CNN," 2023 10th International Conference on Information Technology, Computer, and Electrical Engineering (ICITACEE), pp. 431-436, 2023.
  • P. K. Roy, J. P. Singh and S. Banerjee, "Deep learning to filter SMS Spam," Future Generation Computer Systems, vol. 102, pp. 524-533, 2020.
  • S. Yadav and A., "Mobile SMS Spam Filtering for Nepali Text Using Naïve Bayesian and Support Vector Machine," International Journal of Intelligent Systems, vol. 04, pp. 24-28, 2014.

A Comparative Study of Machine Learning Classifiers for Different Language Spam SMS Detection: Performance Evaluation and Analysis

Year 2024, Volume: 4 Issue: 2, 69 - 77, 30.12.2024
https://doi.org/10.54569/aair.1549781

Abstract

With the continuous rise in the number of mobile device users, SMS (Short Message Service) remains a prevalent communication tool accessible on both smartphones and basic phones. Consequently, SMS traffic has experienced a significant surge. This increase has also led to a rise in spam messages, as spammers seek financial or business gains through activities like marketing promotions, lottery scams, and credit card information theft. Consequently, spam classification has become a focal point of research. In this paper, we explore the effectiveness of 11 machine learning algorithms for SMS spam detection, including multinomial Naïve Bayes, K-Nearest Neighbors (KNN), and Random Forest, among others. Utilizing datasets from UCI and Bangla SMS collections, our experimental results reveal that the multinomial Naïve Bayes algorithm surpasses previous models in spam detection, achieving accuracies of 98.65% and 89.10% in the respective datasets.

References

  • A. Alli and S. Misra, "A deep learning method for automatic SMS spam classification: Performance of learning algorithms on indigenous dataset," Concurrency and Computation: Practice and Experience, vol. 34, p. 34, 2022.
  • S. D. Gupta, S. Saha and S. K. Das, "SMS spam detection using machine learning," in Journal of Physics: Conference Series, 2021.
  • T. Almeida and J. Hidalgo, "SMS Spam Collection," 2011.
  • X. Liu, H. Lu and A. Nayak, "A Spam Transformer Model for SMS Spam Detection," IEEE Access, vol. 9, pp. 80253-80263, 2021.
  • S. Gadde, A. Lakshmanarao and S. Satyanarayana, "SMS Spam Detection using Machine Learning and Deep Learning Techniques," 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), vol. 1, pp. 358-362, 2021.
  • P. J. Yerima and S, "A comparative study of word embedding techniques for SMS spam detection," 2022 14th International Conference on Computational Intelligence and Communication Networks (CICN), pp. 149-155, 2022.
  • D. Suleiman and G. Al-Naymat, "SMS spam detection using H2O framework," Procedia computer science 113, pp. 154-161, 2017.
  • G. L. Haq, S. Nazir and H. U. Khan, "Spam Detection Approach for Secure Mobile Message Communication Using Machine Learning Algorithms," Secur. Commun. Networks, vol. 2020, pp. 8873639:1-8873639:6, 2020.
  • L. P. Lim and M. M. Singh, "Resolving the imbalance issue in short messaging service spam dataset using cost-sensitive techniques," Journal of Information Security and Applications, vol. 54, p. 102558, 2020.
  • E. Wijaya, G. Noveliora, K. D. Utami, Rojali and G. Z. Nabiilah, "Spam Detection in Short Message Service (SMS) Using Naïve Bayes, SVM, LSTM, and CNN," 2023 10th International Conference on Information Technology, Computer, and Electrical Engineering (ICITACEE), pp. 431-436, 2023.
  • E. Sankar, "Sms Spam Detection Using Machine Learning," Interantional Journal Of Scientific Research In Engineering And Management, 2023.
  • Mahadev and H. Jain, "An Analysis of SMS Spam Detection using Machine Learning Model," 2022 Fifth International Conference on Computational Intelligence and Communication Technologies (CCICT), pp. 151-156, 2022.
  • A. A. M. Tasmia, A. A. N. ,. Jidney and Z. M. A. M. Haque, "Ensemble Approach to Classify Spam SMS from Bengali Text," in Springer Nature, kolkata, 2023.
  • F. Khan, R. Mustafa, F. Tasnim, T. Mahmud, M. S. Hossain and K. Andersson, "Exploring BERT and ELMo for Bangla Spam SMS Dataset Creation and Detection," in 2023 26th International Conference on Computer and Information Technology (ICCIT), 2023.
  • R. G. d. Luna, V. C. Magnaye, R. A. L. Reaño, K. L. Enriquez, D. Astorga, T. Celestial, A. M. Española, B. A. Lanting, D. Mugar, M. Ramos and J. Redondo, "A Machine Learning Approach for Efficient Spam Detection in Short Messaging System (SMS)," TENCON 2023 - 2023 IEEE Region 10 Conference (TENCON), pp. 53-58, 2023.
  • R. G. d. L. Redondo, V. C. Magnaye, R. A. L. Reaño, K. L. E. a. D. Astorga, T. Celestial, A. M. Española, B. A. Lanting, D. Mugar, M. Ramos and Jenjazel, "A Machine Learning Approach for Efficient Spam Detection in Short Messaging System (SMS)," TENCON 2023 - 2023 IEEE Region 10 Conference (TENCON), pp. 53-58, 2023.
  • Ojo and D. A. Oyeyemi, "SMS Spam Detection and Classification to Combat Abuse in Telephone Networks Using Natural Language Processing," Journal of Advances in Mathematics and Computer Science, 2023.
  • S. Alghazzawi and D. Alqahtani, "A survey of Emerging Techniques in Detecting SMS Spam," Transactions on Machine Learning and Artificial Intelligence, 2019.
  • U. M. Kundi, S. Rehman, T. Ali, K. Mahmood and T. Alsaedi, "An Intelligent Framework Based on Deep Learning for SMS and e-mail Spam Detection," Appl. Comput. Intell. Soft Comput., vol. 2023, pp. 6648970:1-6648970:16, 2023.
  • M. Gupta, A. Bakliwal, S. Agarwal and P. Mehndiratta, "A comparative study of spam SMS detection using machine learning classifiers," in IEEE, 2018.
  • T. A. H. Almeida and Y. A. Jos'e Maria G, "Contributions to the study of SMS spam filtering: new collection and results," in Association for Computing Machinery, New York, NY, USA, 2011.
  • Bashar and S. Yerima, "Semi-supervised novelty detection with one class SVM for SMS spam detection," 2022 29th International Conference on Systems, Signals and Image Processing (IWSSIP), Vols. CFP2255E-ART, pp. 1-4, 2022.
  • S. Gadde, A. Lakshmanarao and S. Satyanarayana, "SMS spam detection using machine learning and deep learning techniques," 2021 7th international conference on advanced computing and communication systems (ICACCS), vol. 1, pp. 358-362, 2021.
  • E. W. Nabiilah, G. Noveliora, K. D. Utami, Rojali and G. Zain, "Spam Detection in Short Message Service (SMS) Using Naïve Bayes, SVM, LSTM, and CNN," 2023 10th International Conference on Information Technology, Computer, and Electrical Engineering (ICITACEE), pp. 431-436, 2023.
  • P. K. Roy, J. P. Singh and S. Banerjee, "Deep learning to filter SMS Spam," Future Generation Computer Systems, vol. 102, pp. 524-533, 2020.
  • S. Yadav and A., "Mobile SMS Spam Filtering for Nepali Text Using Naïve Bayesian and Support Vector Machine," International Journal of Intelligent Systems, vol. 04, pp. 24-28, 2014.
There are 26 citations in total.

Details

Primary Language English
Subjects Machine Learning (Other), Natural Language Processing
Journal Section Research Articles
Authors

Samrat Kumar Dev Sharma 0009-0009-7647-0731

Publication Date December 30, 2024
Submission Date September 13, 2024
Acceptance Date December 28, 2024
Published in Issue Year 2024 Volume: 4 Issue: 2

Cite

IEEE S. K. Dev Sharma, “A Comparative Study of Machine Learning Classifiers for Different Language Spam SMS Detection: Performance Evaluation and Analysis”, Adv. Artif. Intell. Res., vol. 4, no. 2, pp. 69–77, 2024, doi: 10.54569/aair.1549781.

88x31.png
Advances in Artificial Intelligence Research is an open access journal which means that the content is freely available without charge to the user or his/her institution. All papers are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, which allows users to distribute, remix, adapt, and build upon the material in any medium or format for non-commercial purposes only, and only so long as attribution is given to the creator.

Graphic design @ Özden Işıktaş