Research Article

A comparative analysis of learning techniques in the context of Turkish spam detection

Volume: 14 Number: 1 July 7, 2024
EN TR

A comparative analysis of learning techniques in the context of Turkish spam detection

Abstract

Short Message Service (SMS) is a mobile messaging tool used by billions of people to communicate via a mobile phone. However, due to the lack of proper message filtering techniques, this form of communication is vulnerable to unwanted and junk messages. This paper compared SMS spam detection approaches based on machine learning methods such as Adaptive Boosting (AdaBoost), Extreme Gradient Boosting (XGBoost), K-Nearest Neighbors (KNN), Decision Tree (DT), Random Forest (RF), Multinominal Naïve Bayes (MNB), Logistic Regression (LR), and Support Vector Machines (SVM) and deep learning methods such as Convolutional Neural Networks (CNNs), Artificial Neural Networks (ANNs), and Long Short Term Memory (LSTM) in terms of f-score, accuracy, recall, precision, and a confusion matrix constructed for each strategy. The study tested two different preprocessing methods on two different Turkish SMS datasets to evaluate the approaches. The aim of this study is to contribute to the issue of spam filtering in Turkey. The results indicate that the highest accuracy values were achieved with Support Vector Machine (99.03%) using the first preprocessing method and Logistic Regression and Random Forest (98.07%) using the second preprocessing method on the BigTurkishSMS dataset, a combination of the two datasets used. As is the case with the majority of machine learning algorithms, the second preprocessing of the data set yielded superior results in deep learning models. The ANN model achieved the highest accuracy, with a score of 97.41%. The study employed a comparison of machine learning and deep learning techniques on Turkish SMS datasets, which will provide valuable insights for researchers working in this field.

Keywords

Thanks

The author would like to thank Cakmak Z. and Cifci M.S. for their assistance in gathering datasets and conducting experiments. This paper was presented at the International Information Congress 2024 (IIC2024).

References

  1. Al Maruf, A., Al Numan, A., Haque, M. M., Jidney, T. T., & Aung, Z. (2023, April). Ensembleapproach to classify spam SMS from Bengali text. In International Conference on Advances in Computing and Data Sciences (pp. 440-453). Cham: Springer Nature Switzerland.
  2. Almeida, T. A., Hidalgo, J. M. G., & Yamakami, A. (2011, September). Contributions to the study of SMS spam filtering: new collection and results. In Proceedings of the 11th ACM symposium on Document engineering (pp. 259-262).
  3. Arulprakash, M. (2021). Eshort message service spam detection and filtering using machine learning approach. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 12(9), 721- 727.
  4. Chen, Y. H., Huang, L., Wang, C. D., Fu, M., Huang, S. Q., Huang, J., Tan & Yan, C. (2022). Adversarial Spam Detector with Character Similarity Network. IEEE Transactions on Industrial Informatics, 19(3), 2541-2551. doi: 10.1109/TII.2022.3177726
  5. Dierks, Z., (2023). Forecast of the smartphone user penetration rate in Turkey 2018-2024. Tech. rep., Statista.
  6. Ergin, S., & Isik, S. (2014a, June). The assessment of feature selection methods on agglutinative language for spam email detection: A special case for Turkish. In 2014 IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA) Proceedings (pp. 122-125). IEEE.
  7. Ergin, S., & Isik, S. (2014b, June). The investigation on the effect of feature vector dimension for spam email detection with a new framework. In 2014 9th Iberian Conference on Information Systems and Technologies (CISTI) (pp. 1-4). IEEE.
  8. Eryılmaz, E. E., Şahin, D. Ö., & Kılıç, E. (2020, June). Filtering Turkish spam using LSTM from deep learning techniques. In 2020 8th International Symposium on Digital Forensics and Security (ISDFS) (pp. 1-6). IEEE.

Details

Primary Language

English

Subjects

Applied Computing (Other)

Journal Section

Research Article

Publication Date

July 7, 2024

Submission Date

June 15, 2024

Acceptance Date

June 25, 2024

Published in Issue

Year 2024 Volume: 14 Number: 1

APA
Şengel, Ö. (2024). A comparative analysis of learning techniques in the context of Turkish spam detection. Batman Üniversitesi Yaşam Bilimleri Dergisi, 14(1), 43-56. https://doi.org/10.55024/buyasambid.1501609
AMA
1.Şengel Ö. A comparative analysis of learning techniques in the context of Turkish spam detection. Batman Üniversitesi Yaşam Bilimleri Dergisi. 2024;14(1):43-56. doi:10.55024/buyasambid.1501609
Chicago
Şengel, Öznur. 2024. “A Comparative Analysis of Learning Techniques in the Context of Turkish Spam Detection”. Batman Üniversitesi Yaşam Bilimleri Dergisi 14 (1): 43-56. https://doi.org/10.55024/buyasambid.1501609.
EndNote
Şengel Ö (July 1, 2024) A comparative analysis of learning techniques in the context of Turkish spam detection. Batman Üniversitesi Yaşam Bilimleri Dergisi 14 1 43–56.
IEEE
[1]Ö. Şengel, “A comparative analysis of learning techniques in the context of Turkish spam detection”, Batman Üniversitesi Yaşam Bilimleri Dergisi, vol. 14, no. 1, pp. 43–56, July 2024, doi: 10.55024/buyasambid.1501609.
ISNAD
Şengel, Öznur. “A Comparative Analysis of Learning Techniques in the Context of Turkish Spam Detection”. Batman Üniversitesi Yaşam Bilimleri Dergisi 14/1 (July 1, 2024): 43-56. https://doi.org/10.55024/buyasambid.1501609.
JAMA
1.Şengel Ö. A comparative analysis of learning techniques in the context of Turkish spam detection. Batman Üniversitesi Yaşam Bilimleri Dergisi. 2024;14:43–56.
MLA
Şengel, Öznur. “A Comparative Analysis of Learning Techniques in the Context of Turkish Spam Detection”. Batman Üniversitesi Yaşam Bilimleri Dergisi, vol. 14, no. 1, July 2024, pp. 43-56, doi:10.55024/buyasambid.1501609.
Vancouver
1.Öznur Şengel. A comparative analysis of learning techniques in the context of Turkish spam detection. Batman Üniversitesi Yaşam Bilimleri Dergisi. 2024 Jul. 1;14(1):43-56. doi:10.55024/buyasambid.1501609