Research Article
BibTex RIS Cite

Comparative Analysis of Machine Learning Approaches in the Spam-Mail Classification Problem

Year 2022, , 349 - 364, 31.07.2022
https://doi.org/10.17671/gazibtd.1014764

Abstract

Electronic mail is a communication tool where organizations and people frequently use various interactions such as file sharing. In addition to the beneficial effects of such tools, there is also the sharing of spam e-mail. Unwanted e-mails are labeled as 'Spam'. Spam emails; It can be a source of harmful content such as unwanted advertisements, virus interactions and phishing. It is known that security is very important in communication. For this reason, it is important to classify e-mail systems according to various criteria in order to be free from harmful tools or software. Such studies are presented under different headings in the literature. Machine learning algorithms are used effectively in classification studies. In this study, it is aimed to adapt naive bayes, logistic regression, decision tree and k-nearest neighbor algorithms to the related problem and analyze them comparatively. Here, the effect of approaches with different methodologies on the related problem is tried to be examined in detail. In this context, algorithms have been used in various data sets. The effect of datasets of different sizes and different raw/spam ratios on the study is discussed. Different performance results have been obtained. These performance results were compared according to different methods and presented in tables. The high number of datasets and spam rate provided effective results in the Enron 5 dataset. By using different feature selection methods, Decision tree algorithm performed well on Enron 4 dataset. It has been observed that the best performance performances are obtained with logistic regression and k-nearest neighbor algorithms according to the tests on the CS440/ECE448 dataset.

References

  • J. Hong, "The State of Phishing Attacks", Communications of the ACM, 55(1), 74-81, 2012.
  • E. M. Rudd, A. Rozsa, M. Günther, T. E. Boult, "A Survey of Stealth Malware Attacks, Mitigation Measures, and Steps Toward Autonomous Open World Solutions", IEEE Communications Surveys & Tutorials, 19(2), 1145-1172, 2016.
  • S. Ergin, S. Işık, “The İnvestigation on the Effect of Feature Vector Dimension for Spam Email Detection with a New Framework”, In 2014 9th Iberian Conference on Information Systems and Technologies (CISTI), IEEE, 1-4, 2014.
  • M. E. Maron, "Automatic İndexing: an Experimental İnquiry", Journal of the ACM (JACM), 8(3), 404-417, 1961.
  • J. R. Anderson, M. Matessa, "Explorations of an İncremental, Bayesian Algorithm for Categorization", Machine Learning, 9(4), 275-308, 1992.
  • D. D. Lewis, W. A. Gale, "A Sequential Algorithm for Training Text Classifiers", SIGIR’94. Springer, London, 3-12, 1994.
  • J. R. Quinlan, "Generating Production Rules from Decision Trees", ijcai., 87, 304-307, 1987.
  • T. Cover, P. Hart, "Nearest Neighbor Pattern Classification", IEEE Transactions on Information Theory, 13(1), 21-27, 1967.
  • L. Melian, A. Nursikuwagus, "Prediction Student Eligibility in Vocation School with Naïve-Byes Decision Algorithm", IOP Conference Series: Materials Science and Engineering, Bandung, Indonesia, 407(1), 012140, 9 May 2018.
  • W. A. Awad, S. M. ELseuofi, “Machine Learning Methods for Spam E-Mail Classification”, International Journal of Computer Science & Information Technology (IJCSIT), 3(1), 173–184, 2011.
  • A. Sharaff, N. K. Nagwani, A. Dhadse, “Comparative Study of Classification Algorithms for Spam Email Detection”, Emerging research in computing, information, communication and applications, Springer, New Delhi, 237–244, 2016.
  • T. Lv, P. Yan, H. Yuan, W. He, "Spam Filter Based on Naive Bayesian Classifier", Journal of Physics: Conference Series, Zhejiang, China, 1575(1), 012054, 22-23 May 2020.
  • M. Raza, N. D. Jayasinghe, M. M. A. Muslam, "A Comprehensive Review on Email Spam Classification using Machine Learning Algorithms", 2021 International Conference on Information Networking (ICOIN), IEEE, Jeju Island, Korea (South), 327-332, 13-16 January 2021.
  • A. Junnarkar, S. Adhikari, J. Fagania, P. Chimurkar, D. Karia, "E-Mail Spam Classification via Machine Learning and Natural Language Processing", 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), IEEE, Tirunelveli, India, 693-699, 4-6 February 2021.
  • B. Ahmed, "Wrapper Feature Selection Approach Based on Binary Firefly Algorithm for Spam E-mail Filtering", Journal of Soft Computing and Data Mining, 1(2), 44-52, 2020.
  • R. Nayak, S. A. Jiwani, B. Rajitha, "Spam Email Detection using Machine Learning Algorithm", Materials Today: Proceedings, 2021.
  • G. Salton, C. S. Yang, C. T. Yu, “Contribution to the Theory of Indexing”, Cornell University, 1973.
  • İnternet: D. Galanis, J. Koutsikakis, Natural Language Proc. Group, nlp.cs.aueb.gr/software_and_datasets/Enron-Spam/index.html, 16.11.2021.
  • İnternet: I. Androutsopoulos, aueb.gr/users/ion/data/lingspam_public, 09.11.2021.
  • İnternet: I. Androutsopoulos, aueb.gr/users/ion/publications.html, 23.11.2021.
  • D. Gaurav, S. M. Tiwari, A. Goyal, N. Gandhi, A. Abraham, "Machine Intelligence-Based Algorithms for Spam Filtering on Document Labeling", Soft Computing, 24(13), 9625-9638, 2020.
  • S. Gibson, B. Issac, L. Zhang, S. M. Jacob, “Detecting Spam Email with Machine Learning Optimized with Bio-Inspired Meta-Heuristic Algorithms”, IEEE Access, 8, 187914- 187932, 2020.
  • N. F. Rusland, N. Wahid, S. Kasim, H. Hafit, "Analysis of Naïve Bayes Algorithm for Email Spam Filtering Across Multiple Datasets", IOP Conference Series: Materials Science and Engineering, Melaka, Malaysia, 226(1), 6–7 May 2017.
  • B. K. Dedeturk, B. Akay, "Spam Filtering Using a Logistic Regression Model Trained by an Artificial Bee Colony Algorithm", Applied Soft Computing, 91, 106229, 2020.
  • İnternet: C. Özdemir, UCI Machine L. Repository, https://archive.ics.uci.edu/ml/datasets/Turkish+Spam+V01, 16.10.2021.
  • İnternet: M. Kirk, Github, github.com/hexgnu/spam_filter/tree/master/data, 22.11.2021.
  • G. Salton, C. S. Yang, "On the Specification of Term Values in Automatic Indexing", Journal of Documentation, 29(4), 351-372, 1973.
  • F. Jánez-Martino, E. Fidalgo, S. González-Martínez, J. Velasco-Mata, “Classification of Spam Emails Through Hierarchical Clustering and Supervised Learning”, arXiv preprint arXiv:2005.08773, 2020.
  • S. Isik, Z. Kurt, Y. Anagun, K. Ozkan, “Recurrent Neural Networks for Spam E-mail Classification on an Agglutinative Language”, International Journal of Intelligent Systems and Applications in Engineering, 8(4), 221-227, 2020.
  • İnternet: G. V. Cormack, T. R. Lynam, TREC 2007 Public Corpus, https://plg.uwaterloo.ca/cgi-bin/cgiwrap/gvcormac/foo07, 22.11.2021.
  • E. Ezpeleta, I. Velez de Mendizabal, J. M. G. Hidalgo, U. Zurutuza, "Novel Email Spam Detection Method using Sentiment Analysis and Personality Recognition", Logic Journal of the IGPL, 28(1), 83-94, 2020.
  • M. Bassiouni, M. Ali, E. A. El-Dahshan, "Ham and Spam E-Mails Classification using Machine Learning Techniques", Journal of Applied Security Research, 13(3), 315-331, 2018.
  • İnternet: M. Hopkins, E. Reeber, G. Forman, J. Suermondt, UCI Machine Learning Repository, archive.ics.uci.edu/ml/datasets/Spambase, 18.10.2021.
  • A. I. Taloba, S. S. I. Ismail, “An Intelligent Hybrid Technique of Decision Tree and Genetic Algorithm for E-Mail Spam Detection”, 2019 Ninth International Conference on Intelligent Computing and Information Systems (ICICIS), Cairo, Egypt, 99–104, 8-10 December 2019.
  • A. Karim, S. Azam, B. Shanmugam, K. Kannoorpatti, M. Alazab, “A Comprehensive Survey for Intelligent Spam Email Detection”, IEEE Access, 7, 168261-168295, 2019.
  • S. Nandhiniand, J. M. KS. "Performance Evaluation of Machine Learning Algorithms for Email Spam Detection", 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE), IEEE, Vellore, India, 1-4, 24-25 February 2020.
  • M. Yağanoğlu, E. Irmak, "Separation of Incoming E-Mails Through Artificial Intelligence Techniques", Avrupa Bilim ve Teknoloji Dergisi, (21), 690-696, 2021.
  • İnternet: Tiago A. Almeida, UCI Machine Learning Repo., archive.ics.uci.edu/ml/datasets/sms+spam+collection, 18.10.2021.
  • I. Čavor, "Decision Tree Model for Email Classification", 2021 25th International Conference on Information Technology (IT), IEEE, Zabljak, Montenegro, 1-4, 16-20 February 2021.
  • T. Kumaresan, S. Sanjushree, K. Suhasini, C. Palanisamy, “Image spam filtering using support vector machine and particle swarm optimization”, National Conference on Information Processing and Remote Computing( NCIPRC), 17-21, 2015.
  • J. Batra, R. Jain, V. A. Tikkiwal, A. Chakraborty, "A Comprehensive Study of Spam Detection in E-Mails Using Bio-Inspired Optimization Techniques", International Journal of Information Management Data Insights, 1(1), 100006, 2021.
  • M. Al-Tahrawi, M. Abualhaj, S. Al-Khatib, "Polynomial Neural Networks Versus Other Spam Email Filters: An Empirical Study", TEM Journal, 9(1), 136-143, 2020.
  • S. Amjad, F. S. Gharehchopogh, "A Novel Hybrid Approach for Email Spam Detection Based on Scatter Search Algorithm and K-Nearest Neighbors", Journal of Advances in Computer Engineering and Technology, 5(3), 181-194, 2019.
  • G. Al-Rawashdeh, R. Mamat, N. H. B. Abd Rahim, “Hybrid Water Cycle Optimization Algorithm with Simulated Annealing for Spam E-Mail Detection”, IEEE Access, 7, 143721-143734, 2019.
  • İnternet: Kaggle, www.kaggle.com, 15.10.2021.
  • İnternet: Apache SpamAssassin, spamassassin.apache.org/old/publiccorpus, 04.12.2021.
  • V. Metsis, I. Androutsopoulos, G. Paliouras, “Spam Filtering with Naive Bayes-Which Naive Bayes?”, CEAS 2006 - Third Conference on Email and Anti-Spam, Mountain View, California, USA, 17, 28-69, 27-28 July 2006.
  • İnternet: I. Androutsopoulos, http://www2.aueb.gr/users/ion/data/enron-spam, 09.11.2021.
  • İnternet: K. Studer, The Grainger College of Engineering, https://courses.grainger.illinois.edu/cs440/fa2018/MPs/mp4/assignment4.html, 02.12.2021.
  • K. A. Vidhya, G. Aghila, "A Survey of Naïve Bayes Machine Learning Approach in Text Document Classification", (IJCSIS) International Journal of Computer Science and Information Security, 7(2), 206-211, 2010.
  • Z. Jorgensen, Y. Zhou, M. Inge, "A Multiple Instance Learning Strategy for Combating Good Word Attacks on Spam Filters", Journal of Machine Learning Research, 9(6), 1115-1146, 2008.
  • S. Ergin, S. Işık, “The Assessment of Feature Selection Methods on Agglutinative Language for Spam Email Detection: A Special Case for Turkish”, In 2014 IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA) Proceedings, IEEE, 122-125, June 2014.
  • L. H. Lee, C. H. Wan, T. F. Yong, H. M. Kok, "A Review of Nearest Neighbor-Support Vector Machines Hybrid Classification Models", Journal of Applied Sciences, 10(17), 1841-1858, 2010.
  • H. Satılmış, S. Akleylek, “IoT Güvenliği İçin Kullanılan Makine Öğrenimi ve Derin Öğrenme Modelleri Üzerine bir Derleme”, Bilişim Teknolojileri Dergisi, 14(4), 457-481, 2021.
  • A. Junnarkar, S. Adhikari, J. Fagania, P. Chimurkar, D. Karia, "E-Mail Spam Classification via Machine Learning and Natural Language Processing", 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), IEEE, Tirunelveli, India, 693-699, 4-6 February 2021.
  • Z. Yong, L. Youwen, X. Shixiong, “An Improved KNN Text Classification Algorithm Based on Clustering”, Journal of computers, 4(3), 230-237, 2009.

Makine Öğrenmesi Yaklaşımlarının Spam-Mail Sınıflandırma Probleminde Karşılaştırmalı Analizi

Year 2022, , 349 - 364, 31.07.2022
https://doi.org/10.17671/gazibtd.1014764

Abstract

Elektronik posta, kuruluşların, kişilerin sıklıkla kullandıkları dosya paylaşımı gibi çeşitli etkileşimlerin bulunduğu iletişim aracıdır. Bu tür araçların faydalı etkilerinin yanında istenmeyen elektronik posta paylaşımı da söz konusudur. İstenmeyen elektronik postalar ‘Spam’ adı ile etiketlenmektedir. Spam elektronik postalar; istenmeyen reklamlar, virüs etkileşimleri ve oltalama gibi zararlı içeriklere kaynak teşkil edebilmektedir. İletişimde güvenliğin oldukça önemli olduğu bilinmektedir. Bu sebeple elektronik posta sistemlerinin zararlı araçlardan veya yazılımlardan arındırılabilmesi için çeşitli kriterlere göre sınıflandırılması önem arz etmektedir. Literatürde bu tür çalışmalar farklı başlıklar altında sunulmaktadır. Sınıflandırma çalışmalarında makine öğrenmesi algoritmaları etkin bir şekilde kullanılmaktadır. Bu çalışma kapsamında naive bayes, lojistik regresyon, karar ağacı ve k-en yakın komşu algoritmalarının ilgili probleme uyarlanması ve karşılaştırmalı olarak analiz edilmesi amaçlanmıştır. Burada farklı metodolojilere sahip yaklaşımların ilgili problem üzerindeki etkisi detaylı olarak incelenmek istenmiştir. Bu kapsamda algoritmalar çeşitli veri setleri kullanılmıştır. Veri setlerinin farklı büyüklüklerde ve farklı ham/spam oranlarında olması çalışma üzerindeki etkisi tartışılmıştır. Farklı başarım sonuçları elde edilmiştir. Bu başarım sonuçlarının farklı metotlara göre karşılaştırması yapılarak tablolar halinde sunulmuştur. Veri seti sayısının ve spam oranının fazla olması Enron 5 veri setinde etkili sonuçların elde edilmesini sağlamıştır. Farklı özellik seçim yöntemlerinin kullanımıyla Karar ağacı algoritmasının Enron 4 veri seti üzerinde iyi performans göstermesini sağlamıştır. En iyi başarım performanslarının CS440/ECE448 veri seti üzerindeki testlere göre lojistik regresyon ve k-en yakın komşu algoritmalarıyla elde edildiği gözlemlenmiştir.

References

  • J. Hong, "The State of Phishing Attacks", Communications of the ACM, 55(1), 74-81, 2012.
  • E. M. Rudd, A. Rozsa, M. Günther, T. E. Boult, "A Survey of Stealth Malware Attacks, Mitigation Measures, and Steps Toward Autonomous Open World Solutions", IEEE Communications Surveys & Tutorials, 19(2), 1145-1172, 2016.
  • S. Ergin, S. Işık, “The İnvestigation on the Effect of Feature Vector Dimension for Spam Email Detection with a New Framework”, In 2014 9th Iberian Conference on Information Systems and Technologies (CISTI), IEEE, 1-4, 2014.
  • M. E. Maron, "Automatic İndexing: an Experimental İnquiry", Journal of the ACM (JACM), 8(3), 404-417, 1961.
  • J. R. Anderson, M. Matessa, "Explorations of an İncremental, Bayesian Algorithm for Categorization", Machine Learning, 9(4), 275-308, 1992.
  • D. D. Lewis, W. A. Gale, "A Sequential Algorithm for Training Text Classifiers", SIGIR’94. Springer, London, 3-12, 1994.
  • J. R. Quinlan, "Generating Production Rules from Decision Trees", ijcai., 87, 304-307, 1987.
  • T. Cover, P. Hart, "Nearest Neighbor Pattern Classification", IEEE Transactions on Information Theory, 13(1), 21-27, 1967.
  • L. Melian, A. Nursikuwagus, "Prediction Student Eligibility in Vocation School with Naïve-Byes Decision Algorithm", IOP Conference Series: Materials Science and Engineering, Bandung, Indonesia, 407(1), 012140, 9 May 2018.
  • W. A. Awad, S. M. ELseuofi, “Machine Learning Methods for Spam E-Mail Classification”, International Journal of Computer Science & Information Technology (IJCSIT), 3(1), 173–184, 2011.
  • A. Sharaff, N. K. Nagwani, A. Dhadse, “Comparative Study of Classification Algorithms for Spam Email Detection”, Emerging research in computing, information, communication and applications, Springer, New Delhi, 237–244, 2016.
  • T. Lv, P. Yan, H. Yuan, W. He, "Spam Filter Based on Naive Bayesian Classifier", Journal of Physics: Conference Series, Zhejiang, China, 1575(1), 012054, 22-23 May 2020.
  • M. Raza, N. D. Jayasinghe, M. M. A. Muslam, "A Comprehensive Review on Email Spam Classification using Machine Learning Algorithms", 2021 International Conference on Information Networking (ICOIN), IEEE, Jeju Island, Korea (South), 327-332, 13-16 January 2021.
  • A. Junnarkar, S. Adhikari, J. Fagania, P. Chimurkar, D. Karia, "E-Mail Spam Classification via Machine Learning and Natural Language Processing", 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), IEEE, Tirunelveli, India, 693-699, 4-6 February 2021.
  • B. Ahmed, "Wrapper Feature Selection Approach Based on Binary Firefly Algorithm for Spam E-mail Filtering", Journal of Soft Computing and Data Mining, 1(2), 44-52, 2020.
  • R. Nayak, S. A. Jiwani, B. Rajitha, "Spam Email Detection using Machine Learning Algorithm", Materials Today: Proceedings, 2021.
  • G. Salton, C. S. Yang, C. T. Yu, “Contribution to the Theory of Indexing”, Cornell University, 1973.
  • İnternet: D. Galanis, J. Koutsikakis, Natural Language Proc. Group, nlp.cs.aueb.gr/software_and_datasets/Enron-Spam/index.html, 16.11.2021.
  • İnternet: I. Androutsopoulos, aueb.gr/users/ion/data/lingspam_public, 09.11.2021.
  • İnternet: I. Androutsopoulos, aueb.gr/users/ion/publications.html, 23.11.2021.
  • D. Gaurav, S. M. Tiwari, A. Goyal, N. Gandhi, A. Abraham, "Machine Intelligence-Based Algorithms for Spam Filtering on Document Labeling", Soft Computing, 24(13), 9625-9638, 2020.
  • S. Gibson, B. Issac, L. Zhang, S. M. Jacob, “Detecting Spam Email with Machine Learning Optimized with Bio-Inspired Meta-Heuristic Algorithms”, IEEE Access, 8, 187914- 187932, 2020.
  • N. F. Rusland, N. Wahid, S. Kasim, H. Hafit, "Analysis of Naïve Bayes Algorithm for Email Spam Filtering Across Multiple Datasets", IOP Conference Series: Materials Science and Engineering, Melaka, Malaysia, 226(1), 6–7 May 2017.
  • B. K. Dedeturk, B. Akay, "Spam Filtering Using a Logistic Regression Model Trained by an Artificial Bee Colony Algorithm", Applied Soft Computing, 91, 106229, 2020.
  • İnternet: C. Özdemir, UCI Machine L. Repository, https://archive.ics.uci.edu/ml/datasets/Turkish+Spam+V01, 16.10.2021.
  • İnternet: M. Kirk, Github, github.com/hexgnu/spam_filter/tree/master/data, 22.11.2021.
  • G. Salton, C. S. Yang, "On the Specification of Term Values in Automatic Indexing", Journal of Documentation, 29(4), 351-372, 1973.
  • F. Jánez-Martino, E. Fidalgo, S. González-Martínez, J. Velasco-Mata, “Classification of Spam Emails Through Hierarchical Clustering and Supervised Learning”, arXiv preprint arXiv:2005.08773, 2020.
  • S. Isik, Z. Kurt, Y. Anagun, K. Ozkan, “Recurrent Neural Networks for Spam E-mail Classification on an Agglutinative Language”, International Journal of Intelligent Systems and Applications in Engineering, 8(4), 221-227, 2020.
  • İnternet: G. V. Cormack, T. R. Lynam, TREC 2007 Public Corpus, https://plg.uwaterloo.ca/cgi-bin/cgiwrap/gvcormac/foo07, 22.11.2021.
  • E. Ezpeleta, I. Velez de Mendizabal, J. M. G. Hidalgo, U. Zurutuza, "Novel Email Spam Detection Method using Sentiment Analysis and Personality Recognition", Logic Journal of the IGPL, 28(1), 83-94, 2020.
  • M. Bassiouni, M. Ali, E. A. El-Dahshan, "Ham and Spam E-Mails Classification using Machine Learning Techniques", Journal of Applied Security Research, 13(3), 315-331, 2018.
  • İnternet: M. Hopkins, E. Reeber, G. Forman, J. Suermondt, UCI Machine Learning Repository, archive.ics.uci.edu/ml/datasets/Spambase, 18.10.2021.
  • A. I. Taloba, S. S. I. Ismail, “An Intelligent Hybrid Technique of Decision Tree and Genetic Algorithm for E-Mail Spam Detection”, 2019 Ninth International Conference on Intelligent Computing and Information Systems (ICICIS), Cairo, Egypt, 99–104, 8-10 December 2019.
  • A. Karim, S. Azam, B. Shanmugam, K. Kannoorpatti, M. Alazab, “A Comprehensive Survey for Intelligent Spam Email Detection”, IEEE Access, 7, 168261-168295, 2019.
  • S. Nandhiniand, J. M. KS. "Performance Evaluation of Machine Learning Algorithms for Email Spam Detection", 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE), IEEE, Vellore, India, 1-4, 24-25 February 2020.
  • M. Yağanoğlu, E. Irmak, "Separation of Incoming E-Mails Through Artificial Intelligence Techniques", Avrupa Bilim ve Teknoloji Dergisi, (21), 690-696, 2021.
  • İnternet: Tiago A. Almeida, UCI Machine Learning Repo., archive.ics.uci.edu/ml/datasets/sms+spam+collection, 18.10.2021.
  • I. Čavor, "Decision Tree Model for Email Classification", 2021 25th International Conference on Information Technology (IT), IEEE, Zabljak, Montenegro, 1-4, 16-20 February 2021.
  • T. Kumaresan, S. Sanjushree, K. Suhasini, C. Palanisamy, “Image spam filtering using support vector machine and particle swarm optimization”, National Conference on Information Processing and Remote Computing( NCIPRC), 17-21, 2015.
  • J. Batra, R. Jain, V. A. Tikkiwal, A. Chakraborty, "A Comprehensive Study of Spam Detection in E-Mails Using Bio-Inspired Optimization Techniques", International Journal of Information Management Data Insights, 1(1), 100006, 2021.
  • M. Al-Tahrawi, M. Abualhaj, S. Al-Khatib, "Polynomial Neural Networks Versus Other Spam Email Filters: An Empirical Study", TEM Journal, 9(1), 136-143, 2020.
  • S. Amjad, F. S. Gharehchopogh, "A Novel Hybrid Approach for Email Spam Detection Based on Scatter Search Algorithm and K-Nearest Neighbors", Journal of Advances in Computer Engineering and Technology, 5(3), 181-194, 2019.
  • G. Al-Rawashdeh, R. Mamat, N. H. B. Abd Rahim, “Hybrid Water Cycle Optimization Algorithm with Simulated Annealing for Spam E-Mail Detection”, IEEE Access, 7, 143721-143734, 2019.
  • İnternet: Kaggle, www.kaggle.com, 15.10.2021.
  • İnternet: Apache SpamAssassin, spamassassin.apache.org/old/publiccorpus, 04.12.2021.
  • V. Metsis, I. Androutsopoulos, G. Paliouras, “Spam Filtering with Naive Bayes-Which Naive Bayes?”, CEAS 2006 - Third Conference on Email and Anti-Spam, Mountain View, California, USA, 17, 28-69, 27-28 July 2006.
  • İnternet: I. Androutsopoulos, http://www2.aueb.gr/users/ion/data/enron-spam, 09.11.2021.
  • İnternet: K. Studer, The Grainger College of Engineering, https://courses.grainger.illinois.edu/cs440/fa2018/MPs/mp4/assignment4.html, 02.12.2021.
  • K. A. Vidhya, G. Aghila, "A Survey of Naïve Bayes Machine Learning Approach in Text Document Classification", (IJCSIS) International Journal of Computer Science and Information Security, 7(2), 206-211, 2010.
  • Z. Jorgensen, Y. Zhou, M. Inge, "A Multiple Instance Learning Strategy for Combating Good Word Attacks on Spam Filters", Journal of Machine Learning Research, 9(6), 1115-1146, 2008.
  • S. Ergin, S. Işık, “The Assessment of Feature Selection Methods on Agglutinative Language for Spam Email Detection: A Special Case for Turkish”, In 2014 IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA) Proceedings, IEEE, 122-125, June 2014.
  • L. H. Lee, C. H. Wan, T. F. Yong, H. M. Kok, "A Review of Nearest Neighbor-Support Vector Machines Hybrid Classification Models", Journal of Applied Sciences, 10(17), 1841-1858, 2010.
  • H. Satılmış, S. Akleylek, “IoT Güvenliği İçin Kullanılan Makine Öğrenimi ve Derin Öğrenme Modelleri Üzerine bir Derleme”, Bilişim Teknolojileri Dergisi, 14(4), 457-481, 2021.
  • A. Junnarkar, S. Adhikari, J. Fagania, P. Chimurkar, D. Karia, "E-Mail Spam Classification via Machine Learning and Natural Language Processing", 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), IEEE, Tirunelveli, India, 693-699, 4-6 February 2021.
  • Z. Yong, L. Youwen, X. Shixiong, “An Improved KNN Text Classification Algorithm Based on Clustering”, Journal of computers, 4(3), 230-237, 2009.
There are 56 citations in total.

Details

Primary Language Turkish
Subjects Computer Software
Journal Section Articles
Authors

Nuriye Baktır 0000-0002-3229-9700

Yılmaz Atay 0000-0002-3298-3334

Publication Date July 31, 2022
Submission Date March 16, 2022
Published in Issue Year 2022

Cite

APA Baktır, N., & Atay, Y. (2022). Makine Öğrenmesi Yaklaşımlarının Spam-Mail Sınıflandırma Probleminde Karşılaştırmalı Analizi. Bilişim Teknolojileri Dergisi, 15(3), 349-364. https://doi.org/10.17671/gazibtd.1014764