Classification of Unwanted E-Mails (Spam) with Turkish Text by Different Algorithms in Weka Program

Hüseyin Şimşek; Emrah Aydemir

doi:10.55195/jscai.1104694

Research Article

Classification of Unwanted E-Mails (Spam) with Turkish Text by Different Algorithms in Weka Program

Year 2022, Volume: 3 Issue: 1, 1 - 10, 28.06.2022

Hüseyin Şimşek , Emrah Aydemir

https://doi.org/10.55195/jscai.1104694

Cited By: 4

Abstract

Abstract: Today, with the widespread use of the Internet, electronic communication tools have also been widely used. One of these tools is e-mails. E-mails are easy to use and provide the opportunity to reach thousands of people at the same time. This advantage causes some bad uses. E-mail users are faced with dozens of unsolicited mails (spam) against their will. In this study, 1017 mails collected from about 20 different Gmail and Hotmail accounts were classified as spam or regular e-mail using the algorithms in the Weka program, and the success of the algorithms was compared. In the study, 45 different algorithms were tested. The highest classification success was obtained with the NavieBayesMultinominal and NavieBayesMultinominalUpdateable algorithms with 94.7886% correct classification. Among other classifier algorithms, Trees RandomForest algorithm 93.6087%, Meta. MultiClassClassifier and Functions SGD 92.4287%, Functions SMO 91.7404%, Meta RandomCommittee 91.0521%, Bayes NavieBayes and Bayes NavieBayesUpdateable 90.3638% classification success.

Keywords

E-mail, Spam E-mail, Weka, Classification Algorithm

References

C. ÖZDEMİR, M. ATAŞ, ve A. B. ÖZER, “TÜRKÇE İSTENMEYEN ELEKTRONİK POSTALARIN YAPAY BAĞIŞIKLIK SİSTEMİ İLE SINIFLANDIRILMASI CLASSIFICATION OF TURKISH SPAM E-MAILS WITH ARTIFICIAL IMMUNE SYSTEM”.
C. Özdemir, “Yapay bağışıklık sistemi ile spam filtreleme”, Master’s Thesis, Fen Bilimleri Enstitüsü, 2013.
M. E. Yüksel ve Ş. D. Odabaşı, “SMTP Protokolü ve Spam Mail Problemi”, Akad. Bilişim, 2010.
E. E. ERYILMAZ ve E. KILIÇ, “İstenmeyen Epostaların Tespiti için Kullanılan Yöntemlerin İncelenmesi”, Dicle Üniversitesi Mühendis. Fakültesi Mühendis. Derg., c. 11, sy 3, ss. 977-987, 2020.
C. Altunyaprak, “Bayes yöntemi kullanarak istenmeyen elektronik postaların filtrelenmesi”, PhD Thesis, Yüksek Lisans Tezi, Muğla Üniversitesi Fen Bilimleri Enstitüsü, 2006.
Y. GEDİK, “E-Posta Pazarlama: Teorik Bir Bakış”, Uluslar. Önetim Akad. Derg., c. 3, sy 2, ss. 476-490, 2020.
K. Tekeli ve R. Aşlıyan, “Çok Katmanlı Algılayıcı, K-NN ve C4. 5 Metotlarıyla İstenmeyen E-postaların Tespiti”, Adnan Menderes Üniversitesi, 2016.
Ü. Cahide ve İ. ŞAHİN, “İstenmeyen Elektronik Postaların (SPAM) Filtrelenmesi için Bir Uzman Sistem Tasarımı ve Gerçekleştirilmesi”, Politek. Derg., c. 20, sy 2, ss. 267-274, 2017.
N. Nazlı, “Analysis of machine learning-based spam filtering techniques”, Master’s Thesis, 2018.
Y. KAYA ve C. ÖZDEMİR, “Spam filtrelemek için kaydırmalı ikili örüntüler tabanlı yeni bir yaklaşım”.
O. ÇITLAK, İ. A. DOĞRU, ve M. DÖRTERLER, “Kısa Link Analizi ile Bir Spam Tespit Sistemi A spam detection system with short Link Analysis”.
K. Zainal, N. F. Sulaiman, ve M. Z. Jali, “An analysis of various algorithms for text spam classification and clustering using RapidMiner and Weka”, Int. J. Comput. Sci. Inf. Secur., c. 13, sy 3, s. 66, 2015.
N. F. Rusland, N. Wahid, S. Kasim, ve H. Hafit, “Analysis of Naïve Bayes algorithm for email spam filtering across multiple datasets”, içinde IOP conference series: materials science and engineering, 2017, c. 226, sy 1, s. 012091.
A. K. Sharma ve S. Sahni, “A comparative study of classification algorithms for spam email data analysis”, Int. J. Comput. Sci. Eng., c. 3, sy 5, ss. 1890-1895, 2011.
G. H. AL-Rawashdeh ve R. B. Mamat, “Comparison of four email classification algorithms using WEKA”, Int. J. Comput. Sci. Inf. Secur. IJCSIS, c. 17, sy 2, ss. 42-54, 2019.
H. C. Gündüz, “Spam 2.0, Tespit ve Engelleme Yöntemleri”, s. 6, 2007.
Emrah Aydemir, Weka İle Yapay Zeka, 2.Baskı. Ankara: Seçkin, 2019.
E. Koç, S. Çalışkan, S. A. Yazıcıoğlu, U. Demirci, ve Z. Kuş, “Yapay Sinir Ağları, Kelime Vektörleri ve Derin Öğrenme Uygulamaları”, 2018.
G. AKSOY, “Ağırlıklı Bayes sınıflandırıcıda ağırlıkların optimizasyonu/Optimization of the weights of weighted naïve Bayesian classifier”, 2018.
A. Suat, “KNN, NAİVE BAYES VE KARAR AĞACI MAKİNE ÖĞRENME ALGORİTMALARI, BU ALGORİTMALARIN SOSYAL BİLİMLERDE KULLANIM İMKÂNLARI”, 2020.
Ş. Demirel ve S. G. YAKUT, “Karar Ağacı Algoritmaları ve Çocuk İşçiliği Üzerine Bir Uygulama”, Sos. Bilim. Araşt. Derg., c. 8, sy 4, ss. 52-65, 2019.
B. S. KUZU ve S. G. YAKUT, “DESTEK VEKTÖR MAKİNELERİ YARDIMIYLA İMALAT SANAYİSİNDE MALİ BAŞARISIZLIK TAHMİNLERİNİN TEKNOLOJİ YOĞUNLUĞUNA GÖRE İNCELENMESİ”, Osman. Korkut Ata Üniversitesi İktisadi Ve İdari Bilim. Fakültesi Derg., c. 4, sy 2, ss. 36-54, 2020.
D. Akmaz ve M. S. Mamiş, “Bayes, Lazy, Trees, Rules Sınıfı Makine Öğrenme Algoritmaları”.
Seckin, “MetaSezgisel Algoritmalar”, Bir Yazılımcının Günlüğü, 16 Mayıs 2017. https://biryazilimciningunlugu.wordpress.com/2017/05/16/metasezgisel-algoritmalar/ (erişim 16 Nisan 2022).
“Rules extraction system family”, Wikipedia. 27 Kasım 2019. Erişim: 16 Nisan 2022. [Çevrimiçi]. Erişim adresi: https://en.wikipedia.org/w/index.php?title=Rules_extraction_system_family&oldid=928196017

Year 2022, Volume: 3 Issue: 1, 1 - 10, 28.06.2022

Hüseyin Şimşek , Emrah Aydemir

https://doi.org/10.55195/jscai.1104694

Cited By: 4

Abstract

References

C. ÖZDEMİR, M. ATAŞ, ve A. B. ÖZER, “TÜRKÇE İSTENMEYEN ELEKTRONİK POSTALARIN YAPAY BAĞIŞIKLIK SİSTEMİ İLE SINIFLANDIRILMASI CLASSIFICATION OF TURKISH SPAM E-MAILS WITH ARTIFICIAL IMMUNE SYSTEM”.
C. Özdemir, “Yapay bağışıklık sistemi ile spam filtreleme”, Master’s Thesis, Fen Bilimleri Enstitüsü, 2013.
M. E. Yüksel ve Ş. D. Odabaşı, “SMTP Protokolü ve Spam Mail Problemi”, Akad. Bilişim, 2010.
E. E. ERYILMAZ ve E. KILIÇ, “İstenmeyen Epostaların Tespiti için Kullanılan Yöntemlerin İncelenmesi”, Dicle Üniversitesi Mühendis. Fakültesi Mühendis. Derg., c. 11, sy 3, ss. 977-987, 2020.
C. Altunyaprak, “Bayes yöntemi kullanarak istenmeyen elektronik postaların filtrelenmesi”, PhD Thesis, Yüksek Lisans Tezi, Muğla Üniversitesi Fen Bilimleri Enstitüsü, 2006.
Y. GEDİK, “E-Posta Pazarlama: Teorik Bir Bakış”, Uluslar. Önetim Akad. Derg., c. 3, sy 2, ss. 476-490, 2020.
K. Tekeli ve R. Aşlıyan, “Çok Katmanlı Algılayıcı, K-NN ve C4. 5 Metotlarıyla İstenmeyen E-postaların Tespiti”, Adnan Menderes Üniversitesi, 2016.
Ü. Cahide ve İ. ŞAHİN, “İstenmeyen Elektronik Postaların (SPAM) Filtrelenmesi için Bir Uzman Sistem Tasarımı ve Gerçekleştirilmesi”, Politek. Derg., c. 20, sy 2, ss. 267-274, 2017.
N. Nazlı, “Analysis of machine learning-based spam filtering techniques”, Master’s Thesis, 2018.
Y. KAYA ve C. ÖZDEMİR, “Spam filtrelemek için kaydırmalı ikili örüntüler tabanlı yeni bir yaklaşım”.
O. ÇITLAK, İ. A. DOĞRU, ve M. DÖRTERLER, “Kısa Link Analizi ile Bir Spam Tespit Sistemi A spam detection system with short Link Analysis”.
K. Zainal, N. F. Sulaiman, ve M. Z. Jali, “An analysis of various algorithms for text spam classification and clustering using RapidMiner and Weka”, Int. J. Comput. Sci. Inf. Secur., c. 13, sy 3, s. 66, 2015.
N. F. Rusland, N. Wahid, S. Kasim, ve H. Hafit, “Analysis of Naïve Bayes algorithm for email spam filtering across multiple datasets”, içinde IOP conference series: materials science and engineering, 2017, c. 226, sy 1, s. 012091.
A. K. Sharma ve S. Sahni, “A comparative study of classification algorithms for spam email data analysis”, Int. J. Comput. Sci. Eng., c. 3, sy 5, ss. 1890-1895, 2011.
G. H. AL-Rawashdeh ve R. B. Mamat, “Comparison of four email classification algorithms using WEKA”, Int. J. Comput. Sci. Inf. Secur. IJCSIS, c. 17, sy 2, ss. 42-54, 2019.
H. C. Gündüz, “Spam 2.0, Tespit ve Engelleme Yöntemleri”, s. 6, 2007.
Emrah Aydemir, Weka İle Yapay Zeka, 2.Baskı. Ankara: Seçkin, 2019.
E. Koç, S. Çalışkan, S. A. Yazıcıoğlu, U. Demirci, ve Z. Kuş, “Yapay Sinir Ağları, Kelime Vektörleri ve Derin Öğrenme Uygulamaları”, 2018.
G. AKSOY, “Ağırlıklı Bayes sınıflandırıcıda ağırlıkların optimizasyonu/Optimization of the weights of weighted naïve Bayesian classifier”, 2018.
A. Suat, “KNN, NAİVE BAYES VE KARAR AĞACI MAKİNE ÖĞRENME ALGORİTMALARI, BU ALGORİTMALARIN SOSYAL BİLİMLERDE KULLANIM İMKÂNLARI”, 2020.
Ş. Demirel ve S. G. YAKUT, “Karar Ağacı Algoritmaları ve Çocuk İşçiliği Üzerine Bir Uygulama”, Sos. Bilim. Araşt. Derg., c. 8, sy 4, ss. 52-65, 2019.
B. S. KUZU ve S. G. YAKUT, “DESTEK VEKTÖR MAKİNELERİ YARDIMIYLA İMALAT SANAYİSİNDE MALİ BAŞARISIZLIK TAHMİNLERİNİN TEKNOLOJİ YOĞUNLUĞUNA GÖRE İNCELENMESİ”, Osman. Korkut Ata Üniversitesi İktisadi Ve İdari Bilim. Fakültesi Derg., c. 4, sy 2, ss. 36-54, 2020.
D. Akmaz ve M. S. Mamiş, “Bayes, Lazy, Trees, Rules Sınıfı Makine Öğrenme Algoritmaları”.
Seckin, “MetaSezgisel Algoritmalar”, Bir Yazılımcının Günlüğü, 16 Mayıs 2017. https://biryazilimciningunlugu.wordpress.com/2017/05/16/metasezgisel-algoritmalar/ (erişim 16 Nisan 2022).
“Rules extraction system family”, Wikipedia. 27 Kasım 2019. Erişim: 16 Nisan 2022. [Çevrimiçi]. Erişim adresi: https://en.wikipedia.org/w/index.php?title=Rules_extraction_system_family&oldid=928196017

There are 25 citations in total.

Details

Primary Language	English
Subjects	Computer Software
Journal Section	Research Articles
Authors	Hüseyin Şimşek 0000-0001-7308-8021 Emrah Aydemir 0000-0002-8380-7891
Publication Date	June 28, 2022
Submission Date	April 17, 2022
Published in Issue	Year 2022 Volume: 3 Issue: 1

Cite

APA	Şimşek, H., & Aydemir, E. (2022). Classification of Unwanted E-Mails (Spam) with Turkish Text by Different Algorithms in Weka Program. Journal of Soft Computing and Artificial Intelligence, 3(1), 1-10. https://doi.org/10.55195/jscai.1104694
AMA	Şimşek H, Aydemir E. Classification of Unwanted E-Mails (Spam) with Turkish Text by Different Algorithms in Weka Program. JSCAI. June 2022;3(1):1-10. doi:10.55195/jscai.1104694
Chicago	Şimşek, Hüseyin, and Emrah Aydemir. “Classification of Unwanted E-Mails (Spam) With Turkish Text by Different Algorithms in Weka Program”. Journal of Soft Computing and Artificial Intelligence 3, no. 1 (June 2022): 1-10. https://doi.org/10.55195/jscai.1104694.
EndNote	Şimşek H, Aydemir E (June 1, 2022) Classification of Unwanted E-Mails (Spam) with Turkish Text by Different Algorithms in Weka Program. Journal of Soft Computing and Artificial Intelligence 3 1 1–10.
IEEE	H. Şimşek and E. Aydemir, “Classification of Unwanted E-Mails (Spam) with Turkish Text by Different Algorithms in Weka Program”, JSCAI, vol. 3, no. 1, pp. 1–10, 2022, doi: 10.55195/jscai.1104694.
ISNAD	Şimşek, Hüseyin - Aydemir, Emrah. “Classification of Unwanted E-Mails (Spam) With Turkish Text by Different Algorithms in Weka Program”. Journal of Soft Computing and Artificial Intelligence 3/1 (June 2022), 1-10. https://doi.org/10.55195/jscai.1104694.
JAMA	Şimşek H, Aydemir E. Classification of Unwanted E-Mails (Spam) with Turkish Text by Different Algorithms in Weka Program. JSCAI. 2022;3:1–10.
MLA	Şimşek, Hüseyin and Emrah Aydemir. “Classification of Unwanted E-Mails (Spam) With Turkish Text by Different Algorithms in Weka Program”. Journal of Soft Computing and Artificial Intelligence, vol. 3, no. 1, 2022, pp. 1-10, doi:10.55195/jscai.1104694.
Vancouver	Şimşek H, Aydemir E. Classification of Unwanted E-Mails (Spam) with Turkish Text by Different Algorithms in Weka Program. JSCAI. 2022;3(1):1-10.

Cited By

Hyperparameter Optimization of Ensemble Models for Spam Email Detection

Applied Sciences

https://doi.org/10.3390/app13031971

Email Security Issues, Tools, and Techniques Used in Investigation

Sustainability

https://doi.org/10.3390/su151310612

Classification of News Texts from Different Languages with Machine Learning Algorithms

Journal of Soft Computing and Artificial Intelligence

https://doi.org/10.55195/jscai.1311380

Türkçe E-postalarda Spam Tespiti için Makine Öğrenme Yöntemlerinin ve Dil Modellerinin Analizi

European Journal of Science and Technology

https://doi.org/10.31590/ejosat.1234079

Article Files

Full Text

This work is licensed under a Creative Commons Attribution 4.0 International License.