Classification of Temporary and Real E-mail Addresses with Machine Learning Techniques

Caner Balım; Nevzat Olgun

doi:10.46810/tdfd.1519463

Araştırma Makalesi

Classification of Temporary and Real E-mail Addresses with Machine Learning Techniques

Yıl 2024, Cilt: 13 Sayı: 3, 176 - 183, 26.09.2024

Caner Balım , Nevzat Olgun

https://doi.org/10.46810/tdfd.1519463

Öz

Temporary e-mail addresses are e-mail addresses that users can quickly create without signing up. These e-mail addresses are useful for privacy and to avoid spam. However, they also pose several serious cyber threats, including fraud, spam campaigns, and fake account creation In this study, a method utilizing natural language processing and machine learning techniques is proposed to classify real and temporary e-mail addresses. First, temporary and real e-mail addresses are analyzed, and features are developed to identify the differences between them. These features include lexical structures, broad contexts, and structural features of e-mail addresses. Various machine learning algorithms were then applied on the resulting feature set to differentiate e-mail addresses. The results were evaluated with K-fold cross-validation method and an accuracy rate of 96% was obtained. This success rate shows that the developed method can successfully distinguish between real and temporary e-mail addresses.

Anahtar Kelimeler

E-mail classification, Natural language processing, Artificial neural network, Machine learning

Kaynakça

M. Diale, T. Celik, and C. Van Der Walt, “Unsupervised feature learning for spam email filtering,” Computers & Electrical Engineering, vol. 74, pp. 89–104, Mar. 2019, doi: 10.1016/j.compeleceng.2019.01.004.
N. Saidani, K. Adi, and M. S. Allili, “A semantic-based classification approach for an enhanced spam detection,” Computers & Security, vol. 94, p. 101716, Jul. 2020, doi: 10.1016/j.cose.2020.101716.
S. Salloum, T. Gaber, S. Vadera, and K. Shaalan, “Phishing Email Detection Using Natural Language Processing Techniques: A Literature Survey,” Procedia Computer Science, vol. 189, pp. 19–28, Jan. 2021, doi: 10.1016/j.procs.2021.05.077.
G. Sanghani and K. Kotecha, “Incremental personalized E-mail spam filter using novel TFDCR feature selection with dynamic feature update,” Expert Systems with Applications, vol. 115, pp. 287–299, Jan. 2019, doi: 10.1016/j.eswa.2018.07.049.
The Enron Email Dataset, Kaggle, Mar. 2024. [Online]. Available: https://www.kaggle.com/datasets/wcukierski/enron-email-dataset
The Spam Assassin Email Dataset, Kaggle, Mar. 2024. [Online]. Available: https://www.kaggle.com/datasets/ganiyuolalekan/spam-assassin-email-classification-dataset
A. Vaswani et al., “Attention is all you need,” in Advances in Neural Information Processing Systems, Neural information processing systems foundation, 2017, pp. 5999–6009. [Online]. Available: https://arxiv.org/abs/1706.03762v5
K. Debnath and N. Kar, “Email Spam Detection using Deep Learning Approach,” in 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COM-IT-CON), May 2022, pp. 37–41. doi: 10.1109/COM-IT-CON54601.2022.9850588.
P. Krishnamoorthy, M. Sathiyanarayanan, and H. P. Proença, “A novel and secured email classification and emotion detection using hybrid deep neural network,” International Journal of Cognitive Computing in Engineering, vol. 5, pp. 44–57, Jan. 2024, doi: 10.1016/j.ijcce.2024.01.002.
I. AbdulNabi and Q. Yaseen, “Spam Email Detection Using Deep Learning Techniques,” Procedia Computer Science, vol. 184, pp. 853–858, Jan. 2021, doi: 10.1016/j.procs.2021.03.107.
B. K. Dedeturk and B. Akay, “Spam filtering using a logistic regression model trained by an artificial bee colony algorithm,” Applied Soft Computing, vol. 91, p. 106229, Jun. 2020, doi: 10.1016/j.asoc.2020.106229.
S. Gibson, B. Issac, L. Zhang, and S. M. Jacob, “Detecting Spam Email With Machine Learning Optimized With Bio-Inspired Metaheuristic Algorithms,” IEEE Access, vol. 8, pp. 187914–187932, 2020, doi: 10.1109/ACCESS.2020.3030751.
J. Rastenis, S. Ramanauskaitė, J. Janulevičius, A. Čenys, A. Slotkienė, and K. Pakrijauskas, “E-mail-Based Phishing Attack Taxonomy,” Applied Sciences, vol. 10, no. 7, Art. no. 7, Jan. 2020, doi: 10.3390/app10072363.
P. Mehdi Gholampour and R. M. Verma, “Adversarial Robustness of Phishing Email Detection Models,” in Proceedings of the 9th ACM International Workshop on Security and Privacy Analytics, in IWSPA ’23. New York, NY, USA: Association for Computing Machinery, Nisan 2023, pp. 67–76. doi: 10.1145/3579987.3586567.
A. Kumar, J. M. Chatterjee, and V. G. Díaz, “A novel hybrid approach of SVM combined with NLP and probabilistic neural network for email phishing,” International Journal of Electrical and Computer Engineering (IJECE), vol. 10, no. 1, Art. no. 1, Feb. 2020, doi: 10.11591/ijece.v10i1.pp486-493.
Y. Fang, C. Zhang, C. Huang, L. Liu, and Y. Yang, “Phishing Email Detection Using Improved RCNN Model With Multilevel Vectors and Attention Mechanism,” IEEE Access, vol. 7, pp. 56329–56340, 2019, doi: 10.1109/ACCESS.2019.2913705.
S. Salloum, T. Gaber, S. Vadera, and K. Shaalan, “A Systematic Literature Review on Phishing Email Detection Using Natural Language Processing Techniques,” IEEE Access, vol. 10, pp. 65703–65727, 2022, doi: 10.1109/ACCESS.2022.3183083.
A. Barushka and P. Hajek, “Spam filtering using integrated distribution-based balancing approach and regularized deep neural networks,” Appl Intell, vol. 48, no. 10, pp. 3538–3556, Oct. 2018, doi: 10.1007/s10489-018-1161-y.
U. Srinivasarao and A. Sharaff, “Spam email classification and sentiment analysis based on semantic similarity methods,” Int. J. Comput. Sci. Eng., vol. 26, no. 1, pp. 65–77, Ocak 2023, doi: 10.1504/ijcse.2023.129147.
“Apache OpenOffice Extensions.”, Open Office. https://extensions.openoffice.org/ (accessed Mar. 01, 2024)
V. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals,” Soviet physics. Doklady, 1965.
B. Berger, M. S. Waterman, and Y. W. Yu, “Levenshtein Distance, Sequence Comparison and Biological Database Search,” IEEE Trans Inf Theory, vol. 67, no. 6, pp. 3287–3294, Jun. 2021, doi: 10.1109/tit.2020.2996543.
S. J. Russell and P. Norvig, Artificial intelligence a modern approach. London, 2010.
C. Cortes and V. Vapnik, “Support-vector networks,” Mach Learn, vol. 20, no. 3, pp. 273–297, Sep. 1995, doi: 10.1007/BF00994018.
A. Liaw, M. Wiener, and others, “Classification and regression by randomForest,” R news, vol. 2, no. 3, pp. 18–22, 2002.

Geçici ve Gerçek E-posta Adreslerinin Makine Öğrenme Teknikleriyle Sınıflandırılması

Yıl 2024, Cilt: 13 Sayı: 3, 176 - 183, 26.09.2024

Caner Balım , Nevzat Olgun

https://doi.org/10.46810/tdfd.1519463

Öz

Geçici e-posta adresleri, kullanıcıların üye olmadan hızlı bir şekilde oluşturabildikleri e-posta adresleridir. Bu e-posta adresleri gizlilik ve istenmeyen e-postalardan kaçınmak için yararlıdır. Fakat bu e-postalar adreslerinin dolandırıcılığa, spam kampanyalarında kullanılma ve sahte hesap oluşturmaya kadar bir dizi ciddi siber tehdidi de bulunmaktadır. Bu çalışmada, gerçek ve geçici e-posta adreslerini sınıflandırmak için doğal dil işleme ve makine öğrenme tekniklerinden yararlanan bir yöntem önerilmiştir. Öncelikle, geçici ve gerçek e-posta adresleri analiz edilmiş ve arasındaki farkları belirlemeye yönelik öznitelikler geliştirilmiştir. Bu öznitelikler, e-posta adreslerinin leksik yapılarını, geniş bağlamlarını ve yapısal özelliklerini içermektedir. Sonrasında elde edilen öznitelik seti üzerinde, çeşitli makine öğrenme algoritmaları uygulanmış ve e-posta adresleri ayırt edilmeye çalışılmıştır. Elde edilen sonuçlar, K-katlı çapraz doğrulama yöntemiyle değerlendirilmiş ve %96 doğruluk oranı elde edilmiştir. Bu başarı oranı, geliştirilen yöntemin gerçek ve geçici e-posta adreslerini başarılı bir şekilde ayırt edebileceğini göstermektedir.

Anahtar Kelimeler

E-mail sınıflandırma, Doğal dil işleme, Yapay sinir ağı, Makine öğrenme

Kaynakça

M. Diale, T. Celik, and C. Van Der Walt, “Unsupervised feature learning for spam email filtering,” Computers & Electrical Engineering, vol. 74, pp. 89–104, Mar. 2019, doi: 10.1016/j.compeleceng.2019.01.004.
N. Saidani, K. Adi, and M. S. Allili, “A semantic-based classification approach for an enhanced spam detection,” Computers & Security, vol. 94, p. 101716, Jul. 2020, doi: 10.1016/j.cose.2020.101716.
S. Salloum, T. Gaber, S. Vadera, and K. Shaalan, “Phishing Email Detection Using Natural Language Processing Techniques: A Literature Survey,” Procedia Computer Science, vol. 189, pp. 19–28, Jan. 2021, doi: 10.1016/j.procs.2021.05.077.
G. Sanghani and K. Kotecha, “Incremental personalized E-mail spam filter using novel TFDCR feature selection with dynamic feature update,” Expert Systems with Applications, vol. 115, pp. 287–299, Jan. 2019, doi: 10.1016/j.eswa.2018.07.049.
The Enron Email Dataset, Kaggle, Mar. 2024. [Online]. Available: https://www.kaggle.com/datasets/wcukierski/enron-email-dataset
The Spam Assassin Email Dataset, Kaggle, Mar. 2024. [Online]. Available: https://www.kaggle.com/datasets/ganiyuolalekan/spam-assassin-email-classification-dataset
A. Vaswani et al., “Attention is all you need,” in Advances in Neural Information Processing Systems, Neural information processing systems foundation, 2017, pp. 5999–6009. [Online]. Available: https://arxiv.org/abs/1706.03762v5
K. Debnath and N. Kar, “Email Spam Detection using Deep Learning Approach,” in 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COM-IT-CON), May 2022, pp. 37–41. doi: 10.1109/COM-IT-CON54601.2022.9850588.
P. Krishnamoorthy, M. Sathiyanarayanan, and H. P. Proença, “A novel and secured email classification and emotion detection using hybrid deep neural network,” International Journal of Cognitive Computing in Engineering, vol. 5, pp. 44–57, Jan. 2024, doi: 10.1016/j.ijcce.2024.01.002.
I. AbdulNabi and Q. Yaseen, “Spam Email Detection Using Deep Learning Techniques,” Procedia Computer Science, vol. 184, pp. 853–858, Jan. 2021, doi: 10.1016/j.procs.2021.03.107.
B. K. Dedeturk and B. Akay, “Spam filtering using a logistic regression model trained by an artificial bee colony algorithm,” Applied Soft Computing, vol. 91, p. 106229, Jun. 2020, doi: 10.1016/j.asoc.2020.106229.
S. Gibson, B. Issac, L. Zhang, and S. M. Jacob, “Detecting Spam Email With Machine Learning Optimized With Bio-Inspired Metaheuristic Algorithms,” IEEE Access, vol. 8, pp. 187914–187932, 2020, doi: 10.1109/ACCESS.2020.3030751.
J. Rastenis, S. Ramanauskaitė, J. Janulevičius, A. Čenys, A. Slotkienė, and K. Pakrijauskas, “E-mail-Based Phishing Attack Taxonomy,” Applied Sciences, vol. 10, no. 7, Art. no. 7, Jan. 2020, doi: 10.3390/app10072363.
P. Mehdi Gholampour and R. M. Verma, “Adversarial Robustness of Phishing Email Detection Models,” in Proceedings of the 9th ACM International Workshop on Security and Privacy Analytics, in IWSPA ’23. New York, NY, USA: Association for Computing Machinery, Nisan 2023, pp. 67–76. doi: 10.1145/3579987.3586567.
A. Kumar, J. M. Chatterjee, and V. G. Díaz, “A novel hybrid approach of SVM combined with NLP and probabilistic neural network for email phishing,” International Journal of Electrical and Computer Engineering (IJECE), vol. 10, no. 1, Art. no. 1, Feb. 2020, doi: 10.11591/ijece.v10i1.pp486-493.
Y. Fang, C. Zhang, C. Huang, L. Liu, and Y. Yang, “Phishing Email Detection Using Improved RCNN Model With Multilevel Vectors and Attention Mechanism,” IEEE Access, vol. 7, pp. 56329–56340, 2019, doi: 10.1109/ACCESS.2019.2913705.
S. Salloum, T. Gaber, S. Vadera, and K. Shaalan, “A Systematic Literature Review on Phishing Email Detection Using Natural Language Processing Techniques,” IEEE Access, vol. 10, pp. 65703–65727, 2022, doi: 10.1109/ACCESS.2022.3183083.
A. Barushka and P. Hajek, “Spam filtering using integrated distribution-based balancing approach and regularized deep neural networks,” Appl Intell, vol. 48, no. 10, pp. 3538–3556, Oct. 2018, doi: 10.1007/s10489-018-1161-y.
U. Srinivasarao and A. Sharaff, “Spam email classification and sentiment analysis based on semantic similarity methods,” Int. J. Comput. Sci. Eng., vol. 26, no. 1, pp. 65–77, Ocak 2023, doi: 10.1504/ijcse.2023.129147.
“Apache OpenOffice Extensions.”, Open Office. https://extensions.openoffice.org/ (accessed Mar. 01, 2024)
V. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals,” Soviet physics. Doklady, 1965.
B. Berger, M. S. Waterman, and Y. W. Yu, “Levenshtein Distance, Sequence Comparison and Biological Database Search,” IEEE Trans Inf Theory, vol. 67, no. 6, pp. 3287–3294, Jun. 2021, doi: 10.1109/tit.2020.2996543.
S. J. Russell and P. Norvig, Artificial intelligence a modern approach. London, 2010.
C. Cortes and V. Vapnik, “Support-vector networks,” Mach Learn, vol. 20, no. 3, pp. 273–297, Sep. 1995, doi: 10.1007/BF00994018.
A. Liaw, M. Wiener, and others, “Classification and regression by randomForest,” R news, vol. 2, no. 3, pp. 18–22, 2002.

Toplam 25 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Bilgi Güvenliği Yönetimi
Bölüm	Makaleler
Yazarlar	Caner Balım 0000-0002-1010-129X Nevzat Olgun 0000-0003-2461-4923
Yayımlanma Tarihi	26 Eylül 2024
Gönderilme Tarihi	20 Temmuz 2024
Kabul Tarihi	12 Eylül 2024
Yayımlandığı Sayı	Yıl 2024 Cilt: 13 Sayı: 3

Kaynak Göster

APA	Balım, C., & Olgun, N. (2024). Classification of Temporary and Real E-mail Addresses with Machine Learning Techniques. Türk Doğa Ve Fen Dergisi, 13(3), 176-183. https://doi.org/10.46810/tdfd.1519463
AMA	Balım C, Olgun N. Classification of Temporary and Real E-mail Addresses with Machine Learning Techniques. TDFD. Eylül 2024;13(3):176-183. doi:10.46810/tdfd.1519463
Chicago	Balım, Caner, ve Nevzat Olgun. “Classification of Temporary and Real E-Mail Addresses With Machine Learning Techniques”. Türk Doğa Ve Fen Dergisi 13, sy. 3 (Eylül 2024): 176-83. https://doi.org/10.46810/tdfd.1519463.
EndNote	Balım C, Olgun N (01 Eylül 2024) Classification of Temporary and Real E-mail Addresses with Machine Learning Techniques. Türk Doğa ve Fen Dergisi 13 3 176–183.
IEEE	C. Balım ve N. Olgun, “Classification of Temporary and Real E-mail Addresses with Machine Learning Techniques”, TDFD, c. 13, sy. 3, ss. 176–183, 2024, doi: 10.46810/tdfd.1519463.
ISNAD	Balım, Caner - Olgun, Nevzat. “Classification of Temporary and Real E-Mail Addresses With Machine Learning Techniques”. Türk Doğa ve Fen Dergisi 13/3 (Eylül 2024), 176-183. https://doi.org/10.46810/tdfd.1519463.
JAMA	Balım C, Olgun N. Classification of Temporary and Real E-mail Addresses with Machine Learning Techniques. TDFD. 2024;13:176–183.
MLA	Balım, Caner ve Nevzat Olgun. “Classification of Temporary and Real E-Mail Addresses With Machine Learning Techniques”. Türk Doğa Ve Fen Dergisi, c. 13, sy. 3, 2024, ss. 176-83, doi:10.46810/tdfd.1519463.
Vancouver	Balım C, Olgun N. Classification of Temporary and Real E-mail Addresses with Machine Learning Techniques. TDFD. 2024;13(3):176-83.

Kapak Resmi İndir

Makale Dosyaları

Tam Metin

Bu eser Creative Commons Atıf-GayriTicari-Türetilemez 4.0 Uluslararası Lisansı ile lisanslanmıştır.