Phishing E-mail Detection with Machine Learning and Deep Learning: Improving Classification Performance with Proposed New Features

Hadjer Brioua; Havvanur Siyambaş; Durmuş Özkan Şahin

doi:10.17694/bajece.1490596

Research Article

Phishing E-mail Detection with Machine Learning and Deep Learning: Improving Classification Performance with Proposed New Features

Year 2025, Volume: 13 Issue: 2, 183 - 193

Hadjer Brioua , Havvanur Siyambaş , Durmuş Özkan Şahin

https://doi.org/10.17694/bajece.1490596

Abstract

Today, with the increasing use of the internet, individuals who use email have become potential targets for fraudsters. These malicious groups send fake or misleading emails to steal sensitive information such as identity, bank, and social media credentials. This tactic is known as phishing. This study proposes a machine learning-based system for detecting phishing attacks using the SeFACED dataset, which was adjusted for binary classification with 12,498 normal and 5,142 fraudulent email data points. Python was used for programming, with Google Colab and Jupyter Notebook as development platforms. Email data underwent data collection, cleaning, and word stem separation processes. Three feature extraction techniques were used: Bag of Words, TF-IDF, and Word2Vec. Six algorithms, including Logistic Regression, Random Forest, Support Vector Machines, Naive Bayes, Convolutional Neural Network, and Long Short-Term Memory, were employed for classification. Performance was evaluated using metrics like accuracy, preci-sion, recall, and F1-score. New attributes proposed to enhance detection included CSS tags, HTML tags, black-list words, link errors, and grammar and spelling errors. The addition of these features generally improved classification results.

Keywords

Phishing, Phishing e-mail, Phishing attacks, Machine learning, Deep learning, Classification, Phishing e-mail classification

References

[1] R. Alabdan, “Phishing attacks survey: Types, vectors, and technical approaches,” Future internet, vol. 12, no. 10, p. 168, 2020.
[2] K. L. Chiew, K. S. C. Yong, and C. L. Tan, “A survey of phishing attacks: Their types, vectors and technical approaches,” Expert Systems with Applications, vol. 106, pp. 1–20, 2018.
[3] U. A. Butt, R. Amin, H. Aldabbas, S. Mohan, B. Alouffi, and A. Ah-madian, “Cloud-based email phishing attack using machine and deep learning algorithm,” Complex & Intelligent Systems, vol. 9, no. 3, pp. 3043–3070, 2023.
[4] M. Jakobsson, “Two-factor inauthentication–the rise in sms phishing attacks,” Computer Fraud & Security, vol. 2018, no. 6, pp. 6–8, 2018.
[5] A. Basit, M. Zafar, X. Liu, A. R. Javed, Z. Jalil, and K. Kifayat, “A com-prehensive survey of ai-enabled phishing attacks detection techniques,” Telecommunication Systems, vol. 76, pp. 139–154, 2021.
[6] APWG, “Apwg phishing activity trends report,” 2025. [Online]. Available: https://apwg.org/trendsreports
[7] S. Gupta, A. Singhal, and A. Kapoor, “A literature survey on social engineering attacks: Phishing attack,” in 2016 International Conference on Computing, Communication and Automation (ICCCA), 2016, pp. 537–540.
[8] J. Rastenis, S. Ramanauskait˙e, J. Januleviˇcius, A. ˇCenys, A. Slotkien˙e, and K. Pakrijauskas, “E-mail-based phishing attack taxonomy,” Applied sciences, vol. 10, no. 7, p. 2363, 2020.
[9] ¸S. Ahi and ˙I. So˘gukpınar, “Derin o¨ ˘grenme modelleri ile kimlik avı e-posta tespiti,” T¨urkiye Bilis¸im Vakfı Bilgisayar Bilimleri ve M¨uhendisli˘gi Dergisi, vol. 13, no. 2, pp. 17–31, 2020.
[10] R. Abdulraheem, A. Odeh, M. Al Fayoumi, and I. Keshta, “Efficient email phishing detection using machine learning,” in 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC). IEEE, 2022, pp. 0354–0358.
[11] N. S. Paradkar, “Phishing email’s detection using machine learning and deep learning,” in 2023 3rd International Conference on Advances in Computing, Communication, Embedded and Secure Systems (ACCESS). IEEE, 2023, pp. 160–162.
[12] A. Livara and R. Hernandez, “An empirical analysis of machine learning techniques in phishing e-mail detection,” in 2022 International Confer-ence for Advancement in Technology (ICONAT). IEEE, 2022, pp. 1–6.
[13] A. Akinyelu and A. Adewumi, “Classification of phishing email using random forest machine learning technique,” Journal of Applied Mathe-matics, vol. 2014, 2014.
[14] M. Dewis and T. Viana, “Phish responder: A hybrid machine learning approach to detect phishing and spam emails,” Applied System Innova-tion, vol. 5, no. 4, p. 73, 2022.
[15] X.-W. Chen and X. Lin, “Big data deep learning: challenges and perspectives,” IEEE access, vol. 2, pp. 514–525, 2014.
[16] M. Cos¸kun, ¨O. Yıldırım, A. Uc¸ar, and Y. Demır, “An overview of popular deep learning methods,” European Journal of Technique (EJT), vol. 7, no. 2, pp. 165–176, 2017.
[17] M. K. Sharma, R. Kumar, D. K. Sinha, K. Senthilkumar, D. Dhabliya, and G. Ahluwalia, “Exploring the benefits of deep learning for data science practices,” in 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT). IEEE, 2024, pp. 1–7.
[18] E. E. Eryilmaz, D. O. S¸ahin, and E. Kılıc¸, “Machine learning based spam e-mail detection system for turkish,” in 2020 5th International Conference on Computer Science and Engineering (UBMK). IEEE, 2020, pp. 7–12.
[19] S. T. Singh, M. D. Gabhane, and C. Mahamuni, “Study of machine learning and deep learning algorithms for the detection of email spam based on python implementation,” in 2023 International Conference on Disruptive Technologies (ICDT). IEEE, 2023, pp. 637–642.
[20] B. Sonare, G. J. Dharmale, A. Renapure, H. Khandelwal, and S. Narharshettiwar, “E-mail spam detection using machine learning,” in 2023 4th International Conference for Emerging Technology (INCET). IEEE, 2023, pp. 1–5.
[21] A. A. Adzhar, Z. Mabni, and Z. Ibrahim, “A comparative study on email phishing detection using machine learning techniques,” in 2022 IEEE International Conference on Computing (ICOCO). IEEE, 2022, pp. 96–101.
[22] B. B. Gupta, K. Yadav, I. Razzak, K. Psannis, A. Castiglione, and X. Chang, “A novel approach for phishing urls detection using lexical based machine learning in a real-time environment,” Computer Commu-nications, vol. 175, pp. 47–57, 2021.
[23] N. Moradpoor, B. Clavie, and B. Buchanan, “Employing machine learning techniques for detection and classification of phishing emails,” in 2017 Computing Conference. IEEE, 2017, pp. 149–156.
[24] M. Al Fayoumi, A. Odeh, I. Keshta, A. Aboshgifa, T. AlHajahjeh, and R. Abdulraheem, “Email phishing detection based on na¨ıve bayes, random forests, and svm classifications: A comparative study,” in 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC). IEEE, 2022, pp. 0007–0011.
[25] F. Salahdine, Z. El Mrabet, and N. Kaabouch, “Phishing attacks detection a machine learning-based approach,” in 2021 IEEE 12th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON). IEEE, 2021, pp. 0250–0255.
[26] Y. Wei and Y. Sekiya, “Sufficiency of ensemble machine learning methods for phishing websites detection,” IEEE Access, vol. 10, pp. 124 103–124 113, 2022.
[27] A. K. Jain and B. B. Gupta, “A machine learning based approach for phishing detection using hyperlinks information,” Journal of Ambient Intelligence and Humanized Computing, vol. 10, pp. 2015–2028, 2019.
[28] S. M. M. Ahammad, T. Raviteja, J. Koushik, P. V. Dinesh, and A. Ashok, “Machine learning approach based phishing email text analysis (ml-pe-ta),” in 2022 Third International Conference on Intelligent Computing Instrumentation and Control Technologies (ICICICT). IEEE, 2022, pp. 1087–1092.
[29] C. Thapa, J. W. Tang, A. Abuadbba, Y. Gao, S. Camtepe, S. Nepal, M. Almashor, and Y. Zheng, “Evaluation of federated learning in phishing email detection,” Sensors, vol. 23, no. 9, p. 4346, 2023.
[30] P. N. Wosah, Q. Ali Mirza, and W. Sayers, “Analysing the email data using stylometric method and deep learning to mitigate phishing attack,” International Journal of Information Technology, pp. 1–14, 2024.
[31] S. Jamal, H. Wimmer, and I. H. Sarker, “An improved transformer-based model for detecting phishing, spam and ham emails: A large language model approach,” Security and Privacy, p. e402, 2024.
[32] A. Al-Subaiey, M. Al-Thani, N. A. Alam, K. F. Antora, A. Khandakar, and S. A. U. Zaman, “Novel interpretable and robust web-based ai platform for phishing email detection,” Computers and Electrical Engi-neering, vol. 120, p. 109625, 2024.
[33] M. Hina, M. Ali, A. R. Javed, F. Ghabban, L. A. Khan, and Z. Jalil, “Sefaced: Semantic-based forensic analysis and classification of e-mail data using deep learning,” IEEE Access, vol. 9, pp. 98 398–98 411, 2021.
[34] AWS-Blog, “Lojistik Regresyon Nedir? - Lojistik Regresyon Modeline Ayrıntılı Bakıs¸,” 2025. [Online]. Available: https://aws.amazon.com/tr/what-is/logistic-regression/
[35] E. Gavcar and H. M. Metin, “Hisse senedi de˘gerlerinin makine ¨o˘grenimi (derin ¨o˘grenme) ile tahmini,” Ekonomi ve Y¨onetim Aras¸tırmaları Dergisi, vol. 10, no. 2, pp. 1–11, 2021.
[36] H. Li, “Computer network connection enhancement optimization algo-rithm based on convolutional neural network,” in 2021 International Conference on Networking, Communications and Information Technol-ogy (NetCIT). IEEE, 2021, pp. 281–284.
[37] A. Onar, “English Spam Words List,” 2025. [Online]. Available: https: //github.com/OOPSpam/spam-words/blob/main/spam-words-EN.txt

Year 2025, Volume: 13 Issue: 2, 183 - 193

Hadjer Brioua , Havvanur Siyambaş , Durmuş Özkan Şahin

https://doi.org/10.17694/bajece.1490596

Abstract

References

[1] R. Alabdan, “Phishing attacks survey: Types, vectors, and technical approaches,” Future internet, vol. 12, no. 10, p. 168, 2020.
[2] K. L. Chiew, K. S. C. Yong, and C. L. Tan, “A survey of phishing attacks: Their types, vectors and technical approaches,” Expert Systems with Applications, vol. 106, pp. 1–20, 2018.
[3] U. A. Butt, R. Amin, H. Aldabbas, S. Mohan, B. Alouffi, and A. Ah-madian, “Cloud-based email phishing attack using machine and deep learning algorithm,” Complex & Intelligent Systems, vol. 9, no. 3, pp. 3043–3070, 2023.
[4] M. Jakobsson, “Two-factor inauthentication–the rise in sms phishing attacks,” Computer Fraud & Security, vol. 2018, no. 6, pp. 6–8, 2018.
[5] A. Basit, M. Zafar, X. Liu, A. R. Javed, Z. Jalil, and K. Kifayat, “A com-prehensive survey of ai-enabled phishing attacks detection techniques,” Telecommunication Systems, vol. 76, pp. 139–154, 2021.
[6] APWG, “Apwg phishing activity trends report,” 2025. [Online]. Available: https://apwg.org/trendsreports
[7] S. Gupta, A. Singhal, and A. Kapoor, “A literature survey on social engineering attacks: Phishing attack,” in 2016 International Conference on Computing, Communication and Automation (ICCCA), 2016, pp. 537–540.
[8] J. Rastenis, S. Ramanauskait˙e, J. Januleviˇcius, A. ˇCenys, A. Slotkien˙e, and K. Pakrijauskas, “E-mail-based phishing attack taxonomy,” Applied sciences, vol. 10, no. 7, p. 2363, 2020.
[9] ¸S. Ahi and ˙I. So˘gukpınar, “Derin o¨ ˘grenme modelleri ile kimlik avı e-posta tespiti,” T¨urkiye Bilis¸im Vakfı Bilgisayar Bilimleri ve M¨uhendisli˘gi Dergisi, vol. 13, no. 2, pp. 17–31, 2020.
[10] R. Abdulraheem, A. Odeh, M. Al Fayoumi, and I. Keshta, “Efficient email phishing detection using machine learning,” in 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC). IEEE, 2022, pp. 0354–0358.
[11] N. S. Paradkar, “Phishing email’s detection using machine learning and deep learning,” in 2023 3rd International Conference on Advances in Computing, Communication, Embedded and Secure Systems (ACCESS). IEEE, 2023, pp. 160–162.
[12] A. Livara and R. Hernandez, “An empirical analysis of machine learning techniques in phishing e-mail detection,” in 2022 International Confer-ence for Advancement in Technology (ICONAT). IEEE, 2022, pp. 1–6.
[13] A. Akinyelu and A. Adewumi, “Classification of phishing email using random forest machine learning technique,” Journal of Applied Mathe-matics, vol. 2014, 2014.
[14] M. Dewis and T. Viana, “Phish responder: A hybrid machine learning approach to detect phishing and spam emails,” Applied System Innova-tion, vol. 5, no. 4, p. 73, 2022.
[15] X.-W. Chen and X. Lin, “Big data deep learning: challenges and perspectives,” IEEE access, vol. 2, pp. 514–525, 2014.
[16] M. Cos¸kun, ¨O. Yıldırım, A. Uc¸ar, and Y. Demır, “An overview of popular deep learning methods,” European Journal of Technique (EJT), vol. 7, no. 2, pp. 165–176, 2017.
[17] M. K. Sharma, R. Kumar, D. K. Sinha, K. Senthilkumar, D. Dhabliya, and G. Ahluwalia, “Exploring the benefits of deep learning for data science practices,” in 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT). IEEE, 2024, pp. 1–7.
[18] E. E. Eryilmaz, D. O. S¸ahin, and E. Kılıc¸, “Machine learning based spam e-mail detection system for turkish,” in 2020 5th International Conference on Computer Science and Engineering (UBMK). IEEE, 2020, pp. 7–12.
[19] S. T. Singh, M. D. Gabhane, and C. Mahamuni, “Study of machine learning and deep learning algorithms for the detection of email spam based on python implementation,” in 2023 International Conference on Disruptive Technologies (ICDT). IEEE, 2023, pp. 637–642.
[20] B. Sonare, G. J. Dharmale, A. Renapure, H. Khandelwal, and S. Narharshettiwar, “E-mail spam detection using machine learning,” in 2023 4th International Conference for Emerging Technology (INCET). IEEE, 2023, pp. 1–5.
[21] A. A. Adzhar, Z. Mabni, and Z. Ibrahim, “A comparative study on email phishing detection using machine learning techniques,” in 2022 IEEE International Conference on Computing (ICOCO). IEEE, 2022, pp. 96–101.
[22] B. B. Gupta, K. Yadav, I. Razzak, K. Psannis, A. Castiglione, and X. Chang, “A novel approach for phishing urls detection using lexical based machine learning in a real-time environment,” Computer Commu-nications, vol. 175, pp. 47–57, 2021.
[23] N. Moradpoor, B. Clavie, and B. Buchanan, “Employing machine learning techniques for detection and classification of phishing emails,” in 2017 Computing Conference. IEEE, 2017, pp. 149–156.
[24] M. Al Fayoumi, A. Odeh, I. Keshta, A. Aboshgifa, T. AlHajahjeh, and R. Abdulraheem, “Email phishing detection based on na¨ıve bayes, random forests, and svm classifications: A comparative study,” in 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC). IEEE, 2022, pp. 0007–0011.
[25] F. Salahdine, Z. El Mrabet, and N. Kaabouch, “Phishing attacks detection a machine learning-based approach,” in 2021 IEEE 12th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON). IEEE, 2021, pp. 0250–0255.
[26] Y. Wei and Y. Sekiya, “Sufficiency of ensemble machine learning methods for phishing websites detection,” IEEE Access, vol. 10, pp. 124 103–124 113, 2022.
[27] A. K. Jain and B. B. Gupta, “A machine learning based approach for phishing detection using hyperlinks information,” Journal of Ambient Intelligence and Humanized Computing, vol. 10, pp. 2015–2028, 2019.
[28] S. M. M. Ahammad, T. Raviteja, J. Koushik, P. V. Dinesh, and A. Ashok, “Machine learning approach based phishing email text analysis (ml-pe-ta),” in 2022 Third International Conference on Intelligent Computing Instrumentation and Control Technologies (ICICICT). IEEE, 2022, pp. 1087–1092.
[29] C. Thapa, J. W. Tang, A. Abuadbba, Y. Gao, S. Camtepe, S. Nepal, M. Almashor, and Y. Zheng, “Evaluation of federated learning in phishing email detection,” Sensors, vol. 23, no. 9, p. 4346, 2023.
[30] P. N. Wosah, Q. Ali Mirza, and W. Sayers, “Analysing the email data using stylometric method and deep learning to mitigate phishing attack,” International Journal of Information Technology, pp. 1–14, 2024.
[31] S. Jamal, H. Wimmer, and I. H. Sarker, “An improved transformer-based model for detecting phishing, spam and ham emails: A large language model approach,” Security and Privacy, p. e402, 2024.
[32] A. Al-Subaiey, M. Al-Thani, N. A. Alam, K. F. Antora, A. Khandakar, and S. A. U. Zaman, “Novel interpretable and robust web-based ai platform for phishing email detection,” Computers and Electrical Engi-neering, vol. 120, p. 109625, 2024.
[33] M. Hina, M. Ali, A. R. Javed, F. Ghabban, L. A. Khan, and Z. Jalil, “Sefaced: Semantic-based forensic analysis and classification of e-mail data using deep learning,” IEEE Access, vol. 9, pp. 98 398–98 411, 2021.
[34] AWS-Blog, “Lojistik Regresyon Nedir? - Lojistik Regresyon Modeline Ayrıntılı Bakıs¸,” 2025. [Online]. Available: https://aws.amazon.com/tr/what-is/logistic-regression/
[35] E. Gavcar and H. M. Metin, “Hisse senedi de˘gerlerinin makine ¨o˘grenimi (derin ¨o˘grenme) ile tahmini,” Ekonomi ve Y¨onetim Aras¸tırmaları Dergisi, vol. 10, no. 2, pp. 1–11, 2021.
[36] H. Li, “Computer network connection enhancement optimization algo-rithm based on convolutional neural network,” in 2021 International Conference on Networking, Communications and Information Technol-ogy (NetCIT). IEEE, 2021, pp. 281–284.
[37] A. Onar, “English Spam Words List,” 2025. [Online]. Available: https: //github.com/OOPSpam/spam-words/blob/main/spam-words-EN.txt

There are 37 citations in total.

Details

Primary Language	English
Subjects	Computer Software
Journal Section	Araştırma Articlessi
Authors	Hadjer Brioua 0009-0003-9829-1810 Havvanur Siyambaş 0009-0002-3911-4286 Durmuş Özkan Şahin 0000-0002-0831-7825
Early Pub Date	July 11, 2025
Publication Date
Submission Date	May 27, 2024
Acceptance Date	January 10, 2025
Published in Issue	Year 2025 Volume: 13 Issue: 2

Cite

APA	Brioua, H., Siyambaş, H., & Şahin, D. Ö. (2025). Phishing E-mail Detection with Machine Learning and Deep Learning: Improving Classification Performance with Proposed New Features. Balkan Journal of Electrical and Computer Engineering, 13(2), 183-193. https://doi.org/10.17694/bajece.1490596

Download Cover Image

Article Files

Full Text

All articles published by BAJECE are licensed under the Creative Commons Attribution 4.0 International License. This permits anyone to copy, redistribute, remix, transmit and adapt the work provided the original work and source is appropriately cited. Creative Commons LisansÄ±