Research Article
BibTex RIS Cite

A Novel Ensemble Learning-Based Machine Learning Model for Phishing Attack

Year 2025, Volume: 13 Issue: 4, 427 - 434, 31.12.2025
https://doi.org/10.17694/bajece.1695071
https://izlik.org/JA63ER87SS

Abstract

In today's world, the internet is increasingly effective in every aspect of our lives. The internet, which provides countless advantages when used consciously, also carries many dangers in its other aspect. One of these dangers and the most important one is the possibility of being targeted by malicious people while using the internet. Attackers can deceive innocent people by directing them to fake, misleading websites to obtain our important information and data. With this type of attack, known as phishing attack, internet users can provide their information and data to attackers. In this study, we propose a new ensemble learning-based machine learning model with feature selection methods to detect phishing attacks. We also try two feature selection algorithms to increase the classification success of the model and analyze the effects of these algorithms on the classification success. After the feature selection algorithms, the dataset with the selected features was trained with a new ensemble learning model that we created with the voting classifier method using XGBoost, CatBoost, LightGBM algorithms. The proposed model was analyzed using widely used performance evaluation metrics, achieving an accuracy of 97.96%. It was observed that the proposed model outperforms the studies in the literature using the same dataset.

References

  • [1] A. Basit, M. Zafar, X. Liu, A. R. Javed, Z. Jalil, and K. Kifayat, “A comprehensive survey of AI-enabled phishing attacks detection techniques,” Telecommunication Systems, vol. 76, no. 1, pp. 139–154, Jan. 2021, doi: 10.1007/s11235-020-00733-2.
  • [2] S. Gupta, A. Singhal, and A. Kapoor, “A literature survey on social engineering attacks: Phishing attack,” in 2016 International Conference on Computing, Communication and Automation (ICCCA), 2016.
  • [3] A. Almomani, M. Alauthman, M. T. Shatnawi, M. Alweshah, A. Alrosan, and B. B. Gupta, “Phishing website detection with semantic features based on machine learning classifiers: a comparative study,” International Journal on Semantic Web and Information Systems, vol. 18, no. 1, 2022.
  • [4] M. F. B. Karim, T. Hasan, N. Tazreen, S. B. Hakim, and S. Tarannum, “An investigation of ML techniques to detect phishing websites by complexity reduction,” in 2022 IEEE International Conference on Cybernetics and Computational Intelligence, 2022.
  • [5] A. Subasi and E. Kremic, “Comparison of adaboost with multiboosting for phishing website detection,” Procedia Computer Science, 2020.
  • [6] A. Karakaya and A. Ulu, “A novel model based on ensemble learning for phishing attack,” Düzce Üniversitesi Bilim ve Teknoloji Dergisi, 2024.
  • [7] E. Aslan and Y. Özüpak, “Comparison of machine learning algorithms for automatic prediction of Alzheimer disease,” Journal of the Chinese Medical Association, vol. 88, no. 2, pp. 98–107, 2025.
  • [8] Y. Özüpak, F. Alpsalaz, and E. Aslan, “Air quality forecasting using machine learning: Comparative analysis and ensemble strategies for enhanced prediction,” Water, Air, & Soil Pollution, vol. 236, no. 7, p. 464, 2025.
  • [9] F. Alpsalaz, Y. Özüpak, E. Aslan, and H. Uzel, “Classification of maize leaf diseases with deep learning: Performance evaluation of the proposed model and use of explicable artificial intelligence,” Chemometrics and Intelligent Laboratory Systems, vol. 242, p. 105412, 2025.
  • [10] E. Aslan and Y. Özüpak, “Detection of road extraction from satellite images with deep learning method,” Cluster Computing, vol. 28, no. 1, p. 72, 2025.
  • [11] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proc. 22nd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Aug. 2016, pp. 785–794.
  • [12] G. Dai, M. Amjad, I. Ahmad, M. Ahmad, P. Wróblewski, P. Kamiński, and U. Amjad, “Prediction of pile bearing capacity using XGBoost algorithm: modeling and performance evaluation,” Applied Sciences, 2022.
  • [13] A. Maulana, T. R. Noviandy, N. R. Sasmita, M. Paristiowati, R. Suhendra, E. Yandri, and J. Satrio, “Optimizing university admissions: A machine learning perspective,” Journal of Educational Management and Learning, vol. 1, no. 1, p. 2023, Jun. 2023.
  • [14] A. H. R. Alves and R. Cerri, “A two-step model for drug-target interaction prediction with predictive bi-clustering trees and XGBoost,” in 2022 International Joint Conference on Neural Networks (IJCNN), 2022.
  • [15] L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “CatBoost: unbiased boosting with categorical features,” in Advances in Neural Information Processing Systems, 2018.
  • [16] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T. Y. Liu, “LightGBM: A highly efficient gradient boosting decision tree,” in Advances in Neural Information Processing Systems, 2017.
  • [17] W. M. Kaufinann, “Data mining: Concepts and techniques,” 2007.
  • [18] J. Dai, J. Chen, Y. Liu, and H. Hu, “Novel multi-label feature selection via label symmetric uncertainty correlation learning and feature redundancy evaluation,” Knowledge-Based Systems, 2020.
  • [19] M. Rahmanian and E. Mohebbi, “Unsupervised fuzzy multivariate symmetric uncertainty feature selection based on constructing virtual cluster representative,” Fuzzy Sets and Systems, 2022.
  • [20] R. Mohammad, F. Thabtah, and L. McCluskey, “Phishing websites features,” School of Computing and Engineering, University of Huddersfield, 2015.
  • [21] R. Mohammad, F. Thabtah, and L. McCluskey, “Predicting phishing websites based on self-structuring neural network,” Neural Computing and Applications, vol. 25, no. 2, pp. 443–458, Aug. 2014.
  • [22] K. Ileri, “Comparative analysis of CatBoost, LightGBM, XGBoost, RF, and DT methods optimised with PSO to estimate the number of k-barriers for intrusion detection in wireless sensor networks,” International Journal of Machine Learning and Cybernetics, pp. 1–20, 2025.
  • [23] H. F. Harumy, S. M. Hardi, and M. F. Al Banna, “Early-stage diabetes risk detection using comparison of XGBoost, LightGBM, and CatBoost algorithms,” in Proc. Int. Conf. on Advanced Information Networking and Applications, pp. 12–24, 2024.
  • [24] S. R. Sharma, B. Singh, and M. Kaur, “Improving the classification of phishing websites using a hybrid algorithm,” Computational Intelligence, vol. 38, no. 2, pp. 667–689, 2021.
  • [25] S. Maurya and A. Jain, “Malicious website detection based on URL classification: A comparative analysis,” Lecture Notes in Networks and Systems, vol. 421, pp. 249–260, 2023.
  • [26] J.V. Cubas and G.M. Niño, “Modelo de machine learning en la detección de sitios web phishing,” 2022.
  • [27] M. A. A. Siddiq, M. Arifuzzaman, and M. S. Islam, “Phishing website detection using deep learning,” ACM Int. Conf. Proc. Series, pp. 83–88, Mar. 2022, doi: 10.1145/3542954.3542967.
  • [28] A. Hashim, R. Medani, and T. A. Attia, “Defences against web application attacks and detecting phishing links using machine learning,” in 2020 Int. Conf. on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE), 2021.
  • [29] D. C. J. and A. Gaurav, “Exposing model bias in machine learning revisiting the boy who cried wolf in the context of phishing detection,” Journal of Business Analytics, vol. 4, no. 2, pp. 171–178, 2021.
  • [30] R. A. Kelkar and A. Vijayalakshmi, “ML based model for phishing website detection,” Challenge, vol. 7, no. 12, 2020.
  • [31] G. Sonowal and K. S. Kuppusamy, “PhiDMA–A phishing detection model with multi-filter approach,” Journal of King Saud University-Computer and Information Sciences, 2020.
  • [32] A. F. Nugraha and L. Rahman, “Meta-algorithms for improving classification performance in the web-phishing detection process,” in 2019 4th International Conference on Information Technology, 2019.
  • [33] S. Adi, Y. Pristyanto, and A. Sunyoto, “The best features selection method and relevance variable for web phishing classification,” in 2019 International Conference on Information and Communications, 2019.
  • [34] A. Subasi, E. Molah, F. Almkallawi, and T. J. Chaudhery, “Intelligent phishing website detection using random forest classifier,” in 2017 International Conference on Electrical and Computing Technologies and Applications (ICECTA), 2017.
  • [35] D. R. Ibrahim and A. H. Hadi, “Phishing websites prediction using classification techniques,” in 2017 International Conference on New Trends in Computing Sciences (ICTCS), 2017.
  • [36] M. Kaytan and D. Hanbay, “Effective classification of phishing web pages based on new rules by using extreme learning machines,” Computer Science, vol. 2, no. 1, pp. 15–36, 2017.
  • [37] H. Bibi, S. R. Shah, M. M. Baig, M. Sharif, M. Mehmood, Z. Akhtar, and K. Siddique, “Phishing website detection using improved multilayered convolutional neural networks,” Journal of Computer Science, vol. 20, no. 9, pp. 1069–1079, 2024.
  • [38] S. Alnemari and M. Alshammari, “Detecting phishing domains using machine learning,” Applied Sciences, vol. 13, no. 8, p. 4649, 2023.

KİMLİK AVI SALDIRISI İÇİN YENİ BİR TOPLULUK ÖĞRENME MODELİ TABANLI MAKİNE ÖĞRENMESİ MODELİ

Year 2025, Volume: 13 Issue: 4, 427 - 434, 31.12.2025
https://doi.org/10.17694/bajece.1695071
https://izlik.org/JA63ER87SS

Abstract

Günümüzde internet hayatımızın her alanında etkinliğini giderek artırmaktadır. Bilinçli kullanıldığında sayısız avantaj sağlayan internet, diğer yönüyle de birçok tehlikeleri bünyesinde taşımaktadır. Bu tehlikelerden birisi ve en önemlisi interneti kullanırken kötü niyetli kişilerin hedefi olma olasılığıdır. Saldırganlar önemli bilgilerimizi, verilerimizi elde etmek için sahte, yanıltıcı web sitelerine yönlendirerek masum insanları kandırabilmektedir. Kimlik avı saldırısı olarak bilinen bu saldırı tipi ile internet kullanıcıları bilgilerini, verilerini saldırganlara sunabilmektedir. Bu çalışmada, kimlik avı saldırılarını tespit etmek için öznitelik seçimi yöntemlerini de kullanarak yeni bir topluluk öğrenme tabanlı makine öğrenmesi modeli öneriyoruz. Ayrıca modelin sınıflandırma başarısını artırmak için iki öznitelik seçim algoritması deniyoruz ve bu algoritmaların sınıflandırma başarısı üzerindeki etkilerini analiz ediyoruz. Öznitelik seçim algoritmalarından sonra seçilen özniteliklere sahip veriseti ile XGBoost, CatBoost, LightGBM algoritmaları oylama sınıflayıcısı metodu ile oluşturduğumuz yeni bir topluluk öğrenme modeli ile eğitilmiştir. Önerilen model, yaygın olarak kullanılan model performansı değerlendirme metrikleri kullanılarak analiz edilmiştir. Önerilen modelin aynı veri setini kullanan literatürde bulunan çalışmalardan daha üstün performans gösterdiği gözlemlenmiştir.

References

  • [1] A. Basit, M. Zafar, X. Liu, A. R. Javed, Z. Jalil, and K. Kifayat, “A comprehensive survey of AI-enabled phishing attacks detection techniques,” Telecommunication Systems, vol. 76, no. 1, pp. 139–154, Jan. 2021, doi: 10.1007/s11235-020-00733-2.
  • [2] S. Gupta, A. Singhal, and A. Kapoor, “A literature survey on social engineering attacks: Phishing attack,” in 2016 International Conference on Computing, Communication and Automation (ICCCA), 2016.
  • [3] A. Almomani, M. Alauthman, M. T. Shatnawi, M. Alweshah, A. Alrosan, and B. B. Gupta, “Phishing website detection with semantic features based on machine learning classifiers: a comparative study,” International Journal on Semantic Web and Information Systems, vol. 18, no. 1, 2022.
  • [4] M. F. B. Karim, T. Hasan, N. Tazreen, S. B. Hakim, and S. Tarannum, “An investigation of ML techniques to detect phishing websites by complexity reduction,” in 2022 IEEE International Conference on Cybernetics and Computational Intelligence, 2022.
  • [5] A. Subasi and E. Kremic, “Comparison of adaboost with multiboosting for phishing website detection,” Procedia Computer Science, 2020.
  • [6] A. Karakaya and A. Ulu, “A novel model based on ensemble learning for phishing attack,” Düzce Üniversitesi Bilim ve Teknoloji Dergisi, 2024.
  • [7] E. Aslan and Y. Özüpak, “Comparison of machine learning algorithms for automatic prediction of Alzheimer disease,” Journal of the Chinese Medical Association, vol. 88, no. 2, pp. 98–107, 2025.
  • [8] Y. Özüpak, F. Alpsalaz, and E. Aslan, “Air quality forecasting using machine learning: Comparative analysis and ensemble strategies for enhanced prediction,” Water, Air, & Soil Pollution, vol. 236, no. 7, p. 464, 2025.
  • [9] F. Alpsalaz, Y. Özüpak, E. Aslan, and H. Uzel, “Classification of maize leaf diseases with deep learning: Performance evaluation of the proposed model and use of explicable artificial intelligence,” Chemometrics and Intelligent Laboratory Systems, vol. 242, p. 105412, 2025.
  • [10] E. Aslan and Y. Özüpak, “Detection of road extraction from satellite images with deep learning method,” Cluster Computing, vol. 28, no. 1, p. 72, 2025.
  • [11] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proc. 22nd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Aug. 2016, pp. 785–794.
  • [12] G. Dai, M. Amjad, I. Ahmad, M. Ahmad, P. Wróblewski, P. Kamiński, and U. Amjad, “Prediction of pile bearing capacity using XGBoost algorithm: modeling and performance evaluation,” Applied Sciences, 2022.
  • [13] A. Maulana, T. R. Noviandy, N. R. Sasmita, M. Paristiowati, R. Suhendra, E. Yandri, and J. Satrio, “Optimizing university admissions: A machine learning perspective,” Journal of Educational Management and Learning, vol. 1, no. 1, p. 2023, Jun. 2023.
  • [14] A. H. R. Alves and R. Cerri, “A two-step model for drug-target interaction prediction with predictive bi-clustering trees and XGBoost,” in 2022 International Joint Conference on Neural Networks (IJCNN), 2022.
  • [15] L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “CatBoost: unbiased boosting with categorical features,” in Advances in Neural Information Processing Systems, 2018.
  • [16] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T. Y. Liu, “LightGBM: A highly efficient gradient boosting decision tree,” in Advances in Neural Information Processing Systems, 2017.
  • [17] W. M. Kaufinann, “Data mining: Concepts and techniques,” 2007.
  • [18] J. Dai, J. Chen, Y. Liu, and H. Hu, “Novel multi-label feature selection via label symmetric uncertainty correlation learning and feature redundancy evaluation,” Knowledge-Based Systems, 2020.
  • [19] M. Rahmanian and E. Mohebbi, “Unsupervised fuzzy multivariate symmetric uncertainty feature selection based on constructing virtual cluster representative,” Fuzzy Sets and Systems, 2022.
  • [20] R. Mohammad, F. Thabtah, and L. McCluskey, “Phishing websites features,” School of Computing and Engineering, University of Huddersfield, 2015.
  • [21] R. Mohammad, F. Thabtah, and L. McCluskey, “Predicting phishing websites based on self-structuring neural network,” Neural Computing and Applications, vol. 25, no. 2, pp. 443–458, Aug. 2014.
  • [22] K. Ileri, “Comparative analysis of CatBoost, LightGBM, XGBoost, RF, and DT methods optimised with PSO to estimate the number of k-barriers for intrusion detection in wireless sensor networks,” International Journal of Machine Learning and Cybernetics, pp. 1–20, 2025.
  • [23] H. F. Harumy, S. M. Hardi, and M. F. Al Banna, “Early-stage diabetes risk detection using comparison of XGBoost, LightGBM, and CatBoost algorithms,” in Proc. Int. Conf. on Advanced Information Networking and Applications, pp. 12–24, 2024.
  • [24] S. R. Sharma, B. Singh, and M. Kaur, “Improving the classification of phishing websites using a hybrid algorithm,” Computational Intelligence, vol. 38, no. 2, pp. 667–689, 2021.
  • [25] S. Maurya and A. Jain, “Malicious website detection based on URL classification: A comparative analysis,” Lecture Notes in Networks and Systems, vol. 421, pp. 249–260, 2023.
  • [26] J.V. Cubas and G.M. Niño, “Modelo de machine learning en la detección de sitios web phishing,” 2022.
  • [27] M. A. A. Siddiq, M. Arifuzzaman, and M. S. Islam, “Phishing website detection using deep learning,” ACM Int. Conf. Proc. Series, pp. 83–88, Mar. 2022, doi: 10.1145/3542954.3542967.
  • [28] A. Hashim, R. Medani, and T. A. Attia, “Defences against web application attacks and detecting phishing links using machine learning,” in 2020 Int. Conf. on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE), 2021.
  • [29] D. C. J. and A. Gaurav, “Exposing model bias in machine learning revisiting the boy who cried wolf in the context of phishing detection,” Journal of Business Analytics, vol. 4, no. 2, pp. 171–178, 2021.
  • [30] R. A. Kelkar and A. Vijayalakshmi, “ML based model for phishing website detection,” Challenge, vol. 7, no. 12, 2020.
  • [31] G. Sonowal and K. S. Kuppusamy, “PhiDMA–A phishing detection model with multi-filter approach,” Journal of King Saud University-Computer and Information Sciences, 2020.
  • [32] A. F. Nugraha and L. Rahman, “Meta-algorithms for improving classification performance in the web-phishing detection process,” in 2019 4th International Conference on Information Technology, 2019.
  • [33] S. Adi, Y. Pristyanto, and A. Sunyoto, “The best features selection method and relevance variable for web phishing classification,” in 2019 International Conference on Information and Communications, 2019.
  • [34] A. Subasi, E. Molah, F. Almkallawi, and T. J. Chaudhery, “Intelligent phishing website detection using random forest classifier,” in 2017 International Conference on Electrical and Computing Technologies and Applications (ICECTA), 2017.
  • [35] D. R. Ibrahim and A. H. Hadi, “Phishing websites prediction using classification techniques,” in 2017 International Conference on New Trends in Computing Sciences (ICTCS), 2017.
  • [36] M. Kaytan and D. Hanbay, “Effective classification of phishing web pages based on new rules by using extreme learning machines,” Computer Science, vol. 2, no. 1, pp. 15–36, 2017.
  • [37] H. Bibi, S. R. Shah, M. M. Baig, M. Sharif, M. Mehmood, Z. Akhtar, and K. Siddique, “Phishing website detection using improved multilayered convolutional neural networks,” Journal of Computer Science, vol. 20, no. 9, pp. 1069–1079, 2024.
  • [38] S. Alnemari and M. Alshammari, “Detecting phishing domains using machine learning,” Applied Sciences, vol. 13, no. 8, p. 4649, 2023.
There are 38 citations in total.

Details

Primary Language English
Subjects Software Engineering (Other)
Journal Section Research Article
Authors

Ekrem Baser 0000-0002-8233-7840

Submission Date May 8, 2025
Acceptance Date November 18, 2025
Publication Date December 31, 2025
DOI https://doi.org/10.17694/bajece.1695071
IZ https://izlik.org/JA63ER87SS
Published in Issue Year 2025 Volume: 13 Issue: 4

Cite

APA Baser, E. (2025). A Novel Ensemble Learning-Based Machine Learning Model for Phishing Attack. Balkan Journal of Electrical and Computer Engineering, 13(4), 427-434. https://doi.org/10.17694/bajece.1695071

All articles published by BAJECE are licensed under the Creative Commons Attribution 4.0 International License. This permits anyone to copy, redistribute, remix, transmit and adapt the work provided the original work and source is appropriately cited.Creative Commons Lisansı