A Novel Model Based on Ensemble Learning for Phishing Attack

Aykut Karakaya; Ahmet Ulu

doi:10.29130/dubited.1426401

Research Article

Kimlik Avı Saldırısı için Ensemble Öğrenmesine Dayalı Yeni Bir Model

Year 2024, Volume: 12 Issue: 4, 1804 - 1827, 23.10.2024

Aykut Karakaya , Ahmet Ulu

https://doi.org/10.29130/dubited.1426401

Abstract

İnternet ortamının hızının artması ve kullanılan altyapıların gelişmesiyle birlikte, insanlar çoğu işlerini çevrimiçi olarak gerçekleştirmeye başlamıştır. Bu durum hayatı kolaylaştırırken, kötü niyetli kişiler tarafından saldırıya maruz kalma olasılığını artırmaktadır. Bu saldırılardan biri de kimlik avıdır. Kimlik avı saldırısında saldırganlar, kopyalanmış, sahte web siteleri oluşturarak kullanıcılardan bilgi çalmayı amaçlamaktadır. Bu saldırı nispeten eski ve kolay olmasına rağmen, düşük bilgi teknolojileri okuryazarlığı nedeniyle hâlâ etkili olabilmektedir. Kullanıcılar, bu sahte web sitelerine anlık tepki, bilgisizlik veya iyi niyetle bilgilerini girebilmekte ve kimlik avı saldırılarına maruz kalabilmektedir. Bir kullanıcının hesap bilgilerinin tehlikeye girmesi, bağlı olduğu kuruluşun veya kurumun güvenliğini de riske atmaktadır. Bu çalışmada, kimlik avı saldırılarını tespit etmek için yeni bir makine öğrenimi tabanlı topluluk (ensemble) model öneriyoruz. Ayrıca, farklı özellik seçimi yöntemlerinin etkisini ölçmek için bir ablasyon çalışmaları sunuyoruz. NaiveStackingSymmetric (NSS) olarak adlandırdığımız model doğruluk (ACC), eğri altındaki alan (AUC) ve F-skor metrikleri ile çokgen alan metriği (PAM) kullanılarak analiz edilmekte ve aynı veri kümesini kullanan diğer çalışmalara göre daha iyi sonuçlara sahip olduğu gösterilmektedir.

Keywords

Kimlik avı saldırısı , ensemble öğrenme , kötücül URL , stacking , bilgi güvenliği

References

[1] A. Karakaya and S. Akleylek, “A survey on security threats and authentication approaches in wireless sensor networks,” in 2018 6th International Symposium on Digital Forensic and Security (ISDFS), 2018, pp. 1–4. doi: 10.1109/ISDFS.2018.8355381.
[2] A. Karakaya and F. Arat, “A Survey on Security Requirements, Threats and Protocols in Industrial Internet of Things,” International Journal of Information Security Science, vol. 10, no. 4. Şeref SAĞIROĞLU, pp. 138–152, 2021.
[3] K. Krombholz, H. Hobel, M. Huber, and E. Weippl, “Advanced social engineering attacks,” J. Inf. Secur. Appl., vol. 22, pp. 113–122, 2015.
[4] A. Almomani et al., “Phishing website detection with semantic features based on machine learning classifiers: A comparative study,” Int. J. Semant. Web Inf. Syst., vol. 18, no. 1, pp. 1–24, 2022.
[5] S. R. Sharma, B. Singh, and M. Kaur, “Improving the classification of phishing websites using a hybrid algorithm,” Comput. Intell., vol. 38, no. 2, pp. 667–689, 2022.
[6] O. Aydemir, “A new performance evaluation metric for classifiers: polygon area metric,” J. Classif., vol. 38, pp. 16–26, 2021.
[7] S. Maurya and A. Jain, “Malicious Website Detection Based on URL Classification: A Comparative Analysis,” in Proceedings of Third International Conference on Computing, Communications, and Cyber-Security: IC4S 2021, 2022, pp. 249–260.
[8] H. Bouijij, A. Berqia, and H. Saliah-Hassan, “Phishing URL classification using Extra-Tree and DNN,” in 2022 10th International Symposium on Digital Forensics and Security (ISDFS), 2022, pp. 1–6.
[9] J. V. Cubas and G. M. Niño, “Modelo de machine learning en la detección de sitios web phishing,” Rev. Ibérica Sist. e Tecnol. Informação, no. E52, pp. 161–173, 2022.
[10] M. A. A. Siddiq, M. Arifuzzaman, and M. S. Islam, “Phishing Website Detection using Deep Learning,” in Proceedings of the 2nd International Conference on Computing Advancements, 2022, pp. 83–88.
[11] W. Fadheel, W. Al-Mawee, and S. Carr, “On Phishing: URL Lexical and Network Traffic Features Analysis and Knowledge Extraction using Machine Learning Algorithms (A Comparison Study),” in 2022 5th International Conference on Data Science and Information Technology (DSIT), 2022, pp. 1–7.
[12] A. Hashim, R. Medani, and T. A. Attia, “Defences against web application attacks and detecting phishing links using machine learning,” in 2020 international conference on computer, control, electrical, and electronics engineering (ICCCEEE), 2021, pp. 1–6.
[13] S. Dangwal and A.-N. Moldovan, “Feature Selection for Machine Learning-based Phishing Websites Detection,” in 2021 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2021, pp. 1–6.
[14] D. CJ and A. Gaurav, “Exposing model bias in machine learning revisiting the boy who cried wolf in the context of phishing detection,” J. Bus. Anal., vol. 4, no. 2, pp. 171–178, 2021.
[15] Z. Fan, “Detecting and Classifying Phishing Websites by Machine Learning,” in 2021 3rd International Conference on Applied Machine Learning (ICAML), 2021, pp. 48–51.
[16] A. Subasi and E. Kremic, “Comparison of adaboost with multiboosting for phishing website detection,” Procedia Comput. Sci., vol. 168, pp. 272–278, 2020.
[17] R. A. Kelkar and A. Vijayalakshmi, “ML BASED MODEL FOR PHISHING WEBSITE DETECTION,” challenge, vol. 7, no. 12, p. 2020.
[18] G. Sonowal and K. S. Kuppusamy, “PhiDMA--A phishing detection model with multi-filter approach,” J. King Saud Univ. Inf. Sci., vol. 32, no. 1, pp. 99–112, 2020.
[19] A. F. Nugraha and L. Rahman, “Meta-algorithms for improving classification performance in the web-phishing detection process,” in 2019 4th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), 2019, pp. 271–275.
[20] S. Adi, Y. Pristyanto, and A. Sunyoto, “The best features selection method and relevance variable for web phishing classification,” in 2019 International Conference on Information and Communications Technology (ICOIACT), 2019, pp. 578–583.
[21] I. Tyagi, J. Shad, S. Sharma, S. Gaur, and G. Kaur, “A novel machine learning approach to detect phishing websites,” in 2018 5th International conference on signal processing and integrated networks (SPIN), 2018, pp. 425–430.
[22] A. Subasi, E. Molah, F. Almkallawi, and T. J. Chaudhery, “Intelligent phishing website detection using random forest classifier,” in 2017 International conference on electrical and computing technologies and applications (ICECTA), 2017, pp. 1–5.
[23] D. R. Ibrahim and A. H. Hadi, “Phishing websites prediction using classification techniques,” in 2017 International Conference on New Trends in Computing Sciences (ICTCS), 2017, pp. 133–137.
[24] A. Almomany, W. R. Ayyad, and A. Jarrah, “Optimized implementation of an improved KNN classification algorithm using Intel FPGA platform: Covid-19 case study,” J. King Saud Univ. Inf. Sci., vol. 34, no. 6, pp. 3815–3827, 2022.
[25] Y. Liao and V. R. Vemuri, “Use of k-nearest neighbor classifier for intrusion detection,” Comput. \& Secur., vol. 21, no. 5, pp. 439–448, 2002.
[26] L. Breiman, “Random forests,” Mach. Learn., vol. 45, pp. 5–32, 2001.
[27] M. Schonlau and R. Y. Zou, “The random forest algorithm for statistical learning,” Stata J., vol. 20, no. 1, pp. 3–29, 2020.
[28] J. Stefanowski and others, “On rough set based approaches to induction of decision rules,” Rough sets Knowl. Discov., vol. 1, no. 1, pp. 500–529, 1998.
[29] G. I. Webb, E. Keogh, and R. Miikkulainen, “Na{\"\i}ve Bayes.,” Encycl. Mach. Learn., vol. 15, pp. 713–714, 2010.
[30] S. Chen, G. I. Webb, L. Liu, and X. Ma, “A novel selective naïve Bayes algorithm,” Knowledge-Based Syst., vol. 192, p. 105361, 2020, doi: https://doi.org/10.1016/j.knosys.2019.105361.
[31] D. E. Goldberg, Genetic algorithms. pearson education India, 2013.
[32] R. A. Welikala et al., “Genetic algorithm based feature selection combined with dual classification for the automated detection of proliferative diabetic retinopathy,” Comput. Med. Imaging Graph., vol. 43, pp. 64–77, 2015.
[33] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of ICNN’95-international conference on neural networks, 1995, pp. 1942–1948.
[34] A. Pradhan, S. K. Bisoy, and A. Das, “A survey on PSO based meta-heuristic scheduling mechanism in cloud computing environment,” J. King Saud Univ. Inf. Sci., vol. 34, no. 8, pp. 4888–4901, 2022.
[35] A. Ahmad and L. Dey, “A feature selection technique for classificatory analysis,” Pattern Recognit. Lett., vol. 26, no. 1, pp. 43–56, 2005.
[36] L. Yu and H. Liu, “Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution,” in Proceedings of the Twentieth International Conference on Machine Learning, AAAI Press, 2003, pp. 856–863.
[37] R. M. Mohammad, F. Thabtah, and L. McCluskey, “Phishing websites features,” Sch. Comput. Eng. Univ. Huddersf., 2015.
[38] A. Karakaya, A. Ulu, and S. Akleylek, “GOALALERT: A novel real-time technical team alert approach using machine learning on an IoT-based system in sports,” Microprocess. Microsyst., vol. 93, p. 104606, 2022, doi: https://doi.org/10.1016/j.micpro.2022.104606.
[39] R. Polikar, “Ensemble learning,” in Ensemble machine learning, Springer, 2012, pp. 1–34.
[40] O. Sagi and L. Rokach, “Ensemble learning: A survey,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 8, no. 4, p. e1249, 2018.
[41] M. F. Bin Karim, T. Hasan, N. Tazreen, S. Bin Hakim, and S. Tarannum, “An investigation of ML techniques to detect Phishing Websites by complexity reduction,” in 2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom), 2022, pp. 144–149.
[42] E. Karacan, A. Karakaya, and S. Akleylek, “Quantum Secure Communication Between Service Provider and Sim,” IEEE Access, vol. 10, pp. 69135–69146, 2022, doi: 10.1109/ACCESS.2022.3186306.
[43] A. Karakaya and S. Akleylek, “A novel IoT-based health and tactical analysis model with fog computing,” PeerJ Comput. Sci., vol. 7, p. e342, 2021.
[44] A. Ulu, G. Yildiz, and B. Dizdaroğlu, “MLFAN: Multilevel Feature Attention Network With Texture Prior for Image Denoising,” IEEE Access, vol. 11, pp. 34260–34273, 2023, doi: 10.1109/ACCESS.2023.3264604.

A Novel Model Based on Ensemble Learning for Phishing Attack

Year 2024, Volume: 12 Issue: 4, 1804 - 1827, 23.10.2024

Aykut Karakaya , Ahmet Ulu

https://doi.org/10.29130/dubited.1426401

Abstract

With the increase in the speed of the internet environment and the development of the infrastructures used, people have started to perform most of their work online. As much as this makes life easier, it also increases the possibility of being attacked by malicious people. Attackers can activate a phishing attack that aims to steal information from victims by creating copied, fake websites. While this attack is very old and somewhat simple, it can still be effective due to low IT literacy. People can enter their information on these fake websites out of spontaneity or ignorance or good intentions and be exposed to Phishing attacks. The compromise of a user's account information also puts at risk the security of the organization or institution to which it is connected. In this study, we propose a new machine learning-based ensemble model with feature selection methods to detect phishing attacks. Also, an ablation study is presented to measure the effect of different feature selection methods. The proposed model which we named as NaiveStackingSymmetric (NSS) is analyzed using the widely used accuracy (ACC), the area under curve (AUC), and F-score metrics as well as the polygon area metric (PAM), and it is shown that it outperforms other studies in the literature using the same dataset.

Keywords

Phishing attack , ensemble learning , malicious URL , stacking , information security

References

[1] A. Karakaya and S. Akleylek, “A survey on security threats and authentication approaches in wireless sensor networks,” in 2018 6th International Symposium on Digital Forensic and Security (ISDFS), 2018, pp. 1–4. doi: 10.1109/ISDFS.2018.8355381.
[2] A. Karakaya and F. Arat, “A Survey on Security Requirements, Threats and Protocols in Industrial Internet of Things,” International Journal of Information Security Science, vol. 10, no. 4. Şeref SAĞIROĞLU, pp. 138–152, 2021.
[3] K. Krombholz, H. Hobel, M. Huber, and E. Weippl, “Advanced social engineering attacks,” J. Inf. Secur. Appl., vol. 22, pp. 113–122, 2015.
[4] A. Almomani et al., “Phishing website detection with semantic features based on machine learning classifiers: A comparative study,” Int. J. Semant. Web Inf. Syst., vol. 18, no. 1, pp. 1–24, 2022.
[5] S. R. Sharma, B. Singh, and M. Kaur, “Improving the classification of phishing websites using a hybrid algorithm,” Comput. Intell., vol. 38, no. 2, pp. 667–689, 2022.
[6] O. Aydemir, “A new performance evaluation metric for classifiers: polygon area metric,” J. Classif., vol. 38, pp. 16–26, 2021.
[7] S. Maurya and A. Jain, “Malicious Website Detection Based on URL Classification: A Comparative Analysis,” in Proceedings of Third International Conference on Computing, Communications, and Cyber-Security: IC4S 2021, 2022, pp. 249–260.
[8] H. Bouijij, A. Berqia, and H. Saliah-Hassan, “Phishing URL classification using Extra-Tree and DNN,” in 2022 10th International Symposium on Digital Forensics and Security (ISDFS), 2022, pp. 1–6.
[9] J. V. Cubas and G. M. Niño, “Modelo de machine learning en la detección de sitios web phishing,” Rev. Ibérica Sist. e Tecnol. Informação, no. E52, pp. 161–173, 2022.
[10] M. A. A. Siddiq, M. Arifuzzaman, and M. S. Islam, “Phishing Website Detection using Deep Learning,” in Proceedings of the 2nd International Conference on Computing Advancements, 2022, pp. 83–88.
[11] W. Fadheel, W. Al-Mawee, and S. Carr, “On Phishing: URL Lexical and Network Traffic Features Analysis and Knowledge Extraction using Machine Learning Algorithms (A Comparison Study),” in 2022 5th International Conference on Data Science and Information Technology (DSIT), 2022, pp. 1–7.
[12] A. Hashim, R. Medani, and T. A. Attia, “Defences against web application attacks and detecting phishing links using machine learning,” in 2020 international conference on computer, control, electrical, and electronics engineering (ICCCEEE), 2021, pp. 1–6.
[13] S. Dangwal and A.-N. Moldovan, “Feature Selection for Machine Learning-based Phishing Websites Detection,” in 2021 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2021, pp. 1–6.
[14] D. CJ and A. Gaurav, “Exposing model bias in machine learning revisiting the boy who cried wolf in the context of phishing detection,” J. Bus. Anal., vol. 4, no. 2, pp. 171–178, 2021.
[15] Z. Fan, “Detecting and Classifying Phishing Websites by Machine Learning,” in 2021 3rd International Conference on Applied Machine Learning (ICAML), 2021, pp. 48–51.
[16] A. Subasi and E. Kremic, “Comparison of adaboost with multiboosting for phishing website detection,” Procedia Comput. Sci., vol. 168, pp. 272–278, 2020.
[17] R. A. Kelkar and A. Vijayalakshmi, “ML BASED MODEL FOR PHISHING WEBSITE DETECTION,” challenge, vol. 7, no. 12, p. 2020.
[18] G. Sonowal and K. S. Kuppusamy, “PhiDMA--A phishing detection model with multi-filter approach,” J. King Saud Univ. Inf. Sci., vol. 32, no. 1, pp. 99–112, 2020.
[19] A. F. Nugraha and L. Rahman, “Meta-algorithms for improving classification performance in the web-phishing detection process,” in 2019 4th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), 2019, pp. 271–275.
[20] S. Adi, Y. Pristyanto, and A. Sunyoto, “The best features selection method and relevance variable for web phishing classification,” in 2019 International Conference on Information and Communications Technology (ICOIACT), 2019, pp. 578–583.
[21] I. Tyagi, J. Shad, S. Sharma, S. Gaur, and G. Kaur, “A novel machine learning approach to detect phishing websites,” in 2018 5th International conference on signal processing and integrated networks (SPIN), 2018, pp. 425–430.
[22] A. Subasi, E. Molah, F. Almkallawi, and T. J. Chaudhery, “Intelligent phishing website detection using random forest classifier,” in 2017 International conference on electrical and computing technologies and applications (ICECTA), 2017, pp. 1–5.
[23] D. R. Ibrahim and A. H. Hadi, “Phishing websites prediction using classification techniques,” in 2017 International Conference on New Trends in Computing Sciences (ICTCS), 2017, pp. 133–137.
[24] A. Almomany, W. R. Ayyad, and A. Jarrah, “Optimized implementation of an improved KNN classification algorithm using Intel FPGA platform: Covid-19 case study,” J. King Saud Univ. Inf. Sci., vol. 34, no. 6, pp. 3815–3827, 2022.
[25] Y. Liao and V. R. Vemuri, “Use of k-nearest neighbor classifier for intrusion detection,” Comput. \& Secur., vol. 21, no. 5, pp. 439–448, 2002.
[26] L. Breiman, “Random forests,” Mach. Learn., vol. 45, pp. 5–32, 2001.
[27] M. Schonlau and R. Y. Zou, “The random forest algorithm for statistical learning,” Stata J., vol. 20, no. 1, pp. 3–29, 2020.
[28] J. Stefanowski and others, “On rough set based approaches to induction of decision rules,” Rough sets Knowl. Discov., vol. 1, no. 1, pp. 500–529, 1998.
[29] G. I. Webb, E. Keogh, and R. Miikkulainen, “Na{\"\i}ve Bayes.,” Encycl. Mach. Learn., vol. 15, pp. 713–714, 2010.
[30] S. Chen, G. I. Webb, L. Liu, and X. Ma, “A novel selective naïve Bayes algorithm,” Knowledge-Based Syst., vol. 192, p. 105361, 2020, doi: https://doi.org/10.1016/j.knosys.2019.105361.
[31] D. E. Goldberg, Genetic algorithms. pearson education India, 2013.
[32] R. A. Welikala et al., “Genetic algorithm based feature selection combined with dual classification for the automated detection of proliferative diabetic retinopathy,” Comput. Med. Imaging Graph., vol. 43, pp. 64–77, 2015.
[33] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of ICNN’95-international conference on neural networks, 1995, pp. 1942–1948.
[34] A. Pradhan, S. K. Bisoy, and A. Das, “A survey on PSO based meta-heuristic scheduling mechanism in cloud computing environment,” J. King Saud Univ. Inf. Sci., vol. 34, no. 8, pp. 4888–4901, 2022.
[35] A. Ahmad and L. Dey, “A feature selection technique for classificatory analysis,” Pattern Recognit. Lett., vol. 26, no. 1, pp. 43–56, 2005.
[36] L. Yu and H. Liu, “Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution,” in Proceedings of the Twentieth International Conference on Machine Learning, AAAI Press, 2003, pp. 856–863.
[37] R. M. Mohammad, F. Thabtah, and L. McCluskey, “Phishing websites features,” Sch. Comput. Eng. Univ. Huddersf., 2015.
[38] A. Karakaya, A. Ulu, and S. Akleylek, “GOALALERT: A novel real-time technical team alert approach using machine learning on an IoT-based system in sports,” Microprocess. Microsyst., vol. 93, p. 104606, 2022, doi: https://doi.org/10.1016/j.micpro.2022.104606.
[39] R. Polikar, “Ensemble learning,” in Ensemble machine learning, Springer, 2012, pp. 1–34.
[40] O. Sagi and L. Rokach, “Ensemble learning: A survey,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 8, no. 4, p. e1249, 2018.
[41] M. F. Bin Karim, T. Hasan, N. Tazreen, S. Bin Hakim, and S. Tarannum, “An investigation of ML techniques to detect Phishing Websites by complexity reduction,” in 2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom), 2022, pp. 144–149.
[42] E. Karacan, A. Karakaya, and S. Akleylek, “Quantum Secure Communication Between Service Provider and Sim,” IEEE Access, vol. 10, pp. 69135–69146, 2022, doi: 10.1109/ACCESS.2022.3186306.
[43] A. Karakaya and S. Akleylek, “A novel IoT-based health and tactical analysis model with fog computing,” PeerJ Comput. Sci., vol. 7, p. e342, 2021.
[44] A. Ulu, G. Yildiz, and B. Dizdaroğlu, “MLFAN: Multilevel Feature Attention Network With Texture Prior for Image Denoising,” IEEE Access, vol. 11, pp. 34260–34273, 2023, doi: 10.1109/ACCESS.2023.3264604.

There are 44 citations in total.

Details

Primary Language	English
Subjects	Supervised Learning, Classification Algorithms
Journal Section	Articles
Authors	Aykut Karakaya 0000-0001-6970-3239 Ahmet Ulu 0000-0002-4618-5712
Publication Date	October 23, 2024
Submission Date	January 27, 2024
Acceptance Date	April 8, 2024
Published in Issue	Year 2024 Volume: 12 Issue: 4

Cite

APA	Karakaya, A., & Ulu, A. (2024). A Novel Model Based on Ensemble Learning for Phishing Attack. Duzce University Journal of Science and Technology, 12(4), 1804-1827. https://doi.org/10.29130/dubited.1426401
AMA	Karakaya A, Ulu A. A Novel Model Based on Ensemble Learning for Phishing Attack. DUBİTED. October 2024;12(4):1804-1827. doi:10.29130/dubited.1426401
Chicago	Karakaya, Aykut, and Ahmet Ulu. “A Novel Model Based on Ensemble Learning for Phishing Attack”. Duzce University Journal of Science and Technology 12, no. 4 (October 2024): 1804-27. https://doi.org/10.29130/dubited.1426401.
EndNote	Karakaya A, Ulu A (October 1, 2024) A Novel Model Based on Ensemble Learning for Phishing Attack. Duzce University Journal of Science and Technology 12 4 1804–1827.
IEEE	A. Karakaya and A. Ulu, “A Novel Model Based on Ensemble Learning for Phishing Attack”, DUBİTED, vol. 12, no. 4, pp. 1804–1827, 2024, doi: 10.29130/dubited.1426401.
ISNAD	Karakaya, Aykut - Ulu, Ahmet. “A Novel Model Based on Ensemble Learning for Phishing Attack”. Duzce University Journal of Science and Technology 12/4 (October2024), 1804-1827. https://doi.org/10.29130/dubited.1426401.
JAMA	Karakaya A, Ulu A. A Novel Model Based on Ensemble Learning for Phishing Attack. DUBİTED. 2024;12:1804–1827.
MLA	Karakaya, Aykut and Ahmet Ulu. “A Novel Model Based on Ensemble Learning for Phishing Attack”. Duzce University Journal of Science and Technology, vol. 12, no. 4, 2024, pp. 1804-27, doi:10.29130/dubited.1426401.
Vancouver	Karakaya A, Ulu A. A Novel Model Based on Ensemble Learning for Phishing Attack. DUBİTED. 2024;12(4):1804-27.

Download Cover Image

Article Files

Full Text