Research Article
BibTex RIS Cite

Two-Stage malicious URL detection architecture: A next-generation threat recognition and classification approach

Year 2025, Volume: 14 Issue: 4, 1498 - 1508, 15.10.2025

Abstract

The increasing prevalence of online threats has made the detection and classification of malicious URLs a critical research topic. This study aims to evaluate and compare the performance of different machine learning algorithms combined with feature selection methods for detecting malicious URLs and determining their attack types. Three different datasets were used in the study. The first dataset, ISCX-2016, contains 79 distinct features. The same 79 features were extracted for each URL in the other two raw URL datasets. In the modeling process, the first phase focused on accurately classifying malicious URLs, while the second phase compared the effectiveness of algorithms in determining attack types. The Random Forest algorithm, when applied without feature selection and evaluated across all features, achieved the highest performance in both binary classification (97% accuracy) and multi-class classification (98% accuracy). These findings serve as a valuable guide for the development of malicious URL detection systems and provide significant contributions to the literature.

References

  • A. Sertçelik, Siber olaylar ekseninde siber güvenliği anlamak. Medeniyet Araştırmaları Dergisi, 2 (3), 25-42, 2015.
  • RFC 1738, Uniform resource locators (URL), T. Berners-Lee, L. Masinter, & M. McCahill, (1994). https://doi.org/10.17487/rfc1738.
  • P. Prakash, M. Kumar, R. R. Kompella and M. Gupta, PhishNet: Predictive blacklisting to detect phishing attacks. 2010 Proceedings IEEE INFOCOM, sayfa 1-5, San Diego, CA, USA, 2010. https://doi.org/10.1109/INFCOM.2010.5462216
  • Y. Fukushima, Y. Hori and K. Sakurai, Proactive blacklisting for malicious web sites by reputation evaluation based on domain and IP address registration. 2011 IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications, sayfa 352-361, Changsha, China, 2011. https://doi.org/10.1109/TrustCom.2011.46
  • A. K. Jain and B. B. Gupta, A novel approach to protect against phishing attacks at client side using auto-updated white-list. EURASIP Journal on Information Security, 2016, sayfa 1-11, 2016. https://doi.org/10.1186/s13635-016-0034-3
  • H. Choi, B. B. Zhu and H. Lee, Detecting malicious web links and ıdentifying their attack types. 2nd USENIX Conference on Web Application Development (WebApps 11), Berkeley, USA, 2011.
  • W. Chu, B. B. Zhu, F. Xue, X. Guan and Z. Cai, Protect sensitive sites from phishing attacks using features extractable from ınaccessible phishing URLS. 2013 IEEE International Conference on Communications (ICC), sayfa 1990-1994, Budapest, Hungary, 2013. https://doi.org/10.1109/ICC.2013.6654816
  • M. S. Lin, C. Y. Chiu, Y. J. Lee and H. K. Pao, Malicious URL filtering – A big data application. 2013 IEEE international conference on big data, sayfa 589-596, Santa Clara, CA, USA, 2013. https://doi.org/10.1109/BigData.2013.6691627
  • A. Joshi, L. Lloyd, P. Westin and S. Seethapathy, Using lexical features for malicious URL detection--a machine learning approach. arXiv preprint arXiv:1910.06277, 2019.
  • A. Powell, D. Bates, C. Van Wyk and A. D. de Abreu, A cross-comparison of feature selection algorithms on multiple cyber security data-sets. Proceedings of the FAIR 2019 Workshop, sayfa. 196-207, Cape Town, South Africa, 2019.
  • O. K. Sahingoz, E. Buber, O. Demir and B. Diri, Machine learning based phishing detection from URLs. Expert Systems with Applications, 117, 345-357, 2019. https://doi.org/10.1016/j.eswa.2018.09.029.
  • S. Wang, Y. Wang and M. Tang, Auto malicious websites classification based on Naive Bayes Classifier. 2020 IEEE 3rd International Conference on Information Systems and Computer Aided Education (ICISCAE), sayfa. 443-447, Dalian, China, 2020. https://doi.org/10.1109/ICISCAE51034.2020.9236912
  • S. Singhal, U. Chawla and R. Shorey, Machine learning & Concept drift based approach for malicious website detection. 2020 12th International Conference on Communication Systems & Networks (COMSNETS), sayfa 582-585, Bangalore, India, 2020. https://doi.org/10.1109/COMSNETS48256.2020.9027485
  • R. S. Arslan, Kötücül web sayfalarının tespitinde Doc2Vec modeli ve makine öğrenmesi yaklaşımı. Avrupa Bilim ve Teknoloji Dergisi, (27), 792-801, 2021. https://doi.org/10.31590/ejosat.981450.
  • S. S. K. Singh, V. Menon, S. A. Sajidha, V. M. Nisha, A. Sheik Abdullah, M. Nivedita and A. Mairaj, Meta learning for enhanced web security against malicious URLs. Research Square, 2023. https://doi.org/10.21203/rs.3.rs-3626868/v1
  • K. Sadaf, Phishing website detection using XGBoost and Catboost classifiers. 2023 International Conference on Smart Computing and Application (ICSCA), sayfa 1-6, Bali, Indonesia, 2023. https://doi.org/10.1109/ICSCA57840.2023.10087829
  • S. Sheikhi and P. Kostakos, Safeguarding cyberspace: Enhancing malicious website detection with PSO-optimized XGBoost and firefly-based feature selection. Computers & Security, 142, 103885, 2024. https://doi.org/10.1016/j.cose.2024.103885.
  • A. E. Omolara and M. Alawida, DaE2: Unmasking malicious URLs by leveraging diverse and efficient ensemble machine learning for online security. Computers & Security, 148, 104170, 2025. https://doi.org/10.1016/j.cose.2024.104170
  • Y. A. Kustiawan and K. I. Ghauth, Evaluating the ımpact of feature engineering in phishing URL detection: A comparative study of URL, HTML, and derived features. IEEE Access, 13, 126756-126768, 2024. https://doi.org/10.1109/ACCESS.2025.3579223
  • H. R. Alavala, S. Singh, P. Joshi and S. Basavaraju, Enhancing malicious URL Detection with advanced machine learning techniques. 2025 First International Conference on Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT), sayfa 151-156, Bengaluru, India, 2025. https://doi.org/10.1109/CE2CT64011.2025.10939290
  • Canadian Institute for Cybersecurity. ISCX-URL-2016 dataset. https://www.unb.ca/cic/datasets/url-2016.html, Accessed 28 December 2024
  • Siddhartha, M. Malicious URLs dataset. https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset, Accessed 28 December 2024
  • TeseRact. URL dataset. https://www.kaggle.com/datasets/teseract/urldataset, Accessed 28 December 2024

İki aşamalı kötücül URL tespit mimarisi: Yeni nesil tehdit tanıma ve sınıflandırma yaklaşımı

Year 2025, Volume: 14 Issue: 4, 1498 - 1508, 15.10.2025

Abstract

Günümüzde çevrimiçi tehditlerin artması, zararlı URL'lerin tespit edilmesini ve zarar türlerine göre sınıflandırılmasını önemli bir araştırma konusu haline getirmiştir. Bu çalışma, zararlı URL'lerin tespit edilmesi ve zarar türlerinin belirlenmesi amacıyla farklı makine öğrenimi algoritmaları ile özellik seçme yöntemlerinin kombinasyonları üzerine değerlendirme yaparak performanslarını karşılaştırmayı hedeflemektedir. Çalışmada üç farklı veri seti kullanılmıştır. Bunlardan birincisi ISCX-2016 veri seti olup 79 farklı özellik içermektedir. Ham URL’lerden oluşan diğer iki veri setinde bulunan her bir URL için aynı 79 özellik çıkarılmıştır. Modelleme sürecinde, ilk aşamada zararlı URL’lerin doğru bir şekilde sınıflandırılmasına odaklanılmış, ikinci aşamada ise zarar türlerinin belirlenmesinde kullanılan algoritmaların etkinliği karşılaştırılmıştır. Özellik seçimi uygulanmadan kullanılan Rastgele Orman algoritması, tüm özellikler üzerinden değerlendirildiğinde, hem ikili sınıflandırmada (%97 doğruluk) hem de çok sınıflı sınıflandırmada (%98 doğruluk) en yüksek performansa ulaşmıştır. Bu bulgular, zararlı URL tespit sistemlerinin geliştirilmesi açısından önemli bir rehber niteliği taşımakta ve literatüre değerli katkılar sunmaktadır.

References

  • A. Sertçelik, Siber olaylar ekseninde siber güvenliği anlamak. Medeniyet Araştırmaları Dergisi, 2 (3), 25-42, 2015.
  • RFC 1738, Uniform resource locators (URL), T. Berners-Lee, L. Masinter, & M. McCahill, (1994). https://doi.org/10.17487/rfc1738.
  • P. Prakash, M. Kumar, R. R. Kompella and M. Gupta, PhishNet: Predictive blacklisting to detect phishing attacks. 2010 Proceedings IEEE INFOCOM, sayfa 1-5, San Diego, CA, USA, 2010. https://doi.org/10.1109/INFCOM.2010.5462216
  • Y. Fukushima, Y. Hori and K. Sakurai, Proactive blacklisting for malicious web sites by reputation evaluation based on domain and IP address registration. 2011 IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications, sayfa 352-361, Changsha, China, 2011. https://doi.org/10.1109/TrustCom.2011.46
  • A. K. Jain and B. B. Gupta, A novel approach to protect against phishing attacks at client side using auto-updated white-list. EURASIP Journal on Information Security, 2016, sayfa 1-11, 2016. https://doi.org/10.1186/s13635-016-0034-3
  • H. Choi, B. B. Zhu and H. Lee, Detecting malicious web links and ıdentifying their attack types. 2nd USENIX Conference on Web Application Development (WebApps 11), Berkeley, USA, 2011.
  • W. Chu, B. B. Zhu, F. Xue, X. Guan and Z. Cai, Protect sensitive sites from phishing attacks using features extractable from ınaccessible phishing URLS. 2013 IEEE International Conference on Communications (ICC), sayfa 1990-1994, Budapest, Hungary, 2013. https://doi.org/10.1109/ICC.2013.6654816
  • M. S. Lin, C. Y. Chiu, Y. J. Lee and H. K. Pao, Malicious URL filtering – A big data application. 2013 IEEE international conference on big data, sayfa 589-596, Santa Clara, CA, USA, 2013. https://doi.org/10.1109/BigData.2013.6691627
  • A. Joshi, L. Lloyd, P. Westin and S. Seethapathy, Using lexical features for malicious URL detection--a machine learning approach. arXiv preprint arXiv:1910.06277, 2019.
  • A. Powell, D. Bates, C. Van Wyk and A. D. de Abreu, A cross-comparison of feature selection algorithms on multiple cyber security data-sets. Proceedings of the FAIR 2019 Workshop, sayfa. 196-207, Cape Town, South Africa, 2019.
  • O. K. Sahingoz, E. Buber, O. Demir and B. Diri, Machine learning based phishing detection from URLs. Expert Systems with Applications, 117, 345-357, 2019. https://doi.org/10.1016/j.eswa.2018.09.029.
  • S. Wang, Y. Wang and M. Tang, Auto malicious websites classification based on Naive Bayes Classifier. 2020 IEEE 3rd International Conference on Information Systems and Computer Aided Education (ICISCAE), sayfa. 443-447, Dalian, China, 2020. https://doi.org/10.1109/ICISCAE51034.2020.9236912
  • S. Singhal, U. Chawla and R. Shorey, Machine learning & Concept drift based approach for malicious website detection. 2020 12th International Conference on Communication Systems & Networks (COMSNETS), sayfa 582-585, Bangalore, India, 2020. https://doi.org/10.1109/COMSNETS48256.2020.9027485
  • R. S. Arslan, Kötücül web sayfalarının tespitinde Doc2Vec modeli ve makine öğrenmesi yaklaşımı. Avrupa Bilim ve Teknoloji Dergisi, (27), 792-801, 2021. https://doi.org/10.31590/ejosat.981450.
  • S. S. K. Singh, V. Menon, S. A. Sajidha, V. M. Nisha, A. Sheik Abdullah, M. Nivedita and A. Mairaj, Meta learning for enhanced web security against malicious URLs. Research Square, 2023. https://doi.org/10.21203/rs.3.rs-3626868/v1
  • K. Sadaf, Phishing website detection using XGBoost and Catboost classifiers. 2023 International Conference on Smart Computing and Application (ICSCA), sayfa 1-6, Bali, Indonesia, 2023. https://doi.org/10.1109/ICSCA57840.2023.10087829
  • S. Sheikhi and P. Kostakos, Safeguarding cyberspace: Enhancing malicious website detection with PSO-optimized XGBoost and firefly-based feature selection. Computers & Security, 142, 103885, 2024. https://doi.org/10.1016/j.cose.2024.103885.
  • A. E. Omolara and M. Alawida, DaE2: Unmasking malicious URLs by leveraging diverse and efficient ensemble machine learning for online security. Computers & Security, 148, 104170, 2025. https://doi.org/10.1016/j.cose.2024.104170
  • Y. A. Kustiawan and K. I. Ghauth, Evaluating the ımpact of feature engineering in phishing URL detection: A comparative study of URL, HTML, and derived features. IEEE Access, 13, 126756-126768, 2024. https://doi.org/10.1109/ACCESS.2025.3579223
  • H. R. Alavala, S. Singh, P. Joshi and S. Basavaraju, Enhancing malicious URL Detection with advanced machine learning techniques. 2025 First International Conference on Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT), sayfa 151-156, Bengaluru, India, 2025. https://doi.org/10.1109/CE2CT64011.2025.10939290
  • Canadian Institute for Cybersecurity. ISCX-URL-2016 dataset. https://www.unb.ca/cic/datasets/url-2016.html, Accessed 28 December 2024
  • Siddhartha, M. Malicious URLs dataset. https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset, Accessed 28 December 2024
  • TeseRact. URL dataset. https://www.kaggle.com/datasets/teseract/urldataset, Accessed 28 December 2024
There are 23 citations in total.

Details

Primary Language Turkish
Subjects Machine Learning (Other), Software and Application Security
Journal Section Research Articles
Authors

Durmuş Özkan Şahin 0000-0002-0831-7825

Sercan Demirci 0000-0001-6739-7653

Muhammet Abdullah Şahin 0009-0008-9801-078X

Nuri Can Acar 0009-0005-3570-431X

Hamit Burak Can Kodal 0009-0000-0415-606X

Early Pub Date October 5, 2025
Publication Date October 15, 2025
Submission Date April 14, 2025
Acceptance Date September 17, 2025
Published in Issue Year 2025 Volume: 14 Issue: 4

Cite

APA Şahin, D. Ö., Demirci, S., Şahin, M. A., … Acar, N. C. (2025). İki aşamalı kötücül URL tespit mimarisi: Yeni nesil tehdit tanıma ve sınıflandırma yaklaşımı. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi, 14(4), 1498-1508. https://doi.org/10.28948/ngumuh.1675785
AMA Şahin DÖ, Demirci S, Şahin MA, Acar NC, Kodal HBC. İki aşamalı kötücül URL tespit mimarisi: Yeni nesil tehdit tanıma ve sınıflandırma yaklaşımı. NOHU J. Eng. Sci. October 2025;14(4):1498-1508. doi:10.28948/ngumuh.1675785
Chicago Şahin, Durmuş Özkan, Sercan Demirci, Muhammet Abdullah Şahin, Nuri Can Acar, and Hamit Burak Can Kodal. “İki Aşamalı Kötücül URL Tespit Mimarisi: Yeni Nesil Tehdit Tanıma Ve Sınıflandırma Yaklaşımı”. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi 14, no. 4 (October 2025): 1498-1508. https://doi.org/10.28948/ngumuh.1675785.
EndNote Şahin DÖ, Demirci S, Şahin MA, Acar NC, Kodal HBC (October 1, 2025) İki aşamalı kötücül URL tespit mimarisi: Yeni nesil tehdit tanıma ve sınıflandırma yaklaşımı. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi 14 4 1498–1508.
IEEE D. Ö. Şahin, S. Demirci, M. A. Şahin, N. C. Acar, and H. B. C. Kodal, “İki aşamalı kötücül URL tespit mimarisi: Yeni nesil tehdit tanıma ve sınıflandırma yaklaşımı”, NOHU J. Eng. Sci., vol. 14, no. 4, pp. 1498–1508, 2025, doi: 10.28948/ngumuh.1675785.
ISNAD Şahin, Durmuş Özkan et al. “İki Aşamalı Kötücül URL Tespit Mimarisi: Yeni Nesil Tehdit Tanıma Ve Sınıflandırma Yaklaşımı”. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi 14/4 (October2025), 1498-1508. https://doi.org/10.28948/ngumuh.1675785.
JAMA Şahin DÖ, Demirci S, Şahin MA, Acar NC, Kodal HBC. İki aşamalı kötücül URL tespit mimarisi: Yeni nesil tehdit tanıma ve sınıflandırma yaklaşımı. NOHU J. Eng. Sci. 2025;14:1498–1508.
MLA Şahin, Durmuş Özkan et al. “İki Aşamalı Kötücül URL Tespit Mimarisi: Yeni Nesil Tehdit Tanıma Ve Sınıflandırma Yaklaşımı”. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi, vol. 14, no. 4, 2025, pp. 1498-0, doi:10.28948/ngumuh.1675785.
Vancouver Şahin DÖ, Demirci S, Şahin MA, Acar NC, Kodal HBC. İki aşamalı kötücül URL tespit mimarisi: Yeni nesil tehdit tanıma ve sınıflandırma yaklaşımı. NOHU J. Eng. Sci. 2025;14(4):1498-50.

download