İki aşamalı kötücül URL tespit mimarisi: Yeni nesil tehdit tanıma ve sınıflandırma yaklaşımı

Durmuş Özkan Şahin; Sercan Demirci; Muhammet Abdullah Şahin; Nuri Can Acar; Hamit Burak Can Kodal

doi:10.28948/ngumuh.1675785

Research Article

Two-Stage malicious URL detection architecture: A next-generation threat recognition and classification approach

Year 2025, Volume: 14 Issue: 4, 1498 - 1508, 15.10.2025

Durmuş Özkan Şahin , Sercan Demirci , Muhammet Abdullah Şahin , Nuri Can Acar , Hamit Burak Can Kodal

Abstract

The increasing prevalence of online threats has made the detection and classification of malicious URLs a critical research topic. This study aims to evaluate and compare the performance of different machine learning algorithms combined with feature selection methods for detecting malicious URLs and determining their attack types. Three different datasets were used in the study. The first dataset, ISCX-2016, contains 79 distinct features. The same 79 features were extracted for each URL in the other two raw URL datasets. In the modeling process, the first phase focused on accurately classifying malicious URLs, while the second phase compared the effectiveness of algorithms in determining attack types. The Random Forest algorithm, when applied without feature selection and evaluated across all features, achieved the highest performance in both binary classification (97% accuracy) and multi-class classification (98% accuracy). These findings serve as a valuable guide for the development of malicious URL detection systems and provide significant contributions to the literature.

Keywords

Cybersecurity , Feature Extraction , Machine Learning Algorithms , Malicious URL Detection , Malware Detection , URL Classification

References

A. Sertçelik, Siber olaylar ekseninde siber güvenliği anlamak. Medeniyet Araştırmaları Dergisi, 2 (3), 25-42, 2015.
RFC 1738, Uniform resource locators (URL), T. Berners-Lee, L. Masinter, & M. McCahill, (1994). https://doi.org/10.17487/rfc1738.
P. Prakash, M. Kumar, R. R. Kompella and M. Gupta, PhishNet: Predictive blacklisting to detect phishing attacks. 2010 Proceedings IEEE INFOCOM, sayfa 1-5, San Diego, CA, USA, 2010. https://doi.org/10.1109/INFCOM.2010.5462216
Y. Fukushima, Y. Hori and K. Sakurai, Proactive blacklisting for malicious web sites by reputation evaluation based on domain and IP address registration. 2011 IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications, sayfa 352-361, Changsha, China, 2011. https://doi.org/10.1109/TrustCom.2011.46
A. K. Jain and B. B. Gupta, A novel approach to protect against phishing attacks at client side using auto-updated white-list. EURASIP Journal on Information Security, 2016, sayfa 1-11, 2016. https://doi.org/10.1186/s13635-016-0034-3
H. Choi, B. B. Zhu and H. Lee, Detecting malicious web links and ıdentifying their attack types. 2nd USENIX Conference on Web Application Development (WebApps 11), Berkeley, USA, 2011.
W. Chu, B. B. Zhu, F. Xue, X. Guan and Z. Cai, Protect sensitive sites from phishing attacks using features extractable from ınaccessible phishing URLS. 2013 IEEE International Conference on Communications (ICC), sayfa 1990-1994, Budapest, Hungary, 2013. https://doi.org/10.1109/ICC.2013.6654816
M. S. Lin, C. Y. Chiu, Y. J. Lee and H. K. Pao, Malicious URL filtering – A big data application. 2013 IEEE international conference on big data, sayfa 589-596, Santa Clara, CA, USA, 2013. https://doi.org/10.1109/BigData.2013.6691627
A. Joshi, L. Lloyd, P. Westin and S. Seethapathy, Using lexical features for malicious URL detection--a machine learning approach. arXiv preprint arXiv:1910.06277, 2019.
A. Powell, D. Bates, C. Van Wyk and A. D. de Abreu, A cross-comparison of feature selection algorithms on multiple cyber security data-sets. Proceedings of the FAIR 2019 Workshop, sayfa. 196-207, Cape Town, South Africa, 2019.
O. K. Sahingoz, E. Buber, O. Demir and B. Diri, Machine learning based phishing detection from URLs. Expert Systems with Applications, 117, 345-357, 2019. https://doi.org/10.1016/j.eswa.2018.09.029.
S. Wang, Y. Wang and M. Tang, Auto malicious websites classification based on Naive Bayes Classifier. 2020 IEEE 3rd International Conference on Information Systems and Computer Aided Education (ICISCAE), sayfa. 443-447, Dalian, China, 2020. https://doi.org/10.1109/ICISCAE51034.2020.9236912
S. Singhal, U. Chawla and R. Shorey, Machine learning & Concept drift based approach for malicious website detection. 2020 12th International Conference on Communication Systems & Networks (COMSNETS), sayfa 582-585, Bangalore, India, 2020. https://doi.org/10.1109/COMSNETS48256.2020.9027485
R. S. Arslan, Kötücül web sayfalarının tespitinde Doc2Vec modeli ve makine öğrenmesi yaklaşımı. Avrupa Bilim ve Teknoloji Dergisi, (27), 792-801, 2021. https://doi.org/10.31590/ejosat.981450.
S. S. K. Singh, V. Menon, S. A. Sajidha, V. M. Nisha, A. Sheik Abdullah, M. Nivedita and A. Mairaj, Meta learning for enhanced web security against malicious URLs. Research Square, 2023. https://doi.org/10.21203/rs.3.rs-3626868/v1
K. Sadaf, Phishing website detection using XGBoost and Catboost classifiers. 2023 International Conference on Smart Computing and Application (ICSCA), sayfa 1-6, Bali, Indonesia, 2023. https://doi.org/10.1109/ICSCA57840.2023.10087829
S. Sheikhi and P. Kostakos, Safeguarding cyberspace: Enhancing malicious website detection with PSO-optimized XGBoost and firefly-based feature selection. Computers & Security, 142, 103885, 2024. https://doi.org/10.1016/j.cose.2024.103885.
A. E. Omolara and M. Alawida, DaE2: Unmasking malicious URLs by leveraging diverse and efficient ensemble machine learning for online security. Computers & Security, 148, 104170, 2025. https://doi.org/10.1016/j.cose.2024.104170
Y. A. Kustiawan and K. I. Ghauth, Evaluating the ımpact of feature engineering in phishing URL detection: A comparative study of URL, HTML, and derived features. IEEE Access, 13, 126756-126768, 2024. https://doi.org/10.1109/ACCESS.2025.3579223
H. R. Alavala, S. Singh, P. Joshi and S. Basavaraju, Enhancing malicious URL Detection with advanced machine learning techniques. 2025 First International Conference on Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT), sayfa 151-156, Bengaluru, India, 2025. https://doi.org/10.1109/CE2CT64011.2025.10939290
Canadian Institute for Cybersecurity. ISCX-URL-2016 dataset. https://www.unb.ca/cic/datasets/url-2016.html, Accessed 28 December 2024
Siddhartha, M. Malicious URLs dataset. https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset, Accessed 28 December 2024
TeseRact. URL dataset. https://www.kaggle.com/datasets/teseract/urldataset, Accessed 28 December 2024

İki aşamalı kötücül URL tespit mimarisi: Yeni nesil tehdit tanıma ve sınıflandırma yaklaşımı

Year 2025, Volume: 14 Issue: 4, 1498 - 1508, 15.10.2025

Durmuş Özkan Şahin , Sercan Demirci , Muhammet Abdullah Şahin , Nuri Can Acar , Hamit Burak Can Kodal

Abstract

Günümüzde çevrimiçi tehditlerin artması, zararlı URL'lerin tespit edilmesini ve zarar türlerine göre sınıflandırılmasını önemli bir araştırma konusu haline getirmiştir. Bu çalışma, zararlı URL'lerin tespit edilmesi ve zarar türlerinin belirlenmesi amacıyla farklı makine öğrenimi algoritmaları ile özellik seçme yöntemlerinin kombinasyonları üzerine değerlendirme yaparak performanslarını karşılaştırmayı hedeflemektedir. Çalışmada üç farklı veri seti kullanılmıştır. Bunlardan birincisi ISCX-2016 veri seti olup 79 farklı özellik içermektedir. Ham URL’lerden oluşan diğer iki veri setinde bulunan her bir URL için aynı 79 özellik çıkarılmıştır. Modelleme sürecinde, ilk aşamada zararlı URL’lerin doğru bir şekilde sınıflandırılmasına odaklanılmış, ikinci aşamada ise zarar türlerinin belirlenmesinde kullanılan algoritmaların etkinliği karşılaştırılmıştır. Özellik seçimi uygulanmadan kullanılan Rastgele Orman algoritması, tüm özellikler üzerinden değerlendirildiğinde, hem ikili sınıflandırmada (%97 doğruluk) hem de çok sınıflı sınıflandırmada (%98 doğruluk) en yüksek performansa ulaşmıştır. Bu bulgular, zararlı URL tespit sistemlerinin geliştirilmesi açısından önemli bir rehber niteliği taşımakta ve literatüre değerli katkılar sunmaktadır.

Keywords

Bilgisayar Güvenliği , Kötü Amaçlı Yazılım Tespiti , Makine Öğrenimi Algoritmaları , Özellik Çıkartma , URL Sınıflandırma , Zararlı URL Tespiti

References

A. Sertçelik, Siber olaylar ekseninde siber güvenliği anlamak. Medeniyet Araştırmaları Dergisi, 2 (3), 25-42, 2015.
RFC 1738, Uniform resource locators (URL), T. Berners-Lee, L. Masinter, & M. McCahill, (1994). https://doi.org/10.17487/rfc1738.
P. Prakash, M. Kumar, R. R. Kompella and M. Gupta, PhishNet: Predictive blacklisting to detect phishing attacks. 2010 Proceedings IEEE INFOCOM, sayfa 1-5, San Diego, CA, USA, 2010. https://doi.org/10.1109/INFCOM.2010.5462216
Y. Fukushima, Y. Hori and K. Sakurai, Proactive blacklisting for malicious web sites by reputation evaluation based on domain and IP address registration. 2011 IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications, sayfa 352-361, Changsha, China, 2011. https://doi.org/10.1109/TrustCom.2011.46
A. K. Jain and B. B. Gupta, A novel approach to protect against phishing attacks at client side using auto-updated white-list. EURASIP Journal on Information Security, 2016, sayfa 1-11, 2016. https://doi.org/10.1186/s13635-016-0034-3
H. Choi, B. B. Zhu and H. Lee, Detecting malicious web links and ıdentifying their attack types. 2nd USENIX Conference on Web Application Development (WebApps 11), Berkeley, USA, 2011.
W. Chu, B. B. Zhu, F. Xue, X. Guan and Z. Cai, Protect sensitive sites from phishing attacks using features extractable from ınaccessible phishing URLS. 2013 IEEE International Conference on Communications (ICC), sayfa 1990-1994, Budapest, Hungary, 2013. https://doi.org/10.1109/ICC.2013.6654816
M. S. Lin, C. Y. Chiu, Y. J. Lee and H. K. Pao, Malicious URL filtering – A big data application. 2013 IEEE international conference on big data, sayfa 589-596, Santa Clara, CA, USA, 2013. https://doi.org/10.1109/BigData.2013.6691627
A. Joshi, L. Lloyd, P. Westin and S. Seethapathy, Using lexical features for malicious URL detection--a machine learning approach. arXiv preprint arXiv:1910.06277, 2019.
A. Powell, D. Bates, C. Van Wyk and A. D. de Abreu, A cross-comparison of feature selection algorithms on multiple cyber security data-sets. Proceedings of the FAIR 2019 Workshop, sayfa. 196-207, Cape Town, South Africa, 2019.
O. K. Sahingoz, E. Buber, O. Demir and B. Diri, Machine learning based phishing detection from URLs. Expert Systems with Applications, 117, 345-357, 2019. https://doi.org/10.1016/j.eswa.2018.09.029.
S. Wang, Y. Wang and M. Tang, Auto malicious websites classification based on Naive Bayes Classifier. 2020 IEEE 3rd International Conference on Information Systems and Computer Aided Education (ICISCAE), sayfa. 443-447, Dalian, China, 2020. https://doi.org/10.1109/ICISCAE51034.2020.9236912
S. Singhal, U. Chawla and R. Shorey, Machine learning & Concept drift based approach for malicious website detection. 2020 12th International Conference on Communication Systems & Networks (COMSNETS), sayfa 582-585, Bangalore, India, 2020. https://doi.org/10.1109/COMSNETS48256.2020.9027485
R. S. Arslan, Kötücül web sayfalarının tespitinde Doc2Vec modeli ve makine öğrenmesi yaklaşımı. Avrupa Bilim ve Teknoloji Dergisi, (27), 792-801, 2021. https://doi.org/10.31590/ejosat.981450.
S. S. K. Singh, V. Menon, S. A. Sajidha, V. M. Nisha, A. Sheik Abdullah, M. Nivedita and A. Mairaj, Meta learning for enhanced web security against malicious URLs. Research Square, 2023. https://doi.org/10.21203/rs.3.rs-3626868/v1
K. Sadaf, Phishing website detection using XGBoost and Catboost classifiers. 2023 International Conference on Smart Computing and Application (ICSCA), sayfa 1-6, Bali, Indonesia, 2023. https://doi.org/10.1109/ICSCA57840.2023.10087829
S. Sheikhi and P. Kostakos, Safeguarding cyberspace: Enhancing malicious website detection with PSO-optimized XGBoost and firefly-based feature selection. Computers & Security, 142, 103885, 2024. https://doi.org/10.1016/j.cose.2024.103885.
A. E. Omolara and M. Alawida, DaE2: Unmasking malicious URLs by leveraging diverse and efficient ensemble machine learning for online security. Computers & Security, 148, 104170, 2025. https://doi.org/10.1016/j.cose.2024.104170
Y. A. Kustiawan and K. I. Ghauth, Evaluating the ımpact of feature engineering in phishing URL detection: A comparative study of URL, HTML, and derived features. IEEE Access, 13, 126756-126768, 2024. https://doi.org/10.1109/ACCESS.2025.3579223
H. R. Alavala, S. Singh, P. Joshi and S. Basavaraju, Enhancing malicious URL Detection with advanced machine learning techniques. 2025 First International Conference on Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT), sayfa 151-156, Bengaluru, India, 2025. https://doi.org/10.1109/CE2CT64011.2025.10939290
Canadian Institute for Cybersecurity. ISCX-URL-2016 dataset. https://www.unb.ca/cic/datasets/url-2016.html, Accessed 28 December 2024
Siddhartha, M. Malicious URLs dataset. https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset, Accessed 28 December 2024
TeseRact. URL dataset. https://www.kaggle.com/datasets/teseract/urldataset, Accessed 28 December 2024

There are 23 citations in total.

Details

Primary Language	Turkish
Subjects	Machine Learning (Other), Software and Application Security
Journal Section	Research Articles
Authors	Durmuş Özkan Şahin 0000-0002-0831-7825 Sercan Demirci 0000-0001-6739-7653 Muhammet Abdullah Şahin 0009-0008-9801-078X Nuri Can Acar 0009-0005-3570-431X Hamit Burak Can Kodal 0009-0000-0415-606X
Early Pub Date	October 5, 2025
Publication Date	October 15, 2025
Submission Date	April 14, 2025
Acceptance Date	September 17, 2025
Published in Issue	Year 2025 Volume: 14 Issue: 4

Cite

APA	Şahin, D. Ö., Demirci, S., Şahin, M. A., … Acar, N. C. (2025). İki aşamalı kötücül URL tespit mimarisi: Yeni nesil tehdit tanıma ve sınıflandırma yaklaşımı. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi, 14(4), 1498-1508. https://doi.org/10.28948/ngumuh.1675785
AMA	Şahin DÖ, Demirci S, Şahin MA, Acar NC, Kodal HBC. İki aşamalı kötücül URL tespit mimarisi: Yeni nesil tehdit tanıma ve sınıflandırma yaklaşımı. NOHU J. Eng. Sci. October 2025;14(4):1498-1508. doi:10.28948/ngumuh.1675785
Chicago	Şahin, Durmuş Özkan, Sercan Demirci, Muhammet Abdullah Şahin, Nuri Can Acar, and Hamit Burak Can Kodal. “İki Aşamalı Kötücül URL Tespit Mimarisi: Yeni Nesil Tehdit Tanıma Ve Sınıflandırma Yaklaşımı”. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi 14, no. 4 (October 2025): 1498-1508. https://doi.org/10.28948/ngumuh.1675785.
EndNote	Şahin DÖ, Demirci S, Şahin MA, Acar NC, Kodal HBC (October 1, 2025) İki aşamalı kötücül URL tespit mimarisi: Yeni nesil tehdit tanıma ve sınıflandırma yaklaşımı. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi 14 4 1498–1508.
IEEE	D. Ö. Şahin, S. Demirci, M. A. Şahin, N. C. Acar, and H. B. C. Kodal, “İki aşamalı kötücül URL tespit mimarisi: Yeni nesil tehdit tanıma ve sınıflandırma yaklaşımı”, NOHU J. Eng. Sci., vol. 14, no. 4, pp. 1498–1508, 2025, doi: 10.28948/ngumuh.1675785.
ISNAD	Şahin, Durmuş Özkan et al. “İki Aşamalı Kötücül URL Tespit Mimarisi: Yeni Nesil Tehdit Tanıma Ve Sınıflandırma Yaklaşımı”. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi 14/4 (October2025), 1498-1508. https://doi.org/10.28948/ngumuh.1675785.
JAMA	Şahin DÖ, Demirci S, Şahin MA, Acar NC, Kodal HBC. İki aşamalı kötücül URL tespit mimarisi: Yeni nesil tehdit tanıma ve sınıflandırma yaklaşımı. NOHU J. Eng. Sci. 2025;14:1498–1508.
MLA	Şahin, Durmuş Özkan et al. “İki Aşamalı Kötücül URL Tespit Mimarisi: Yeni Nesil Tehdit Tanıma Ve Sınıflandırma Yaklaşımı”. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi, vol. 14, no. 4, 2025, pp. 1498-0, doi:10.28948/ngumuh.1675785.
Vancouver	Şahin DÖ, Demirci S, Şahin MA, Acar NC, Kodal HBC. İki aşamalı kötücül URL tespit mimarisi: Yeni nesil tehdit tanıma ve sınıflandırma yaklaşımı. NOHU J. Eng. Sci. 2025;14(4):1498-50.

Download Cover Image

Article Files

Full Text

download