TY - JOUR T1 - İki aşamalı kötücül URL tespit mimarisi: Yeni nesil tehdit tanıma ve sınıflandırma yaklaşımı TT - Two-Stage malicious URL detection architecture: A next-generation threat recognition and classification approach AU - Şahin, Durmuş Özkan AU - Demirci, Sercan AU - Şahin, Muhammet Abdullah AU - Acar, Nuri Can AU - Kodal, Hamit Burak Can PY - 2025 DA - October Y2 - 2025 DO - 10.28948/ngumuh.1675785 JF - Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi JO - NÖHÜ Müh. Bilim. Derg. PB - Niğde Ömer Halisdemir Üniversitesi WT - DergiPark SN - 2564-6605 SP - 1498 EP - 1508 VL - 14 IS - 4 LA - tr AB - Günümüzde çevrimiçi tehditlerin artması, zararlı URL'lerin tespit edilmesini ve zarar türlerine göre sınıflandırılmasını önemli bir araştırma konusu haline getirmiştir. Bu çalışma, zararlı URL'lerin tespit edilmesi ve zarar türlerinin belirlenmesi amacıyla farklı makine öğrenimi algoritmaları ile özellik seçme yöntemlerinin kombinasyonları üzerine değerlendirme yaparak performanslarını karşılaştırmayı hedeflemektedir. Çalışmada üç farklı veri seti kullanılmıştır. Bunlardan birincisi ISCX-2016 veri seti olup 79 farklı özellik içermektedir. Ham URL’lerden oluşan diğer iki veri setinde bulunan her bir URL için aynı 79 özellik çıkarılmıştır. Modelleme sürecinde, ilk aşamada zararlı URL’lerin doğru bir şekilde sınıflandırılmasına odaklanılmış, ikinci aşamada ise zarar türlerinin belirlenmesinde kullanılan algoritmaların etkinliği karşılaştırılmıştır. Özellik seçimi uygulanmadan kullanılan Rastgele Orman algoritması, tüm özellikler üzerinden değerlendirildiğinde, hem ikili sınıflandırmada (%97 doğruluk) hem de çok sınıflı sınıflandırmada (%98 doğruluk) en yüksek performansa ulaşmıştır. Bu bulgular, zararlı URL tespit sistemlerinin geliştirilmesi açısından önemli bir rehber niteliği taşımakta ve literatüre değerli katkılar sunmaktadır. KW - Bilgisayar Güvenliği KW - Kötü Amaçlı Yazılım Tespiti KW - Makine Öğrenimi Algoritmaları KW - Özellik Çıkartma KW - URL Sınıflandırma KW - Zararlı URL Tespiti N2 - The increasing prevalence of online threats has made the detection and classification of malicious URLs a critical research topic. This study aims to evaluate and compare the performance of different machine learning algorithms combined with feature selection methods for detecting malicious URLs and determining their attack types. Three different datasets were used in the study. The first dataset, ISCX-2016, contains 79 distinct features. The same 79 features were extracted for each URL in the other two raw URL datasets. In the modeling process, the first phase focused on accurately classifying malicious URLs, while the second phase compared the effectiveness of algorithms in determining attack types. The Random Forest algorithm, when applied without feature selection and evaluated across all features, achieved the highest performance in both binary classification (97% accuracy) and multi-class classification (98% accuracy). These findings serve as a valuable guide for the development of malicious URL detection systems and provide significant contributions to the literature. CR - A. Sertçelik, Siber olaylar ekseninde siber güvenliği anlamak. Medeniyet Araştırmaları Dergisi, 2 (3), 25-42, 2015. CR - RFC 1738, Uniform resource locators (URL), T. Berners-Lee, L. Masinter, & M. McCahill, (1994). https://doi.org/10.17487/rfc1738. CR - P. Prakash, M. Kumar, R. R. Kompella and M. Gupta, PhishNet: Predictive blacklisting to detect phishing attacks. 2010 Proceedings IEEE INFOCOM, sayfa 1-5, San Diego, CA, USA, 2010. https://doi.org/10.1109/INFCOM.2010.5462216 CR - Y. Fukushima, Y. Hori and K. Sakurai, Proactive blacklisting for malicious web sites by reputation evaluation based on domain and IP address registration. 2011 IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications, sayfa 352-361, Changsha, China, 2011. https://doi.org/10.1109/TrustCom.2011.46 CR - A. K. Jain and B. B. Gupta, A novel approach to protect against phishing attacks at client side using auto-updated white-list. EURASIP Journal on Information Security, 2016, sayfa 1-11, 2016. https://doi.org/10.1186/s13635-016-0034-3 CR - H. Choi, B. B. Zhu and H. Lee, Detecting malicious web links and ıdentifying their attack types. 2nd USENIX Conference on Web Application Development (WebApps 11), Berkeley, USA, 2011. CR - W. Chu, B. B. Zhu, F. Xue, X. Guan and Z. Cai, Protect sensitive sites from phishing attacks using features extractable from ınaccessible phishing URLS. 2013 IEEE International Conference on Communications (ICC), sayfa 1990-1994, Budapest, Hungary, 2013. https://doi.org/10.1109/ICC.2013.6654816 CR - M. S. Lin, C. Y. Chiu, Y. J. Lee and H. K. Pao, Malicious URL filtering – A big data application. 2013 IEEE international conference on big data, sayfa 589-596, Santa Clara, CA, USA, 2013. https://doi.org/10.1109/BigData.2013.6691627 CR - A. Joshi, L. Lloyd, P. Westin and S. Seethapathy, Using lexical features for malicious URL detection--a machine learning approach. arXiv preprint arXiv:1910.06277, 2019. CR - A. Powell, D. Bates, C. Van Wyk and A. D. de Abreu, A cross-comparison of feature selection algorithms on multiple cyber security data-sets. Proceedings of the FAIR 2019 Workshop, sayfa. 196-207, Cape Town, South Africa, 2019. CR - O. K. Sahingoz, E. Buber, O. Demir and B. Diri, Machine learning based phishing detection from URLs. Expert Systems with Applications, 117, 345-357, 2019. https://doi.org/10.1016/j.eswa.2018.09.029. CR - S. Wang, Y. Wang and M. Tang, Auto malicious websites classification based on Naive Bayes Classifier. 2020 IEEE 3rd International Conference on Information Systems and Computer Aided Education (ICISCAE), sayfa. 443-447, Dalian, China, 2020. https://doi.org/10.1109/ICISCAE51034.2020.9236912 CR - S. Singhal, U. Chawla and R. Shorey, Machine learning & Concept drift based approach for malicious website detection. 2020 12th International Conference on Communication Systems & Networks (COMSNETS), sayfa 582-585, Bangalore, India, 2020. https://doi.org/10.1109/COMSNETS48256.2020.9027485 CR - R. S. Arslan, Kötücül web sayfalarının tespitinde Doc2Vec modeli ve makine öğrenmesi yaklaşımı. Avrupa Bilim ve Teknoloji Dergisi, (27), 792-801, 2021. https://doi.org/10.31590/ejosat.981450. CR - S. S. K. Singh, V. Menon, S. A. Sajidha, V. M. Nisha, A. Sheik Abdullah, M. Nivedita and A. Mairaj, Meta learning for enhanced web security against malicious URLs. Research Square, 2023. https://doi.org/10.21203/rs.3.rs-3626868/v1 CR - K. Sadaf, Phishing website detection using XGBoost and Catboost classifiers. 2023 International Conference on Smart Computing and Application (ICSCA), sayfa 1-6, Bali, Indonesia, 2023. https://doi.org/10.1109/ICSCA57840.2023.10087829 CR - S. Sheikhi and P. Kostakos, Safeguarding cyberspace: Enhancing malicious website detection with PSO-optimized XGBoost and firefly-based feature selection. Computers & Security, 142, 103885, 2024. https://doi.org/10.1016/j.cose.2024.103885. CR - A. E. Omolara and M. Alawida, DaE2: Unmasking malicious URLs by leveraging diverse and efficient ensemble machine learning for online security. Computers & Security, 148, 104170, 2025. https://doi.org/10.1016/j.cose.2024.104170 CR - Y. A. Kustiawan and K. I. Ghauth, Evaluating the ımpact of feature engineering in phishing URL detection: A comparative study of URL, HTML, and derived features. IEEE Access, 13, 126756-126768, 2024. https://doi.org/10.1109/ACCESS.2025.3579223 CR - H. R. Alavala, S. Singh, P. Joshi and S. Basavaraju, Enhancing malicious URL Detection with advanced machine learning techniques. 2025 First International Conference on Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT), sayfa 151-156, Bengaluru, India, 2025. https://doi.org/10.1109/CE2CT64011.2025.10939290 CR - Canadian Institute for Cybersecurity. ISCX-URL-2016 dataset. https://www.unb.ca/cic/datasets/url-2016.html, Accessed 28 December 2024 CR - Siddhartha, M. Malicious URLs dataset. https://www.kaggle.com/datasets/sid321axn/malicious-urls-dataset, Accessed 28 December 2024 CR - TeseRact. URL dataset. https://www.kaggle.com/datasets/teseract/urldataset, Accessed 28 December 2024 UR - https://doi.org/10.28948/ngumuh.1675785 L1 - https://dergipark.org.tr/tr/download/article-file/4771842 ER -