Research Article
BibTex RIS Cite

Context-Aware Phishing URL Detection: Harnessing the Power of Large Language Models

Year 2025, Volume: 3 Issue: 2, 1 - 25, 31.12.2025

Abstract

Phishing attacks remain a significant cybersecurity threat, exploiting users by disguising malicious URLs to resemble legitimate websites. Traditional machine learning approaches often struggle with detecting such threats due to reliance on handcrafted features and limited contextual understanding. This study investigates the effectiveness of advanced transformer-based models i.e., BERT, GPT-2, and XLM-RoBERTa for automatic phishing URL detection. The models were trained and evaluated on a labeled dataset comprising phishing and legitimate URLs, with performance assessed through comprehensive metrics and diagnostic curves. Experimental results showed that XLM-RoBERTa achieved the highest accuracy of 96.1%, outperforming BERT (94.2%) and GPT-2 (92.4%). Precision, recall, F1-score, and AUC-ROC metrics were consistently high across all transformer models, with XLM-RoBERTa demonstrating the most balanced performance. Further evaluation using precision-recall curves, lift and gain charts, calibration curves, and Kolmogorov–Smirnov (KS) plots provided in-depth insights into model discrimination and calibration. These findings underscore the advantages of deep contextualized language models in accurately and reliably detecting phishing URLs, offering a promising approach for enhancing cybersecurity defenses.

References

  • M. Moghimi, A. Y. Varjani, New rule-based phishing detection method, Expert Systems with Applications 53 (2016) 231–242.
  • K. S. Adewole, A. G. Akintola, S. A. Salihu, N. Faruk, R. G. Jimoh, Hybrid rule-based model for phishing urls detection, in: Emerging Technologies in Computing (iCETiC 2019), 2019.
  • A. Subasi, E. Molah, F. Almkallawi, T. J. Chaudhery, Intelligent phishing website detection using random forest classifier, in: International Conference on Electrical and Computing Technologies and Applications (ICECTA), 2017.
  • B. B. Gupta, K. Yadav, I. Razzak, K. Psannis, A. Castiglione, X. Chang, A novel approach for phishing urls detection using lexical based machine learning in a real-time environment, Computer Communications 175 (2021) 47–57.
  • M. Sanchez-Paniagua, E. F. Fern´andez, E. Alegre, W. Al-Nabki, V. Gonz´alez-Castro, Phishing url detection: A real-case scenario through login urls, IEEE Access 10 (2022) 42949–42960.
  • S. H. Ahammad, S. D. Kale, G. D. Upadhye, S. D. Pande, E. V. Babu, A. V. Dhumane, M. D. K. J. Bahadur, Phishing url detection using machine learning methods, Advances in Engineering Software 173 (2022) 103288.
  • S. Jalil, M. Usman, A. Fong, Highly accurate phishing url detection based on machine learning, Journal of Ambient Intelligence and Humanized Computing 14 (7) (2023) 9233–9251.
  • S. R. A. Samad, S. Balasubaramanian, A. S. Al-Kaabi, B. Sharma, S. Chowdhury, A. Mehbodniya, Analysis of the performance impact of fine-tuned machine learning model for phishing url detection, Electronics 12 (7) (2023) 1642.
  • E. Kritika, A comprehensive literature review on phishing url detection using deep learning techniques, Journal of Cyber Security Technology (2024) 1–29.
  • M. M. Alani, H. Tawfik, Phishnot: A cloud-based machine-learning approach to phishing url detection, Computer Networks 218 (2022) 109407.
  • S. Remya, M. J. Pillai, K. K. Nair, S. R. Subbareddy, Y. Y. Cho, An effective detection approach for phishing url using resmlp, IEEE Access (2024).
  • T. Kim, N. Park, J. Hong, S. W. Kim, Phishing url detection: A network-based approach robust to evasion, in: ACM Conference on Computer and Communications Security, 2022.
  • V. Vajrobol, B. B. Gupta, A. Gaurav, Mutual information based logistic regression for phishing url detection, Cyber Security and Applications 2 (2024) 100044.
  • M. Alsaedi, F. A. Ghaleb, F. Saeed, J. Ahmad, M. Alasli, Cyber threat intelligence-based malicious url detection model using ensemble learning, Sensors 22 (9) (2022) 3373.
  • A. Prasad, S. Chandra, Phiusiil: A diverse security profile empowered phishing url detection framework, Computers & Security 136 (2024) 103545.
  • H. Ghalechyan, E. Israyelyan, A. Arakelyan, G. Hovhannisyan, A. Davtyan, Phishing url detection with neural networks: An empirical study, Scientific Reports 14 (1) (2024) 25134.
  • N. T. Lam, Developing a framework for detecting phishing urls using machine learning, International Journal of Computer Science & Network Security 23 (10) (2023) 157–163.
  • F. Rashid, B. Doyle, S. C. Han, S. Seneviratne, Phishing url detection generalisation using unsupervised domain adaptation, Computer Networks 245 (2024) 110398.
  • M. Mossano, O. Kulyk, B. M. Berens, E. M. H¨außler, M. Volkamer, Influence of url formatting on users’ phishing url detection, European Symposium on Usable Security (2023) 318–333.
  • A. Ozcan, C. Catal, E. Donmez, B. Senturk, A hybrid dnn–lstm model for detecting phishing urls, Neural Computing and Applications (2023) 1–17.
  • D. T. Mosa, M. Y. Shams, A. A. Abohany, E. S. M. El-Kenawy, M. Thabet, Machine learning techniques for detecting phishing url attacks, Computers, Materials and Continua 75 (1) (2023) 1271–1290.
  • S. J. Buu, S. B. Cho, A transformer network calibrated with fuzzy logic for phishing url detection, Fuzzy Sets and Systems (2025) 109474.
  • P. H. Hussan, S. M. Mangj, Bertphiurl: A teacher-student learning approach for detecting phishing urls, Journal of Future Artificial Intelligence and Technologies 1 (4) (2025) 417–428.
  • A. Simhadri, M. Rishikesh, M. Subramaniam, Machine learning in phishing url detection: A review of recent progress, Power Energy and Secure Smart Technologies (2025).
  • Z. Zhang, J. Wu, N. Lu, W. Shi, Z. Liu, Adaptpud: An accurate url-based detection approach against tailored deceptive phishing websites, Computer Networks (2025) 111303.
  • J. Zhou, K. Zhang, A. Bilal, Y. Zhou, Y. Fan, W. Pan, An integrated csppc and bilstm framework for malicious url detection, Scientific Reports 15 (1) (2025) 6659.
  • W. Guo, Q. Wang, H. Yue, H. Sun, R. Q. Hu, Efficient phishing url detection using graph-based machine learning, arXiv preprint (2025).
  • A. Aljofey, Q. Jiang, A. Rasool, H. Chen, W. Liu, Q. Qu, Y. Wang, An effective detection approach for phishing websites using url and html features, Scientific Reports 12 (1) (2022) 8842.
  • S. Paul, S. Saha, Cyberbert: Bert for cyberbullying identification, Multimedia Systems 28 (6) (2022) 1897–1904.
  • H. B. Azhar, A. A. Runa, Enhancing cyberbullying detection with roberta, Global Security, Safety, and Sustainability (2025).
  • A. Rawla, S. Singh, M. Daniyal, P. Dubey, Detection of phishing attacks in phiusiil dataset using deep learning, Procedia Computer Science 259 (2025) 543–552.
  • S. Aslam, H. Aslam, A. Manzoor, H. Chen, A. Rasool, Antiphishstack: Lstm-based stacked generalization model for optimized phishing url detection, Symmetry 16 (2) (2024) 248.
  • M. Diviya, M. Subramanian, D. G. Krishnan, An optimized phishing detection model using hybrid feature selection, Engineering Research Express 7 (1) (2025) 015202.
  • J. Zhang, P. Wu, J. London, D. Tenney, Benchmarking and evaluating large language models in phishing detection, IEEE Access (2025).
  • A. Saman, S. Rasool, A feature-level hybrid model approach for automated phishing email detection, Journal of Com-puting & Biomedical Informatics 9 (1) (2025).
  • A. J. K. Al-Jaberi, S. Kurnaz, R. A. S. Naseri, H. M. Farhan, A novel architecture based on weight freezing and random forest for website phishing detection, International Journal of Computational Intelligence Systems 18 (1) (2025) 284.
  • M. Koca, ˙I. Avcı, M. A. S. Al-Hayani, Classification of malicious urls using naive bayes and genetic algorithm, SAUCIS 6 (2) (2023) 80–90.
There are 37 citations in total.

Details

Primary Language English
Subjects System and Network Security, Natural Language Processing
Journal Section Research Article
Authors

Tabinda Ali Shah Zaman 0009-0003-8217-9461

Muzammal Hussain 0000-0002-7320-4413

Saddam Ali 0009-0001-9928-0115

Muhammad Aqib Fareed 0009-0006-4359-4437

Submission Date June 30, 2025
Acceptance Date December 30, 2025
Publication Date December 31, 2025
Published in Issue Year 2025 Volume: 3 Issue: 2

Cite