Context-Aware Phishing URL Detection: Harnessing the Power of Large Language Models

Tabinda Ali Shah Zaman; Muzammal Hussain; Saddam Ali; Muhammad Aqib Fareed

Context-Aware Phishing URL Detection: Harnessing the Power of Large Language Models

Abstract

Phishing attacks remain a significant cybersecurity threat, exploiting users by disguising malicious URLs to resemble legitimate websites. Traditional machine learning approaches often struggle with detecting such threats due to reliance on handcrafted features and limited contextual understanding. This study investigates the effectiveness of advanced transformer-based models i.e., BERT, GPT-2, and XLM-RoBERTa for automatic phishing URL detection. The models were trained and evaluated on a labeled dataset comprising phishing and legitimate URLs, with performance assessed through comprehensive metrics and diagnostic curves. Experimental results showed that XLM-RoBERTa achieved the highest accuracy of 96.1%, outperforming BERT (94.2%) and GPT-2 (92.4%). Precision, recall, F1-score, and AUC-ROC metrics were consistently high across all transformer models, with XLM-RoBERTa demonstrating the most balanced performance. Further evaluation using precision-recall curves, lift and gain charts, calibration curves, and Kolmogorov–Smirnov (KS) plots provided in-depth insights into model discrimination and calibration. These findings underscore the advantages of deep contextualized language models in accurately and reliably detecting phishing URLs, offering a promising approach for enhancing cybersecurity defenses.

Keywords

References

M. Moghimi, A. Y. Varjani, New rule-based phishing detection method, Expert Systems with Applications 53 (2016) 231–242.
K. S. Adewole, A. G. Akintola, S. A. Salihu, N. Faruk, R. G. Jimoh, Hybrid rule-based model for phishing urls detection, in: Emerging Technologies in Computing (iCETiC 2019), 2019.
A. Subasi, E. Molah, F. Almkallawi, T. J. Chaudhery, Intelligent phishing website detection using random forest classifier, in: International Conference on Electrical and Computing Technologies and Applications (ICECTA), 2017.
B. B. Gupta, K. Yadav, I. Razzak, K. Psannis, A. Castiglione, X. Chang, A novel approach for phishing urls detection using lexical based machine learning in a real-time environment, Computer Communications 175 (2021) 47–57.
M. Sanchez-Paniagua, E. F. Fern´andez, E. Alegre, W. Al-Nabki, V. Gonz´alez-Castro, Phishing url detection: A real-case scenario through login urls, IEEE Access 10 (2022) 42949–42960.
S. H. Ahammad, S. D. Kale, G. D. Upadhye, S. D. Pande, E. V. Babu, A. V. Dhumane, M. D. K. J. Bahadur, Phishing url detection using machine learning methods, Advances in Engineering Software 173 (2022) 103288.
S. Jalil, M. Usman, A. Fong, Highly accurate phishing url detection based on machine learning, Journal of Ambient Intelligence and Humanized Computing 14 (7) (2023) 9233–9251.
S. R. A. Samad, S. Balasubaramanian, A. S. Al-Kaabi, B. Sharma, S. Chowdhury, A. Mehbodniya, Analysis of the performance impact of fine-tuned machine learning model for phishing url detection, Electronics 12 (7) (2023) 1642.

E. Kritika, A comprehensive literature review on phishing url detection using deep learning techniques, Journal of Cyber Security Technology (2024) 1–29.
M. M. Alani, H. Tawfik, Phishnot: A cloud-based machine-learning approach to phishing url detection, Computer Networks 218 (2022) 109407.
S. Remya, M. J. Pillai, K. K. Nair, S. R. Subbareddy, Y. Y. Cho, An effective detection approach for phishing url using resmlp, IEEE Access (2024).
T. Kim, N. Park, J. Hong, S. W. Kim, Phishing url detection: A network-based approach robust to evasion, in: ACM Conference on Computer and Communications Security, 2022.
V. Vajrobol, B. B. Gupta, A. Gaurav, Mutual information based logistic regression for phishing url detection, Cyber Security and Applications 2 (2024) 100044.
M. Alsaedi, F. A. Ghaleb, F. Saeed, J. Ahmad, M. Alasli, Cyber threat intelligence-based malicious url detection model using ensemble learning, Sensors 22 (9) (2022) 3373.
A. Prasad, S. Chandra, Phiusiil: A diverse security profile empowered phishing url detection framework, Computers & Security 136 (2024) 103545.
H. Ghalechyan, E. Israyelyan, A. Arakelyan, G. Hovhannisyan, A. Davtyan, Phishing url detection with neural networks: An empirical study, Scientific Reports 14 (1) (2024) 25134.
N. T. Lam, Developing a framework for detecting phishing urls using machine learning, International Journal of Computer Science & Network Security 23 (10) (2023) 157–163.
F. Rashid, B. Doyle, S. C. Han, S. Seneviratne, Phishing url detection generalisation using unsupervised domain adaptation, Computer Networks 245 (2024) 110398.
M. Mossano, O. Kulyk, B. M. Berens, E. M. H¨außler, M. Volkamer, Influence of url formatting on users’ phishing url detection, European Symposium on Usable Security (2023) 318–333.
A. Ozcan, C. Catal, E. Donmez, B. Senturk, A hybrid dnn–lstm model for detecting phishing urls, Neural Computing and Applications (2023) 1–17.
D. T. Mosa, M. Y. Shams, A. A. Abohany, E. S. M. El-Kenawy, M. Thabet, Machine learning techniques for detecting phishing url attacks, Computers, Materials and Continua 75 (1) (2023) 1271–1290.
S. J. Buu, S. B. Cho, A transformer network calibrated with fuzzy logic for phishing url detection, Fuzzy Sets and Systems (2025) 109474.
P. H. Hussan, S. M. Mangj, Bertphiurl: A teacher-student learning approach for detecting phishing urls, Journal of Future Artificial Intelligence and Technologies 1 (4) (2025) 417–428.
A. Simhadri, M. Rishikesh, M. Subramaniam, Machine learning in phishing url detection: A review of recent progress, Power Energy and Secure Smart Technologies (2025).
Z. Zhang, J. Wu, N. Lu, W. Shi, Z. Liu, Adaptpud: An accurate url-based detection approach against tailored deceptive phishing websites, Computer Networks (2025) 111303.
J. Zhou, K. Zhang, A. Bilal, Y. Zhou, Y. Fan, W. Pan, An integrated csppc and bilstm framework for malicious url detection, Scientific Reports 15 (1) (2025) 6659.
W. Guo, Q. Wang, H. Yue, H. Sun, R. Q. Hu, Efficient phishing url detection using graph-based machine learning, arXiv preprint (2025).
A. Aljofey, Q. Jiang, A. Rasool, H. Chen, W. Liu, Q. Qu, Y. Wang, An effective detection approach for phishing websites using url and html features, Scientific Reports 12 (1) (2022) 8842.
S. Paul, S. Saha, Cyberbert: Bert for cyberbullying identification, Multimedia Systems 28 (6) (2022) 1897–1904.
H. B. Azhar, A. A. Runa, Enhancing cyberbullying detection with roberta, Global Security, Safety, and Sustainability (2025).
A. Rawla, S. Singh, M. Daniyal, P. Dubey, Detection of phishing attacks in phiusiil dataset using deep learning, Procedia Computer Science 259 (2025) 543–552.
S. Aslam, H. Aslam, A. Manzoor, H. Chen, A. Rasool, Antiphishstack: Lstm-based stacked generalization model for optimized phishing url detection, Symmetry 16 (2) (2024) 248.
M. Diviya, M. Subramanian, D. G. Krishnan, An optimized phishing detection model using hybrid feature selection, Engineering Research Express 7 (1) (2025) 015202.
J. Zhang, P. Wu, J. London, D. Tenney, Benchmarking and evaluating large language models in phishing detection, IEEE Access (2025).
A. Saman, S. Rasool, A feature-level hybrid model approach for automated phishing email detection, Journal of Com-puting & Biomedical Informatics 9 (1) (2025).
A. J. K. Al-Jaberi, S. Kurnaz, R. A. S. Naseri, H. M. Farhan, A novel architecture based on weight freezing and random forest for website phishing detection, International Journal of Computational Intelligence Systems 18 (1) (2025) 284.
M. Koca, ˙I. Avcı, M. A. S. Al-Hayani, Classification of malicious urls using naive bayes and genetic algorithm, SAUCIS 6 (2) (2023) 80–90.

Details

Primary Language

English

Subjects

System and Network Security, Natural Language Processing

Journal Section

Research Article

Authors

Tabinda Ali Shah Zaman
0009-0003-8217-9461
Pakistan

Muzammal Hussain ^*
0000-0002-7320-4413
Pakistan

Saddam Ali
0009-0001-9928-0115
Pakistan

Muhammad Aqib Fareed
0009-0006-4359-4437
Pakistan

Publication Date

December 31, 2025

Submission Date

June 30, 2025

Acceptance Date

December 30, 2025

Published in Issue

Year 2025 Volume: 3 Number: 2

IZ

https://izlik.org/JA94FR85RZ

Cite

RIS / Bibtex

APA

Ali Shah Zaman, T., Hussain, M., Ali, S., & Fareed, M. A. (2025). Context-Aware Phishing URL Detection: Harnessing the Power of Large Language Models. Current Trends in Computing, 3(2), 1-25. https://izlik.org/JA94FR85RZ

AMA

1.Ali Shah Zaman T, Hussain M, Ali S, Fareed MA. Context-Aware Phishing URL Detection: Harnessing the Power of Large Language Models. CTC. 2025;3(2):1-25. https://izlik.org/JA94FR85RZ

Chicago

Ali Shah Zaman, Tabinda, Muzammal Hussain, Saddam Ali, and Muhammad Aqib Fareed. 2025. “Context-Aware Phishing URL Detection: Harnessing the Power of Large Language Models”. Current Trends in Computing 3 (2): 1-25. https://izlik.org/JA94FR85RZ.

EndNote

Ali Shah Zaman T, Hussain M, Ali S, Fareed MA (December 1, 2025) Context-Aware Phishing URL Detection: Harnessing the Power of Large Language Models. Current Trends in Computing 3 2 1–25.

IEEE

[1]T. Ali Shah Zaman, M. Hussain, S. Ali, and M. A. Fareed, “Context-Aware Phishing URL Detection: Harnessing the Power of Large Language Models”, CTC, vol. 3, no. 2, pp. 1–25, Dec. 2025, [Online]. Available: https://izlik.org/JA94FR85RZ

ISNAD

Ali Shah Zaman, Tabinda - Hussain, Muzammal - Ali, Saddam - Fareed, Muhammad Aqib. “Context-Aware Phishing URL Detection: Harnessing the Power of Large Language Models”. Current Trends in Computing 3/2 (December 1, 2025): 1-25. https://izlik.org/JA94FR85RZ.

JAMA

1.Ali Shah Zaman T, Hussain M, Ali S, Fareed MA. Context-Aware Phishing URL Detection: Harnessing the Power of Large Language Models. CTC. 2025;3:1–25.

MLA

Ali Shah Zaman, Tabinda, et al. “Context-Aware Phishing URL Detection: Harnessing the Power of Large Language Models”. Current Trends in Computing, vol. 3, no. 2, Dec. 2025, pp. 1-25, https://izlik.org/JA94FR85RZ.

Vancouver

1.Tabinda Ali Shah Zaman, Muzammal Hussain, Saddam Ali, Muhammad Aqib Fareed. Context-Aware Phishing URL Detection: Harnessing the Power of Large Language Models. CTC [Internet]. 2025 Dec. 1;3(2):1-25. Available from: https://izlik.org/JA94FR85RZ