Phishing attacks remain a significant cybersecurity threat, exploiting users by disguising malicious URLs to resemble legitimate websites. Traditional machine learning approaches often struggle with detecting such threats due to reliance on handcrafted features and limited contextual understanding. This study investigates the effectiveness of advanced transformer-based models i.e., BERT, GPT-2, and XLM-RoBERTa for automatic phishing URL detection. The models were trained and evaluated on a labeled dataset comprising phishing and legitimate URLs, with performance assessed through comprehensive metrics and diagnostic curves. Experimental results showed that XLM-RoBERTa achieved the highest accuracy of 96.1%, outperforming BERT (94.2%) and GPT-2 (92.4%). Precision, recall, F1-score, and AUC-ROC metrics were consistently high across all transformer models, with XLM-RoBERTa demonstrating the most balanced performance. Further evaluation using precision-recall curves, lift and gain charts, calibration curves, and Kolmogorov–Smirnov (KS) plots provided in-depth insights into model discrimination and calibration. These findings underscore the advantages of deep contextualized language models in accurately and reliably detecting phishing URLs, offering a promising approach for enhancing cybersecurity defenses.
| Primary Language | English |
|---|---|
| Subjects | System and Network Security, Natural Language Processing |
| Journal Section | Research Article |
| Authors | |
| Submission Date | June 30, 2025 |
| Acceptance Date | December 30, 2025 |
| Publication Date | December 31, 2025 |
| Published in Issue | Year 2025 Volume: 3 Issue: 2 |