PhishShield: Early Detection of Malicious Websites via Domain Sequence Modeling and Feature-Based Classification
Abstract
Cyber attackers exploit various online channels to spread attention-grabbing, intriguing, or fear-inducing content. By prompting users to engage with these materials and click on embedded links, they redirect individuals to fraudulent websites that closely resemble legitimate ones, thereby stealing confidential information or perpetrating other deceptive practices, often through these phishing sites. Therefore, mobile applications or browsers must be able to identify such fake and harmful websites even before users access them. This study introduces a two-stage approach for early phishing website detection, called as PhishShield. In the first stage, the LSTM-based deep learning model is pre-trained to assess domain names based on their address character sequences, scoring their likelihood of being phishing. In the second stage, a machine learning model, trained with the phishing score and additional features (such as, domain and SSL details), predicts whether a website is phishing or legitimate. The obtained results show that the random forests classifier in the second stage achieves the highest accuracy with 0.984. To train the models, two distinct datasets, namely Dataset1 and Dataset2, are prepared. The deep learning model in the first stage is trained with Dataset1, and the machine learning model in the second stage is trained with Dataset2.
Keywords
References
- A. K. Jain, B. B. Gupta, Phishing detection: analysis of visual similarity based approaches, Security and Communication Networks 2017 (2017). doi:https://doi.org/10.1155/2017/5421046.
- Y.-M. Tseng, F.-G. Chen, A free-rider aware reputation system for peer-to-peer file-sharing networks, Expert Systems with Applications 38 (3) (2011) 2432–2440. doi:https://doi.org/10.1016/j.eswa.2010.08.032.
- A. Mohaisen, N. Hopper, Y. Kim, Keep your friends close: Incorporating trust into social network-based sybil defenses, in: 2011 Proceedings IEEE INFOCOM, 2011, pp. 1943–1951. doi:https://doi.org/10.1109/INFCOM.2011.5934998.
- J. Li, H. Zhou, S. Wu, X. Luo, T. Wang, X. Zhan, X. Ma, FOAP: Fine-Grained Open-World android app fingerprinting, in: 31st USENIX Security Symposium (USENIX Security 22), 2022, pp. 1579–1596. URL https://www.usenix.org/system/files/sec22-li-jianfeng.pdf
- Y. Chen, T. Ni, W. Xu, T. Gu, Swipepass: Acoustic-based second-factor user authentication for smartphones, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6 (3) (2022) 1–25. doi:https://doi.org/10.1145/3550292.
- M. Thomas, A. Mohaisen, Kindred domains: Detecting and clustering botnet domains using dns traffic, in: Proceedings of the 23rd International Conference on World Wide Web, 2014, pp. 707–712. doi:https://doi.org/10.1145/2567948.257935.
- H. J. Parker, S. V. Flowerday, Contributing factors to increased susceptibility to social media phishing attacks, South African Journal of Information Management 22 (1) (2020) 1–10.
- N. A. Azeez, B. B. Salaudeen, S. Misra, R. Damaševičius, R. Maskeliūnas, Identifying phishing attacks in communication networks using url consistency features, International Journal of Electronic Security and Digital Forensics 12 (2) (2020) 200–213. doi:https://doi.org/10.1504/IJESDF.2020.106318.
Details
Primary Language
English
Subjects
Cybersecurity and Privacy (Other)
Journal Section
Research Article
Early Pub Date
June 19, 2026
Publication Date
June 30, 2026
Submission Date
December 10, 2025
Acceptance Date
April 9, 2026
Published in Issue
Year 2026 Volume: 9 Number: 3
