Research Article

PhishShield: Early Detection of Malicious Websites via Domain Sequence Modeling and Feature-Based Classification

Volume: 9 Number: 3 June 30, 2026

PhishShield: Early Detection of Malicious Websites via Domain Sequence Modeling and Feature-Based Classification

Abstract

Cyber attackers exploit various online channels to spread attention-grabbing, intriguing, or fear-inducing content. By prompting users to engage with these materials and click on embedded links, they redirect individuals to fraudulent websites that closely resemble legitimate ones, thereby stealing confidential information or perpetrating other deceptive practices, often through these phishing sites. Therefore, mobile applications or browsers must be able to identify such fake and harmful websites even before users access them. This study introduces a two-stage approach for early phishing website detection, called as PhishShield. In the first stage, the LSTM-based deep learning model is pre-trained to assess domain names based on their address character sequences, scoring their likelihood of being phishing. In the second stage, a machine learning model, trained with the phishing score and additional features (such as, domain and SSL details), predicts whether a website is phishing or legitimate. The obtained results show that the random forests classifier in the second stage achieves the highest accuracy with 0.984. To train the models, two distinct datasets, namely Dataset1 and Dataset2, are prepared. The deep learning model in the first stage is trained with Dataset1, and the machine learning model in the second stage is trained with Dataset2.

Keywords

References

  1. A. K. Jain, B. B. Gupta, Phishing detection: analysis of visual similarity based approaches, Security and Communication Networks 2017 (2017). doi:https://doi.org/10.1155/2017/5421046.
  2. Y.-M. Tseng, F.-G. Chen, A free-rider aware reputation system for peer-to-peer file-sharing networks, Expert Systems with Applications 38 (3) (2011) 2432–2440. doi:https://doi.org/10.1016/j.eswa.2010.08.032.
  3. A. Mohaisen, N. Hopper, Y. Kim, Keep your friends close: Incorporating trust into social network-based sybil defenses, in: 2011 Proceedings IEEE INFOCOM, 2011, pp. 1943–1951. doi:https://doi.org/10.1109/INFCOM.2011.5934998.
  4. J. Li, H. Zhou, S. Wu, X. Luo, T. Wang, X. Zhan, X. Ma, FOAP: Fine-Grained Open-World android app fingerprinting, in: 31st USENIX Security Symposium (USENIX Security 22), 2022, pp. 1579–1596. URL https://www.usenix.org/system/files/sec22-li-jianfeng.pdf
  5. Y. Chen, T. Ni, W. Xu, T. Gu, Swipepass: Acoustic-based second-factor user authentication for smartphones, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6 (3) (2022) 1–25. doi:https://doi.org/10.1145/3550292.
  6. M. Thomas, A. Mohaisen, Kindred domains: Detecting and clustering botnet domains using dns traffic, in: Proceedings of the 23rd International Conference on World Wide Web, 2014, pp. 707–712. doi:https://doi.org/10.1145/2567948.257935.
  7. H. J. Parker, S. V. Flowerday, Contributing factors to increased susceptibility to social media phishing attacks, South African Journal of Information Management 22 (1) (2020) 1–10.
  8. N. A. Azeez, B. B. Salaudeen, S. Misra, R. Damaševičius, R. Maskeliūnas, Identifying phishing attacks in communication networks using url consistency features, International Journal of Electronic Security and Digital Forensics 12 (2) (2020) 200–213. doi:https://doi.org/10.1504/IJESDF.2020.106318.

Details

Primary Language

English

Subjects

Cybersecurity and Privacy (Other)

Journal Section

Research Article

Early Pub Date

June 19, 2026

Publication Date

June 30, 2026

Submission Date

December 10, 2025

Acceptance Date

April 9, 2026

Published in Issue

Year 2026 Volume: 9 Number: 3

APA
Tülü, C. N., & Deniz, İ. (2026). PhishShield: Early Detection of Malicious Websites via Domain Sequence Modeling and Feature-Based Classification. Sakarya University Journal of Computer and Information Sciences, 9(3), 724-738. https://doi.org/10.35377/saucis...1839585
AMA
1.Tülü CN, Deniz İ. PhishShield: Early Detection of Malicious Websites via Domain Sequence Modeling and Feature-Based Classification. SAUCIS. 2026;9(3):724-738. doi:10.35377/saucis.1839585
Chicago
Tülü, Cagatay Neftali, and İhsan Deniz. 2026. “PhishShield: Early Detection of Malicious Websites via Domain Sequence Modeling and Feature-Based Classification”. Sakarya University Journal of Computer and Information Sciences 9 (3): 724-38. https://doi.org/10.35377/saucis. 1839585.
EndNote
Tülü CN, Deniz İ (June 1, 2026) PhishShield: Early Detection of Malicious Websites via Domain Sequence Modeling and Feature-Based Classification. Sakarya University Journal of Computer and Information Sciences 9 3 724–738.
IEEE
[1]C. N. Tülü and İ. Deniz, “PhishShield: Early Detection of Malicious Websites via Domain Sequence Modeling and Feature-Based Classification”, SAUCIS, vol. 9, no. 3, pp. 724–738, June 2026, doi: 10.35377/saucis...1839585.
ISNAD
Tülü, Cagatay Neftali - Deniz, İhsan. “PhishShield: Early Detection of Malicious Websites via Domain Sequence Modeling and Feature-Based Classification”. Sakarya University Journal of Computer and Information Sciences 9/3 (June 1, 2026): 724-738. https://doi.org/10.35377/saucis. 1839585.
JAMA
1.Tülü CN, Deniz İ. PhishShield: Early Detection of Malicious Websites via Domain Sequence Modeling and Feature-Based Classification. SAUCIS. 2026;9:724–738.
MLA
Tülü, Cagatay Neftali, and İhsan Deniz. “PhishShield: Early Detection of Malicious Websites via Domain Sequence Modeling and Feature-Based Classification”. Sakarya University Journal of Computer and Information Sciences, vol. 9, no. 3, June 2026, pp. 724-38, doi:10.35377/saucis. 1839585.
Vancouver
1.Cagatay Neftali Tülü, İhsan Deniz. PhishShield: Early Detection of Malicious Websites via Domain Sequence Modeling and Feature-Based Classification. SAUCIS. 2026 Jun. 1;9(3):724-38. doi:10.35377/saucis. 1839585

 

INDEXING & ABSTRACTING & ARCHIVING

 

31045 31044   ResimLink - Resim Yükle  31047 

31043 28939 28938 34240
 

 

29070    The papers in this journal are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License