PhishShield: Early Detection of Malicious Websites via Domain Sequence Modeling and Feature-Based Classification

Cagatay Neftali Tülü; İhsan Deniz

doi:10.35377/saucis...1839585

PhishShield: Early Detection of Malicious Websites via Domain Sequence Modeling and Feature-Based Classification

Abstract

Cyber attackers exploit various online channels to spread attention-grabbing, intriguing, or fear-inducing content. By prompting users to engage with these materials and click on embedded links, they redirect individuals to fraudulent websites that closely resemble legitimate ones, thereby stealing confidential information or perpetrating other deceptive practices, often through these phishing sites. Therefore, mobile applications or browsers must be able to identify such fake and harmful websites even before users access them. This study introduces a two-stage approach for early phishing website detection, called as PhishShield. In the first stage, the LSTM-based deep learning model is pre-trained to assess domain names based on their address character sequences, scoring their likelihood of being phishing. In the second stage, a machine learning model, trained with the phishing score and additional features (such as, domain and SSL details), predicts whether a website is phishing or legitimate. The obtained results show that the random forests classifier in the second stage achieves the highest accuracy with 0.984. To train the models, two distinct datasets, namely Dataset1 and Dataset2, are prepared. The deep learning model in the first stage is trained with Dataset1, and the machine learning model in the second stage is trained with Dataset2.

Keywords

References

A. K. Jain, B. B. Gupta, Phishing detection: analysis of visual similarity based approaches, Security and Communication Networks 2017 (2017). doi:https://doi.org/10.1155/2017/5421046.
Y.-M. Tseng, F.-G. Chen, A free-rider aware reputation system for peer-to-peer file-sharing networks, Expert Systems with Applications 38 (3) (2011) 2432–2440. doi:https://doi.org/10.1016/j.eswa.2010.08.032.
A. Mohaisen, N. Hopper, Y. Kim, Keep your friends close: Incorporating trust into social network-based sybil defenses, in: 2011 Proceedings IEEE INFOCOM, 2011, pp. 1943–1951. doi:https://doi.org/10.1109/INFCOM.2011.5934998.
J. Li, H. Zhou, S. Wu, X. Luo, T. Wang, X. Zhan, X. Ma, FOAP: Fine-Grained Open-World android app fingerprinting, in: 31st USENIX Security Symposium (USENIX Security 22), 2022, pp. 1579–1596. URL https://www.usenix.org/system/files/sec22-li-jianfeng.pdf
Y. Chen, T. Ni, W. Xu, T. Gu, Swipepass: Acoustic-based second-factor user authentication for smartphones, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6 (3) (2022) 1–25. doi:https://doi.org/10.1145/3550292.
M. Thomas, A. Mohaisen, Kindred domains: Detecting and clustering botnet domains using dns traffic, in: Proceedings of the 23rd International Conference on World Wide Web, 2014, pp. 707–712. doi:https://doi.org/10.1145/2567948.257935.
H. J. Parker, S. V. Flowerday, Contributing factors to increased susceptibility to social media phishing attacks, South African Journal of Information Management 22 (1) (2020) 1–10.
N. A. Azeez, B. B. Salaudeen, S. Misra, R. Damaševičius, R. Maskeliūnas, Identifying phishing attacks in communication networks using url consistency features, International Journal of Electronic Security and Digital Forensics 12 (2) (2020) 200–213. doi:https://doi.org/10.1504/IJESDF.2020.106318.

S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Computation 9 (8) (1997) 1735–1780. doi:https://doi.org/10.1162/neco.1997.9.8.1735.
C. Catal, G. Giray, B. Tekinerdogan, S. Kumar, S. Shukla, Applications of deep learning for phishing detection: a systematic literature review, Knowledge and Information Systems 64 (6) (2022) 1457–1500. doi:https://doi.org/10.1007/s10115-022-01672-x.
V. E. Adeyemo, A. O. Balogun, H. A. Mojeed, N. O. Akande, K. S. Adewole, Ensemble-based logistic model trees for website phishing detection, in: Advances in Cyber Security: Second International Conference, ACeS 2020, Penang, Malaysia, December 8-9, 2020, Revised Selected Papers 2, Springer, 2021, pp. 627–641. doi:https://doi.org/10.1007/978-981-33-6835-4_41.
E. Zhu, Y. Ju, Z. Chen, F. Liu, X. Fang, Dtof-ann: an artificial neural network phishing detection model based on decision tree and optimal features, Applied Soft Computing 95 (2020) 106505. doi:https://doi.org/10.1016/j.asoc.2020.106505.
S. Hutchinson, Z. Zhang, Q. Liu, Detecting phishing websites with random forest, in: Machine Learning and Intelligent Communications: Third International Conference, MLICOM 2018, Hangzhou, China, July 6-8, 2018, Proceedings 3, Springer, 2018, pp. 470–479. doi:https://doi.org/10.1007/978-3-030-00557-3_46.
P. Patil, R. Rane, M. Bhalekar, Detecting spam and phishing mails using svm and obfuscation url detection algorithm, in: 2017 International Conference on Inventive Systems and Control (ICISC), IEEE, 2017, pp. 1–4.
C. Ardi, J. Heidemann, Auntietuna: Personalized content-based phishing detection, in: NDSS Usable Security Workshop (USEC), 2016.
S. Abdelnabi, K. Krombholz, M. Fritz, Visualphishnet: Zero-day phishing website detection by visual similarity, in: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, 2020, pp. 1681–1698. doi:https://doi.org/10.1145/3372297.3417233.
A. Butnaru, A. Mylonas, N. Pitropakis, Towards lightweight url-based phishing detection, Future Internet 13 (1) (2021) 1. doi:https://doi.org/10.3390/fi13010001.
K. Haynes, H. Shirazi, I. Ray, Lightweight url-based phishing detection using natural language processing transformers for mobile devices, Procedia Computer Science 191 (2021) 127–134. doi:https://doi.org/10.1016/j.procs.2021.07.040.
A. K. Jain, B. B. Gupta, A survey of phishing attack techniques, defence mechanisms and open research challenges, Enterprise Information Systems 16 (4) (2022) 527–565. URL https://doi.org/10.1080/17517575.2021.1896786
E. Medvet, E. Kirda, C. Kruegel, Visual-similarity-based phishing detection, in: Proceedings of the 4th International Conference on Security and Privacy in Communication Networks, 2008, pp. 1–6. doi:https://doi.org/10.1145/1460877.1460905.
O. K. Sahingoz, E. Buber, O. Demir, B. Diri, Machine learning based phishing detection from urls, Expert Systems with Applications 117 (2019) 345–357. doi:https://doi.org/10.1016/j.eswa.2018.09.029.
C. C. L. Tan, K. L. Chiew, K. S. Yong, Y. Sebastian, J. C. M. Than, W. K. Tiong, Hybrid phishing detection using joint visual and textual identity, Expert Systems with Applications 220 (2023) 119723. doi:https://doi.org/10.1016/j.eswa.2023.119723.
K. L. Chiew, C. L. Tan, K. Wong, K. S. Yong, W. K. Tiong, A new hybrid ensemble feature selection framework for machine learning-based phishing detection system, Information Sciences 484 (2019) 153–166. doi:https://doi.org/10.1016/j.ins.2019.01.064.
M. Korkmaz, E. Kocyigit, O. Sahingoz, B. Diri, A hybrid phishing detection system using deep learning-based url and content analysis, Elektronika ir Elektrotechnika 28 (5) (2022). doi:http://doi.org/10.5755/j02.eie.31197.
A. Ozcan, C. Catal, E. Donmez, B. Senturk, A hybrid dnn–lstm model for detecting phishing urls, Neural Computing and Applications (2021) 1–17. doi:https://doi.org/10.1007/s00521-021-06401-z.
F. Sharevski, P. Jachim, “Alexa, What’s a Phishing Email?”: Training users to spot phishing emails using a voice assistant, EURASIP Journal on Information Security 2022 (1) (2022) 7. doi:https://doi.org/10.1186/s13635-022-00133-w.
L. Tang, Q. H. Mahmoud, A survey of machine learning-based solutions for phishing website detection, Machine Learning and Knowledge Extraction 3 (3) (2021) 672–694. doi:https://doi.org/10.3390/make3030034

Details

Primary Language

English

Subjects

Cybersecurity and Privacy (Other)

Journal Section

Research Article

Authors

Cagatay Neftali Tülü ^*
0000-0002-4462-3707
Türkiye

İhsan Deniz
0009-0000-1193-7147
Türkiye

Early Pub Date

June 19, 2026

Publication Date

June 30, 2026

Submission Date

December 10, 2025

Acceptance Date

April 9, 2026

Published in Issue

Year 2026 Volume: 9 Number: 3

DOI

https://doi.org/10.35377/saucis...1839585

IZ

https://izlik.org/JA62LB57ZJ

Cite

RIS / Bibtex

APA

Tülü, C. N., & Deniz, İ. (2026). PhishShield: Early Detection of Malicious Websites via Domain Sequence Modeling and Feature-Based Classification. Sakarya University Journal of Computer and Information Sciences, 9(3), 724-738. https://doi.org/10.35377/saucis...1839585

AMA

1.Tülü CN, Deniz İ. PhishShield: Early Detection of Malicious Websites via Domain Sequence Modeling and Feature-Based Classification. SAUCIS. 2026;9(3):724-738. doi:10.35377/saucis.1839585

Chicago

Tülü, Cagatay Neftali, and İhsan Deniz. 2026. “PhishShield: Early Detection of Malicious Websites via Domain Sequence Modeling and Feature-Based Classification”. Sakarya University Journal of Computer and Information Sciences 9 (3): 724-38. https://doi.org/10.35377/saucis. 1839585.

EndNote

Tülü CN, Deniz İ (June 1, 2026) PhishShield: Early Detection of Malicious Websites via Domain Sequence Modeling and Feature-Based Classification. Sakarya University Journal of Computer and Information Sciences 9 3 724–738.

IEEE

[1]C. N. Tülü and İ. Deniz, “PhishShield: Early Detection of Malicious Websites via Domain Sequence Modeling and Feature-Based Classification”, SAUCIS, vol. 9, no. 3, pp. 724–738, June 2026, doi: 10.35377/saucis...1839585.

ISNAD

Tülü, Cagatay Neftali - Deniz, İhsan. “PhishShield: Early Detection of Malicious Websites via Domain Sequence Modeling and Feature-Based Classification”. Sakarya University Journal of Computer and Information Sciences 9/3 (June 1, 2026): 724-738. https://doi.org/10.35377/saucis. 1839585.

JAMA

1.Tülü CN, Deniz İ. PhishShield: Early Detection of Malicious Websites via Domain Sequence Modeling and Feature-Based Classification. SAUCIS. 2026;9:724–738.

MLA

Tülü, Cagatay Neftali, and İhsan Deniz. “PhishShield: Early Detection of Malicious Websites via Domain Sequence Modeling and Feature-Based Classification”. Sakarya University Journal of Computer and Information Sciences, vol. 9, no. 3, June 2026, pp. 724-38, doi:10.35377/saucis. 1839585.

Vancouver

1.Cagatay Neftali Tülü, İhsan Deniz. PhishShield: Early Detection of Malicious Websites via Domain Sequence Modeling and Feature-Based Classification. SAUCIS. 2026 Jun. 1;9(3):724-38. doi:10.35377/saucis. 1839585