Research Article

Local Image Descriptor Based Phishing Web Page Recognition as an Open-Set Problem

October 31, 2019
EN TR

Local Image Descriptor Based Phishing Web Page Recognition as an Open-Set Problem

Abstract

With the advent of e-commerce, digital services and social media, scammers have changed their way to gain illegal benefits in various forms such as capturing the credit card information or exploiting personal cloud accounts which is termed as phishing. For this reason, against this cyber crime, last two decades have witnessed a variety of combatting methodologies like HTML content based similarity analysis, URL based classification and recently visual similarity based matching since phishing web pages visually mimic to their legitimate counterparts in order to create an illusion to deceive innocent users. To this end, in this study, we propose a computer vision and machine learning based approach in order to classify whether a suspicious web page is phishing and further recognize its original brand name. In this regard, we have utilized and investigated two different local image descriptors namely Scale Invariant Feature Transform (SIFT) and DAISY. Apart from their common properties such as scale invariance, the aforementioned descriptors have apparent differences such that in addition to rotational invariance, SIFT employs key-point based sampling whereas DAISY applies dense sampling by default. Therefore, we first aimed to investigate the feasibility of these two local image descriptors in addition to revealing the effects of sampling strategy and rotational invariance in problem domain. Furthermore, in order to create a discriminative representation of a web page, we followed the bag of visual words (BOVW) approach having different vocabulary sizes such as 50, 100, 200 and 400. In order to evaluate the proposed approach, we have utilized a publicly available phishing dataset including snapshots of webpages sampled from both 14 different highly phished brands and ordinary legitimate web pages yielding a challenging open-set problem. The aforementioned dataset involves 1313 training and 1539 testing image samples in total. The visual features extracted via SIFT and DAISY were first transformed to a BOVW histogram and fed to three different machine learning methods such as SVM, Random Forest and XGBoost. According to the conducted experiments, based on a 400-D visual vocabulary, SIFT descriptor along with XGBoost has been found as the best descriptor-learner configuration having reached up to 89.34% validation accuracy with 0.76% false positive rate. Moreover, SIFT has outperformed DAISY descriptor in all settings. As a result, it has been shown that SIFT descriptors equipped with BOVW representation can be effectively used for brand identification of phishing web pages.

Keywords

References

  1. Drake, C.E., Oliver, J.J. & Koontz, E.J., (2014) Anatomy of a phishing email, In CEAS 2014.
  2. Varshney, G., Misra, M. & Atrey, P.K., (2016) A survey and classification of web phishing detection schemes, Security and Communication Networks, 8, 6266-6284.
  3. APWG, Phishing Attack Trends Report. Retrieved from http://docs.apwg.org/reports/apwg_trends_report_q4_2017.pdf), on (02.6.2019).
  4. Dalgic, F. C., Bozkir, A. S., Aydos, M. (2018). Phish-IRIS: A New Approach for Vision Based Brand Prediction of Phishing Web Pages via Compact Visual Descriptors. In 2018 2nd International Symposium on Multidisciplinary Studies and Innovative Technologies (pp. 1-8). IEEE.
  5. Rao, R.S. & Pais, A.R., (2018) Detection of phishing web sites using an efficient feature-based machine learning framework, Neural Computing and Applications, 1-23.
  6. Lam, I.F., Xiao, W.C., Wang, S.C. and Chen, K.T., (2009) Counteracting Phishing Page Polymorphism: an Image Layout Analysis Approach, LNCS (pp. 270-279).
  7. Chen, K. T., Chen, J. Y., Huang, C. R., Chen, C. S. (2009). Fighting Phishing with Discriminative Keypoint Features. IEEE Internet Computing, 13(3), 56-63.
  8. Rao, R. S. & Ali, S. T. (2015). A Computer Vision Technique To detect Phishing Attacks. In 2015 Fifth International Conference on Communication Systems and Network Technologies (pp. 596-601). IEEE.

Details

Primary Language

English

Subjects

Engineering

Journal Section

Research Article

Publication Date

October 31, 2019

Submission Date

August 1, 2019

Acceptance Date

October 25, 2019

Published in Issue

Year 2019

APA
Bozkır, A. S., & Aydos, M. (2019). Local Image Descriptor Based Phishing Web Page Recognition as an Open-Set Problem. Avrupa Bilim Ve Teknoloji Dergisi, 444-451. https://doi.org/10.31590/ejosat.638404
AMA
1.Bozkır AS, Aydos M. Local Image Descriptor Based Phishing Web Page Recognition as an Open-Set Problem. EJOSAT. Published online October 1, 2019:444-451. doi:10.31590/ejosat.638404
Chicago
Bozkır, Ahmet Selman, and Murat Aydos. 2019. “Local Image Descriptor Based Phishing Web Page Recognition As an Open-Set Problem”. Avrupa Bilim Ve Teknoloji Dergisi, October 1, 444-51. https://doi.org/10.31590/ejosat.638404.
EndNote
Bozkır AS, Aydos M (October 1, 2019) Local Image Descriptor Based Phishing Web Page Recognition as an Open-Set Problem. Avrupa Bilim ve Teknoloji Dergisi 444–451.
IEEE
[1]A. S. Bozkır and M. Aydos, “Local Image Descriptor Based Phishing Web Page Recognition as an Open-Set Problem”, EJOSAT, pp. 444–451, Oct. 2019, doi: 10.31590/ejosat.638404.
ISNAD
Bozkır, Ahmet Selman - Aydos, Murat. “Local Image Descriptor Based Phishing Web Page Recognition As an Open-Set Problem”. Avrupa Bilim ve Teknoloji Dergisi. October 1, 2019. 444-451. https://doi.org/10.31590/ejosat.638404.
JAMA
1.Bozkır AS, Aydos M. Local Image Descriptor Based Phishing Web Page Recognition as an Open-Set Problem. EJOSAT. 2019;:444–451.
MLA
Bozkır, Ahmet Selman, and Murat Aydos. “Local Image Descriptor Based Phishing Web Page Recognition As an Open-Set Problem”. Avrupa Bilim Ve Teknoloji Dergisi, Oct. 2019, pp. 444-51, doi:10.31590/ejosat.638404.
Vancouver
1.Ahmet Selman Bozkır, Murat Aydos. Local Image Descriptor Based Phishing Web Page Recognition as an Open-Set Problem. EJOSAT. 2019 Oct. 1;444-51. doi:10.31590/ejosat.638404

Cited By