Brand Recognition of Phishing Web Pages via Global Image Descriptors

Esra Eroğlu; Ahmet Selman Bozkır; Murat Aydos

doi:10.31590/ejosat.638397

Research Article

Brand Recognition of Phishing Web Pages via Global Image Descriptors

Year 2019, , 436 - 443, 31.10.2019

Esra Eroğlu Ahmet Selman Bozkır Murat Aydos

https://doi.org/10.31590/ejosat.638397

Cited By: 1

Abstract

Phishing attacks, which have exponentially increased in recent years, are a form of cyber attack aiming to steal sensitive credentials of innocent users. In general, the attackers attempt to deceive users by creating and submitting a fake but visually similar version of a legitimate web page, which has already been in usage. In this study, we suggest an approach for recognition of phishing web pages by utilizing two global image descriptors namely GIST and local binary patterns (LBP) which have never been employed in phishing web page recognition literature. Moreover, in order to obtain a discriminative representation, we have experimented two kinds of visual feature extraction scheme such as (1) “holistic” and (2) “multi-level patches”. While we have only used whole web page screenshot in “holistic” scheme, screenshots were divided into equally sized smaller crops at growing number of levels during the implementation of “multi-level” patches scheme. In order to evaluate the proposed approach, we have employed a publicly available phishing web page dataset in literature including screenshots of both 14 different highly phished brands and legitimate web pages posing an open-set problem for researchers. Besides, the aforementioned dataset covers 1313 training and 1539 testing cases in total. The visual signatures extracted by use of GIST and LBP descriptors were then fed to various machine learning models such as SVM, Random Forest and XGBoost (regularized gradient tree boosting). According to the results of comprehensively conducted experiments, XGBoost has been found as the best learner. In line with this finding, we obtained 87.7% (GIST) and 83.1% (LBP) validation accuracy along with the representation of “multi-level patches”. Consequently, it has been shown that preferred global image descriptors can be successfully employed for detecting and recognizing phishing web pages. In addition, average required time for processing one screenshot (around 1.12 sec.) with GIST descriptors indicates that the proposed scheme and GIST can be effectively used as a browser based plug-in for recognizing brands of phishing web pages.

Keywords

Phishing, Computer Vision, Machine Learning, GIST, LBP

References

Jain, A. K., & Gupta, B. B. (2017). Phishing detection: analysis of visual similarity based approaches. Security and Communication Networks, 2017.
Phishing Activity Trends Report 1st Quarter 2019, www.apwg.org • info@apwg.org
Basnet, R. B., & Sung, A. H. (2014). Learning to Detect Phishing Webpages. J. Internet Serv. Inf. Secur., 4(3), 21-39.
Ali, W. (2017). Phishing Website Detection based on Supervised Machine Learning with Wrapper Features Selection. International Journal of Advanced Computer Science and Applications, 8(9), 72-78.
Zhang, W., Lu, H., Xu, B., & Yang, H. (2013). Web phishing detection based on page spatial layout similarity. Informatica, 37(3).
Rao, R. S., & Ali, S. T. (2015, April). A computer vision technique to detect phishing attacks. In 2015 Fifth International Conference on Communication Systems and Network Technologies (pp. 596-601). IEEE.
Prakash, P., Kumar, M., Kompella, R. R., & Gupta, M. (2010, March). Phishnet: predictive blacklisting to detect phishing attacks. In 2010 Proceedings IEEE INFOCOM (pp. 1-5). IEEE.
Hara, M., Yamada, A., & Miyake, Y. (2009, March). Visual similarity-based phishing detection without victim site information. In 2009 IEEE Symposium on Computational Intelligence in Cyber Security (pp. 30-36). IEEE.
Fu, A. Y., Wenyin, L., & Deng, X. (2006). Detecting phishing web pages with visual similarity assessment based on earth mover's distance (EMD). IEEE transactions on dependable and secure computing, 3(4), 301-311.
Chen, K. T., Chen, J. Y., Huang, C. R., & Chen, C. S. (2009). Fighting phishing with discriminative keypoint features. IEEE Internet Computing, 13(3), 56-63.
Dalal, N., Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, USA
Sahingoz, O. K., Buber, E., Demir, O., & Diri, B. (2019). Machine learning based phishing detection from URLs. Expert Systems with Applications, 117, 345-357.
Google Safe Browsing API, https://developers.google.com/safe-browsing/ (Online accessed: 13.7.2019)
XGBoost Documentation, https://xgboost.readthedocs.io/en/latest/ (Online accessed: 13.7.2019)
Oliva, A. and Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International journal of computer vision, 42(3), 145-175.
Wang, Y., Li, Y., & Ji, X. (2013, December). Recognizing human actions based on gist descriptor and word phrase. In Proceedings 2013 International Conference on Mechatronic Sciences, Electric Engineering and Computer (MEC) (pp. 1104-1107). IEEE.
Corinna Cortes, Vladimir Vapnik, "Support-vector networks", Machine learning, vol. 20, no. 3, pp. 273-297, 1995.
Sikirić, I., Brkić, K., & Šegvić, S. (2013). Classifying traffic scenes using the GIST image descriptor. arXiv preprint arXiv:1310.0316.
T. Ojala, M. Pietikäinen, and D. Harwood, “A comparative study of texture measures with classification based on featured distributions”, Pattern Recognition, vol. 29, pp. 51-59, 1996.
Lazebnik, S., Schmid, C., & Ponce, J. (2006, June). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06) (Vol. 2, pp. 2169-2178). IEEE.
Dalgic, F. C., Bozkir, A. S., & Aydos, M. (2018, October). Phish-IRIS: A New Approach for Vision Based Brand Prediction of Phishing Web Pages via Compact Visual Descriptors. In 2018 2nd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT) (pp. 1-8). IEEE.

Genel Görsel Betimleyicilerden Faydalanarak Oltalayıcı Web Sayfalarında Marka Tanıma

Year 2019, , 436 - 443, 31.10.2019

Esra Eroğlu Ahmet Selman Bozkır Murat Aydos

https://doi.org/10.31590/ejosat.638397

Cited By: 1

Abstract

İnternetin gelişmesiyle son yıllarda katlanarak artan kimlik avı saldırıları, masum kullanıcıların özel kimlik bilgilerini çalmayı amaçlayan bir siber saldırı şeklidir. Genel olarak saldırganlar, kullanımda olan meşru bir web sayfasının sahte ancak görsel olarak benzer bir sürümünü oluşturup kullanıcılara göndererek aldatmaya çalışırlar. Bu çalışmada oltalayıcı web sayfalarının hedef aldığı markaların tanınmasında alanyazınında denenmemiş olan iki genel amaçlı görsel betimleyicinin (GIST ve Local Binary Patterns) kullanıldığı bir yaklaşım önerilmektedir. Buna ilaveten ayırt ediciliği yüksek temsillerin elde edilebilmesi amacıyla “bütünsel” ve “çok seviyeli parçalama” gibi iki özellik çıkarım yaklaşımı denenmiştir. “Bütünsel” yaklaşımda tüm sayfa şipşakı girdi olarak kullanılırken “çok seviyeli parçalama” yaklaşımında tüm görsel, eşit büyüklükteki parçalar içeren çok katmanlı yapıda ele alınmıştır. Önerilen yaklaşımın performans ölçümünde, oltalama saldırılarına sıklıkla maruz kalan toplamda 14 farklı marka ile birlikte özgün web sayfalarına ait sayfa şipşaklarını içeren ve araştırmacılar açısından “açık küme” problemi teşkil eden bir veri kümesi kullanılmştır. Öte yandan, yukarıda belirtilen veri kümesi toplamda 1313 eğitim ve 1539 test örneğini kapsamaktadır. GIST ve LBP betimleyicileri kullanılarak çıkarılan görsel imzalar daha sonra SVM, Random Forest ve XGBoost gibi çeşitli makine öğrenme modellerine girdi olarak sunulmuştur. Kapsamlı deneylerin sonuçlarına göre, XGBoost en iyi sınıflandırıcı olarak tespit edilmiştir. Öte yandan geçerleme verisi üzerinde “çok seviyeli parçalama” temsili kullanılarak doğruluk kriterinde sırasıyla %87.7 (GIST) ve %83.1 (LBP) değerleri elde edilmiştir. Sonuç olarak seçilen genel görsel betimleyicilerinin oltalayıcı web sayfalarını tespit etme ve marka tanımada başarıyla kullanılabileceği gösterilmiştir. Ek olarak, bir sayfa şipşakının ortalama GIST betimleyicisimnden yararlanarak 1.12 saniyede işlenerek sınıflandırılabilmesi önerilen yaklaşımın oltalayıcı web sayfalarının tanınmasında bir tarayıcı eklentisi olarak da etkin ve verimli şekilde kullanabileceğini göstermetedir.

Keywords

LBP, ltalama saldırıları, Bilgisayarlı Görü, Makine Öğrenmesi, GIST

References

Jain, A. K., & Gupta, B. B. (2017). Phishing detection: analysis of visual similarity based approaches. Security and Communication Networks, 2017.
Phishing Activity Trends Report 1st Quarter 2019, www.apwg.org • info@apwg.org
Basnet, R. B., & Sung, A. H. (2014). Learning to Detect Phishing Webpages. J. Internet Serv. Inf. Secur., 4(3), 21-39.
Ali, W. (2017). Phishing Website Detection based on Supervised Machine Learning with Wrapper Features Selection. International Journal of Advanced Computer Science and Applications, 8(9), 72-78.
Zhang, W., Lu, H., Xu, B., & Yang, H. (2013). Web phishing detection based on page spatial layout similarity. Informatica, 37(3).
Rao, R. S., & Ali, S. T. (2015, April). A computer vision technique to detect phishing attacks. In 2015 Fifth International Conference on Communication Systems and Network Technologies (pp. 596-601). IEEE.
Prakash, P., Kumar, M., Kompella, R. R., & Gupta, M. (2010, March). Phishnet: predictive blacklisting to detect phishing attacks. In 2010 Proceedings IEEE INFOCOM (pp. 1-5). IEEE.
Hara, M., Yamada, A., & Miyake, Y. (2009, March). Visual similarity-based phishing detection without victim site information. In 2009 IEEE Symposium on Computational Intelligence in Cyber Security (pp. 30-36). IEEE.
Fu, A. Y., Wenyin, L., & Deng, X. (2006). Detecting phishing web pages with visual similarity assessment based on earth mover's distance (EMD). IEEE transactions on dependable and secure computing, 3(4), 301-311.
Chen, K. T., Chen, J. Y., Huang, C. R., & Chen, C. S. (2009). Fighting phishing with discriminative keypoint features. IEEE Internet Computing, 13(3), 56-63.
Dalal, N., Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, USA
Sahingoz, O. K., Buber, E., Demir, O., & Diri, B. (2019). Machine learning based phishing detection from URLs. Expert Systems with Applications, 117, 345-357.
Google Safe Browsing API, https://developers.google.com/safe-browsing/ (Online accessed: 13.7.2019)
XGBoost Documentation, https://xgboost.readthedocs.io/en/latest/ (Online accessed: 13.7.2019)
Oliva, A. and Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International journal of computer vision, 42(3), 145-175.
Wang, Y., Li, Y., & Ji, X. (2013, December). Recognizing human actions based on gist descriptor and word phrase. In Proceedings 2013 International Conference on Mechatronic Sciences, Electric Engineering and Computer (MEC) (pp. 1104-1107). IEEE.
Corinna Cortes, Vladimir Vapnik, "Support-vector networks", Machine learning, vol. 20, no. 3, pp. 273-297, 1995.
Sikirić, I., Brkić, K., & Šegvić, S. (2013). Classifying traffic scenes using the GIST image descriptor. arXiv preprint arXiv:1310.0316.
T. Ojala, M. Pietikäinen, and D. Harwood, “A comparative study of texture measures with classification based on featured distributions”, Pattern Recognition, vol. 29, pp. 51-59, 1996.
Lazebnik, S., Schmid, C., & Ponce, J. (2006, June). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06) (Vol. 2, pp. 2169-2178). IEEE.
Dalgic, F. C., Bozkir, A. S., & Aydos, M. (2018, October). Phish-IRIS: A New Approach for Vision Based Brand Prediction of Phishing Web Pages via Compact Visual Descriptors. In 2018 2nd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT) (pp. 1-8). IEEE.

There are 21 citations in total.

Details

Primary Language	English
Subjects	Engineering
Journal Section	Articles
Authors	Esra Eroğlu This is me 0000-0002-6140-6894 Ahmet Selman Bozkır This is me 0000-0003-4305-7800 Murat Aydos This is me 0000-0002-7570-9204
Publication Date	October 31, 2019
Published in Issue	Year 2019

Cite

APA	Eroğlu, E., Bozkır, A. S., & Aydos, M. (2019). Brand Recognition of Phishing Web Pages via Global Image Descriptors. Avrupa Bilim Ve Teknoloji Dergisi436-443. https://doi.org/10.31590/ejosat.638397