Prediction of Phishing Web Sites with Deep Learning Using WEKA Environment

Özlem Batur  Dinler; Canan Batur Şahin

doi:10.31590/ejosat.901465

Research Article

Prediction of Phishing Web Sites with Deep Learning Using WEKA Environment

Year 2021, Issue: 24, 35 - 41, 15.04.2021

Özlem Batur Dinler , Canan Batur Şahin

https://doi.org/10.31590/ejosat.901465

Cited By: 4

Abstract

COVID-19 (Coronavirus) disease, observed in the city of Wuhan, China, on December 30, 2019, spread worldwide and caused a global epidemic. Since this epidemic can be transmitted very quickly and easily, some precautions and voluntary quarantine practices that governments have to take have significantly changed the habits of world communities in a short time. This change has especially increased distance activities, such as distance working, distance education, and distance shopping (e-commerce). Therefore, people have felt the need to quickly move the physical platforms they use to digital platforms to meet their daily needs. In this case, web phishing targeting digital platforms has led to a significant increase in online cyber attack types. The increase in phishing and the increasing volume of phishing websites have resulted in greater exposure of the world's information and organizations to various cyberattacks. Thus, after the COVID-19 pandemic in 2019, it has become more important than ever to detect phishing website analysis. In this study, performs the web phishing analysis and makes a comparison of classification performances among five popular methods: Random Forest (RF), Support Vector Machine (SVM), Multilayer Perception (MLP), k-Nearest Neighbour (k-NN), and Deep Learning (DL) by utilizing a Waikato Environment for Knowledge Analysis (WEKA) graphical user interface (GUI). In the experiments conducted with the data set divided into two as training and test, the RF and DL methods were more successful than the other methods compared, but k-NN, achieved a better performance when cross-validation was used. The possible reason for this is a simple approach toward deep learning. We hope the current study can provide guidance in investigating WEKA deep learning for web phishing classification.

Keywords

Machine learning, , Deep learning, , WEKA, , DL4J deep learning architecture, , Web Phishing, , COVID-19.

References

Güven, H. (2020), Changes in E-Commerce in the Covid-19 Pandemic Crisis Process, Eurasian Journal of Researches in Social and Economics (EJRSE), 7(5):251-268, ISSN:2148-9963.
https://atlasvpn.com/blog/google-reports-over-2-million-phishing-sites-in-2020-ytd
Batur Dinler, Ö., Aydın, N. (2020), An Optimal Feature Parameter Set Based on Gated Recurrent Unit Recurrent Neural Networks for Speech Segment Detection, Applied Sciences. 10(4):1273. https://doi.org/10.3390/app10041273.
Moghimi, M., Varjani, A. Y. (2016), New rule-based phishing detection method[J], Expert Systems with Applications, 53: 231-242.
Nguyen HH, Nguyen DT. (2016), Machine Learning based phishing web sites detection. AETA 2015: Recent Advances in Electrical Engineering and Related Sciences. LNEE, 371, 123-131.
Zouina, M., Outtaj, B. (2017), A novel lightweight URL phishing detection system using SVM and similarity index. Human-centric Computing and Information Sciences, vol. 7, p. 17. Springer Open, Netherlands.
Chiew, K.L., Tan, C.L., Wong, K., Yong, K.S., Tiong, W.K. (2019), A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf. Sci. 484, 153–166.
Sahingoz, O.K., Buber, E., Demir, O., Diri, B. (2019), Machine learning based phishing detection from URLs. Expert Syst. Appl. 117, 345–357.
Bahnsen, A.C., Bohorquez, E.C., Villegas, S., Vargas, J., Gonzlez, F.A. (2017), Classifying phishing URLs using recurrent neural networks. In: Proc of 2017 APWG Symposium on Electronic Crime Research (eCrime), pp. 1–8.
Nıvaashını. M. (2017). Deep Boltzmann Machine Based Detection of Phishing URLS, International Journal of Advances in Electronics and Computer Science, Volume-4, Issue-9, Sep.
Yuan, H., Chen, X., Li, Y., Yang, Z., and Liu, W. (2018), Detecting Phishing Websites and Targets Based on URLs and Webpage Links, in 2018 24th International Conference on Pattern Recognition (ICPR), pp.3669–3674, doi: 10.1109/ICPR.2018.8546262.
Selvaganapathy, S.G., Nivaashini, M., and Natarajan, H.P. (2018), Deep belief network based detection and categorization of malicious URLs, Inf. Secur. J., Global Perspective, vol. 27, no. 3, pp. 145–161, Apr.
Chen, W., Zhang, W., and Su, Y. (2018), Phishing detection research based on LSTM recurrent neural network, in Proceedings of International Conference of Pioneering Computer Scientists, Engineers and Educators, pp. 638–645, Springer, Zhengzhou, China, September.
https://en.wikipedia.org/wiki/WeChat.
Gupta, B. B., Arachchilage, N. A. G. & Psannis, K. E. (2018), Defending against phishing attacks: Taxonomy of methods, current issues and future directions. Telecommunication Systems, 67 (2), 247–267.
Prakash, P., Kumar, M., Kompella, R.R., and Gupta, M., (2010), Phish-Net: Predictive blacklisting to detect phishing attacks,” in Proceedings of the 2017 IEEE Conference on Computer Communications (IEEE INFOCOM2010), San Diego, USA, March.
Jain, A.K., and Gupta, B.B. (2016), A novel approach to protect against phishing attacks at client side using auto-updated white-list, EURASIP Journal on Information Security, vol. 2016, no. 1, p. 1-9.
Jain, A.K., and Gupta, B.B. (2017), Phishing Detection: Analysis of Visual Similarity Based Approaches, Security and Communication Networks, vol. 2017, pp. 1–20, doi: 10.1155/2017/5421046.
Babagoli, M., Aghababa, M. P., & Solouk, V. (2018), Heuristic nonlinear regression strategy for detecting phishing websites. Soft Computing, pp: 1–13.
Abu-Nimeh, S., Nappa, D., Wang, X., & Nair, S. (2007), A comparison of machine learning techniques for phishing detection. In Proceedings of the anti-phishing working groups 2nd annual ecrime researchers summit, eCrime ’07, ACM, New York, NY, USA (pp. 60–69). APWG. Accessed 24 July 2018. http://docs.apwg.org/reports/apwg_trends_report_ q4_2016.pdf
UCI Machine Learning Repository, Website Phishing Data Set, https://archive.ics.uci.edu/ml/datasets/Website+Phishi ng (17.01.2021)
Frank, E., Hall, M.A., Witten, I.H. (2016), The Weka Workbench, 4th ed.; Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”,Morgan Kaufmann: Burlington, MA, USA.
Lang, S., Bravo-Marquez, F., Beckham, C., Hall, M., Frank, E. (2019), WekaDeeplearning4j: A Deep Learning Package for Weka based on DeepLearning4j, Knowl.-Based Syst.178, 48–50. [CrossRef]
Mouratidis, D., ve Kermanidis, K. (2019), Paralel Verilerin Dilden Bağımsız Otomatik Seçimi için Topluluk ve Derin Öğrenme. Algoritmalar, 12 (1), 26. doi: 10.3390/ a12010026 .
Şahín, C., and Dírí B. (2019), Robust Feature Selection with LSTM Recurrent Neural Networks for Artificial Immune Recognition System, IEEE Access, Vol.7, pp. 24165 – 24178.

WEKA Ortamını Kullanarak Derin Öğrenme ile Kimlik Hırsızı Web Sitelerinin Tahmini

Year 2021, Issue: 24, 35 - 41, 15.04.2021

Özlem Batur Dinler , Canan Batur Şahin

https://doi.org/10.31590/ejosat.901465

Cited By: 4

Abstract

30 Aralık 2019’da, Çin’in Wuhan şehrinde görülen COVID-19 (Coranavirus) hastalığı, dünya çapında yayılarak küresel bir salgına yol açmıştır. Bu salgın, çok hızlı ve çok kolay bulaşabildiği için hükümetlerin almak zorunda kaldığı birtakım önlemler ve gönüllü karantina uygulamaları, kısa bir süre içerisinde dünya topluluklarının alışkanlıklarını önemli ölçüde değiştirmiştir. Bu değişim özellikle, uzaktan çalışma, uzaktan eğitim ve uzaktan alışveriş (e-ticaret) gibi uzaktaki etkinlikleri artırdı. Bu nedenle insanlar günlük ihtiyaçlarını karşılamak adına kullandıkları fiziksel platformları, hızlıca dijital platformlara taşıma gereksinimi duydular. Bu durumda beraberinde, dijital platformların hedef alındığı web kimlik hırsızlığı çevrimiçi siber saldırı türlerinde ciddi bir artış meydana getirmiştir. Kimlik avındaki artış ve kimlik hırzısı web sitelerinin artan hacmi, dünyadaki bilgilerin ve kuruluşların çeşitli siber saldırılara daha fazla maruz kalmasıyla sonuçlandı. Bu nedenle, 2019'daki COVID-19 salgınından sonra kimlik hırsızı web sitelerinin analizini tespit etmek, her zamankinden daha önemli hale geldi. Bu çalışmada web kimlik hırsızlığı analiz edilmekte ve Bilgi Analizi için Waikato Ortamı (Waikato Environment for Knowledge Analysis - WEKA) grafik kullanıcı arayüzünden (GUI) yararlanarak RF, SVM, MLP, k-NN ve DL’den oluşan beş popüler yöntem arasındaki sınıflandırma performansları karşılaştırılmaktadır. Eğitim ve test olarak ikiye ayrılan veri seti ile yapılan deneylerde RF ve DL yöntemleri diğer yöntemlere göre daha başarılı iken, k-NN, çapraz doğrulama kullanıldığında daha iyi performans elde etmiştir. Bunun olası nedeni, derin öğrnemeye yönelik basit bir yaklaşımdır. Bu çalışmanın, kimlik hırzısı web sitelerinin sınıflandırması için WEKA derin öğrenmeyi araştırmada rehberlik sağlayacağını umuyoruz.

Keywords

Makine öğrenimi, , Derin öğrenme, , WEKA, , DL4J derin öğrenme mimarisi, , Web Kimlik Hırsızlığı, , Covid-19.

Supporting Institution

ARACONF 2021

References

Güven, H. (2020), Changes in E-Commerce in the Covid-19 Pandemic Crisis Process, Eurasian Journal of Researches in Social and Economics (EJRSE), 7(5):251-268, ISSN:2148-9963.
https://atlasvpn.com/blog/google-reports-over-2-million-phishing-sites-in-2020-ytd
Batur Dinler, Ö., Aydın, N. (2020), An Optimal Feature Parameter Set Based on Gated Recurrent Unit Recurrent Neural Networks for Speech Segment Detection, Applied Sciences. 10(4):1273. https://doi.org/10.3390/app10041273.
Moghimi, M., Varjani, A. Y. (2016), New rule-based phishing detection method[J], Expert Systems with Applications, 53: 231-242.
Nguyen HH, Nguyen DT. (2016), Machine Learning based phishing web sites detection. AETA 2015: Recent Advances in Electrical Engineering and Related Sciences. LNEE, 371, 123-131.
Zouina, M., Outtaj, B. (2017), A novel lightweight URL phishing detection system using SVM and similarity index. Human-centric Computing and Information Sciences, vol. 7, p. 17. Springer Open, Netherlands.
Chiew, K.L., Tan, C.L., Wong, K., Yong, K.S., Tiong, W.K. (2019), A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf. Sci. 484, 153–166.
Sahingoz, O.K., Buber, E., Demir, O., Diri, B. (2019), Machine learning based phishing detection from URLs. Expert Syst. Appl. 117, 345–357.
Bahnsen, A.C., Bohorquez, E.C., Villegas, S., Vargas, J., Gonzlez, F.A. (2017), Classifying phishing URLs using recurrent neural networks. In: Proc of 2017 APWG Symposium on Electronic Crime Research (eCrime), pp. 1–8.
Nıvaashını. M. (2017). Deep Boltzmann Machine Based Detection of Phishing URLS, International Journal of Advances in Electronics and Computer Science, Volume-4, Issue-9, Sep.
Yuan, H., Chen, X., Li, Y., Yang, Z., and Liu, W. (2018), Detecting Phishing Websites and Targets Based on URLs and Webpage Links, in 2018 24th International Conference on Pattern Recognition (ICPR), pp.3669–3674, doi: 10.1109/ICPR.2018.8546262.
Selvaganapathy, S.G., Nivaashini, M., and Natarajan, H.P. (2018), Deep belief network based detection and categorization of malicious URLs, Inf. Secur. J., Global Perspective, vol. 27, no. 3, pp. 145–161, Apr.
Chen, W., Zhang, W., and Su, Y. (2018), Phishing detection research based on LSTM recurrent neural network, in Proceedings of International Conference of Pioneering Computer Scientists, Engineers and Educators, pp. 638–645, Springer, Zhengzhou, China, September.
https://en.wikipedia.org/wiki/WeChat.
Gupta, B. B., Arachchilage, N. A. G. & Psannis, K. E. (2018), Defending against phishing attacks: Taxonomy of methods, current issues and future directions. Telecommunication Systems, 67 (2), 247–267.
Prakash, P., Kumar, M., Kompella, R.R., and Gupta, M., (2010), Phish-Net: Predictive blacklisting to detect phishing attacks,” in Proceedings of the 2017 IEEE Conference on Computer Communications (IEEE INFOCOM2010), San Diego, USA, March.
Jain, A.K., and Gupta, B.B. (2016), A novel approach to protect against phishing attacks at client side using auto-updated white-list, EURASIP Journal on Information Security, vol. 2016, no. 1, p. 1-9.
Jain, A.K., and Gupta, B.B. (2017), Phishing Detection: Analysis of Visual Similarity Based Approaches, Security and Communication Networks, vol. 2017, pp. 1–20, doi: 10.1155/2017/5421046.
Babagoli, M., Aghababa, M. P., & Solouk, V. (2018), Heuristic nonlinear regression strategy for detecting phishing websites. Soft Computing, pp: 1–13.
Abu-Nimeh, S., Nappa, D., Wang, X., & Nair, S. (2007), A comparison of machine learning techniques for phishing detection. In Proceedings of the anti-phishing working groups 2nd annual ecrime researchers summit, eCrime ’07, ACM, New York, NY, USA (pp. 60–69). APWG. Accessed 24 July 2018. http://docs.apwg.org/reports/apwg_trends_report_ q4_2016.pdf
UCI Machine Learning Repository, Website Phishing Data Set, https://archive.ics.uci.edu/ml/datasets/Website+Phishi ng (17.01.2021)
Frank, E., Hall, M.A., Witten, I.H. (2016), The Weka Workbench, 4th ed.; Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”,Morgan Kaufmann: Burlington, MA, USA.
Lang, S., Bravo-Marquez, F., Beckham, C., Hall, M., Frank, E. (2019), WekaDeeplearning4j: A Deep Learning Package for Weka based on DeepLearning4j, Knowl.-Based Syst.178, 48–50. [CrossRef]
Mouratidis, D., ve Kermanidis, K. (2019), Paralel Verilerin Dilden Bağımsız Otomatik Seçimi için Topluluk ve Derin Öğrenme. Algoritmalar, 12 (1), 26. doi: 10.3390/ a12010026 .
Şahín, C., and Dírí B. (2019), Robust Feature Selection with LSTM Recurrent Neural Networks for Artificial Immune Recognition System, IEEE Access, Vol.7, pp. 24165 – 24178.

There are 25 citations in total.

Details

Primary Language	English
Subjects	Engineering
Journal Section	Articles
Authors	Özlem Batur Dinler 0000-0002-2955-6761 Canan Batur Şahin 0000-0002-2131-6368
Publication Date	April 15, 2021
Published in Issue	Year 2021 Issue: 24

Cite

APA	Batur Dinler, Ö., & Batur Şahin, C. (2021). Prediction of Phishing Web Sites with Deep Learning Using WEKA Environment. Avrupa Bilim Ve Teknoloji Dergisi(24), 35-41. https://doi.org/10.31590/ejosat.901465

Cited By

The Effect of Technology and Service on Learning Systems During the COVID-19 Pandemic

European Journal of Science and Technology

Arıfullah ULLAH

https://doi.org/10.31590/ejosat.990073

Semantic-based vulnerability detection by functional connectivity of gated graph sequence neural networks

Soft Computing

https://doi.org/10.1007/s00500-022-07777-3

Learning Optimized Patterns of Software Vulnerabilities with the Clock-Work Memory Mechanism

European Journal of Science and Technology

https://doi.org/10.31590/ejosat.1159875

Optimization of Software Vulnerabilities patterns with the Meta-Heuristic Algorithms

Türk Doğa ve Fen Dergisi

https://doi.org/10.46810/tdfd.1201248

Download Cover Image

Article Files

Full Text