Research Article
BibTex RIS Cite

Kimlik avı web sitelerinin tespitinde makine öğrenmesi algoritmalarının karşılaştırmalı analizi

Year 2018, Volume: 24 Issue: 2, 276 - 282, 30.04.2018

Abstract

Web
uygulamalarının kullanım oranındaki artış ile birlikte sayısı artan kötücül web
siteleri ve saldırılar, son kullanıcıya ciddi zararlar vermektedir. Kişisel ve
hassas bilgilerin çalınmasına yönelik bu saldırılardan biri Kimlik Avı
saldırısıdır. Yayımlanan güvenlik raporlarında son yıllarda milyonlarca yeni
kimlik avı sahteciliği yapan web sayfası tespit edildiği ifade edilmektedir.
Böylesi kritik bir durumda bu web sayfalarının tespiti büyük önem arz
etmektedir. Bu çalışmada, bir veri kümesi ile birlikte literatürde bulunan
makine öğrenmesi sınıflandırma algoritmaları kullanılarak karşılaştırmalı
analiz yapılmıştır. Analiz sonuçları, Kimlik Avı Sahteciliği çalışmalarında
kullanılan sınıflandırma algoritmalarının hangi koşullarda tercih edilmesi
gerektiği hakkında farklı parametreler bulunduğunu göstermektedir.

References

  • McAfee Inc. "McAfee Labs Threats Report-February 2015". http://www.mcafee.com/us/resources/reports/rp-quarterly-threat-q4-2014.pdf (26.03.2016).
  • Symantec Corp. "Internet Security Threat Report-ISTR 20 April 2015". https://www4.symantec.com/mktginfo/whitepaper/ISTR/21347932_GA-internet-security-threat-report-volume-20-2015-social_v2.pdf (26.03.2016).
  • McAfee Inc. "McAfee Labs Threats Report-March 2016". http://www.mcafee.com/us/resources/reports/rp-quarterly-threats-mar-2016.pdf (01.05.2016).
  • UCI Machine Learning Repository. "Phishing Websites Dataset". https://archive.ics.uci.edu/ml/datasets/Phishing+Websites (26.03.2016).
  • Kazemian HB, Ahmed S. “Comparisons of machine learning techniques for detecting malicious webpages”. Expert Systems with Applications, 42(3), 1166-1177, 2015.
  • Li Y, Yang L, Ding J. “A minimum enclosing ball-based support vector machine approach for detection of phishing websites”. Optik, 127(1). 345-351. 2016.
  • Moghimi M, Varjani AY. “New rule-based phishing detection method”. Expert Systems with Applications,53, 231-242. 2016.
  • Nguyen LAT, To BL, Nguyen HK, Nguyen MH. "Detecting phishing web sites: A heuristic URL-based approach". 2013 International Conference on Advanced Technologies for Communications (ATC 2013), Hochiminh, Vietnamese, 16-18 October 2013.
  • Mohammad RM, Thabtah F, McCluskey L, Ieee. “An assessment of features related to phishing websites using an automated technique”. 2012 International Conference for Internet Technology and Secured Transactions, London, UK, 10-12 December 2012.
  • Phishtank. "Phishtank Archive". https://www.phishtank.com/phish_archive.php (01.05.2016).
  • Mohammad RM, Thabtah F, McCluskey L. “Predicting phishing websites based on self-structuring neural network”. Neural Computing & Applications, 25(2), 443-458. 2014.
  • Mohammad RM, Thabtah F, McCluskey L. “Intelligent rule-based phishing websites classification”. Iet Information Security, 8(3), 153-160, 2014.
  • Selvan K, Vanitha M. “A Machine Learning Approach for Detection of Phished Websites Using Neural Networks”. International Journal of Recent Technology and Engineering (IJRTE), 4(6), 19-23, 2016.
  • Singh P, Jain N, Maini A. "Investigating the effect of feature selection and dimensionality reduction on phishing website classification problem". 1st International Conference on Next Generation Computing Technologies (NGCT 2015), Dehradun, India, 4-5 September 2015.
  • The University of Maikato. "Weka Machine Learning Algorithms Collection Tool". http://www.cs.waikato.ac.nz/ml/weka/ (01.05.2016).
  • Quinlan JR. C4.5: Programs for Machine Learning. 1st ed. Massachusetts, USA, Morgan Kaufmann Publishers Inc., 1993.
  • Quinlan JR. “Induction of decision trees”. Machine Learning, 1(1), 81-106, 1986.
  • Cendrowska J. “PRISM: An algorithm for inducing modular rules”. International Journal of Man-Machine Studies. 27(4), 349-370, 1987.
  • Cohen WW. "Fast effective rule induction". Twelfth International Conference on Machine Learning (ML95), California, USA, 9-12 July 1995.
  • John GH, Langley P. "Estimating continuous distributions in Bayesian classifiers". Eleventh conference on Uncertainty in artificial intelligence, Montreal, Canada, 18-20 August 1995.
  • Dimitoglou G, Adams JA, Jim CM. “Comparison of the C4.5 and a Naïve Bayes classifier for the prediction of lung cancer survivability”. Journal of Computing, 4(8), 1-9, 2012.
  • Huang J, Lu J, Ling CX. "Comparing naive Bayes, decision trees, and SVM with AUC and accuracy". Third IEEE International Conference on Data Mining (ICDM), Melbourne, USA, 22 November 2003.
  • Aha DW, Kibler D, Albert MK. “Instance-based learning algorithms”. Machine learning, 6(1), 37-66. 1991.
  • Tan S. “An effective refinement strategy for KNN text classifier”. Expert Systems with Applications, 30(2), 290-298, 2006.
  • Breiman L. “Random forests”. Machine Learning. 45(1),5-32. 2001.
  • Davis J, Goadrich M. "The relationship between Precision-Recall and ROC curves". 23rd international Conference on Machine Learning, Pennsylvania, USA, 25-29 June 2006.
  • Fawcett T. “An introduction to ROC analysis”. Pattern recognition letters, 27(8), 861-874, 2006.
  • Ferri C, Hernández-Orallo J, Modroiu R. “An experimental comparison of performance measures for classification”. Pattern Recognition Letters, 30(1), 27-38. 2009.
  • Powers DM. “Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation”. Journal of Machine Learning Technologies, 2(1), 37-63. 2011.

Comparative analysis of machine learning algorithms in detection of phishing websites

Year 2018, Volume: 24 Issue: 2, 276 - 282, 30.04.2018

Abstract

The
increasing number of malicious web sites and attacks, along with the increase
in the usage rate of web applications, cause severe damage to the end user. One
of these attacks aimed at stealing personal and sensitive information is the
Phishing Attack. In the published security reports, it is stated that in recent
years there has been millions of web pages that have made new phishing scams.
In such a critical situation, the identification of these web pages is of great
importance. In this study, a comparative analysis was made on a mentioned
dataset using machine learning classification algorithms in the literature. The
results of the analysis show that the classification algorithms used have different
parameters about which conditions should be preferred in the studies on
Phishing Fraud.

References

  • McAfee Inc. "McAfee Labs Threats Report-February 2015". http://www.mcafee.com/us/resources/reports/rp-quarterly-threat-q4-2014.pdf (26.03.2016).
  • Symantec Corp. "Internet Security Threat Report-ISTR 20 April 2015". https://www4.symantec.com/mktginfo/whitepaper/ISTR/21347932_GA-internet-security-threat-report-volume-20-2015-social_v2.pdf (26.03.2016).
  • McAfee Inc. "McAfee Labs Threats Report-March 2016". http://www.mcafee.com/us/resources/reports/rp-quarterly-threats-mar-2016.pdf (01.05.2016).
  • UCI Machine Learning Repository. "Phishing Websites Dataset". https://archive.ics.uci.edu/ml/datasets/Phishing+Websites (26.03.2016).
  • Kazemian HB, Ahmed S. “Comparisons of machine learning techniques for detecting malicious webpages”. Expert Systems with Applications, 42(3), 1166-1177, 2015.
  • Li Y, Yang L, Ding J. “A minimum enclosing ball-based support vector machine approach for detection of phishing websites”. Optik, 127(1). 345-351. 2016.
  • Moghimi M, Varjani AY. “New rule-based phishing detection method”. Expert Systems with Applications,53, 231-242. 2016.
  • Nguyen LAT, To BL, Nguyen HK, Nguyen MH. "Detecting phishing web sites: A heuristic URL-based approach". 2013 International Conference on Advanced Technologies for Communications (ATC 2013), Hochiminh, Vietnamese, 16-18 October 2013.
  • Mohammad RM, Thabtah F, McCluskey L, Ieee. “An assessment of features related to phishing websites using an automated technique”. 2012 International Conference for Internet Technology and Secured Transactions, London, UK, 10-12 December 2012.
  • Phishtank. "Phishtank Archive". https://www.phishtank.com/phish_archive.php (01.05.2016).
  • Mohammad RM, Thabtah F, McCluskey L. “Predicting phishing websites based on self-structuring neural network”. Neural Computing & Applications, 25(2), 443-458. 2014.
  • Mohammad RM, Thabtah F, McCluskey L. “Intelligent rule-based phishing websites classification”. Iet Information Security, 8(3), 153-160, 2014.
  • Selvan K, Vanitha M. “A Machine Learning Approach for Detection of Phished Websites Using Neural Networks”. International Journal of Recent Technology and Engineering (IJRTE), 4(6), 19-23, 2016.
  • Singh P, Jain N, Maini A. "Investigating the effect of feature selection and dimensionality reduction on phishing website classification problem". 1st International Conference on Next Generation Computing Technologies (NGCT 2015), Dehradun, India, 4-5 September 2015.
  • The University of Maikato. "Weka Machine Learning Algorithms Collection Tool". http://www.cs.waikato.ac.nz/ml/weka/ (01.05.2016).
  • Quinlan JR. C4.5: Programs for Machine Learning. 1st ed. Massachusetts, USA, Morgan Kaufmann Publishers Inc., 1993.
  • Quinlan JR. “Induction of decision trees”. Machine Learning, 1(1), 81-106, 1986.
  • Cendrowska J. “PRISM: An algorithm for inducing modular rules”. International Journal of Man-Machine Studies. 27(4), 349-370, 1987.
  • Cohen WW. "Fast effective rule induction". Twelfth International Conference on Machine Learning (ML95), California, USA, 9-12 July 1995.
  • John GH, Langley P. "Estimating continuous distributions in Bayesian classifiers". Eleventh conference on Uncertainty in artificial intelligence, Montreal, Canada, 18-20 August 1995.
  • Dimitoglou G, Adams JA, Jim CM. “Comparison of the C4.5 and a Naïve Bayes classifier for the prediction of lung cancer survivability”. Journal of Computing, 4(8), 1-9, 2012.
  • Huang J, Lu J, Ling CX. "Comparing naive Bayes, decision trees, and SVM with AUC and accuracy". Third IEEE International Conference on Data Mining (ICDM), Melbourne, USA, 22 November 2003.
  • Aha DW, Kibler D, Albert MK. “Instance-based learning algorithms”. Machine learning, 6(1), 37-66. 1991.
  • Tan S. “An effective refinement strategy for KNN text classifier”. Expert Systems with Applications, 30(2), 290-298, 2006.
  • Breiman L. “Random forests”. Machine Learning. 45(1),5-32. 2001.
  • Davis J, Goadrich M. "The relationship between Precision-Recall and ROC curves". 23rd international Conference on Machine Learning, Pennsylvania, USA, 25-29 June 2006.
  • Fawcett T. “An introduction to ROC analysis”. Pattern recognition letters, 27(8), 861-874, 2006.
  • Ferri C, Hernández-Orallo J, Modroiu R. “An experimental comparison of performance measures for classification”. Pattern Recognition Letters, 30(1), 27-38. 2009.
  • Powers DM. “Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation”. Journal of Machine Learning Technologies, 2(1), 37-63. 2011.
There are 29 citations in total.

Details

Primary Language Turkish
Subjects Engineering
Journal Section Research Article
Authors

Muhammed Ali Koşan 0000-0002-1422-6006

Oktay Yıldız 0000-0001-9155-7426

Hacer Karacan 0000-0001-6788-008X

Publication Date April 30, 2018
Published in Issue Year 2018 Volume: 24 Issue: 2

Cite

APA Koşan, M. A., Yıldız, O., & Karacan, H. (2018). Kimlik avı web sitelerinin tespitinde makine öğrenmesi algoritmalarının karşılaştırmalı analizi. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 24(2), 276-282.
AMA Koşan MA, Yıldız O, Karacan H. Kimlik avı web sitelerinin tespitinde makine öğrenmesi algoritmalarının karşılaştırmalı analizi. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. April 2018;24(2):276-282.
Chicago Koşan, Muhammed Ali, Oktay Yıldız, and Hacer Karacan. “Kimlik Avı Web Sitelerinin Tespitinde Makine öğrenmesi algoritmalarının karşılaştırmalı Analizi”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 24, no. 2 (April 2018): 276-82.
EndNote Koşan MA, Yıldız O, Karacan H (April 1, 2018) Kimlik avı web sitelerinin tespitinde makine öğrenmesi algoritmalarının karşılaştırmalı analizi. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 24 2 276–282.
IEEE M. A. Koşan, O. Yıldız, and H. Karacan, “Kimlik avı web sitelerinin tespitinde makine öğrenmesi algoritmalarının karşılaştırmalı analizi”, Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, vol. 24, no. 2, pp. 276–282, 2018.
ISNAD Koşan, Muhammed Ali et al. “Kimlik Avı Web Sitelerinin Tespitinde Makine öğrenmesi algoritmalarının karşılaştırmalı Analizi”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 24/2 (April 2018), 276-282.
JAMA Koşan MA, Yıldız O, Karacan H. Kimlik avı web sitelerinin tespitinde makine öğrenmesi algoritmalarının karşılaştırmalı analizi. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. 2018;24:276–282.
MLA Koşan, Muhammed Ali et al. “Kimlik Avı Web Sitelerinin Tespitinde Makine öğrenmesi algoritmalarının karşılaştırmalı Analizi”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, vol. 24, no. 2, 2018, pp. 276-82.
Vancouver Koşan MA, Yıldız O, Karacan H. Kimlik avı web sitelerinin tespitinde makine öğrenmesi algoritmalarının karşılaştırmalı analizi. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. 2018;24(2):276-82.





Creative Commons Lisansı
Bu dergi Creative Commons Al 4.0 Uluslararası Lisansı ile lisanslanmıştır.