Research Article
BibTex RIS Cite
Year 2021, Volume: 16 Issue: 2, 231 - 243, 15.09.2021

Abstract

References

  • [1] S.S.M. Motiur Rahman, T. Islam, M.I. Jabiullah, PhishStack: Evaluation of Stacked Generalization in Phishing URLs Detection, Procedia Comput. Sci. 167 (2020) 2410–2418. doi:https://doi.org/10.1016/j.procs.2020.03.294.
  • [2] H. Önal, Phishing (Oltalama) Saldırısı Nedir? | BGA Security, BGA Secur. (2021). https://www.bgasecurity.com/2019/09/phishing-oltalama-saldirisi-nedir/ (accessed June 10, 2021).
  • [3] D. Goel, A.K. Jain, Mobile phishing attacks and defence mechanisms: State of art and open research challenges, Comput. Secur. 73 (2018) 519–544. doi:https://doi.org/10.1016/j.cose.2017.12.006.
  • [4] WANDERA, Mobile Threat Landscape Report 2020 | Wandera, 2020. https://www.wandera.com/mobile-threat-landscape/ (accessed June 10, 2021). [5] APWG, Phishing Activity Trends Report Q1 2020, 2020. www.apwg.org.
  • [6] Phishing Statistics: The 29 Latest Phishing Stats to Know in 2020 - Hashed Out by The SSL StoreTM, Hashedout. (2021). https://www.thesslstore.com/blog/phishing-statistics-latest-phishing-stats-to-know/ (accessed June 10, 2021).
  • [7] M. Abdelhamid, The Role of Health Concerns in Phishing Susceptibility: Survey Design Study, J Med Internet Res. 22 (2020) e18394. doi:10.2196/18394.
  • [8] J. Chen, C. Su, Z. Yan, AI-Driven Cyber Security Analytics and Privacy Protection, Secur. Commun. Networks. 2019 (2019) 1859143. doi:10.1155/2019/1859143.
  • [9] O.K. Sahingoz, E. Buber, O. Demir, B. Diri, Machine learning based phishing detection from URLs, Expert Syst. Appl. 117 (2019) 345–357. doi:https://doi.org/10.1016/j.eswa.2018.09.029.
  • [10] E. Zhu, Y. Ju, Z. Chen, F. Liu, X. Fang, DTOF-ANN: An Artificial Neural Network phishing detection model based on Decision Tree and Optimal Features, Appl. Soft Comput. 95 (2020) 106505. doi:https://doi.org/10.1016/j.asoc.2020.106505.
  • [11] S. Sountharrajan, M. Nivashini, S.K. Shandilya, E. Suganya, A.B. Banu, M. Karthiga, Advances in Cyber Security Analytics and Decision Systems, 2020. doi:10.1007/978-3-030-19353-9.
  • [12] P. Yi, Y. Guan, F. Zou, Y. Yao, W. Wang, T. Zhu, Web Phishing Detection Using a Deep Learning Framework, Wirel. Commun. Mob. Comput. 2018 (2018) 4678746. doi:10.1155/2018/4678746.
  • [13] M. Kaytan, D. Hanbay, Effective Classification of Phishing Web Pages Based on New Rules by Using Extreme Learning Machines, Anatol. J. Comput. Sci. 2 (2017) 15–36. https://dergipark.org.tr/download/article-file/333655.
  • [14] E. Chand, Phishing website Detector, Kaggle. (2021). https://www.kaggle.com/eswarchandt/phishing-website-detector (accessed June 7, 2021).
  • [15] A. J., Phishing Websites Detection, Kaggle. (2020). https://www.kaggle.com/akshaya1508/phishing-websites-detection? (accessed June 7, 2021).
  • [16] C. Sitawarin, D. Wagner, On the Robustness of Deep K-Nearest Neighbors, in: 2019 IEEE Secur. Priv. Work., 2019: pp. 1–7. doi:10.1109/spw.2019.00014.
  • [17] A. Niwatkar, Y.K. Kanse, Feature Extraction using Wavelet Transform and Euclidean Distance for speaker recognition system, in: 2020 Int. Conf. Ind. 4.0 Technol., 2020: pp. 145–147. doi:10.1109/I4Tech48345.2020.9102683.
  • [18] S. Zhang, Z. Shi, G. Wang, R. Yan, Z. Zhang, Groundwater radon precursor anomalies identification by decision tree method, Appl. Geochemistry. 121 (2020) 104696. doi:https://doi.org/10.1016/j.apgeochem.2020.104696.
  • [19] Y. Wang, S.-T. Xia, J. Wu, A less-greedy two-term Tsallis Entropy Information Metric approach for decision tree classification, Knowledge-Based Syst. 120 (2017) 34–42. doi:https://doi.org/10.1016/j.knosys.2016.12.021.
  • [20] Scikit-learn kütüphanesi, Scikit Learn. (2021). https://scikit-learn.org/stable/supervised_learning.html#supervised-learning (accessed June 7, 2021).
  • [21] C. Aldrich, Process variable importance analysis by use of random forests in a shapley regression framework, Minerals. 10 (2020) 1–17. doi:10.3390/min10050420.
  • [22] H. Han, X. Guo, H. Yu, Variable selection using Mean Decrease Accuracy and Mean Decrease Gini based on Random Forest, in: 2016 7th IEEE Int. Conf. Softw. Eng. Serv. Sci., 2016: pp. 219–224. doi:10.1109/ICSESS.2016.7883053.
  • [23] M. Toğaçar, B. Ergen, Z. Cömert, Detection of lung cancer on chest CT images using minimum redundancy maximum relevance feature selection method with convolutional neural networks, Biocybern. Biomed. Eng. (2019). doi:https://doi.org/10.1016/j.bbe.2019.11.004.
  • [24] M. Cıbuk, U. Budak, Y. Guo, M.C. Ince, A. Sengur, Efficient deep features selections and classification for flower species recognition, Measurement. 137 (2019) 7–13. doi:https://doi.org/10.1016/j.measurement.2019.01.041.
  • [25] Hanchuan Peng, Python binding to mRMR Feature Selection algorithm, (n.d.). https://github.com/fbrundu/pymrmr.
  • [26] V. Fonti, Feature Selection using LASSO, VU Amsterdam. (2017) 1–26. doi:10.1109/access.2017.2696365.
  • [27] Feature selection - LASSO, Scikit-Learn. (2021). https://scikit-learn.org/stable/modules/feature_selection.html (accessed June 7, 2021).
  • [28] The Sequential model, Keras. (2021). https://keras.io/api/models/sequential/ (accessed June 7, 2021).
  • [29] Y. Yang, S. Liu, Non-porous thin dense layer coating: Key to achieving ultrahigh peak capacities using narrow open tubular columns, Talanta Open. 1 (2020) 100003. doi:https://doi.org/10.1016/j.talo.2020.100003.
  • [30] D. Zou, Y. Cao, D. Zhou, Q. Gu, Gradient descent optimizes over-parameterized deep ReLU networks, Mach. Learn. 109 (2020) 467–492. doi:10.1007/s10994-019-05839-6.
  • [31] S.A. Khan, Phishing Websites Classification using Deep Learning, GitHub. (2021). https://github.com/sohailahmedkhan173/Phishing-Websites-Classification-using-Deep-Learning (accessed June 7, 2021).
  • [32] D. Chicco, G. Jurman, The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation, BMC Genomics. 21 (2020) 6. doi:10.1186/s12864-019-6413-7.
  • [33] M. Toğaçar, B. Ergen, Deep Learning Approach for Classification of Breast Cancer, in: 2018 Int. Conf. Artif. Intell. Data Process., 2018: pp. 1–5. doi:10.1109/idap.2018.8620802.
  • [34] T. Lin, D.E. Capecci, D.M. Ellis, H.A. Rocha, S. Dommaraju, D.S. Oliveira, N.C. Ebner, Susceptibility to Spear-Phishing Emails: Effects of Internet User Demographics and Email Content, ACM Trans. Comput. Hum. Interact. 26 (2019) 32. doi:10.1145/3336141.

Detection of Phishing Attacks on Websites with Lasso Regression, Minimum Redundancy Maximum Relevance Method, Machine Learning Methods, and Deep Learning Model

Year 2021, Volume: 16 Issue: 2, 231 - 243, 15.09.2021

Abstract

Phishing attacks are malicious software designed to steal personal or public. These types of attacks generally use e-mail addresses or aim to impersonate web-based pages to trap users. In such applications, they use textual or visual-based attractive content to lure users into their network. The internet environment is a large network platform with billions of users, and on this platform, users must be able to safely conduct their transactions without being harmed. To ensure the security of web pages simultaneously on a platform with billions of users, artificial intelligence-based software has been developed recently and this situation continues. In this study, analyzes were performed using two datasets. The two datasets consist of a total of 12454 website content. The first dataset consists of 11054 websites and the second dataset consists of 1400 websites. The datasets are divided into two classes, "phishing" and "legitimate". The contributions of machine learning methods, deep learning models, and feature selection methods in detecting phishing attacks were analyzed. The best accuracy success rate for the first dataset was 97.26%. The best accuracy success rate for the second dataset was 94.76%. As a result, it has been observed that feature selection methods contribute to the experimental analysis in general.

References

  • [1] S.S.M. Motiur Rahman, T. Islam, M.I. Jabiullah, PhishStack: Evaluation of Stacked Generalization in Phishing URLs Detection, Procedia Comput. Sci. 167 (2020) 2410–2418. doi:https://doi.org/10.1016/j.procs.2020.03.294.
  • [2] H. Önal, Phishing (Oltalama) Saldırısı Nedir? | BGA Security, BGA Secur. (2021). https://www.bgasecurity.com/2019/09/phishing-oltalama-saldirisi-nedir/ (accessed June 10, 2021).
  • [3] D. Goel, A.K. Jain, Mobile phishing attacks and defence mechanisms: State of art and open research challenges, Comput. Secur. 73 (2018) 519–544. doi:https://doi.org/10.1016/j.cose.2017.12.006.
  • [4] WANDERA, Mobile Threat Landscape Report 2020 | Wandera, 2020. https://www.wandera.com/mobile-threat-landscape/ (accessed June 10, 2021). [5] APWG, Phishing Activity Trends Report Q1 2020, 2020. www.apwg.org.
  • [6] Phishing Statistics: The 29 Latest Phishing Stats to Know in 2020 - Hashed Out by The SSL StoreTM, Hashedout. (2021). https://www.thesslstore.com/blog/phishing-statistics-latest-phishing-stats-to-know/ (accessed June 10, 2021).
  • [7] M. Abdelhamid, The Role of Health Concerns in Phishing Susceptibility: Survey Design Study, J Med Internet Res. 22 (2020) e18394. doi:10.2196/18394.
  • [8] J. Chen, C. Su, Z. Yan, AI-Driven Cyber Security Analytics and Privacy Protection, Secur. Commun. Networks. 2019 (2019) 1859143. doi:10.1155/2019/1859143.
  • [9] O.K. Sahingoz, E. Buber, O. Demir, B. Diri, Machine learning based phishing detection from URLs, Expert Syst. Appl. 117 (2019) 345–357. doi:https://doi.org/10.1016/j.eswa.2018.09.029.
  • [10] E. Zhu, Y. Ju, Z. Chen, F. Liu, X. Fang, DTOF-ANN: An Artificial Neural Network phishing detection model based on Decision Tree and Optimal Features, Appl. Soft Comput. 95 (2020) 106505. doi:https://doi.org/10.1016/j.asoc.2020.106505.
  • [11] S. Sountharrajan, M. Nivashini, S.K. Shandilya, E. Suganya, A.B. Banu, M. Karthiga, Advances in Cyber Security Analytics and Decision Systems, 2020. doi:10.1007/978-3-030-19353-9.
  • [12] P. Yi, Y. Guan, F. Zou, Y. Yao, W. Wang, T. Zhu, Web Phishing Detection Using a Deep Learning Framework, Wirel. Commun. Mob. Comput. 2018 (2018) 4678746. doi:10.1155/2018/4678746.
  • [13] M. Kaytan, D. Hanbay, Effective Classification of Phishing Web Pages Based on New Rules by Using Extreme Learning Machines, Anatol. J. Comput. Sci. 2 (2017) 15–36. https://dergipark.org.tr/download/article-file/333655.
  • [14] E. Chand, Phishing website Detector, Kaggle. (2021). https://www.kaggle.com/eswarchandt/phishing-website-detector (accessed June 7, 2021).
  • [15] A. J., Phishing Websites Detection, Kaggle. (2020). https://www.kaggle.com/akshaya1508/phishing-websites-detection? (accessed June 7, 2021).
  • [16] C. Sitawarin, D. Wagner, On the Robustness of Deep K-Nearest Neighbors, in: 2019 IEEE Secur. Priv. Work., 2019: pp. 1–7. doi:10.1109/spw.2019.00014.
  • [17] A. Niwatkar, Y.K. Kanse, Feature Extraction using Wavelet Transform and Euclidean Distance for speaker recognition system, in: 2020 Int. Conf. Ind. 4.0 Technol., 2020: pp. 145–147. doi:10.1109/I4Tech48345.2020.9102683.
  • [18] S. Zhang, Z. Shi, G. Wang, R. Yan, Z. Zhang, Groundwater radon precursor anomalies identification by decision tree method, Appl. Geochemistry. 121 (2020) 104696. doi:https://doi.org/10.1016/j.apgeochem.2020.104696.
  • [19] Y. Wang, S.-T. Xia, J. Wu, A less-greedy two-term Tsallis Entropy Information Metric approach for decision tree classification, Knowledge-Based Syst. 120 (2017) 34–42. doi:https://doi.org/10.1016/j.knosys.2016.12.021.
  • [20] Scikit-learn kütüphanesi, Scikit Learn. (2021). https://scikit-learn.org/stable/supervised_learning.html#supervised-learning (accessed June 7, 2021).
  • [21] C. Aldrich, Process variable importance analysis by use of random forests in a shapley regression framework, Minerals. 10 (2020) 1–17. doi:10.3390/min10050420.
  • [22] H. Han, X. Guo, H. Yu, Variable selection using Mean Decrease Accuracy and Mean Decrease Gini based on Random Forest, in: 2016 7th IEEE Int. Conf. Softw. Eng. Serv. Sci., 2016: pp. 219–224. doi:10.1109/ICSESS.2016.7883053.
  • [23] M. Toğaçar, B. Ergen, Z. Cömert, Detection of lung cancer on chest CT images using minimum redundancy maximum relevance feature selection method with convolutional neural networks, Biocybern. Biomed. Eng. (2019). doi:https://doi.org/10.1016/j.bbe.2019.11.004.
  • [24] M. Cıbuk, U. Budak, Y. Guo, M.C. Ince, A. Sengur, Efficient deep features selections and classification for flower species recognition, Measurement. 137 (2019) 7–13. doi:https://doi.org/10.1016/j.measurement.2019.01.041.
  • [25] Hanchuan Peng, Python binding to mRMR Feature Selection algorithm, (n.d.). https://github.com/fbrundu/pymrmr.
  • [26] V. Fonti, Feature Selection using LASSO, VU Amsterdam. (2017) 1–26. doi:10.1109/access.2017.2696365.
  • [27] Feature selection - LASSO, Scikit-Learn. (2021). https://scikit-learn.org/stable/modules/feature_selection.html (accessed June 7, 2021).
  • [28] The Sequential model, Keras. (2021). https://keras.io/api/models/sequential/ (accessed June 7, 2021).
  • [29] Y. Yang, S. Liu, Non-porous thin dense layer coating: Key to achieving ultrahigh peak capacities using narrow open tubular columns, Talanta Open. 1 (2020) 100003. doi:https://doi.org/10.1016/j.talo.2020.100003.
  • [30] D. Zou, Y. Cao, D. Zhou, Q. Gu, Gradient descent optimizes over-parameterized deep ReLU networks, Mach. Learn. 109 (2020) 467–492. doi:10.1007/s10994-019-05839-6.
  • [31] S.A. Khan, Phishing Websites Classification using Deep Learning, GitHub. (2021). https://github.com/sohailahmedkhan173/Phishing-Websites-Classification-using-Deep-Learning (accessed June 7, 2021).
  • [32] D. Chicco, G. Jurman, The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation, BMC Genomics. 21 (2020) 6. doi:10.1186/s12864-019-6413-7.
  • [33] M. Toğaçar, B. Ergen, Deep Learning Approach for Classification of Breast Cancer, in: 2018 Int. Conf. Artif. Intell. Data Process., 2018: pp. 1–5. doi:10.1109/idap.2018.8620802.
  • [34] T. Lin, D.E. Capecci, D.M. Ellis, H.A. Rocha, S. Dommaraju, D.S. Oliveira, N.C. Ebner, Susceptibility to Spear-Phishing Emails: Effects of Internet User Demographics and Email Content, ACM Trans. Comput. Hum. Interact. 26 (2019) 32. doi:10.1145/3336141.
There are 33 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section TJST
Authors

Mesut Toğaçar 0000-0002-8264-3899

Publication Date September 15, 2021
Submission Date June 29, 2021
Published in Issue Year 2021 Volume: 16 Issue: 2

Cite

APA Toğaçar, M. (2021). Detection of Phishing Attacks on Websites with Lasso Regression, Minimum Redundancy Maximum Relevance Method, Machine Learning Methods, and Deep Learning Model. Turkish Journal of Science and Technology, 16(2), 231-243.
AMA Toğaçar M. Detection of Phishing Attacks on Websites with Lasso Regression, Minimum Redundancy Maximum Relevance Method, Machine Learning Methods, and Deep Learning Model. TJST. September 2021;16(2):231-243.
Chicago Toğaçar, Mesut. “Detection of Phishing Attacks on Websites With Lasso Regression, Minimum Redundancy Maximum Relevance Method, Machine Learning Methods, and Deep Learning Model”. Turkish Journal of Science and Technology 16, no. 2 (September 2021): 231-43.
EndNote Toğaçar M (September 1, 2021) Detection of Phishing Attacks on Websites with Lasso Regression, Minimum Redundancy Maximum Relevance Method, Machine Learning Methods, and Deep Learning Model. Turkish Journal of Science and Technology 16 2 231–243.
IEEE M. Toğaçar, “Detection of Phishing Attacks on Websites with Lasso Regression, Minimum Redundancy Maximum Relevance Method, Machine Learning Methods, and Deep Learning Model”, TJST, vol. 16, no. 2, pp. 231–243, 2021.
ISNAD Toğaçar, Mesut. “Detection of Phishing Attacks on Websites With Lasso Regression, Minimum Redundancy Maximum Relevance Method, Machine Learning Methods, and Deep Learning Model”. Turkish Journal of Science and Technology 16/2 (September 2021), 231-243.
JAMA Toğaçar M. Detection of Phishing Attacks on Websites with Lasso Regression, Minimum Redundancy Maximum Relevance Method, Machine Learning Methods, and Deep Learning Model. TJST. 2021;16:231–243.
MLA Toğaçar, Mesut. “Detection of Phishing Attacks on Websites With Lasso Regression, Minimum Redundancy Maximum Relevance Method, Machine Learning Methods, and Deep Learning Model”. Turkish Journal of Science and Technology, vol. 16, no. 2, 2021, pp. 231-43.
Vancouver Toğaçar M. Detection of Phishing Attacks on Websites with Lasso Regression, Minimum Redundancy Maximum Relevance Method, Machine Learning Methods, and Deep Learning Model. TJST. 2021;16(2):231-43.