Research Article
BibTex RIS Cite

Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset

Year 2024, , 213 - 221, 31.12.2024
https://doi.org/10.26650/acin.1521835

Abstract

Phishing attacks continue to pose a major challenge in today’s digital world; thus, sophisticated detection techniques are required to address constantly changing tactics. In this paper, we have proposed an innovative method to identify phishing attempts using the extensive PhiUSIIL dataset. The proposed dataset comprises 134,850 legitimate URLs and 100,945 phishing URLs, providing a robust foundation for analysis. We applied the t-SNE technique for feature extraction, condensing the original 51 features into only 2, while preserving high detection accuracy. We evaluated several machine learning algorithms on both full and reduced datasets, including Logistic Regression, Naive Bayes, k-Nearest Neighbors (kNN), Decision Trees, and Random Forest. The Decision Tree algorithm showed the best performance on the original dataset, achieving 99.7% accuracy. Interestingly, the proposed kNN demonstrated remarkable results on feature-extracted data, achieving 99.2% accuracy. We observed significant improvements in Logistic Regression and Random Forest performance when using the feature-extracted dataset. The proposed method offers substantial benefits in terms of computational efficiency. The feature-extracted dataset requires less processing power; thus, it is well-suited for systems with limited resources. These findings pave the way for developing more powerful and flexible phishing detection systems that can identify and neutralize emerging threats in real-time scenarios.

References

  • Aburrous, M., Hossain, M. A., Dahal, K., & Thabtah, F. (2010). Intelligent phishing detection system for e-banking using fuzzy data mining. Expert Systems with Applications, 37(12), 7913-7921. doi:10.1016/J.ESWA.2010.04.044 google scholar
  • Adebowale, M. A., Lwin, K. T., & Hossain, M. A. (2019). Deep learning with convolutional neural network and long short-term memory for phishing detection. 2019 13th International Conference on Software, Knowledge, Information Management and Applications, SKIMA 2019. doi:10.1109/SKIMA47702.2019.8982427 google scholar
  • Alam, M. N., Sarma, D., Lima, F. F., Saha, I., Ulfath, R. E., & Hossain, S. (2020). Phishing Attacks Detection using Machine Learning Approach. 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), 1173-1179. doi:10.1109/ICSSIT48917.2020.9214225 google scholar
  • Alhudhaif, A., Almaslukh, B., Aseeri, A. O., Guler, O., & Polat, K. (2023). A novel nonlinear automated multi-class skin lesion detection system using soft-attention based convolutional neural networks. Chaos, Solitons & Fractals, 170, 113409. doi:10.1016/J.CHAOS.2023.113409 google scholar
  • Alsaç, A., Yenisey, M. M., Ganiz, M., Dagtekin, M., & Ulusinan, T. (2023). The Efficiency of Regularization Method on Model Success in Issue Type Prediction Problem. Acta Infologica, 7(2), 360-383. doi:10.26650/ACIN.1394019 google scholar
  • Atawneh, S., & Aljehani, H. (2023). Phishing Email Detection Model Using Deep Learning. Electronics 2023, Vol. 12, Page 4261, 12(20), 4261. doi:10.3390/ELECTRONICS12204261 google scholar
  • Bergholz, A., De Beer, J., Glahn, S., Moens, M. F., PaaB, G., & Strobel, S. (2010). New filtering approaches for phishing email. Journal of Computer Security, 18(1), 7-35. doi:10.3233/JCS-2010-0371 google scholar
  • Bibal, A., Delchevalerie, V., & Frenay, B. (2023). DT-SNE: t-SNE discrete visualizations as decision tree structures. Neurocomputing, 529, 101-112. doi:10.1016/J.NEUCOM.2023.01.073 google scholar
  • Bibi, H., Shah, S. R., Baig, M. M., Sharif, M. I., Mehmood, M., Akhtar, Z., & Siddique, K. (2024). Phishing Website Detection Using Improved Multilayered Convolutional Neural Networks. Journal of Computer Science, 20(9), 1069-1079. doi:10.3844/JCSSP.2024.1069.1079 google scholar
  • Buyrukoğlu, S., & Savaş, S. (2023). Stacked-Based Ensemble Machine Learning Model for Positioning Footballer. Arabian Journal for Science and Engineering, 48(2), 1371-1383. doi:10.1007/s13369-022-06857-8 google scholar
  • Divakaran, D. M., & Oest, A. (2022). Phishing Detection Leveraging Machine Learning and Deep Learning: A Review. IEEE Security and Privacy, 20(5), 86-95. doi:10.1109/MSEC.2022.3175225 google scholar
  • Doğruel, M., & Soner Kara, S. (2023). Determining the Happiness Class of Countries with Tree-Based Algorithms in Machine Learning. Acta Infologica, 7(2), 0-0. doi:10.26650/ACIN.1251650 google scholar
  • Efeoğlu, E. (2022). Kablosuz Sinyal Gücünü Kullanarak İç Mekan Kullanıcı Lokalizasyonu için Karar Ağacı Algoritmalarının Karşılaştırılması. Acta Infologica, 0(0), 0-0. doi:10.26650/ACIN.1076352 google scholar
  • Etem, T., & Teke, M. (2024). Enhanced deep learning based decision support system for kidney tumour detection. BenchCouncil Transactions on Benchmarks, Standards and Evaluations, 4(2), 100174. doi:10.1016/J.TBENCH.2024.100174 google scholar
  • Garera, S., Provos, N., Chew, M., & Rubin, A. D. (2007). A framework for detection and measurement of phishing attacks. WORM’07 -Proceedings of the 2007 ACM Workshop on Recurring Malcode, 1-8. doi:10.1145/1314389.1314391 google scholar
  • GitHub - judger90/phishing_detection_tsne. (n.d.). Retrieved 19 September 2024, from https://github.com/judger90/phishing_detection_tsne google scholar
  • Gopali, S., Namin, A. S., Abri, F., & Jones, K. S. (2024). The Performance of Sequential Deep Learning Models in Detecting Phishing Websites Using Contextual Features of URLs. In SAC ’24: Proceedings ofthe 39th ACM/SIGAPP Symposium on Applied Computing (pp. 1064-1066). Association for Computing Machinery (ACM). doi:10.1145/3605098.3636164 google scholar
  • Güler, O., & Yücedağ, İ. (2022). Hand Gesture Recognition from 2D Images by Using Convolutional Capsule Neural Networks. Arabian Journal for Science and Engineering, 47(2), 1211-1225. doi:10.1007/S13369-021-05867-2/TABLES/8 google scholar
  • Jain, A. K., & Gupta, B. B. (2022). A survey of phishing attack techniques, defence mechanisms and open research challenges. Enterprise Information Systems, 16(4), 527-565. doi:10.1080/17517575.2021.1896786 google scholar
  • Jiang, D., Shi, X., Liang, Y., & Liu, H. (2024). Feature extraction technique based on Shapley value method and improved mRMR algorithm. Measurement, 237, 115190. doi:10.1016/J.MEASUREMENT.2024.115190 google scholar
  • Jishnu, K. S., & Arthi, B. (2023). Review of the effectiveness of machine learning based phishing prevention systems. AIP Conference Proceedings, 2917(1). doi:10.1063/5.0175593/2919402 google scholar
  • Prasad, A., & Chandra, S. (2024). PhiUSIIL: A diverse security profile empowered phishing URL detection framework based on similarity index and incremental learning. Computers & Security, 136, 103545. doi:10.1016/J.COSE.2023.103545 google scholar
  • Thakur, K., Ali, M. L., Obaidat, M. A., & Kamruzzaman, A. (2023). A Systematic Review on Deep-Learning-Based Phishing Email Detection. Electronics 2023, Vol. 12, Page 4545, 12(21), 4545. doi:10.3390/ELECTRONICS12214545 google scholar
  • Tülay, E. (2023). Detection of Orienting Response to Novel Sounds in Healthy Elderly Subjects: A Machine Learning Approach Using EEG Features. Acta Infologica, 0(0), 0-0. doi:10.26650/ACIN.1234106 google scholar
  • Türk, F., Lüy, M., & Barışçı, N. (2020). Kidney and Renal Tumor Segmentation Using a Hybrid V-Net-Based Model. Mathematics 2020, Vol. 8, Page 1772, 8(10), 1772. doi:10.3390/MATH8101772 google scholar
  • Yaman, O., & Tuncer, T. (2023). Plant Classification Method Using Histogram and Machine Learning for Smart Agriculture Applications. Acta Infologica, 0(0), 0-0. doi:10.26650/ACIN.1070261 google scholar
  • Yang, L., Zhang, J., Wang, X., Li, Z., Li, Z., & He, Y. (2021). An improved ELM-based and data preprocessing integrated approach for phishing detection considering comprehensive features. Expert Systems with Applications, 165, 113863. doi:10.1016/J.ESWA.2020.113863 google scholar
Year 2024, , 213 - 221, 31.12.2024
https://doi.org/10.26650/acin.1521835

Abstract

References

  • Aburrous, M., Hossain, M. A., Dahal, K., & Thabtah, F. (2010). Intelligent phishing detection system for e-banking using fuzzy data mining. Expert Systems with Applications, 37(12), 7913-7921. doi:10.1016/J.ESWA.2010.04.044 google scholar
  • Adebowale, M. A., Lwin, K. T., & Hossain, M. A. (2019). Deep learning with convolutional neural network and long short-term memory for phishing detection. 2019 13th International Conference on Software, Knowledge, Information Management and Applications, SKIMA 2019. doi:10.1109/SKIMA47702.2019.8982427 google scholar
  • Alam, M. N., Sarma, D., Lima, F. F., Saha, I., Ulfath, R. E., & Hossain, S. (2020). Phishing Attacks Detection using Machine Learning Approach. 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), 1173-1179. doi:10.1109/ICSSIT48917.2020.9214225 google scholar
  • Alhudhaif, A., Almaslukh, B., Aseeri, A. O., Guler, O., & Polat, K. (2023). A novel nonlinear automated multi-class skin lesion detection system using soft-attention based convolutional neural networks. Chaos, Solitons & Fractals, 170, 113409. doi:10.1016/J.CHAOS.2023.113409 google scholar
  • Alsaç, A., Yenisey, M. M., Ganiz, M., Dagtekin, M., & Ulusinan, T. (2023). The Efficiency of Regularization Method on Model Success in Issue Type Prediction Problem. Acta Infologica, 7(2), 360-383. doi:10.26650/ACIN.1394019 google scholar
  • Atawneh, S., & Aljehani, H. (2023). Phishing Email Detection Model Using Deep Learning. Electronics 2023, Vol. 12, Page 4261, 12(20), 4261. doi:10.3390/ELECTRONICS12204261 google scholar
  • Bergholz, A., De Beer, J., Glahn, S., Moens, M. F., PaaB, G., & Strobel, S. (2010). New filtering approaches for phishing email. Journal of Computer Security, 18(1), 7-35. doi:10.3233/JCS-2010-0371 google scholar
  • Bibal, A., Delchevalerie, V., & Frenay, B. (2023). DT-SNE: t-SNE discrete visualizations as decision tree structures. Neurocomputing, 529, 101-112. doi:10.1016/J.NEUCOM.2023.01.073 google scholar
  • Bibi, H., Shah, S. R., Baig, M. M., Sharif, M. I., Mehmood, M., Akhtar, Z., & Siddique, K. (2024). Phishing Website Detection Using Improved Multilayered Convolutional Neural Networks. Journal of Computer Science, 20(9), 1069-1079. doi:10.3844/JCSSP.2024.1069.1079 google scholar
  • Buyrukoğlu, S., & Savaş, S. (2023). Stacked-Based Ensemble Machine Learning Model for Positioning Footballer. Arabian Journal for Science and Engineering, 48(2), 1371-1383. doi:10.1007/s13369-022-06857-8 google scholar
  • Divakaran, D. M., & Oest, A. (2022). Phishing Detection Leveraging Machine Learning and Deep Learning: A Review. IEEE Security and Privacy, 20(5), 86-95. doi:10.1109/MSEC.2022.3175225 google scholar
  • Doğruel, M., & Soner Kara, S. (2023). Determining the Happiness Class of Countries with Tree-Based Algorithms in Machine Learning. Acta Infologica, 7(2), 0-0. doi:10.26650/ACIN.1251650 google scholar
  • Efeoğlu, E. (2022). Kablosuz Sinyal Gücünü Kullanarak İç Mekan Kullanıcı Lokalizasyonu için Karar Ağacı Algoritmalarının Karşılaştırılması. Acta Infologica, 0(0), 0-0. doi:10.26650/ACIN.1076352 google scholar
  • Etem, T., & Teke, M. (2024). Enhanced deep learning based decision support system for kidney tumour detection. BenchCouncil Transactions on Benchmarks, Standards and Evaluations, 4(2), 100174. doi:10.1016/J.TBENCH.2024.100174 google scholar
  • Garera, S., Provos, N., Chew, M., & Rubin, A. D. (2007). A framework for detection and measurement of phishing attacks. WORM’07 -Proceedings of the 2007 ACM Workshop on Recurring Malcode, 1-8. doi:10.1145/1314389.1314391 google scholar
  • GitHub - judger90/phishing_detection_tsne. (n.d.). Retrieved 19 September 2024, from https://github.com/judger90/phishing_detection_tsne google scholar
  • Gopali, S., Namin, A. S., Abri, F., & Jones, K. S. (2024). The Performance of Sequential Deep Learning Models in Detecting Phishing Websites Using Contextual Features of URLs. In SAC ’24: Proceedings ofthe 39th ACM/SIGAPP Symposium on Applied Computing (pp. 1064-1066). Association for Computing Machinery (ACM). doi:10.1145/3605098.3636164 google scholar
  • Güler, O., & Yücedağ, İ. (2022). Hand Gesture Recognition from 2D Images by Using Convolutional Capsule Neural Networks. Arabian Journal for Science and Engineering, 47(2), 1211-1225. doi:10.1007/S13369-021-05867-2/TABLES/8 google scholar
  • Jain, A. K., & Gupta, B. B. (2022). A survey of phishing attack techniques, defence mechanisms and open research challenges. Enterprise Information Systems, 16(4), 527-565. doi:10.1080/17517575.2021.1896786 google scholar
  • Jiang, D., Shi, X., Liang, Y., & Liu, H. (2024). Feature extraction technique based on Shapley value method and improved mRMR algorithm. Measurement, 237, 115190. doi:10.1016/J.MEASUREMENT.2024.115190 google scholar
  • Jishnu, K. S., & Arthi, B. (2023). Review of the effectiveness of machine learning based phishing prevention systems. AIP Conference Proceedings, 2917(1). doi:10.1063/5.0175593/2919402 google scholar
  • Prasad, A., & Chandra, S. (2024). PhiUSIIL: A diverse security profile empowered phishing URL detection framework based on similarity index and incremental learning. Computers & Security, 136, 103545. doi:10.1016/J.COSE.2023.103545 google scholar
  • Thakur, K., Ali, M. L., Obaidat, M. A., & Kamruzzaman, A. (2023). A Systematic Review on Deep-Learning-Based Phishing Email Detection. Electronics 2023, Vol. 12, Page 4545, 12(21), 4545. doi:10.3390/ELECTRONICS12214545 google scholar
  • Tülay, E. (2023). Detection of Orienting Response to Novel Sounds in Healthy Elderly Subjects: A Machine Learning Approach Using EEG Features. Acta Infologica, 0(0), 0-0. doi:10.26650/ACIN.1234106 google scholar
  • Türk, F., Lüy, M., & Barışçı, N. (2020). Kidney and Renal Tumor Segmentation Using a Hybrid V-Net-Based Model. Mathematics 2020, Vol. 8, Page 1772, 8(10), 1772. doi:10.3390/MATH8101772 google scholar
  • Yaman, O., & Tuncer, T. (2023). Plant Classification Method Using Histogram and Machine Learning for Smart Agriculture Applications. Acta Infologica, 0(0), 0-0. doi:10.26650/ACIN.1070261 google scholar
  • Yang, L., Zhang, J., Wang, X., Li, Z., Li, Z., & He, Y. (2021). An improved ELM-based and data preprocessing integrated approach for phishing detection considering comprehensive features. Expert Systems with Applications, 165, 113863. doi:10.1016/J.ESWA.2020.113863 google scholar
There are 27 citations in total.

Details

Primary Language English
Subjects Machine Learning (Other)
Journal Section Research Article
Authors

Taha Etem 0000-0003-1419-5008

Mustafa Teke 0000-0002-7262-4918

Publication Date December 31, 2024
Submission Date July 24, 2024
Acceptance Date December 11, 2024
Published in Issue Year 2024

Cite

APA Etem, T., & Teke, M. (2024). Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset. Acta Infologica, 8(2), 213-221. https://doi.org/10.26650/acin.1521835
AMA Etem T, Teke M. Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset. ACIN. December 2024;8(2):213-221. doi:10.26650/acin.1521835
Chicago Etem, Taha, and Mustafa Teke. “Advanced Phishing Detection: Leveraging T-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset”. Acta Infologica 8, no. 2 (December 2024): 213-21. https://doi.org/10.26650/acin.1521835.
EndNote Etem T, Teke M (December 1, 2024) Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset. Acta Infologica 8 2 213–221.
IEEE T. Etem and M. Teke, “Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset”, ACIN, vol. 8, no. 2, pp. 213–221, 2024, doi: 10.26650/acin.1521835.
ISNAD Etem, Taha - Teke, Mustafa. “Advanced Phishing Detection: Leveraging T-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset”. Acta Infologica 8/2 (December 2024), 213-221. https://doi.org/10.26650/acin.1521835.
JAMA Etem T, Teke M. Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset. ACIN. 2024;8:213–221.
MLA Etem, Taha and Mustafa Teke. “Advanced Phishing Detection: Leveraging T-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset”. Acta Infologica, vol. 8, no. 2, 2024, pp. 213-21, doi:10.26650/acin.1521835.
Vancouver Etem T, Teke M. Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset. ACIN. 2024;8(2):213-21.