Araştırma Makalesi

Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset

Cilt: 8 Sayı: 2 31 Aralık 2024
PDF İndir
EN

Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset

Öz

Phishing attacks continue to pose a major challenge in today’s digital world; thus, sophisticated detection techniques are required to address constantly changing tactics. In this paper, we have proposed an innovative method to identify phishing attempts using the extensive PhiUSIIL dataset. The proposed dataset comprises 134,850 legitimate URLs and 100,945 phishing URLs, providing a robust foundation for analysis. We applied the t-SNE technique for feature extraction, condensing the original 51 features into only 2, while preserving high detection accuracy. We evaluated several machine learning algorithms on both full and reduced datasets, including Logistic Regression, Naive Bayes, k-Nearest Neighbors (kNN), Decision Trees, and Random Forest. The Decision Tree algorithm showed the best performance on the original dataset, achieving 99.7% accuracy. Interestingly, the proposed kNN demonstrated remarkable results on feature-extracted data, achieving 99.2% accuracy. We observed significant improvements in Logistic Regression and Random Forest performance when using the feature-extracted dataset. The proposed method offers substantial benefits in terms of computational efficiency. The feature-extracted dataset requires less processing power; thus, it is well-suited for systems with limited resources. These findings pave the way for developing more powerful and flexible phishing detection systems that can identify and neutralize emerging threats in real-time scenarios.

Anahtar Kelimeler

Kaynakça

  1. Aburrous, M., Hossain, M. A., Dahal, K., & Thabtah, F. (2010). Intelligent phishing detection system for e-banking using fuzzy data mining. Expert Systems with Applications, 37(12), 7913-7921. doi:10.1016/J.ESWA.2010.04.044 google scholar
  2. Adebowale, M. A., Lwin, K. T., & Hossain, M. A. (2019). Deep learning with convolutional neural network and long short-term memory for phishing detection. 2019 13th International Conference on Software, Knowledge, Information Management and Applications, SKIMA 2019. doi:10.1109/SKIMA47702.2019.8982427 google scholar
  3. Alam, M. N., Sarma, D., Lima, F. F., Saha, I., Ulfath, R. E., & Hossain, S. (2020). Phishing Attacks Detection using Machine Learning Approach. 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), 1173-1179. doi:10.1109/ICSSIT48917.2020.9214225 google scholar
  4. Alhudhaif, A., Almaslukh, B., Aseeri, A. O., Guler, O., & Polat, K. (2023). A novel nonlinear automated multi-class skin lesion detection system using soft-attention based convolutional neural networks. Chaos, Solitons & Fractals, 170, 113409. doi:10.1016/J.CHAOS.2023.113409 google scholar
  5. Alsaç, A., Yenisey, M. M., Ganiz, M., Dagtekin, M., & Ulusinan, T. (2023). The Efficiency of Regularization Method on Model Success in Issue Type Prediction Problem. Acta Infologica, 7(2), 360-383. doi:10.26650/ACIN.1394019 google scholar
  6. Atawneh, S., & Aljehani, H. (2023). Phishing Email Detection Model Using Deep Learning. Electronics 2023, Vol. 12, Page 4261, 12(20), 4261. doi:10.3390/ELECTRONICS12204261 google scholar
  7. Bergholz, A., De Beer, J., Glahn, S., Moens, M. F., PaaB, G., & Strobel, S. (2010). New filtering approaches for phishing email. Journal of Computer Security, 18(1), 7-35. doi:10.3233/JCS-2010-0371 google scholar
  8. Bibal, A., Delchevalerie, V., & Frenay, B. (2023). DT-SNE: t-SNE discrete visualizations as decision tree structures. Neurocomputing, 529, 101-112. doi:10.1016/J.NEUCOM.2023.01.073 google scholar

Ayrıntılar

Birincil Dil

İngilizce

Konular

Makine Öğrenme (Diğer)

Bölüm

Araştırma Makalesi

Yayımlanma Tarihi

31 Aralık 2024

Gönderilme Tarihi

24 Temmuz 2024

Kabul Tarihi

11 Aralık 2024

Yayımlandığı Sayı

Yıl 2024 Cilt: 8 Sayı: 2

Kaynak Göster

APA
Etem, T., & Teke, M. (2024). Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset. Acta Infologica, 8(2), 213-221. https://doi.org/10.26650/acin.1521835
AMA
1.Etem T, Teke M. Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset. ACIN. 2024;8(2):213-221. doi:10.26650/acin.1521835
Chicago
Etem, Taha, ve Mustafa Teke. 2024. “Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset”. Acta Infologica 8 (2): 213-21. https://doi.org/10.26650/acin.1521835.
EndNote
Etem T, Teke M (01 Aralık 2024) Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset. Acta Infologica 8 2 213–221.
IEEE
[1]T. Etem ve M. Teke, “Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset”, ACIN, c. 8, sy 2, ss. 213–221, Ara. 2024, doi: 10.26650/acin.1521835.
ISNAD
Etem, Taha - Teke, Mustafa. “Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset”. Acta Infologica 8/2 (01 Aralık 2024): 213-221. https://doi.org/10.26650/acin.1521835.
JAMA
1.Etem T, Teke M. Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset. ACIN. 2024;8:213–221.
MLA
Etem, Taha, ve Mustafa Teke. “Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset”. Acta Infologica, c. 8, sy 2, Aralık 2024, ss. 213-21, doi:10.26650/acin.1521835.
Vancouver
1.Taha Etem, Mustafa Teke. Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset. ACIN. 01 Aralık 2024;8(2):213-21. doi:10.26650/acin.1521835