Today, with the increasing use of the internet, individuals who use email have become potential targets for fraudsters. These malicious groups send fake or misleading emails to steal sensitive information such as identity, bank, and social media credentials. This tactic is known as phishing. This study proposes a machine learning-based system for detecting phishing attacks using the SeFACED dataset, which was adjusted for binary classification with 12,498 normal and 5,142 fraudulent email data points. Python was used for programming, with Google Colab and Jupyter Notebook as development platforms. Email data underwent data collection, cleaning, and word stem separation processes. Three feature extraction techniques were used: Bag of Words, TF-IDF, and Word2Vec. Six algorithms, including Logistic Regression, Random Forest, Support Vector Machines, Naive Bayes, Convolutional Neural Network, and Long Short-Term Memory, were employed for classification. Performance was evaluated using metrics like accuracy, preci-sion, recall, and F1-score. New attributes proposed to enhance detection included CSS tags, HTML tags, black-list words, link errors, and grammar and spelling errors. The addition of these features generally improved classification results.
Phishing Phishing e-mail Phishing attacks Machine learning Deep learning Classification Phishing e-mail classification
Primary Language | English |
---|---|
Subjects | Computer Software |
Journal Section | Araştırma Articlessi |
Authors | |
Early Pub Date | July 11, 2025 |
Publication Date | |
Submission Date | May 27, 2024 |
Acceptance Date | January 10, 2025 |
Published in Issue | Year 2025 Volume: 13 Issue: 2 |
All articles published by BAJECE are licensed under the Creative Commons Attribution 4.0 International License. This permits anyone to copy, redistribute, remix, transmit and adapt the work provided the original work and source is appropriately cited.