The main objective of this research is to evaluate the performance of machine learning algorithms in the field of credit card fraud detection and then compare them according to various performance metrics. Seven different supervised classification algorithms including Logistic Regression, Decision Trees, Random Forest, XGBoost, Naive Bayes, K-Nearest Neighbors and Support Vector Machine were used. The performance of these algorithms was measured through a comprehensive evaluation of metrics including Accuracy, Precision, Recall, F-Score, AUC and AUPRC values. Furthermore, ROC curves and confusion matrices were used to evaluate these algorithms. The data preparation phase is critical in this study. The data imbalance problem arises as an unequal distribution between fraudulent and non-fraudulent transactions. Addressing this imbalance is imperative for successful model training and subsequent reliable results. Various techniques, such as Scaling and Distribution, Random Under-Sampling, Dimensionality Reduction, and Clustering, are employed to ensure an accurate evaluation of model performance and its ability to generalize effectively. As a result, the "Random Forest" and "K-Nearest Neighbors" algorithms exhibit the highest performance levels in this research with 97% accuracy rates. This study contributes significantly to the ongoing fight against financial fraud and provides valuable guidance for future research efforts.
Primary Language | English |
---|---|
Subjects | Communications Engineering (Other) |
Journal Section | Articles |
Authors | |
Early Pub Date | April 7, 2024 |
Publication Date | April 30, 2024 |
Submission Date | November 4, 2023 |
Acceptance Date | December 3, 2023 |
Published in Issue | Year 2024 |