Research Article
BibTex RIS Cite

Comparative analysis of machine learning techniques for credit card fraud detection: Dealing with imbalanced datasets

Year 2024, , 196 - 208, 30.04.2024
https://doi.org/10.31127/tuje.1386127

Abstract

The main objective of this research is to evaluate the performance of machine learning algorithms in the field of credit card fraud detection and then compare them according to various performance metrics. Seven different supervised classification algorithms including Logistic Regression, Decision Trees, Random Forest, XGBoost, Naive Bayes, K-Nearest Neighbors and Support Vector Machine were used. The performance of these algorithms was measured through a comprehensive evaluation of metrics including Accuracy, Precision, Recall, F-Score, AUC and AUPRC values. Furthermore, ROC curves and confusion matrices were used to evaluate these algorithms. The data preparation phase is critical in this study. The data imbalance problem arises as an unequal distribution between fraudulent and non-fraudulent transactions. Addressing this imbalance is imperative for successful model training and subsequent reliable results. Various techniques, such as Scaling and Distribution, Random Under-Sampling, Dimensionality Reduction, and Clustering, are employed to ensure an accurate evaluation of model performance and its ability to generalize effectively. As a result, the "Random Forest" and "K-Nearest Neighbors" algorithms exhibit the highest performance levels in this research with 97% accuracy rates. This study contributes significantly to the ongoing fight against financial fraud and provides valuable guidance for future research efforts.

References

  • Akers, D., Golter, J., Lamm, B., & Solt, M. (2005). Overview of recent developments in the credit card industry. FDIC Banking Review, 17, 23-35.
  • Heggestuen, J. (2020). Credit-card fraud surges 35% as coronavirus freezes the economy and wipes out jobs. Business Insider. https://markets.businessinsider.com/news/stocks/credit-card-account-fraud-skyrockets-coronavirus-pandemic-recession-economy-layoffs-2020-5-1029246107
  • Çalışkan, M. A. (2021). Credit card fraud in Turkey increased by 25% in 2020. Hürriyet. https://www.hurriyet.com.tr/haberleri/kredi-karti-dolandiriciligi
  • Bhatla, T. P., Prabhu, V., & Dua, A. (2003). Understanding credit card frauds. Cards Business Review, 1(6), 1-15.
  • Şenel, S. A., & Arslan, Ö. (2019). The role of forensic accounting profession in preventing the accounting scandals. Cumhuriyet University Journal of Economics and Administrative Sciences, 20(1), 293-308
  • Tripathi, K. K., & Pavaskar, M. A. (2012). Survey on credit card fraud detection methods. International Journal of Emerging Technology and Advanced Engineering, 2(11), 721-726.
  • Sevli, O. (2022). Kredi kartı dolandırıcılığının yapay sinir ağları kullanılarak tespiti. 11th International Conference on Applied Sciences, 233-240. Academy Global Publishing House.
  • Joo, S. H., Grable, J. E., & Bagwell, D. C. (2003). Credit card attitudes and behaviors of college students. College Student Journal, 37(3), 405-420.
  • Fogarty, T. C., Ireson, N. S., & Battle, S. A. (1992). Developing rule-based systems for credit-card applications from data with the genetic algorithm. IMA Journal of Management Mathematics, 4(1), 53-59. https://doi.org/10.1093/imaman/4.1.53
  • Raj, S. B. E., & Portia, A. A. (2011). Analysis on credit card fraud detection methods. In 2011 International Conference on Computer, Communication and Electrical Technology (ICCCET), 152-156. https://doi.org/10.1109/ICCCET.2011.5762457
  • Dornadula, V. N., & Geetha, S. (2019). Credit card fraud detection using machine learning algorithms. Procedia Computer Science, 165, 631-641. https://doi.org/10.1016/j.procs.2020.01.057
  • Yee, O. S., Sagadevan, S., & Malim, N. H. A. H. (2018). Credit card fraud detection using machine learning as data mining technique. Journal of Telecommunication, Electronic and Computer Engineering (JTEC), 10(1-4), 23-27.
  • Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357. https://doi.org/10.1613/jair.953
  • Jha, S., Guillen, M., & Westland, J. C. (2012). Employing transaction aggregation strategy to detect credit card fraud. Expert Systems with Applications, 39(16), 12650-12657. https://doi.org/10.1016/j.eswa.2012.05.018
  • Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., ... & He, Q. (2020). A comprehensive survey on transfer learning. Proceedings of the IEEE, 109(1), 43-76. https://doi.org/10.1109/JPROC.2020.3004555
  • Dal Pozzolo, A., Caelen, O., Le Borgne, Y. A., Waterschoot, S., & Bontempi, G. (2014). Learned lessons in credit card fraud detection from a practitioner perspective. Expert Systems with Applications, 41(10), 4915-4928. https://doi.org/10.1016/j.eswa.2014.02.026
  • Bhattacharyya, S., Jha, S., Tharakunnel, K., & Westland, J. C. (2011). Data mining for credit card fraud: A comparative study. Decision Support Systems, 50(3), 602-613. https://doi.org/10.1016/j.dss.2010.08.008
  • Pulat, M., & Deveci, I. (2021). Bibliometric Analysis of Theses Published on Machine Learning and Decision Trees in Turkey. Journal of Management and Economics, 28(2), 287-308.
  • Albayrak, A. S., & Yilmaz, S. K. (2009). Veri Madenciliği: Karar ağacı algoritmaları ve İMKB verileri üzerine bir uygulama. Suleyman Demirel University Journal of Faculty of Economics & Administrative Sciences, 14(1), 31-52.
  • Akça, M. F., & Sevli, O. (2022). Predicting acceptance of the bank loan offers by using support vector machines. International Advanced Researches and Engineering Journal, 6(2), 142-147. https://doi.org/10.35860/iarej.1058724
  • Bircan, H. (2004). Logistic regression analysis: An application on medical data. Kocaeli University Journal of Social Sciences, 8, 185-208.
  • Yavuz, A., & Çilengiroğlu, Ö. V. (2020). Lojistik regresyon ve CART yöntemlerinin tahmin edici performanslarının yaşam memnuniyeti verileri için karşılaştırılması. Avrupa Bilim ve Teknoloji Dergisi, (18), 719-727. https://doi.org/10.31590/ejosat.691215
  • Çalış, A., Kayapınar, S., & Çetinyokuş, T. (2014). An application on computer and internet security with decision tree algorithms in data mining. Journal of Industrial Engineering, 25(3), 2-19.
  • Türk, S. T., & Balçık, F. (2023). Rastgele orman algoritması ve Sentinel-2 MSI ile fındık ekili alanların belirlenmesi: Piraziz Örneği. Geomatik, 8(2), 91-98. https://doi.org/10.29128/geomatik.1127925
  • Akar, Ö., & Güngör, O. (2012). Rastgele orman algoritması kullanılarak çok bantlı görüntülerin sınıflandırılması. Jeodezi ve Jeoinformasyon Dergisi, 1(2), 139-146. https://doi.org/10.9733/jgg.241212.1t
  • Alshari, H., Saleh, A. Y., & Odabaş, A. (2021). Comparison of gradient boosting decision tree algorithms for CPU performance. Journal of Institue of Science and Technology, 37(1), 157-168.
  • Şahin, E. M., Sahin, S., & Tanağardıgil, İ. (2021). Battery State of Health and Charge Estimation Using Machine Learning Methods. Avrupa Bilim ve Teknoloji Dergisi, (26), 389-394. https://doi.org/10.31590/ejosat.959630
  • Zhang, H., & Li, D. (2007). Naïve Bayes text classifier. In 2007 IEEE international conference on granular computing (GRC 2007), 708-711. https://doi.org/10.1109/GrC.2007.40
  • Yong, Z., Youwen, L., & Shixiong, X. (2009). An improved KNN text classification algorithm based on clustering. Journal of Computers, 4(3), 230-237.
  • Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., & Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems and their applications, 13(4), 18-28. https://doi.org/ 10.1109/5254.708428
  • Polyzotis, N., Zinkevich, M., Roy, S., Breck, E., & Whang, S. (2019). Data validation for machine learning. Proceedings of Machine Learning and Systems, 1, 334-347.
  • Boyd, K., Eng, K. H., & Page, C. D. (2013). Area under the precision-recall curve: point estimates and confidence intervals. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2013, Prague, Czech Republic, September 23-27, 2013, Proceedings, Part III 13, 451-466. https://doi.org/10.1007/978-3-642-40994-3_29
  • Zhang, Z. (2016). Introduction to machine learning: k-nearest neighbors. Annals of Translational Medicine, 4(11), 218-225. https://doi.org/10.21037/atm.2016.03.37
  • MLG-ULB. (2017). Credit Card Fraud Detection. Kaggle. https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
  • Mishra, A., & Ghorpade, C. (2018). Credit card fraud detection on the skewed data using various classification and ensemble techniques. In 2018 IEEE International Students' Conference on Electrical, Electronics and Computer Science (SCEECS), 1-5. https://doi.org/10.1109/SCEECS.2018.8546939
  • Navamani, C., & Krishnan, S. (2018). Credit card nearest neighbor based outlier detection techniques. International Journal of Computer Techniques, 5(2), 56-60.
  • Kazemi, Z., & Zarrabi, H. (2017). Using deep networks for fraud detection in the credit card transactions. In 2017 IEEE 4th International conference on knowledge-based engineering and innovation (KBEI), 630-633. https://doi.org/10.1109/KBEI.2017.8324876
  • Dhankhad, S., Mohammed, E., & Far, B. (2018). Supervised machine learning algorithms for credit card fraudulent transaction detection: a comparative study. In 2018 IEEE international conference on information reuse and integration (IRI), 122-125. https://doi.org/10.1109/IRI.2018.00025
  • Wang, C., Wang, Y., Ye, Z., Yan, L., Cai, W., & Pan, S. (2018). Credit card fraud detection based on whale algorithm optimized BP neural network. In 2018 13th international Conference on Computer Science & Education (ICCSE), 1-4. https://doi.org/10.1109/ICCSE.2018.8468855
  • Pumsirirat, A., & Liu, Y. (2018). Credit card fraud detection using deep learning based on auto-encoder and restricted boltzmann machine. International Journal of Advanced Computer Science and Applications, 9(1), 18-25.
  • Sarızeybek, A. T., & Sevli, O. (2022). Makine Öğrenmesi Yöntemleri ile Banka Müşterilerinin Kredi Alma Eğiliminin Karşılaştırmalı Analizi. Journal of Intelligent Systems: Theory and Applications, 5(2), 137-144. https://doi.org/10.38016/jista.1036047
Year 2024, , 196 - 208, 30.04.2024
https://doi.org/10.31127/tuje.1386127

Abstract

References

  • Akers, D., Golter, J., Lamm, B., & Solt, M. (2005). Overview of recent developments in the credit card industry. FDIC Banking Review, 17, 23-35.
  • Heggestuen, J. (2020). Credit-card fraud surges 35% as coronavirus freezes the economy and wipes out jobs. Business Insider. https://markets.businessinsider.com/news/stocks/credit-card-account-fraud-skyrockets-coronavirus-pandemic-recession-economy-layoffs-2020-5-1029246107
  • Çalışkan, M. A. (2021). Credit card fraud in Turkey increased by 25% in 2020. Hürriyet. https://www.hurriyet.com.tr/haberleri/kredi-karti-dolandiriciligi
  • Bhatla, T. P., Prabhu, V., & Dua, A. (2003). Understanding credit card frauds. Cards Business Review, 1(6), 1-15.
  • Şenel, S. A., & Arslan, Ö. (2019). The role of forensic accounting profession in preventing the accounting scandals. Cumhuriyet University Journal of Economics and Administrative Sciences, 20(1), 293-308
  • Tripathi, K. K., & Pavaskar, M. A. (2012). Survey on credit card fraud detection methods. International Journal of Emerging Technology and Advanced Engineering, 2(11), 721-726.
  • Sevli, O. (2022). Kredi kartı dolandırıcılığının yapay sinir ağları kullanılarak tespiti. 11th International Conference on Applied Sciences, 233-240. Academy Global Publishing House.
  • Joo, S. H., Grable, J. E., & Bagwell, D. C. (2003). Credit card attitudes and behaviors of college students. College Student Journal, 37(3), 405-420.
  • Fogarty, T. C., Ireson, N. S., & Battle, S. A. (1992). Developing rule-based systems for credit-card applications from data with the genetic algorithm. IMA Journal of Management Mathematics, 4(1), 53-59. https://doi.org/10.1093/imaman/4.1.53
  • Raj, S. B. E., & Portia, A. A. (2011). Analysis on credit card fraud detection methods. In 2011 International Conference on Computer, Communication and Electrical Technology (ICCCET), 152-156. https://doi.org/10.1109/ICCCET.2011.5762457
  • Dornadula, V. N., & Geetha, S. (2019). Credit card fraud detection using machine learning algorithms. Procedia Computer Science, 165, 631-641. https://doi.org/10.1016/j.procs.2020.01.057
  • Yee, O. S., Sagadevan, S., & Malim, N. H. A. H. (2018). Credit card fraud detection using machine learning as data mining technique. Journal of Telecommunication, Electronic and Computer Engineering (JTEC), 10(1-4), 23-27.
  • Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357. https://doi.org/10.1613/jair.953
  • Jha, S., Guillen, M., & Westland, J. C. (2012). Employing transaction aggregation strategy to detect credit card fraud. Expert Systems with Applications, 39(16), 12650-12657. https://doi.org/10.1016/j.eswa.2012.05.018
  • Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., ... & He, Q. (2020). A comprehensive survey on transfer learning. Proceedings of the IEEE, 109(1), 43-76. https://doi.org/10.1109/JPROC.2020.3004555
  • Dal Pozzolo, A., Caelen, O., Le Borgne, Y. A., Waterschoot, S., & Bontempi, G. (2014). Learned lessons in credit card fraud detection from a practitioner perspective. Expert Systems with Applications, 41(10), 4915-4928. https://doi.org/10.1016/j.eswa.2014.02.026
  • Bhattacharyya, S., Jha, S., Tharakunnel, K., & Westland, J. C. (2011). Data mining for credit card fraud: A comparative study. Decision Support Systems, 50(3), 602-613. https://doi.org/10.1016/j.dss.2010.08.008
  • Pulat, M., & Deveci, I. (2021). Bibliometric Analysis of Theses Published on Machine Learning and Decision Trees in Turkey. Journal of Management and Economics, 28(2), 287-308.
  • Albayrak, A. S., & Yilmaz, S. K. (2009). Veri Madenciliği: Karar ağacı algoritmaları ve İMKB verileri üzerine bir uygulama. Suleyman Demirel University Journal of Faculty of Economics & Administrative Sciences, 14(1), 31-52.
  • Akça, M. F., & Sevli, O. (2022). Predicting acceptance of the bank loan offers by using support vector machines. International Advanced Researches and Engineering Journal, 6(2), 142-147. https://doi.org/10.35860/iarej.1058724
  • Bircan, H. (2004). Logistic regression analysis: An application on medical data. Kocaeli University Journal of Social Sciences, 8, 185-208.
  • Yavuz, A., & Çilengiroğlu, Ö. V. (2020). Lojistik regresyon ve CART yöntemlerinin tahmin edici performanslarının yaşam memnuniyeti verileri için karşılaştırılması. Avrupa Bilim ve Teknoloji Dergisi, (18), 719-727. https://doi.org/10.31590/ejosat.691215
  • Çalış, A., Kayapınar, S., & Çetinyokuş, T. (2014). An application on computer and internet security with decision tree algorithms in data mining. Journal of Industrial Engineering, 25(3), 2-19.
  • Türk, S. T., & Balçık, F. (2023). Rastgele orman algoritması ve Sentinel-2 MSI ile fındık ekili alanların belirlenmesi: Piraziz Örneği. Geomatik, 8(2), 91-98. https://doi.org/10.29128/geomatik.1127925
  • Akar, Ö., & Güngör, O. (2012). Rastgele orman algoritması kullanılarak çok bantlı görüntülerin sınıflandırılması. Jeodezi ve Jeoinformasyon Dergisi, 1(2), 139-146. https://doi.org/10.9733/jgg.241212.1t
  • Alshari, H., Saleh, A. Y., & Odabaş, A. (2021). Comparison of gradient boosting decision tree algorithms for CPU performance. Journal of Institue of Science and Technology, 37(1), 157-168.
  • Şahin, E. M., Sahin, S., & Tanağardıgil, İ. (2021). Battery State of Health and Charge Estimation Using Machine Learning Methods. Avrupa Bilim ve Teknoloji Dergisi, (26), 389-394. https://doi.org/10.31590/ejosat.959630
  • Zhang, H., & Li, D. (2007). Naïve Bayes text classifier. In 2007 IEEE international conference on granular computing (GRC 2007), 708-711. https://doi.org/10.1109/GrC.2007.40
  • Yong, Z., Youwen, L., & Shixiong, X. (2009). An improved KNN text classification algorithm based on clustering. Journal of Computers, 4(3), 230-237.
  • Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., & Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems and their applications, 13(4), 18-28. https://doi.org/ 10.1109/5254.708428
  • Polyzotis, N., Zinkevich, M., Roy, S., Breck, E., & Whang, S. (2019). Data validation for machine learning. Proceedings of Machine Learning and Systems, 1, 334-347.
  • Boyd, K., Eng, K. H., & Page, C. D. (2013). Area under the precision-recall curve: point estimates and confidence intervals. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2013, Prague, Czech Republic, September 23-27, 2013, Proceedings, Part III 13, 451-466. https://doi.org/10.1007/978-3-642-40994-3_29
  • Zhang, Z. (2016). Introduction to machine learning: k-nearest neighbors. Annals of Translational Medicine, 4(11), 218-225. https://doi.org/10.21037/atm.2016.03.37
  • MLG-ULB. (2017). Credit Card Fraud Detection. Kaggle. https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
  • Mishra, A., & Ghorpade, C. (2018). Credit card fraud detection on the skewed data using various classification and ensemble techniques. In 2018 IEEE International Students' Conference on Electrical, Electronics and Computer Science (SCEECS), 1-5. https://doi.org/10.1109/SCEECS.2018.8546939
  • Navamani, C., & Krishnan, S. (2018). Credit card nearest neighbor based outlier detection techniques. International Journal of Computer Techniques, 5(2), 56-60.
  • Kazemi, Z., & Zarrabi, H. (2017). Using deep networks for fraud detection in the credit card transactions. In 2017 IEEE 4th International conference on knowledge-based engineering and innovation (KBEI), 630-633. https://doi.org/10.1109/KBEI.2017.8324876
  • Dhankhad, S., Mohammed, E., & Far, B. (2018). Supervised machine learning algorithms for credit card fraudulent transaction detection: a comparative study. In 2018 IEEE international conference on information reuse and integration (IRI), 122-125. https://doi.org/10.1109/IRI.2018.00025
  • Wang, C., Wang, Y., Ye, Z., Yan, L., Cai, W., & Pan, S. (2018). Credit card fraud detection based on whale algorithm optimized BP neural network. In 2018 13th international Conference on Computer Science & Education (ICCSE), 1-4. https://doi.org/10.1109/ICCSE.2018.8468855
  • Pumsirirat, A., & Liu, Y. (2018). Credit card fraud detection using deep learning based on auto-encoder and restricted boltzmann machine. International Journal of Advanced Computer Science and Applications, 9(1), 18-25.
  • Sarızeybek, A. T., & Sevli, O. (2022). Makine Öğrenmesi Yöntemleri ile Banka Müşterilerinin Kredi Alma Eğiliminin Karşılaştırmalı Analizi. Journal of Intelligent Systems: Theory and Applications, 5(2), 137-144. https://doi.org/10.38016/jista.1036047
There are 41 citations in total.

Details

Primary Language English
Subjects Communications Engineering (Other)
Journal Section Articles
Authors

Vahid Sinap 0000-0002-8734-9509

Early Pub Date April 7, 2024
Publication Date April 30, 2024
Submission Date November 4, 2023
Acceptance Date December 3, 2023
Published in Issue Year 2024

Cite

APA Sinap, V. (2024). Comparative analysis of machine learning techniques for credit card fraud detection: Dealing with imbalanced datasets. Turkish Journal of Engineering, 8(2), 196-208. https://doi.org/10.31127/tuje.1386127
AMA Sinap V. Comparative analysis of machine learning techniques for credit card fraud detection: Dealing with imbalanced datasets. TUJE. April 2024;8(2):196-208. doi:10.31127/tuje.1386127
Chicago Sinap, Vahid. “Comparative Analysis of Machine Learning Techniques for Credit Card Fraud Detection: Dealing With Imbalanced Datasets”. Turkish Journal of Engineering 8, no. 2 (April 2024): 196-208. https://doi.org/10.31127/tuje.1386127.
EndNote Sinap V (April 1, 2024) Comparative analysis of machine learning techniques for credit card fraud detection: Dealing with imbalanced datasets. Turkish Journal of Engineering 8 2 196–208.
IEEE V. Sinap, “Comparative analysis of machine learning techniques for credit card fraud detection: Dealing with imbalanced datasets”, TUJE, vol. 8, no. 2, pp. 196–208, 2024, doi: 10.31127/tuje.1386127.
ISNAD Sinap, Vahid. “Comparative Analysis of Machine Learning Techniques for Credit Card Fraud Detection: Dealing With Imbalanced Datasets”. Turkish Journal of Engineering 8/2 (April 2024), 196-208. https://doi.org/10.31127/tuje.1386127.
JAMA Sinap V. Comparative analysis of machine learning techniques for credit card fraud detection: Dealing with imbalanced datasets. TUJE. 2024;8:196–208.
MLA Sinap, Vahid. “Comparative Analysis of Machine Learning Techniques for Credit Card Fraud Detection: Dealing With Imbalanced Datasets”. Turkish Journal of Engineering, vol. 8, no. 2, 2024, pp. 196-08, doi:10.31127/tuje.1386127.
Vancouver Sinap V. Comparative analysis of machine learning techniques for credit card fraud detection: Dealing with imbalanced datasets. TUJE. 2024;8(2):196-208.
Flag Counter