Research Article
BibTex RIS Cite

Comprehensive Analysis of Resampling Methods on Ensemble Learning for Credit Card Fraud Detection

Year 2022, Volume: 22 Issue: 5, 1005 - 1015, 27.10.2022
https://doi.org/10.35414/akufemubid.1066453

Abstract

Rapid and easy purchases via credit cards have led to a rise in fraudulent transactions. In recent years, machine learning methods have been an important part of fraud detection processes. One of the common problems encountered in processes of fraud detection is the imbalance in datasets. Resampling methods used for the problem of imbalance may differ from study to study in terms of the stages these methods are applied. This study compares the effects of resampling methods according to these stages, using various ensemble learning methods, including a few machine learning and deep learning methods. The comparison utilizing cross-validation technique shows that applying the resampling methods separately to the training and test datasets method gives the most accurate result. However, in another comparison by metric scores of XGB, LGBM, RF, FNN, and other methods used in this study, XGB and FNN techniques give the highest values with 99% recall, precision, and accuracy.

References

  • Alam, T.M., Shaukat, K., Hameed, I.A., Luo, S., Sarwar, M.U., Shabbir, S., Li, J. and Khushi, M., 2020. An investigation of credit card default prediction in the imbalanced datasets. IEEE Access, 8, 201173-201198.
  • Aung, M.H., Seluka, P.T., Fuata, J.T.R., Tikoisuva, M.J., Cabealawa, M.S. and Nand, R., 2020. Random Forest Classifier for Detecting Credit Card Fraud based on Performance Metrics. In 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE) (1-6).
  • Bej, S., Davtyan, N., Wolfien, M., Nassar, M. and Wolkenhauer, O., 2021. LoRAS: an oversampling approach for imbalanced datasets. Machine Learning, 110(2), 279-301.
  • bin Alias, M.S.A., Ibrahim, N.B. and Zin, Z.B.M., 2021. Improved sampling data Workflow using Smtmk to increase the classification accuracy of imbalanced dataset. European Journal of Molecular & Clinical Medicine, 8(02), 2021.
  • Breiman, L., 2001. Random forests. Machine learning, 45(1), 5-32.
  • Chawla, N.V., Bowyer, K.W., Hall, L.O. and Kegelmeyer, W.P., 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.
  • Chen, T. and Guestrin, C., 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (785-794).
  • Cover, T. and Hart, P., 1967. Nearest neighbor pattern classification. IEEE transactions on information theory, 13(1), 21-27.
  • Çinarer, G., Emiroğlu, B.G. and Yurttakal, A.H., 2021. Predicting 1p/19q chromosomal deletion of brain tumors using machine learning. Emerging Materials Research, 10(2), 238-244.
  • Efron, B., 1982. The jackknife, the bootstrap and other resampling plans. Society for industrial and applied mathematics.
  • Garg, R., Oh, E., Naidech, A., Kording, K. and Prabhakaran, S., 2019. Automating ischemic stroke subtype classification using machine learning and natural language processing. Journal of Stroke and Cerebrovascular Diseases, 28(7), 2045-2051.
  • Gulati, P., 2020. Hybrid resampling technique to tackle the imbalanced classification problem.
  • Itoo, F. and Singh, S., 2021. Comparison and analysis of logistic regression, Naïve Bayes and KNN machine learning algorithms for credit card fraud detection. International Journal of Information Technology, 13(4), 1503-1511.
  • Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q. and Liu, T.Y., 2017. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30.
  • McCulloch, W.S. and Pitts, W., 1943. A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5(4), 115-133.
  • Mînăstireanu, E.A. and Meșniță, G., 2020. Methods of handling unbalanced datasets in credit card fraud detection. BRAIN. Broad Research in Artificial Intelligence and Neuroscience, 11(1), 131-143.
  • Mochida, K., Koda, S., Inoue, K., Hirayama, T., Tanaka, S., Nishii, R. and Melgani, F., 2019. Computer vision-based phenotyping for improvement of plant productivity: a machine learning perspective. GigaScience, 8(1), giy153.
  • Mrozek, P., Panneerselvam, J. and Bagdasar, O., 2020, December. Efficient resampling for fraud detection during anonymised credit card transactions with unbalanced datasets. In 2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC) (426-433). IEEE.
  • Nguyen, T.T., Tahir, H., Abdelrazek, M. and Babar, A., 2020. Deep learning methods for credit card fraud detection. arXiv preprint arXiv:2012.03754.
  • Riffi, J., Mahraz, M.A., El Yahyaouy, A. and Tairi, H., 2020. Credit card fraud detection based on multilayer perceptron and extreme learning machine architectures. In 2020 International Conference on Intelligent Systems and Computer Vision (ISCV) (1-5). IEEE.
  • Rtayli, N. and Enneya, N., 2020. Enhanced credit card fraud detection based on SVM-recursive feature elimination and hyper-parameters optimization. Journal of Information Security and Applications, 55, 102596.
  • Shah, H.B., 2020. Comparing Machine Learning Algorithms For Credit Card Fraud Detection.
  • Shamsudin, H., Yusof, U.K., Jayalakshmi, A. and Khalid, M.N.A., 2020. Combining oversampling and undersampling techniques for imbalanced classification: A comparative study using credit card fraudulent transaction dataset. In 2020 IEEE 16th International Conference on Control & Automation (ICCA) (803-808). IEEE.
  • Shivanna, A., Ray, S., Alshouiliy, K. and Agrawal, D.P., 2020. Detection of Fraudulence in Credit Card Transactions using Machine Learning on Azure ML. In 2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON) (0268-0273). IEEE.
  • Tingfei, H., Guangquan, C. and Kuihua, H., 2020. Using variational auto encoding in credit card fraud detection. IEEE Access, 8, 149841-149853.
  • Tran, T.C. and Dang, T.K., 2021. Machine learning for prediction of imbalanced data: Credit fraud detection. In 2021 15th International Conference on Ubiquitous Information Management and Communication (IMCOM) (1-7). IEEE.
  • Vapnik, V.N., The nature of static learning theory M. NewYork: Springer—Verlag, 1, 995.
  • Wang, J., de Moraes, R.M. and Bari, A., 2020. A predictive analytics framework to anomaly detection. In 2020 IEEE Sixth International Conference on Big Data Computing Service and Applications (BigDataService) (104-108). IEEE.
  • Wibowo, P. and Fatichah, C., 2021. An in-depth performance analysis of the oversampling techniques for high-class imbalanced dataset. Register: Jurnal Ilmiah Teknologi Sistem Informasi, 7(1), 63-71.
  • Wright, R.E., 1995. Logistic regression. In L. G. Grimm & P. R. Yarnold (Eds.), Reading and understanding multivariate statistics, 217–244. American Psychological Association.
  • Zhang, D., Bhandari, B. and Black, D., 2020. Credit Card Fraud Detection Using Weighted Support Vector Machine. Applied Mathematics, 11(12), 1275.
  • İnternet kaynakları 1- https://nilsonreport.com/content_promo.php?id_promo=16, (28.01.2022)
  • 2- https://www.ftc.gov/system/files/documents/reports/consumer-sentinel-network-data-book-2019/consumer_sentinel_network_data_book_2019.pdf, (28.01.2022)
  • 3- https://www.kaggle.com/mlg-ulb/creditcardfraud, (28.01.2022)

Yeniden Örnekleme Metotlarının Kredi Kartı Sahtecilik Tespiti için Topluluk Öğrenmesine Kapsamlı Analizi

Year 2022, Volume: 22 Issue: 5, 1005 - 1015, 27.10.2022
https://doi.org/10.35414/akufemubid.1066453

Abstract

Kredi kartı aracılığıyla hızlı ve kolay satın alma işlemleri sahtecilik işlemlerinin artmasına neden olmuştur. Son yıllarda makine öğrenmesi yöntemlerinin kullanımı sahtecilik tespiti işlemlerinde önemli bir pay oluşturmuştur. Sahtecilik tespiti işlemlerinde karşılaşılan yaygın problemlerden birisi veri kümelerinin dengesiz olmasıdır. Dengesizlik problemi için kullanılan yeniden örnekleme metotları kullanıldıkları aşamalar bakımından çalışmadan çalışmaya farklılık gösterebilmektedir. Bu çalışma başlıca topluluk öğrenmesi yöntemleri olmak üzere çeşitli makine öğrenmesi yöntemlerini kullanarak yeniden örnekleme metotlarının kullanıldıkları aşamalara göre yarattığı etkileri karşılaştırmaktadır. Karşılaştırma sonucunda, çapraz doğrulama metodu aracılığıyla yeniden örnekleme metotlarının eğitim ve test veri kümelerine ayrı ayrı yapılmasının en doğru sonucu verdiği gösterilmiştir. Bununla birlikte bu çalışmada kullanılan XGB, LGBM, RF, FNN ve diğer metotların metrik değerlerine dayanan bir başka kıyaslamada ise XGB ve FNN metotları %99 duyarlılık, kesinlik ve doğruluk ile en yüksek değerlere ulaşmışlardır.

References

  • Alam, T.M., Shaukat, K., Hameed, I.A., Luo, S., Sarwar, M.U., Shabbir, S., Li, J. and Khushi, M., 2020. An investigation of credit card default prediction in the imbalanced datasets. IEEE Access, 8, 201173-201198.
  • Aung, M.H., Seluka, P.T., Fuata, J.T.R., Tikoisuva, M.J., Cabealawa, M.S. and Nand, R., 2020. Random Forest Classifier for Detecting Credit Card Fraud based on Performance Metrics. In 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE) (1-6).
  • Bej, S., Davtyan, N., Wolfien, M., Nassar, M. and Wolkenhauer, O., 2021. LoRAS: an oversampling approach for imbalanced datasets. Machine Learning, 110(2), 279-301.
  • bin Alias, M.S.A., Ibrahim, N.B. and Zin, Z.B.M., 2021. Improved sampling data Workflow using Smtmk to increase the classification accuracy of imbalanced dataset. European Journal of Molecular & Clinical Medicine, 8(02), 2021.
  • Breiman, L., 2001. Random forests. Machine learning, 45(1), 5-32.
  • Chawla, N.V., Bowyer, K.W., Hall, L.O. and Kegelmeyer, W.P., 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.
  • Chen, T. and Guestrin, C., 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (785-794).
  • Cover, T. and Hart, P., 1967. Nearest neighbor pattern classification. IEEE transactions on information theory, 13(1), 21-27.
  • Çinarer, G., Emiroğlu, B.G. and Yurttakal, A.H., 2021. Predicting 1p/19q chromosomal deletion of brain tumors using machine learning. Emerging Materials Research, 10(2), 238-244.
  • Efron, B., 1982. The jackknife, the bootstrap and other resampling plans. Society for industrial and applied mathematics.
  • Garg, R., Oh, E., Naidech, A., Kording, K. and Prabhakaran, S., 2019. Automating ischemic stroke subtype classification using machine learning and natural language processing. Journal of Stroke and Cerebrovascular Diseases, 28(7), 2045-2051.
  • Gulati, P., 2020. Hybrid resampling technique to tackle the imbalanced classification problem.
  • Itoo, F. and Singh, S., 2021. Comparison and analysis of logistic regression, Naïve Bayes and KNN machine learning algorithms for credit card fraud detection. International Journal of Information Technology, 13(4), 1503-1511.
  • Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q. and Liu, T.Y., 2017. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30.
  • McCulloch, W.S. and Pitts, W., 1943. A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5(4), 115-133.
  • Mînăstireanu, E.A. and Meșniță, G., 2020. Methods of handling unbalanced datasets in credit card fraud detection. BRAIN. Broad Research in Artificial Intelligence and Neuroscience, 11(1), 131-143.
  • Mochida, K., Koda, S., Inoue, K., Hirayama, T., Tanaka, S., Nishii, R. and Melgani, F., 2019. Computer vision-based phenotyping for improvement of plant productivity: a machine learning perspective. GigaScience, 8(1), giy153.
  • Mrozek, P., Panneerselvam, J. and Bagdasar, O., 2020, December. Efficient resampling for fraud detection during anonymised credit card transactions with unbalanced datasets. In 2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC) (426-433). IEEE.
  • Nguyen, T.T., Tahir, H., Abdelrazek, M. and Babar, A., 2020. Deep learning methods for credit card fraud detection. arXiv preprint arXiv:2012.03754.
  • Riffi, J., Mahraz, M.A., El Yahyaouy, A. and Tairi, H., 2020. Credit card fraud detection based on multilayer perceptron and extreme learning machine architectures. In 2020 International Conference on Intelligent Systems and Computer Vision (ISCV) (1-5). IEEE.
  • Rtayli, N. and Enneya, N., 2020. Enhanced credit card fraud detection based on SVM-recursive feature elimination and hyper-parameters optimization. Journal of Information Security and Applications, 55, 102596.
  • Shah, H.B., 2020. Comparing Machine Learning Algorithms For Credit Card Fraud Detection.
  • Shamsudin, H., Yusof, U.K., Jayalakshmi, A. and Khalid, M.N.A., 2020. Combining oversampling and undersampling techniques for imbalanced classification: A comparative study using credit card fraudulent transaction dataset. In 2020 IEEE 16th International Conference on Control & Automation (ICCA) (803-808). IEEE.
  • Shivanna, A., Ray, S., Alshouiliy, K. and Agrawal, D.P., 2020. Detection of Fraudulence in Credit Card Transactions using Machine Learning on Azure ML. In 2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON) (0268-0273). IEEE.
  • Tingfei, H., Guangquan, C. and Kuihua, H., 2020. Using variational auto encoding in credit card fraud detection. IEEE Access, 8, 149841-149853.
  • Tran, T.C. and Dang, T.K., 2021. Machine learning for prediction of imbalanced data: Credit fraud detection. In 2021 15th International Conference on Ubiquitous Information Management and Communication (IMCOM) (1-7). IEEE.
  • Vapnik, V.N., The nature of static learning theory M. NewYork: Springer—Verlag, 1, 995.
  • Wang, J., de Moraes, R.M. and Bari, A., 2020. A predictive analytics framework to anomaly detection. In 2020 IEEE Sixth International Conference on Big Data Computing Service and Applications (BigDataService) (104-108). IEEE.
  • Wibowo, P. and Fatichah, C., 2021. An in-depth performance analysis of the oversampling techniques for high-class imbalanced dataset. Register: Jurnal Ilmiah Teknologi Sistem Informasi, 7(1), 63-71.
  • Wright, R.E., 1995. Logistic regression. In L. G. Grimm & P. R. Yarnold (Eds.), Reading and understanding multivariate statistics, 217–244. American Psychological Association.
  • Zhang, D., Bhandari, B. and Black, D., 2020. Credit Card Fraud Detection Using Weighted Support Vector Machine. Applied Mathematics, 11(12), 1275.
  • İnternet kaynakları 1- https://nilsonreport.com/content_promo.php?id_promo=16, (28.01.2022)
  • 2- https://www.ftc.gov/system/files/documents/reports/consumer-sentinel-network-data-book-2019/consumer_sentinel_network_data_book_2019.pdf, (28.01.2022)
  • 3- https://www.kaggle.com/mlg-ulb/creditcardfraud, (28.01.2022)
There are 34 citations in total.

Details

Primary Language Turkish
Subjects Artificial Intelligence
Journal Section Articles
Authors

Ali Kemal Ay 0000-0002-4061-4395

Esra Yolaçan 0000-0002-0008-1037

Publication Date October 27, 2022
Submission Date February 1, 2022
Published in Issue Year 2022 Volume: 22 Issue: 5

Cite

APA Ay, A. K., & Yolaçan, E. (2022). Yeniden Örnekleme Metotlarının Kredi Kartı Sahtecilik Tespiti için Topluluk Öğrenmesine Kapsamlı Analizi. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, 22(5), 1005-1015. https://doi.org/10.35414/akufemubid.1066453
AMA Ay AK, Yolaçan E. Yeniden Örnekleme Metotlarının Kredi Kartı Sahtecilik Tespiti için Topluluk Öğrenmesine Kapsamlı Analizi. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi. October 2022;22(5):1005-1015. doi:10.35414/akufemubid.1066453
Chicago Ay, Ali Kemal, and Esra Yolaçan. “Yeniden Örnekleme Metotlarının Kredi Kartı Sahtecilik Tespiti için Topluluk Öğrenmesine Kapsamlı Analizi”. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi 22, no. 5 (October 2022): 1005-15. https://doi.org/10.35414/akufemubid.1066453.
EndNote Ay AK, Yolaçan E (October 1, 2022) Yeniden Örnekleme Metotlarının Kredi Kartı Sahtecilik Tespiti için Topluluk Öğrenmesine Kapsamlı Analizi. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi 22 5 1005–1015.
IEEE A. K. Ay and E. Yolaçan, “Yeniden Örnekleme Metotlarının Kredi Kartı Sahtecilik Tespiti için Topluluk Öğrenmesine Kapsamlı Analizi”, Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, vol. 22, no. 5, pp. 1005–1015, 2022, doi: 10.35414/akufemubid.1066453.
ISNAD Ay, Ali Kemal - Yolaçan, Esra. “Yeniden Örnekleme Metotlarının Kredi Kartı Sahtecilik Tespiti için Topluluk Öğrenmesine Kapsamlı Analizi”. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi 22/5 (October 2022), 1005-1015. https://doi.org/10.35414/akufemubid.1066453.
JAMA Ay AK, Yolaçan E. Yeniden Örnekleme Metotlarının Kredi Kartı Sahtecilik Tespiti için Topluluk Öğrenmesine Kapsamlı Analizi. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi. 2022;22:1005–1015.
MLA Ay, Ali Kemal and Esra Yolaçan. “Yeniden Örnekleme Metotlarının Kredi Kartı Sahtecilik Tespiti için Topluluk Öğrenmesine Kapsamlı Analizi”. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi, vol. 22, no. 5, 2022, pp. 1005-1, doi:10.35414/akufemubid.1066453.
Vancouver Ay AK, Yolaçan E. Yeniden Örnekleme Metotlarının Kredi Kartı Sahtecilik Tespiti için Topluluk Öğrenmesine Kapsamlı Analizi. Afyon Kocatepe Üniversitesi Fen Ve Mühendislik Bilimleri Dergisi. 2022;22(5):1005-1.