Research Article

Investigating the Effect of Class Balancing Methods on the Performance of Machine Learning Techniques: Credit Risk Application

Volume: 5 Number: 1 July 4, 2024
TR EN

Investigating the Effect of Class Balancing Methods on the Performance of Machine Learning Techniques: Credit Risk Application

Abstract

Credit risk arises as a result of the failure of the loans given by banks to the customers to fulfill their obligations at the end of the specified term. Technological advances allow the use of machine learning methods in various sectors. These methods aim to facilitate the identification of customers at risk with the system adapted to the creditworthiness processes of banks. For this purpose, in order to make the most appropriate evaluation in the lending process of banks, re-sampling techniques to eliminate the problem of class imbalance encountered in unbalanced data sets were made balanced and their effects on machine learning were investigated. During the implementation phase, German, Australian and HMEQ credit data sets were used. Different machine learning classification methods such as Logistic Regression (LR), K-Narest Neighbor (KNN), Naive Bayes (NB), Support Vector Machines (SVM), Multilayer Perceptron (MLP), Decision Trees (DT), Random Forests (RF), Gradient Boosting Decision Trees (GBDT), Extremely Randomized Trees, Hard and Soft Voting were used to detect risky customers. The problem of class imbalance was balanced with resampling and hybrid techniques such as Random Oversampling (ROS), Random Undersampling (RUS), Balanced Bagging Classifier (BBC), SMOTE-Tomek Links and SMOTE-ENN. In this context, the performances of three different data sets were examined in four different scenarios. As a result of the study, the hybrid method, in which oversampling and undersampling methods are used together for the class balancing problem, showed the best classification performance among machine learning techniques.

Keywords

References

  1. Akman, M., Genç, Y. ve Ankarali, H. (2011). Random Forests Yöntemi ve Saglik Alaninda Bir Uygulama/Random Forests Methods and an Application in Health Science. Türkiye Klinikleri Biyoistatistik. 3(1): 36.
  2. Alam, T. M., Shaukat, K., Hameed, I. A., Luo, S., Sarwar, M. U., Shabbir, S. ve Khushi, M. (2020). An investigation of credit card default prediction in the imbalanced datasets. IEEE Access. 8: 201173-201198.
  3. Barros, T. M., Souza Neto, P. A., Silva, I. ve Guedes, L. A. (2019). Predictive models for imbalanced data: A school dropout perspective. Education Sciences. 9(4): 275.
  4. Batista, G. E., Bazzan, A. L. ve Monard, M. C. (2003, December). Balancing Training Data for Automated Annotation of Keywords: a Case Study. In WOB (ss. 10-18).
  5. Bradley, A. P., Duin, R. P. W., Paclik, P. ve Landgrebe, T. C. W. (2006). Precision-Recall Operating Characteristic (P-ROC) Curves in Imprecise Environments. In 18th International Conference on Pattern Recognition (ICPR'06) (pp.123-127). Cambridge , United Kingdom.
  6. Breiman, L. (2001). Random forests. Machine learning. 45(1): 5-32.
  7. Boughorbel, S., Jarray, F. ve El-Anbari, M. (2017). Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PloS One. 12(6): 0177678.
  8. Chicco, D. ve Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC genomics. 21(1): 1-13.

Details

Primary Language

English

Subjects

Operation

Journal Section

Research Article

Early Pub Date

June 27, 2024

Publication Date

July 4, 2024

Submission Date

February 13, 2024

Acceptance Date

June 27, 2024

Published in Issue

Year 2024 Volume: 5 Number: 1

APA
Milli, M. E. F., Deveci Kocakoç, İ., & Aras, S. (2024). Investigating the Effect of Class Balancing Methods on the Performance of Machine Learning Techniques: Credit Risk Application. İzmir Yönetim Dergisi, 5(1), 55-70. https://doi.org/10.56203/iyd.1436742
AMA
1.Milli MEF, Deveci Kocakoç İ, Aras S. Investigating the Effect of Class Balancing Methods on the Performance of Machine Learning Techniques: Credit Risk Application. İzmir Journal of Management. 2024;5(1):55-70. doi:10.56203/iyd.1436742
Chicago
Milli, Migraç Enes Furkan, İpek Deveci Kocakoç, and Serkan Aras. 2024. “Investigating the Effect of Class Balancing Methods on the Performance of Machine Learning Techniques: Credit Risk Application”. İzmir Yönetim Dergisi 5 (1): 55-70. https://doi.org/10.56203/iyd.1436742.
EndNote
Milli MEF, Deveci Kocakoç İ, Aras S (July 1, 2024) Investigating the Effect of Class Balancing Methods on the Performance of Machine Learning Techniques: Credit Risk Application. İzmir Yönetim Dergisi 5 1 55–70.
IEEE
[1]M. E. F. Milli, İ. Deveci Kocakoç, and S. Aras, “Investigating the Effect of Class Balancing Methods on the Performance of Machine Learning Techniques: Credit Risk Application”, İzmir Journal of Management, vol. 5, no. 1, pp. 55–70, July 2024, doi: 10.56203/iyd.1436742.
ISNAD
Milli, Migraç Enes Furkan - Deveci Kocakoç, İpek - Aras, Serkan. “Investigating the Effect of Class Balancing Methods on the Performance of Machine Learning Techniques: Credit Risk Application”. İzmir Yönetim Dergisi 5/1 (July 1, 2024): 55-70. https://doi.org/10.56203/iyd.1436742.
JAMA
1.Milli MEF, Deveci Kocakoç İ, Aras S. Investigating the Effect of Class Balancing Methods on the Performance of Machine Learning Techniques: Credit Risk Application. İzmir Journal of Management. 2024;5:55–70.
MLA
Milli, Migraç Enes Furkan, et al. “Investigating the Effect of Class Balancing Methods on the Performance of Machine Learning Techniques: Credit Risk Application”. İzmir Yönetim Dergisi, vol. 5, no. 1, July 2024, pp. 55-70, doi:10.56203/iyd.1436742.
Vancouver
1.Migraç Enes Furkan Milli, İpek Deveci Kocakoç, Serkan Aras. Investigating the Effect of Class Balancing Methods on the Performance of Machine Learning Techniques: Credit Risk Application. İzmir Journal of Management. 2024 Jul. 1;5(1):55-70. doi:10.56203/iyd.1436742

Cited By

Before uploading your article to the system, make sure to use the templates and spelling rules. The referee process will not be started for the works that do not comply with the spelling rules.