Araştırma Makalesi
BibTex RIS Kaynak Göster

Optuna Destekli Topluluk Makine Öğrenmesi Yöntemleri ile İflas Tahmini: Aşırı ve Alt Örnekleme Tekniklerinin Karşılaştırılması

Yıl 2025, Cilt: 16 Sayı: 1, 97 - 113
https://doi.org/10.24012/dumf.1597564

Öz

İflas tahmini, finansal risk yönetiminin kritik bir bileşenidir; ancak bu süreç, sınıf dengesizliği, özellik seçimi ve aşırı öğrenme gibi önemli zorluklarla karşılaşmaktadır. Bu çalışmada, veri setlerindeki sınıf dengesizliği sorununu ele almak amacıyla kullanılan veri dengeleme tekniklerinin karşılaştırmalı bir analizi yapılmıştır. Çalışmada kullanılan veri seti, Taiwan Ekonomik Dergisi’nden alınmış olup, şirketlerin finansal kayıtlarını içermektedir. Çalışmada, Stacking Classifier ve XGBoost gibi topluluk ve yükseltme algoritmalarını içeren bir dizi makine öğrenmesi modeli, SMOTE ile Tomek Links kullanılarak dengelenmiş veri setlerinde test edilmiştir. Ayrıca, hesaplama verimliliğini artırmak ve aşırı öğrenme risklerini azaltmak amacıyla Temel Bileşen Analizi kullanılarak boyut indirgeme işlemi gerçekleştirilmiştir. Model performansını en üst düzeye çıkarmak için Optuna yöntemi ile hiperparametre optimizasyonu yapılmıştır. Elde edilen sonuçlar, SMOTE'un, özellikle topluluk tabanlı modellerde, veri setini dengeleyerek sınıflandırma doğruluğunu ve F1 skorlarını önemli ölçüde artırdığını göstermektedir. Buna karşın, Tomek Links yöntemi, bazı durumlarda potansiyel olarak önemli veri noktalarını kaldırdığı için model performansını olumsuz etkilemiştir. Test edilen modeller arasında, Stacking Classifier, SMOTE ile dengelenmiş veri setinde en iyi performansı sergileyerek %99 doğruluk oranına ulaşmıştır. Bu sonuçlar, gelişmiş tahmin araçlarının finansal karar alma süreçlerine entegrasyonunu desteklemektedir. Stacking Classifier’ın SMOTE ile dengelenmiş verilerdeki güçlü performansı, risk yönetim sistemlerini geliştirerek proaktif iflas tespitini mümkün kılmaktadır.

Kaynakça

  • [1] T. J. Zywicki, “An economic analysis of the consumer bankruptcy crisis,” Nw. UL Rev., vol. 99, pp. 1463, 2004.
  • [2] E. I. Altman, “Predicting financial distress of companies: revisiting the Z-score and ZETA® models,” in Handbook of Research Methods and Applications in Empirical Finance, Edward Elgar Publishing, 2013, pp. 428–456.
  • [3] M. K. Brunnermeier and Y. Sannikov, “A macroeconomic model with a financial sector,” American Economic Review, vol. 104, no. 2, pp. 379–421, 2014.
  • [4] A. W. Lo and D. V. Repin, “The psychophysiology of real-time financial risk processing,” Journal of Cognitive Neuroscience, vol. 14, no. 3, pp. 323–339, 2002.
  • [5] J. E. Stiglitz, “Reforming the global economic architecture: lessons from recent crises,” The Journal of Finance, vol. 54, no. 4, pp. 1508–1521, 1999.
  • [6] V. Sinap, “Comparative performance analysis of machine learning algorithms in the retail industry: Black Friday sales forecasting,” Journal of Selçuk University Social Sciences Vocational School, vol. 27, no. 1, pp. 65–90, 2024.
  • [7] J. Furman, J. E. Stiglitz, B. P. Bosworth, and S. Radelet, “Economic crises: evidence and insights from East Asia,” Brookings Papers on Economic Activity, vol. 1998, no. 2, pp. 1–135, 1998.
  • [8] G. Allayannis and E. Ofek, “Exchange rate exposure, hedging, and the use of foreign currency derivatives,” Journal of International Money and Finance, vol. 20, no. 2, pp. 273–296, 2001.
  • [9] A. A. Al-Mana, W. Nawaz, A. Kamal, and M. Koҫ, “Financial and operational efficiencies of national and international oil companies: An empirical investigation,” Resources Policy, vol. 68, Art. no. 101701, 2020.
  • [10] W. S. Randall and M. T. Farris, “Supply chain financing: using cash‐to‐cash variables to strengthen the supply chain,” International Journal of Physical Distribution & Logistics Management, vol. 39, no. 8, pp. 669–689, 2009.
  • [11] V. Sinap, “Comparative study of loan approval prediction using machine learning methods,” Gazi University Journal of Science Part C: Design and Technology, vol. 12, no. 2, pp. 644–663, 2024.
  • [12] X. Ying, “An overview of overfitting and its solutions,” in Journal of Physics: Conference Series, vol. 1168, Art. no. 022022, Feb. 2019.
  • [13] H. Ali, M. M. Salleh, R. Saedudin, K. Hussain, and M. F. Mushtaq, “Imbalance class problems in data mining: A review,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 14, no. 3, pp. 1560–1571, 2019.
  • [14] I. M. El-Hasnony, S. I. Barakat, M. Elhoseny, and R. R. Mostafa, “Improved feature selection model for big data analytics,” IEEE Access, vol. 8, pp. 66989–67004, 2020.
  • [15] M. Singla, K. S. Gill, M. Kumar, R. Rawat, and S. Aluvala, “Incorporating the Catboost classification method in machine learning applications for SMOTE analysis and bankruptcy data equalisation,” in 2024 International Conference on E-Mobility, Power Control and Smart Systems (ICEMPS), Apr. 2024, pp. 1–5.
  • [16] T. K. Chen, H. H. Liao, G. D. Chen, W. H. Kang, and Y. C. Lin, “Bankruptcy prediction using machine learning models with the text-based communicative value of annual reports,” Expert Systems with Applications, vol. 233, Art. no. 120714, 2023.
  • [17] L. Papíková and M. Papík, “Effects of classification, feature selection, and resampling methods on bankruptcy prediction of small and medium‐sized enterprises,” Intelligent Systems in Accounting, Finance and Management, vol. 29, no. 4, pp. 254–281, 2022.
  • [18] S. Shetty, M. Musa, and X. Brédart, “Bankruptcy prediction using machine learning techniques,” Journal of Risk and Financial Management, vol. 15, no. 1, Art. no. 35, 2022.
  • [19] N. Radwan et al., “An intelligent approach for predicting bankruptcy empowered with machine learning technique,” in 2022 International Conference on Cyber Resilience (ICCR), Oct. 2022, pp. 1–5.
  • [20] H. Kim, H. Cho, and D. Ryu, “Corporate bankruptcy prediction using machine learning methodologies with a focus on sequential data,” Computational Economics, vol. 59, no. 3, pp. 1231–1249, 2022.
  • [21] M. S. Keya et al., “Comparison of different machine learning algorithms for detecting bankruptcy,” in 2021 6th International Conference on Inventive Computation Technologies (ICICT), Jan. 2021, pp. 705–712.
  • [22] T. Le, “A comprehensive survey of imbalanced learning methods for bankruptcy prediction,” IET Communications, vol. 16, no. 5, pp. 433–441, 2022.
  • [23] University of California, Irvine, “Taiwanese bankruptcy prediction,” UCI Machine Learning Repository, 2020. [Online]. Available: https://archive.ics.uci.edu/dataset/572/taiwanese+bankruptcy+prediction
  • [24] S. Kappal, “Data normalization using median median absolute deviation MMAD based Z-score for robust predictions vs. min–max normalization,” London Journal of Research in Science: Natural and Formal, vol. 19, no. 4, pp. 39–44, 2019.
  • [25] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002.
  • [26] Q. Leng et al., “OBMI: Oversampling borderline minority instances by a two-stage Tomek link-finding procedure for class imbalance problem,” Complex & Intelligent Systems, vol. 10, pp. 4775–4792, 2024.
  • [27] I. D. Mienye and Y. Sun, “A survey of ensemble learning: Concepts, algorithms, applications, and prospects,” IEEE Access, vol. 10, pp. 99129–99149, 2022.
  • [28] Y. Jung, “Multiple predicting K-fold cross-validation for model selection,” Journal of Nonparametric Statistics, vol. 30, no. 1, pp. 197–215, 2018.
  • [29] W. Jia, M. Sun, J. Lian, and S. Hou, “Feature dimensionality reduction: a review,” Complex & Intelligent Systems, vol. 8, no. 3, pp. 2663–2693, 2022.
  • [30] S. Tangirala, “Evaluating the impact of GINI index and information gain on classification using decision tree classifier algorithm,” International Journal of Advanced Computer Science and Applications, vol. 11, no. 2, pp. 612–619, 2020.
  • [31] S. S. Dhaliwal, A. A. Nahid, and R. Abbas, “Effective intrusion detection system using XGBoost,” Information, vol. 9, no. 7, Art. no. 149, 2018.
  • [32] N. Javaid et al., “Employing a machine learning boosting classifiers based stacking ensemble model for detecting non-technical losses in smart grids,” IEEE Access, vol. 10, pp. 121886–121899, 2022.
  • [33] D. D. Rufo, T. G. Debelee, A. Ibenthal, and W. G. Negera, “Diagnosis of diabetes mellitus using gradient boosting machine (LightGBM),” Diagnostics, vol. 11, no. 9, Art. no. 1714, 2021.
  • [34] S. Uddin et al., “Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction,” Scientific Reports, vol. 12, no. 1, Art. no. 6256, 2022.
  • [35] K. Kirasich, T. Smith, and B. Sadler, “Random forest vs logistic regression: binary classification for heterogeneous datasets,” SMU Data Science Review, vol. 1, no. 3, Art. no. 9, 2018.
  • [36] S. Hanifi, A. Cammarono, and H. Zare-Behtash, “Advanced hyperparameter optimization of deep learning models for wind power prediction,” Renewable Energy, vol. 221, Art. no. 119700, 2024.
  • [37] Z. Zhao and V. Aumeboonsuke, “Imbalanced credit risk prediction in ensemble learning classifiers: A comparative analysis of SMOTE, ADASYN, SMOTETomek, and cluster centroids,” Journal of Arts Management, vol. 7, no. 3, pp. 959–984, 2023.
  • [38] E. F. Swana, W. Doorsamy, and P. Bokoro, “Tomek link and SMOTE approaches for machine fault classification with an imbalanced dataset,” Sensors, vol. 22, no. 9, Art. no. 3246, 2022.
  • [39] R. M. Pereira, Y. M. Costa, and C. N. Silla Jr, “MLTL: A multi-label approach for the Tomek Link undersampling algorithm,” Neurocomputing, vol. 383, pp. 95–105, 2020.
  • [40] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 785–794.
  • [41] V. H. A. Ribeiro and G. Reynoso-Meza, “Ensemble learning by means of a multi-objective optimization design approach for dealing with imbalanced data sets,” Expert Systems with Applications, vol. 147, Art. no. 113232, 2020. [42] E. Aslan and Y. Özüpak, “Comparison of machine learning algorithms for automatic prediction of Alzheimer's disease,” Journal of the Chinese Medical Association, vol. 88, no. 2, pp. 98–107, 2025.

Bankruptcy Prediction with Optuna-Enhanced Ensemble Machine Learning Methods: A Comparison of Oversampling and Undersampling Techniques

Yıl 2025, Cilt: 16 Sayı: 1, 97 - 113
https://doi.org/10.24012/dumf.1597564

Öz

Bankruptcy prediction is an essential task in financial risk management, often hindered by challenges such as class imbalance, feature selection, and overfitting. This study investigates the comparative effectiveness of data balancing techniques, specifically focusing on oversampling with SMOTE (Synthetic Minority Over-sampling Technique) and undersampling with Tomek Links, in addressing class imbalance in bankruptcy datasets. A range of machine learning models, including ensemble and boosting algorithms such as Stacking Classifier and XGBoost, were applied to imbalanced, SMOTE-balanced, and Tomek Links-balanced datasets. Dimensionality reduction was performed using Principal Component Analysis (PCA) to enhance computational efficiency and reduce overfitting risks, while hyperparameter optimization was conducted using the Optuna framework to maximize model performance. The findings demonstrate that SMOTE significantly improved classification accuracy and F1 scores, particularly for ensemble-based models, by generating synthetic samples to balance the dataset. In contrast, Tomek Links often reduced model performance due to the removal of potentially informative data points. Among the models tested, the Stacking Classifier performed best on SMOTE-balanced data, achieving a prediction accuracy of 99%. These results support integrating advanced predictive tools into financial decision-making. The Stacking Classifier’s strong performance on SMOTE-balanced data enhances risk management systems, enabling proactive bankruptcy detection.

Kaynakça

  • [1] T. J. Zywicki, “An economic analysis of the consumer bankruptcy crisis,” Nw. UL Rev., vol. 99, pp. 1463, 2004.
  • [2] E. I. Altman, “Predicting financial distress of companies: revisiting the Z-score and ZETA® models,” in Handbook of Research Methods and Applications in Empirical Finance, Edward Elgar Publishing, 2013, pp. 428–456.
  • [3] M. K. Brunnermeier and Y. Sannikov, “A macroeconomic model with a financial sector,” American Economic Review, vol. 104, no. 2, pp. 379–421, 2014.
  • [4] A. W. Lo and D. V. Repin, “The psychophysiology of real-time financial risk processing,” Journal of Cognitive Neuroscience, vol. 14, no. 3, pp. 323–339, 2002.
  • [5] J. E. Stiglitz, “Reforming the global economic architecture: lessons from recent crises,” The Journal of Finance, vol. 54, no. 4, pp. 1508–1521, 1999.
  • [6] V. Sinap, “Comparative performance analysis of machine learning algorithms in the retail industry: Black Friday sales forecasting,” Journal of Selçuk University Social Sciences Vocational School, vol. 27, no. 1, pp. 65–90, 2024.
  • [7] J. Furman, J. E. Stiglitz, B. P. Bosworth, and S. Radelet, “Economic crises: evidence and insights from East Asia,” Brookings Papers on Economic Activity, vol. 1998, no. 2, pp. 1–135, 1998.
  • [8] G. Allayannis and E. Ofek, “Exchange rate exposure, hedging, and the use of foreign currency derivatives,” Journal of International Money and Finance, vol. 20, no. 2, pp. 273–296, 2001.
  • [9] A. A. Al-Mana, W. Nawaz, A. Kamal, and M. Koҫ, “Financial and operational efficiencies of national and international oil companies: An empirical investigation,” Resources Policy, vol. 68, Art. no. 101701, 2020.
  • [10] W. S. Randall and M. T. Farris, “Supply chain financing: using cash‐to‐cash variables to strengthen the supply chain,” International Journal of Physical Distribution & Logistics Management, vol. 39, no. 8, pp. 669–689, 2009.
  • [11] V. Sinap, “Comparative study of loan approval prediction using machine learning methods,” Gazi University Journal of Science Part C: Design and Technology, vol. 12, no. 2, pp. 644–663, 2024.
  • [12] X. Ying, “An overview of overfitting and its solutions,” in Journal of Physics: Conference Series, vol. 1168, Art. no. 022022, Feb. 2019.
  • [13] H. Ali, M. M. Salleh, R. Saedudin, K. Hussain, and M. F. Mushtaq, “Imbalance class problems in data mining: A review,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 14, no. 3, pp. 1560–1571, 2019.
  • [14] I. M. El-Hasnony, S. I. Barakat, M. Elhoseny, and R. R. Mostafa, “Improved feature selection model for big data analytics,” IEEE Access, vol. 8, pp. 66989–67004, 2020.
  • [15] M. Singla, K. S. Gill, M. Kumar, R. Rawat, and S. Aluvala, “Incorporating the Catboost classification method in machine learning applications for SMOTE analysis and bankruptcy data equalisation,” in 2024 International Conference on E-Mobility, Power Control and Smart Systems (ICEMPS), Apr. 2024, pp. 1–5.
  • [16] T. K. Chen, H. H. Liao, G. D. Chen, W. H. Kang, and Y. C. Lin, “Bankruptcy prediction using machine learning models with the text-based communicative value of annual reports,” Expert Systems with Applications, vol. 233, Art. no. 120714, 2023.
  • [17] L. Papíková and M. Papík, “Effects of classification, feature selection, and resampling methods on bankruptcy prediction of small and medium‐sized enterprises,” Intelligent Systems in Accounting, Finance and Management, vol. 29, no. 4, pp. 254–281, 2022.
  • [18] S. Shetty, M. Musa, and X. Brédart, “Bankruptcy prediction using machine learning techniques,” Journal of Risk and Financial Management, vol. 15, no. 1, Art. no. 35, 2022.
  • [19] N. Radwan et al., “An intelligent approach for predicting bankruptcy empowered with machine learning technique,” in 2022 International Conference on Cyber Resilience (ICCR), Oct. 2022, pp. 1–5.
  • [20] H. Kim, H. Cho, and D. Ryu, “Corporate bankruptcy prediction using machine learning methodologies with a focus on sequential data,” Computational Economics, vol. 59, no. 3, pp. 1231–1249, 2022.
  • [21] M. S. Keya et al., “Comparison of different machine learning algorithms for detecting bankruptcy,” in 2021 6th International Conference on Inventive Computation Technologies (ICICT), Jan. 2021, pp. 705–712.
  • [22] T. Le, “A comprehensive survey of imbalanced learning methods for bankruptcy prediction,” IET Communications, vol. 16, no. 5, pp. 433–441, 2022.
  • [23] University of California, Irvine, “Taiwanese bankruptcy prediction,” UCI Machine Learning Repository, 2020. [Online]. Available: https://archive.ics.uci.edu/dataset/572/taiwanese+bankruptcy+prediction
  • [24] S. Kappal, “Data normalization using median median absolute deviation MMAD based Z-score for robust predictions vs. min–max normalization,” London Journal of Research in Science: Natural and Formal, vol. 19, no. 4, pp. 39–44, 2019.
  • [25] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002.
  • [26] Q. Leng et al., “OBMI: Oversampling borderline minority instances by a two-stage Tomek link-finding procedure for class imbalance problem,” Complex & Intelligent Systems, vol. 10, pp. 4775–4792, 2024.
  • [27] I. D. Mienye and Y. Sun, “A survey of ensemble learning: Concepts, algorithms, applications, and prospects,” IEEE Access, vol. 10, pp. 99129–99149, 2022.
  • [28] Y. Jung, “Multiple predicting K-fold cross-validation for model selection,” Journal of Nonparametric Statistics, vol. 30, no. 1, pp. 197–215, 2018.
  • [29] W. Jia, M. Sun, J. Lian, and S. Hou, “Feature dimensionality reduction: a review,” Complex & Intelligent Systems, vol. 8, no. 3, pp. 2663–2693, 2022.
  • [30] S. Tangirala, “Evaluating the impact of GINI index and information gain on classification using decision tree classifier algorithm,” International Journal of Advanced Computer Science and Applications, vol. 11, no. 2, pp. 612–619, 2020.
  • [31] S. S. Dhaliwal, A. A. Nahid, and R. Abbas, “Effective intrusion detection system using XGBoost,” Information, vol. 9, no. 7, Art. no. 149, 2018.
  • [32] N. Javaid et al., “Employing a machine learning boosting classifiers based stacking ensemble model for detecting non-technical losses in smart grids,” IEEE Access, vol. 10, pp. 121886–121899, 2022.
  • [33] D. D. Rufo, T. G. Debelee, A. Ibenthal, and W. G. Negera, “Diagnosis of diabetes mellitus using gradient boosting machine (LightGBM),” Diagnostics, vol. 11, no. 9, Art. no. 1714, 2021.
  • [34] S. Uddin et al., “Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction,” Scientific Reports, vol. 12, no. 1, Art. no. 6256, 2022.
  • [35] K. Kirasich, T. Smith, and B. Sadler, “Random forest vs logistic regression: binary classification for heterogeneous datasets,” SMU Data Science Review, vol. 1, no. 3, Art. no. 9, 2018.
  • [36] S. Hanifi, A. Cammarono, and H. Zare-Behtash, “Advanced hyperparameter optimization of deep learning models for wind power prediction,” Renewable Energy, vol. 221, Art. no. 119700, 2024.
  • [37] Z. Zhao and V. Aumeboonsuke, “Imbalanced credit risk prediction in ensemble learning classifiers: A comparative analysis of SMOTE, ADASYN, SMOTETomek, and cluster centroids,” Journal of Arts Management, vol. 7, no. 3, pp. 959–984, 2023.
  • [38] E. F. Swana, W. Doorsamy, and P. Bokoro, “Tomek link and SMOTE approaches for machine fault classification with an imbalanced dataset,” Sensors, vol. 22, no. 9, Art. no. 3246, 2022.
  • [39] R. M. Pereira, Y. M. Costa, and C. N. Silla Jr, “MLTL: A multi-label approach for the Tomek Link undersampling algorithm,” Neurocomputing, vol. 383, pp. 95–105, 2020.
  • [40] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 785–794.
  • [41] V. H. A. Ribeiro and G. Reynoso-Meza, “Ensemble learning by means of a multi-objective optimization design approach for dealing with imbalanced data sets,” Expert Systems with Applications, vol. 147, Art. no. 113232, 2020. [42] E. Aslan and Y. Özüpak, “Comparison of machine learning algorithms for automatic prediction of Alzheimer's disease,” Journal of the Chinese Medical Association, vol. 88, no. 2, pp. 98–107, 2025.
Toplam 41 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Makine Öğrenme (Diğer)
Bölüm Makaleler
Yazarlar

Vahid Sinap 0000-0002-8734-9509

Erken Görünüm Tarihi 26 Mart 2025
Yayımlanma Tarihi
Gönderilme Tarihi 6 Aralık 2024
Kabul Tarihi 5 Mart 2025
Yayımlandığı Sayı Yıl 2025 Cilt: 16 Sayı: 1

Kaynak Göster

IEEE V. Sinap, “Bankruptcy Prediction with Optuna-Enhanced Ensemble Machine Learning Methods: A Comparison of Oversampling and Undersampling Techniques”, DÜMF MD, c. 16, sy. 1, ss. 97–113, 2025, doi: 10.24012/dumf.1597564.
DUJE tarafından yayınlanan tüm makaleler, Creative Commons Atıf 4.0 Uluslararası Lisansı ile lisanslanmıştır. Bu, orijinal eser ve kaynağın uygun şekilde belirtilmesi koşuluyla, herkesin eseri kopyalamasına, yeniden dağıtmasına, yeniden düzenlemesine, iletmesine ve uyarlamasına izin verir. 24456