Research Article
BibTex RIS Cite

İşletmelerin İflas Tahmininde PCA ve RFE-RF Algoritmasının Karşılaştırılması

Year 2022, Volume: 13 Issue: 3, 1001 - 1008, 17.10.2022

Abstract

Makine öğrenmesi tahmin modelleri, şirketlerin finansal sıkıntıya girmeden tespit edilebilmesi açısından çok önemlidir ve son zamanlarda ampirik finansın en önemli araştırma konularından birisi haline gelmiştir. Bu alanda modeller geliştirilirken veriyi analize hazır hale getirmek için veri ön işleme adımları uygulanmaktadır. Bu adımlardan birisi veri setinde girdi olarak kullanılan finansal oranların boyutunun küçültülmesi olarak tanımlanabilen özellik seçimi yöntemidir. Bu aşama araştırmada kullanılacak özelliklerin en iyi alt kümesini seçme veya başka bir deyişle veriyi temsil edebilecek en önemli özelliklerin seçimi sürecidir. Bu çalışmada Temel Bileşenler Analizi (Principal Component Analysis (PCA)) ve Rastgele Orman- Özyinelemeli Özellik Seçimi (Random Forest - Recursive Feature Elimination (RF-RFE)) olmak üzere iki farklı özellik seçim yöntemi karşılaştırılmıştır. Deneylerde Türkiye'de faaliyet gösteren ticari firmalar kullanılmıştır. Seçilen özelliklerin doğru tahmin başarısı AdaBoost ve Stokastik Gradient Descent modeli ile test edilmiştir. Deneysel sonuçlarımız, PCA ile karşılaştırıldığında, RF-RFE'nin daha etkili bir özellik seçim yöntemi olduğunu göstermektedir.

References

  • Aksoy, B., & Boztosun, D. (2018). Diskriminant ve Lojistik Regresyon Yöntemleri Kullanlarak Finansal Başarısızlık Tahmini: BİST İmalat Sektörü Örneği. Finans Politik & Ekonomik Yorumlar Dergisi, 646, 9–32.
  • Breiman, L. (2001). Random Forests. Mach. Learn. (45), 5–32.
  • Budak, H. (2018). Feature Selection Methods and a New Approach, Süleyman Demirel University Journal of Natural and Applied Sciences, 22, (Private-10) -1-3. DOI: 10.19113/sdufbed.01653.
  • Chen, R., Dewi, C., Huang, S., and Caraka, R.E. (2020). Selecting critical features for data classification based on machine learning methods. Journal of Big Data, volume (7). p,1-7.
  • Forman, G. (2003). An Extensive Empirical Study of Feature Selection Metrics for Text Classification, Journal of Machine Learning Research, 3, 1289–1305.
  • Granitto, P.M.; Furlanello, C.; Biasioli, F.; Gasperi. (2006). F. Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemom. Intell. Lab. Syst. 83, 83–90.
  • Gregorutti B, Michel B, Saint-Pierre P. (2017).Correlation and variable importance in random forests. Stat Comput. 27:659–78. Guyon, J. Weston, S. Barnhill, and V. Vapnik. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1-3):389–422.
  • Hasan, B. M. S. and Abdulazeez, A. M., (2021). A Review of Principal Component Analysis Algorithm for Dimensionality Reduction. Journal of Soft Computing and Data Mining Vol. 2 No. 1 pp. 20
  • Jiang H, Deng Y, Chen HS, Tao L, Sha Q, Chen J, Tsai CJ, Zhang S. (2004). Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes. BMC Bioinformatics. 5:81.
  • Ladha, L., Deepa, T. (2011). Feature Selection Methods And Algorithms, International Journal on Computer Science and Engineering, 3(5), 1787-1797.
  • Lu, Y., Cohen, I.,Zhou, X. Z. and Tian, Q. (2007). Feature selection using principal feature analysis. Proceedings of the 15th ACM international conference on Multimedia p. 301–304 https://doi.org/10.1145/1291233.1291297
  • Mselmi, N., Lahiani, A. & Hamza, T. (2017). Financial distress prediction: The case of French small and medium-sized firms, İnternational Review of Financial Analysis,(50), 67-80.
  • Ozonur, D., Kılıç, D.,Akdur, H.T.K. and Bayrak, H. (2019). Multi Response Optimization in Food Industry Using Principal Component Analysis and Response Surface Methodology. Erzincan University Journal of Science and Technology. 12(2), 734-744. DOI:10.18185/erzifbed.485762
  • Parveen, A., Inbarani, H., and SatishKumar, E. (2012). Performance Analysis of Unsupervised Feature Selection Methods. Computing, Communication and Applications (ICCCA), 2012 International Conference. DOI:10.1109/ICCCA.2012.6179181
  • Sun, J., and Li, H., (2012). Financial distress prediction using support vector machines: Ensemble vs. individual. Applied Soft Computing. 12(8). P.2254-2265. https://doi.org/10.1016/j.asoc.2012.03.028
  • Svetnik V, Liaw A, Tong C, Wang T. (2004). Application of Breiman’s random forest to modeling structure-activity relationships of pharmaceutical molecules. In: Roli F, Kittler J, Windeatt T, editors. Multiple classifier systems. Berlin: Springer.
  • Tatlıdil, H.(1996). Uygulamalı Çok Değişkenli İstatistiksel Analiz, Ankara: Akademi Matbaası, 1996, 138, 146.
  • Voyle, N., Keohane, A., Newhouse, S., Lunnon, K., Johnson, C., Soininen, H., Kloszewska, I., Mecocci, P., Tsolaki, M., Vellas, B., et al. (2016). A pathway based classification method for analyzing gene expression for Alzheimer’s disease diagnosis, Journal of Alzheimer's Disease, 49, 659–669.
  • Yürük, M. F., & Ekşi, H. İ. (2019). Yapay Zekâ Yöntemleri İle İşletmelerin Finansal Başarısızlığının Tahmin Edilmesi: BİST İmalat Sektörü Uygulaması. Mukaddime, 10(1), 393–422.

Comparison of PCA and RFE-RF Algorithm in Bankruptcy Prediction

Year 2022, Volume: 13 Issue: 3, 1001 - 1008, 17.10.2022

Abstract

Machine learning prediction models are very important in detecting companies without going into financial distress and have recently become one of the most important research topics in empirical finance. While developing models in this area, data preprocessing steps are applied to make the data ready for analysis. One of these steps is the feature selection method, which can be defined as reducing the size of the financial ratios used as input in the data set. This stage is the process of choosing the best subset of features to be used in the research, or in other words, the selection of the most important features that can represent the data. In this paper, two different feature selection methods, Principal Component Analysis (PCA) and Random Forest - Recursive Feature Elimination (RF-RFE)) are compared. Commercial companies operating in Turkey were used in the experiments. The correct prediction success of the selected features was tested with AdaBoost and Stochastic Gradient Descent model. Our experimental results show that RF-RFE is a more efficient feature selection method compared to PCA.

References

  • Aksoy, B., & Boztosun, D. (2018). Diskriminant ve Lojistik Regresyon Yöntemleri Kullanlarak Finansal Başarısızlık Tahmini: BİST İmalat Sektörü Örneği. Finans Politik & Ekonomik Yorumlar Dergisi, 646, 9–32.
  • Breiman, L. (2001). Random Forests. Mach. Learn. (45), 5–32.
  • Budak, H. (2018). Feature Selection Methods and a New Approach, Süleyman Demirel University Journal of Natural and Applied Sciences, 22, (Private-10) -1-3. DOI: 10.19113/sdufbed.01653.
  • Chen, R., Dewi, C., Huang, S., and Caraka, R.E. (2020). Selecting critical features for data classification based on machine learning methods. Journal of Big Data, volume (7). p,1-7.
  • Forman, G. (2003). An Extensive Empirical Study of Feature Selection Metrics for Text Classification, Journal of Machine Learning Research, 3, 1289–1305.
  • Granitto, P.M.; Furlanello, C.; Biasioli, F.; Gasperi. (2006). F. Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemom. Intell. Lab. Syst. 83, 83–90.
  • Gregorutti B, Michel B, Saint-Pierre P. (2017).Correlation and variable importance in random forests. Stat Comput. 27:659–78. Guyon, J. Weston, S. Barnhill, and V. Vapnik. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1-3):389–422.
  • Hasan, B. M. S. and Abdulazeez, A. M., (2021). A Review of Principal Component Analysis Algorithm for Dimensionality Reduction. Journal of Soft Computing and Data Mining Vol. 2 No. 1 pp. 20
  • Jiang H, Deng Y, Chen HS, Tao L, Sha Q, Chen J, Tsai CJ, Zhang S. (2004). Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes. BMC Bioinformatics. 5:81.
  • Ladha, L., Deepa, T. (2011). Feature Selection Methods And Algorithms, International Journal on Computer Science and Engineering, 3(5), 1787-1797.
  • Lu, Y., Cohen, I.,Zhou, X. Z. and Tian, Q. (2007). Feature selection using principal feature analysis. Proceedings of the 15th ACM international conference on Multimedia p. 301–304 https://doi.org/10.1145/1291233.1291297
  • Mselmi, N., Lahiani, A. & Hamza, T. (2017). Financial distress prediction: The case of French small and medium-sized firms, İnternational Review of Financial Analysis,(50), 67-80.
  • Ozonur, D., Kılıç, D.,Akdur, H.T.K. and Bayrak, H. (2019). Multi Response Optimization in Food Industry Using Principal Component Analysis and Response Surface Methodology. Erzincan University Journal of Science and Technology. 12(2), 734-744. DOI:10.18185/erzifbed.485762
  • Parveen, A., Inbarani, H., and SatishKumar, E. (2012). Performance Analysis of Unsupervised Feature Selection Methods. Computing, Communication and Applications (ICCCA), 2012 International Conference. DOI:10.1109/ICCCA.2012.6179181
  • Sun, J., and Li, H., (2012). Financial distress prediction using support vector machines: Ensemble vs. individual. Applied Soft Computing. 12(8). P.2254-2265. https://doi.org/10.1016/j.asoc.2012.03.028
  • Svetnik V, Liaw A, Tong C, Wang T. (2004). Application of Breiman’s random forest to modeling structure-activity relationships of pharmaceutical molecules. In: Roli F, Kittler J, Windeatt T, editors. Multiple classifier systems. Berlin: Springer.
  • Tatlıdil, H.(1996). Uygulamalı Çok Değişkenli İstatistiksel Analiz, Ankara: Akademi Matbaası, 1996, 138, 146.
  • Voyle, N., Keohane, A., Newhouse, S., Lunnon, K., Johnson, C., Soininen, H., Kloszewska, I., Mecocci, P., Tsolaki, M., Vellas, B., et al. (2016). A pathway based classification method for analyzing gene expression for Alzheimer’s disease diagnosis, Journal of Alzheimer's Disease, 49, 659–669.
  • Yürük, M. F., & Ekşi, H. İ. (2019). Yapay Zekâ Yöntemleri İle İşletmelerin Finansal Başarısızlığının Tahmin Edilmesi: BİST İmalat Sektörü Uygulaması. Mukaddime, 10(1), 393–422.
There are 19 citations in total.

Details

Primary Language English
Journal Section Articles
Authors

Yusuf Aker 0000-0002-6058-068X

Publication Date October 17, 2022
Submission Date February 24, 2022
Published in Issue Year 2022 Volume: 13 Issue: 3

Cite

APA Aker, Y. (2022). Comparison of PCA and RFE-RF Algorithm in Bankruptcy Prediction. Gümüşhane Üniversitesi Sosyal Bilimler Dergisi, 13(3), 1001-1008. https://doi.org/10.36362/gumus.1078348