Research Article

The Multicollinearity Effect on the Performance of Machine Learning Algorithms: Case Examples in Healthcare Modelling

Volume: 12 Number: 3 September 25, 2024
EN

The Multicollinearity Effect on the Performance of Machine Learning Algorithms: Case Examples in Healthcare Modelling

Abstract

Background: The data extracted from various fields inherently consists of extremely correlated measurements in parallel with the exponential increase in the size of the data that need to be interpreted owing to the technological advances. This problem, called the multicollinearity, influences the performance of both statistical and machine learning algorithms. Statistical models proposed as a potential remedy to this problem have not been sufficiently evaluated in the literature. Therefore, a comprehensive comparison of statistical and machine learning models is required for addressing the multicollinearity problem. Methods: Statistical models (including Ridge, Liu, Lasso and Elastic Net regression) and the eight most important machine learning algorithms (including Cart, Knn, Mlp, MARS, Cubist, Svm, Bagging and XGBoost) are comprehensively compared by using two different healthcare datasets (including Body Fat and Cancer) having multicollinearity problem. The performance of the models is assessed through cross validation methods via root mean square error, mean absolute error and r-squared criteria. Results: The results of the study revealed that statistical models outperformed machine learning models in terms of root mean square error, mean absolute error and r-squared criteria in both training and testing performance. Particularly the Liu regression often achieved better relative performance (up to 7.60% to 46.08% for Body Fat data set and up to 1.55% to 21.53% for Cancer data set on training performance and up to 1.56% to 38.08% for Body Fat data set and up to 3.50% to 23.29% for Cancer data set on testing performance) among regression methods as well as compared to machine algorithms. Conclusions: Liu regression is mostly disregarded in the machine learning literature, but since it outperforms the most powerful and widely used machine learning algorithms, it appears to be a promising tool in almost all fields, especially for regression-based studies including data with multicollinearity problem.

Keywords

Supporting Institution

All authors declare that the study was not supported by any institution or project.

Ethical Statement

All authors declare that the ethical principles stated by the journal have been complied with in the study.

Thanks

All authors would like to thank in advance the journal staff and the reviewers for their contributions in the possible peer review.

References

  1. Ortiz, R., Contreras, M., & Mellado, C. (2023). Regression, multicollinearity and Markowitz. Finance Research Letters, 58, 104550.
  2. Haavelmo, T. (1944). The probability approach in econometrics. Econometrica: Journal of the Econometric Society, iii-115.
  3. Chan, J. Y. L., Leow, S. M. H., Bea, K. T., Cheng, W. K., Phoong, S. W., Hong, Z. W., & Chen, Y. L. (2022). Mitigating the multicollinearity problem and its machine learning approach: a review. Mathematics, 10(8), 1283.
  4. A. Garg and K. Tai, ‘Comparison of statistical and machine learning methods in modelling of data with multicollinearity’, IJMIC, vol. 18, no. 4, p. 295, 2013, doi: 10.1504/IJMIC.2013.053535.
  5. C. M. Stein, ‘Multiple regression contributions to probability and statistics’, Essays in Honor of Harold Hotelling, vol. 103, 1960.
  6. C. M. Stein, ‘Confidence sets for the mean of a multivariate normal distribution’, Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 24, no. 2, pp. 265–285, 1962.
  7. A. E. Hoerl and R. W. Kennard, ‘Ridge Regression: Applications to Nonorthogonal Problems’, Technometrics, vol. 12, no. 1, pp. 69–82, Feb. 1970, doi: 10.1080/00401706.1970.10488635.
  8. L. Kejian, ‘A new class of blased estimate in linear regression’, Communications in Statistics - Theory and Methods, vol. 22, no. 2, pp. 393–402, Jan. 1993, doi: 10.1080/03610929308831027.

Details

Primary Language

English

Subjects

Supervised Learning, Machine Learning Algorithms, Machine Learning (Other)

Journal Section

Research Article

Early Pub Date

September 25, 2024

Publication Date

September 25, 2024

Submission Date

October 4, 2023

Acceptance Date

August 14, 2024

Published in Issue

Year 2024 Volume: 12 Number: 3

APA
Yıldırım, H. (2024). The Multicollinearity Effect on the Performance of Machine Learning Algorithms: Case Examples in Healthcare Modelling. Academic Platform Journal of Engineering and Smart Systems, 12(3), 68-80. https://doi.org/10.21541/apjess.1371070
AMA
1.Yıldırım H. The Multicollinearity Effect on the Performance of Machine Learning Algorithms: Case Examples in Healthcare Modelling. APJESS. 2024;12(3):68-80. doi:10.21541/apjess.1371070
Chicago
Yıldırım, Hasan. 2024. “The Multicollinearity Effect on the Performance of Machine Learning Algorithms: Case Examples in Healthcare Modelling”. Academic Platform Journal of Engineering and Smart Systems 12 (3): 68-80. https://doi.org/10.21541/apjess.1371070.
EndNote
Yıldırım H (September 1, 2024) The Multicollinearity Effect on the Performance of Machine Learning Algorithms: Case Examples in Healthcare Modelling. Academic Platform Journal of Engineering and Smart Systems 12 3 68–80.
IEEE
[1]H. Yıldırım, “The Multicollinearity Effect on the Performance of Machine Learning Algorithms: Case Examples in Healthcare Modelling”, APJESS, vol. 12, no. 3, pp. 68–80, Sept. 2024, doi: 10.21541/apjess.1371070.
ISNAD
Yıldırım, Hasan. “The Multicollinearity Effect on the Performance of Machine Learning Algorithms: Case Examples in Healthcare Modelling”. Academic Platform Journal of Engineering and Smart Systems 12/3 (September 1, 2024): 68-80. https://doi.org/10.21541/apjess.1371070.
JAMA
1.Yıldırım H. The Multicollinearity Effect on the Performance of Machine Learning Algorithms: Case Examples in Healthcare Modelling. APJESS. 2024;12:68–80.
MLA
Yıldırım, Hasan. “The Multicollinearity Effect on the Performance of Machine Learning Algorithms: Case Examples in Healthcare Modelling”. Academic Platform Journal of Engineering and Smart Systems, vol. 12, no. 3, Sept. 2024, pp. 68-80, doi:10.21541/apjess.1371070.
Vancouver
1.Hasan Yıldırım. The Multicollinearity Effect on the Performance of Machine Learning Algorithms: Case Examples in Healthcare Modelling. APJESS. 2024 Sep. 1;12(3):68-80. doi:10.21541/apjess.1371070

Cited By

Academic Platform Journal of Engineering and Smart Systems