TR
EN
Predicting COVID-19 Infection Using Machine Learning Methods Combined with Feature Selection
Abstract
COVID-19 is an infection that has affected the world since December 31, 2019, and was declared a pandemic by WHO in March 2020. In this study, Multi-Layer Perceptron (MLP), Tree Boost (TB), Radial Basis Function Network (RBF), Support Vector Machine (SVM), and K-Means Clustering (kMC) individually combined with minimum redundancy maximum relevance (mRMR) and Relief-F have been used to construct new feature selection-based COVID-19 prediction models and discern the influential variables for prediction of COVID-19 infection. The dataset has information related to 20.000 patients (i.e., 10.000 positives, 10.000 negatives) and includes several personal, symptomatic, and non-symptomatic variables. The accuracy, recall, and F1-score metrics have been used to assess the models’ performance, whereas the generalization errors of the models were evaluated using 10-fold cross-validation. The results show that the average performance of mRMR is slightly better than Relief-F in predicting the COVID-19 infection of a patient. In addition, mRMR is more successful than the Relief-F algorithm in finding the relative relevance order of the COVID-19 predictors. The mRMR algorithm emphasizes symptomatic variables such as fever and cough, whereas the Relief-F algorithm highlights non-symptomatic variables such as age and race. It has also been observed that, in general, MLP outperforms all other classifiers for predicting the COVID-19 infection.
Keywords
Supporting Institution
Çukurova University Scientific Research Projects Center
Project Number
FYL-2021-14257
References
- Althnian, A., Elwafa, A. A., Aloboud, N., Alrasheed, H., & Kurdi, H. (2020). Prediction of COVID-19 Individual Susceptibility using Demographic Data: A Case Study on Saudi Arabia. In Procedia Computer Science (Vol. 177, pp. 379–386). https://doi.org/10.1016/j.procs.2020.10.051
- Ciotti, M., Ciccozzi, M., Terrinoni, A., Jiang, W.-C., Wang, C.-B., & Bernardini, S. (2020). The COVID-19 pandemic. In Critical Reviews in Clinical Laboratory Sciences (Vol. 57, Issue 6, pp. 365–388). Informa UK Limited. https://doi.org/¬10.1080/10408363.2020.1783198
- COVID Live. (2022, May 15). Worldometers. https://www.-worldometers.info/coronavirus/
- Data on COVID-19 pandemic. (2021, May 24). Open Data from the State of Espirito Santo. https://dados.es.gov.br/-dataset/dados-sobre-pandemia-covid-19/resource/38cc5066-020d-4c5a-b4c0-e9f690deb6d4
- Fayyoumi, E., Idwan, S., & AboShindi, H. (2020). Machine Learning and Statistical Modelling for Prediction of Novel COVID-19 Patients Case Study: Jordan. In International Journal of Advanced Computer Science and Applications (Vol. 11, Issue 5). The Science and Information Organization. https://doi.org/10.14569/ijacsa.2020.0110518
- Hanchuan Peng, Fuhui Long, & Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. In IEEE Transactions on Pattern Analysis and Machine Intelligence (Vol. 27, Issue 8, pp. 1226–1238). Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/-tpami.2005.159 Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector classification.
- Kulis, B., & Jordan, M. I. (2011). Revisiting k-means: New Algorithms via Bayesian Nonparametrics (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1111.0352
- Natekin, A., & Knoll, A. (2013). Gradient boosting machines, a tutorial. In Frontiers in Neurorobotics (Vol. 7). Frontiers Media SA. https://doi.org/10.3389/fnbot.2013.00021
Details
Primary Language
English
Subjects
Engineering
Journal Section
Research Article
Publication Date
July 15, 2022
Submission Date
June 17, 2022
Acceptance Date
June 30, 2022
Published in Issue
Year 2022 Number: 37