Bagging and Boosting Methods for Predicting Mortality of Patients with COVID-19

Hilal Arslan

doi:10.24012/dumf.1095858

Araştırma Makalesi

COVID-19 Hastalarının Mortalitesini Tahmin Etmek için Torbalama ve Arttırma Yöntemleri

Yıl 2022, Cilt: 13 Sayı: 2, 221 - 226, 28.06.2022

Hilal Arslan

https://doi.org/10.24012/dumf.1095858

Öz

COVID-19 salgını iki yıldan fazla bir süredir devam etmekte ve artan sayıda ölüm meydana gelmektedir. COVID-19 hastalarının sonuçlarını tahmin etmek için kolektif öğrenme teknikleri etkin bir şekilde kullanılmaktadır. COVID-19 hastasının ölüm tahmini, yakın ölüm riskini azaltmak ve etkili klinik tedavi stratejisini uygulama açısından son derece önemlidir. Bu çalışmada, COVID-19 hastalarının mortalitesini tahmin etmek için torbalama ve artırma yöntemleri uyguluyoruz. Altı farklı karar ağacı yöntemi, C4.5, Random tree, REPTree, Logistic Model Tree, Decision Strump ve Hoeffding Tree algoritmaları, torbalama ve artırma yöntemlerinde temel öğrenici olarak kullanılmaktadır. Sonuçlar, 1085 hastadan elde edilen bilgileri içeren gerçek dünya veri seti kullanılarak elde edilmiştir. Deneysel sonuçlar, temel öğrenici olarak REPTree kullanılarak torbalamanın %97,24 doğruluk elde ettiğini göstermektedir. Ayrıca sonuçlarımızı en son teknoloji sonuçlarla karşılaştırdığımızda önerilen yöntemin doğruluk açısından daha yüksek bir performansa sahip olduğu ve takdire şayan bir performans sergilediği görülmektedir.

Anahtar Kelimeler

COVID-19, SARS-CoV-2, Kolektif Öğrenme, Torbalama, Yükseltme

Kaynakça

[1] Y. Zoabi, S. Deri-Rozov, and N. Shomron. Machine learning-based prediction of COVID-19 diagnosis based on symptoms. npj Digital Medicine, 4(1), 3. 2021. DOI: 10.1038/s41746-020-00372-6
[2] H. Arslan and H. Arslan. A new covid-19 detection method from human genome sequences using cpg island features and knn classifier. Engineering Science and Technology, an International Journal, 2021. DOI: 10.1016/j.jestch.2020.12.026
[3] H. Arslan, Machine learning methods for covid-19 prediction using human genomic data, MDPI Proceedings, vol.74 no.1. 2021. DOI: 10.3390/proceedings2021074020
[4] W. Shang, J. Dong, Y. Ren, M. Tian, W. Li, J. Hu and Y. Li, The value of clinical parameters in predicting the severity of COVID- 19. Journal of Medical Virology, 92(10), 2188–2192, 2020. DOI: 10.1002/jmv.26031
[5] T. B. Alakus and I. Turkoglu, “Comparison of deep learning approaches to predict covid-19 infection,” Chaos, Solitons Fractals, vol. 140, p. 110120, 2020.
[6] M. Alazab, A. Awajan, A. Mesleh, A. Abraham, V. Jatana, and S. Alhyari4, “Covid-19 prediction and detection using deep learning,” International Journal of Computer Information Systems and Industrial Management Applications, vol. 12, pp. 168–181, 2020.
[7] V. Andriasyan, A. Yakimovich, F. Georgi, A. Petkidis, R. Witte, D. Puntener, and U. F. Greber, “Deep learning of virus infections reveals mechanics of lytic cells,” Oct. 2019. DOI: 10.1101/798074
[8] A. W. Senior et al. (2020). “Improved protein structure prediction using potentials from deep learning,” Nature, vol. 577, no. 7792, pp. 706–710. DOI: 10.1038/s41586-019-1923-7
[9] Y. Zoabi, S. Deri-Rozov, and N. Shomron, “Machine learning-based prediction of COVID-19 diagnosis based on symptoms,” npj Digital Medicine, vol. 4, no. 1, p. 3, Dec. 2021.
[10] L. J. Muhammad, E. A. Algehyne, S. S. Usman, A. Ahmad, C. Chakraborty, and I. A. Mohammed, “Supervised Machine Learning Models for Prediction of COVID-19 Infection using Epidemiology Dataset,” SN Computer Science, vol. 2, no. 1, p. 11, Feb. 2021. [Online] http://link.springer.com/10.1007/s42979-020-00394-7
[11] S. F. Ardabili, A. Mosavi, P. Ghamisi, F. Ferdinand, A. R. arkonyi-Koczy, U. Reuter, T. Rabczuk, and P. M. Atkinson, “COVID-19 Outbreak Prediction with Machine Learning,” Algorithms, vol. 13, no. 10, p. 249, Oct. 2020. [Online]. Available: https://www.mdpi.com/1999-4893/13/10/249
[12] M.-H. Tayarani N., “Applications of artificial intelligence in battling against covid-19: A literature review,” Chaos, Solitons & Fractals, vol. 142, p. 110338, Jan. 2021. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0960077920307335
[13] S. Kushwaha, S. Bahl, A. Bagha, K. Parmar, M. Javaid, A. Haleem, and R. Singh, “Significant applications of machine learning for covid-19 pandemic,” Journal of Industrial Integration and Management, vol. 5, no. 4, Dec. 2020.
[14] F. De Felice and A. Polimeni, “Coronavirus Disease (COVID-19): A Machine Learning Bibliometric Analysis,” In Vivo, vol. 34, no. 3 suppl, pp. 1613–1617, 2020. [Online]. Available: http://iv.iiarjournals.org/lookup/doi/10.21873/invivo.11951
[15] H. Arslan, “COVID-19 prediction based on genome similarity of human SARS-CoV-2 and bat SARS-CoV-like coronavirus,” Computers & Industrial Engineering, vol. 161, p. 107666, Nov. 2021. [Online]. Available: https://doi.org/10.1016/j.cie.2021.107666
[16] H. Arslan and B. Aygün, “Performance analysis of machine learning algorithms in detection of covid-19 from common symptoms,” in 2021 29th Signal Processing and Communications Applications Conference (SIU), 2021, pp. 1–4.
[17] P. Schwab, A. Mehrjou, S. Parbhoo, L. A. Celi, J. Hetzel, M. Hofer, B. Scholkopf, and S. Bauer, “Real-time prediction of COVID-19 related mortality using electronic health records,” Nature Communications, vol. 12, no. 1, Feb. 2021. [Online]. Available: https://doi.org/10.1038/s41467-020-20816-7
[18] A. Deniz, H. E. Kiziloz, E. Sevinc, and T. Dokeroglu, “Predicting the severity of covid-19 patients using a multi-threaded evolutionary feature selection algorithm,” Expert Systems, Feb. 2022. [Online]. Available: https://doi.org/10.1111/exsy.12949
[19] R. V. Mydukuri, S. Kallam, R. Patan, F. Al-Turjman, and M. Ramachandran, “Deming least square regressed feature selection and gaussian neuro-fuzzy multi-layered data classifier for early COVID prediction,” Expert Systems, Mar. 2021. [Online]. Available: https://doi.org/10.1111/exsy.12694
[20] F. Cabitza, A. Campagner, D. Ferrari, C. D. Resta, D. Ceriotti, E. Sabetta, A. Colombini, E. D. Vecchi, G. Banfi, M. Locatelli, and A. Carobene, “Development, evaluation, and validation of machine learning models for covid-19 detection based on routine blood tests,” Clinical Chemistry and Laboratory Medicine (CCLM), vol. 59, no. 2, pp. 421–431, 2021. [Online]. Available: https://doi.org/10.1515/cclm-2020-1294
[21] Y. Unal and M. N. Dudak, “Classification of covid-19 dataset with some machine learning methods,” Journal of Amasya University the Institute of Sciences and Technology, vol. 1, pp. 30 – 37, 2020.
[22] X. Jiang, M. Coffee, A. Bari, J. Wang, X. Jiang, J. Huang, J. Shi, J. Dai, J. Cai, T. Zhang, Z. Wu, G. He, and Y. Huang, “Towards an artificial intelligence framework for data-driven prediction of coronavirus clinical severity,” Computers, Materials & Continua, vol. 62, no. 3, pp. 537–551, 2020. [Online]. Available: https://doi.org/10.32604/cmc.2020.010691
[23] A. F. de Moraes Batista, J. L. Miraglia, T. H. R. Donato, and A. D. P. C. Filho, “COVID-19 diagnosis prediction in emergency care patients: a machine learning approach,” Apr. 2020. [Online]. Available: https://doi.org/10.1101/2020.04.04.20052092
[24] P. Schwab, A. D. Sch¨utte, B. Dietz, and S. Bauer, “Clinical predictive models for COVID-19: Systematic study,” Journal of Medical Internet Research, vol. 22, no. 10, p. e21439, Oct. 2020. [Online]. Available: https://doi.org/10.2196/21439
[25] L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp. 123–140, Aug. 1996. [Online]. Available: https://doi.org/10.1007/bf00058655
[26] B. Efron and R. J. Tibshirani, An Introduction to the Bootstrap. Springer US, 1993. [Online]. Available: https://doi.org/10.1007/978-1-4899-4541-9
[27] R. E. Schapire, “The strength of weak learnability,” Machine Learning, vol. 5, no. 2, pp. 197–227, Jun. 1990. [Online]. Available: https://doi.org/10.1007/bf00116037
[28] R. E. Schapire and Y. Singer, Machine Learning, vol. 37, no. 3, pp. 297–336, 1999. [Online]. Available: https://doi.org/10.1023/a:1007614523901
[29] S. Ruggieri, “Efficient c4.5 [classification algorithm],” IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 2, pp. 438–444, 2002.
[30] J. R. Quinlan, C4.5: Programs for Machine Learning, 1993.
[31] N. Landwehr, M. Hall, and E. Frank, “Logistic model trees,” Machine Learning, vol. 59, no. 1-2, pp. 161–205, May 2005. [Online]. Available: https://doi.org/10.1007/s10994-005-0466-3
[32] S. K. Jha, P. Paramasivam, Z. Pan, and J. Wang, “Decision stump and Stacking C-based hybrid algorithm for healthcare data classification,” in Cloud Computing and Security. Springer International Publishing, 2018, pp. 205–216. [Online]. Available: https://doi.org/10.1007/978-3-030-00018-919
[33] P. Domingos and G. Hulten, “Mining high-speed data streams,” in Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM Press, 2000. [Online]. Available: https://doi.org/10.1145/347090.347107
[34] G. Hulten, L. Spencer, and P. Domingos, “Mining time-changing data streams,” in proc. of the 2001 acm sigkdd intl. conf. on knowledge discovery and data mining, 2001, pp. 97–106.
[35]“Covid-19 dataset,” https://www.kaggle.com/sudalairajkumar/novel-coronavirus-2019-dataset, accessed: 2022-03-01.
[36] C. Iwendi, A. K. Bashir, A. Peshkar, R. Sujatha, J. M. Chatterjee, S. Pasupuleti, R. Mishra, S. Pillai, and O. Jo, “COVID-19 patient health prediction using boosted random forest algorithm,” Frontiers in Public Health, vol. 8, Jul. 2020. [Online]. Available: https://doi.org/10.3389/fpubh.2020.00357
[37] J. Too and S. Mirjalili, “A hyper learning binary dragonfly algorithm for feature selection: A COVID-19 case study,” Knowledge-Based Systems, vol. 212, p. 106553, Jan. 2021. [Online]. Available: https://doi.org/10.1016/j.knosys.2020.106553

Bagging and Boosting Methods for Predicting Mortality of Patients with COVID-19

Yıl 2022, Cilt: 13 Sayı: 2, 221 - 226, 28.06.2022

Hilal Arslan

https://doi.org/10.24012/dumf.1095858

Öz

COVID-19 pandemic has been going on for more than two years and an increasing number of deaths has been occurring. Ensemble learning techniques are effectively employed to predict the outcome of the patients with COVID-19. The mortality prediction of the COVID-19 patient is crucial to reduce the risk of imminent death as well as to apply effective clinical treatment strategy. In this study, we perform bagging and boosting methods to predict mortality of the patients with COVID-19. The six different decision tree methods, C4.5, Random tree, REPTree, Logistic Model Tree, Decision Stump, and Hoeffding Tree are employed for base learners in bagging and boosting. The results are obtained using a real-world dataset including information obtained from 1085 patients. Experimental results present that bagging using REPTree as a base learner achieves an accuracy of 97.24%. Furthermore, when we compare our results with other classification algorithms, the proposed method has a higher performance with respect to the accuracy, and presents an admirable performance.

Anahtar Kelimeler

COVID-19, SARS-CoV-2, Ensemble Learning, Bagging, boosting

Kaynakça

[1] Y. Zoabi, S. Deri-Rozov, and N. Shomron. Machine learning-based prediction of COVID-19 diagnosis based on symptoms. npj Digital Medicine, 4(1), 3. 2021. DOI: 10.1038/s41746-020-00372-6
[2] H. Arslan and H. Arslan. A new covid-19 detection method from human genome sequences using cpg island features and knn classifier. Engineering Science and Technology, an International Journal, 2021. DOI: 10.1016/j.jestch.2020.12.026
[3] H. Arslan, Machine learning methods for covid-19 prediction using human genomic data, MDPI Proceedings, vol.74 no.1. 2021. DOI: 10.3390/proceedings2021074020
[4] W. Shang, J. Dong, Y. Ren, M. Tian, W. Li, J. Hu and Y. Li, The value of clinical parameters in predicting the severity of COVID- 19. Journal of Medical Virology, 92(10), 2188–2192, 2020. DOI: 10.1002/jmv.26031
[5] T. B. Alakus and I. Turkoglu, “Comparison of deep learning approaches to predict covid-19 infection,” Chaos, Solitons Fractals, vol. 140, p. 110120, 2020.
[6] M. Alazab, A. Awajan, A. Mesleh, A. Abraham, V. Jatana, and S. Alhyari4, “Covid-19 prediction and detection using deep learning,” International Journal of Computer Information Systems and Industrial Management Applications, vol. 12, pp. 168–181, 2020.
[7] V. Andriasyan, A. Yakimovich, F. Georgi, A. Petkidis, R. Witte, D. Puntener, and U. F. Greber, “Deep learning of virus infections reveals mechanics of lytic cells,” Oct. 2019. DOI: 10.1101/798074
[8] A. W. Senior et al. (2020). “Improved protein structure prediction using potentials from deep learning,” Nature, vol. 577, no. 7792, pp. 706–710. DOI: 10.1038/s41586-019-1923-7
[9] Y. Zoabi, S. Deri-Rozov, and N. Shomron, “Machine learning-based prediction of COVID-19 diagnosis based on symptoms,” npj Digital Medicine, vol. 4, no. 1, p. 3, Dec. 2021.
[10] L. J. Muhammad, E. A. Algehyne, S. S. Usman, A. Ahmad, C. Chakraborty, and I. A. Mohammed, “Supervised Machine Learning Models for Prediction of COVID-19 Infection using Epidemiology Dataset,” SN Computer Science, vol. 2, no. 1, p. 11, Feb. 2021. [Online] http://link.springer.com/10.1007/s42979-020-00394-7
[11] S. F. Ardabili, A. Mosavi, P. Ghamisi, F. Ferdinand, A. R. arkonyi-Koczy, U. Reuter, T. Rabczuk, and P. M. Atkinson, “COVID-19 Outbreak Prediction with Machine Learning,” Algorithms, vol. 13, no. 10, p. 249, Oct. 2020. [Online]. Available: https://www.mdpi.com/1999-4893/13/10/249
[12] M.-H. Tayarani N., “Applications of artificial intelligence in battling against covid-19: A literature review,” Chaos, Solitons & Fractals, vol. 142, p. 110338, Jan. 2021. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S0960077920307335
[13] S. Kushwaha, S. Bahl, A. Bagha, K. Parmar, M. Javaid, A. Haleem, and R. Singh, “Significant applications of machine learning for covid-19 pandemic,” Journal of Industrial Integration and Management, vol. 5, no. 4, Dec. 2020.
[14] F. De Felice and A. Polimeni, “Coronavirus Disease (COVID-19): A Machine Learning Bibliometric Analysis,” In Vivo, vol. 34, no. 3 suppl, pp. 1613–1617, 2020. [Online]. Available: http://iv.iiarjournals.org/lookup/doi/10.21873/invivo.11951
[15] H. Arslan, “COVID-19 prediction based on genome similarity of human SARS-CoV-2 and bat SARS-CoV-like coronavirus,” Computers & Industrial Engineering, vol. 161, p. 107666, Nov. 2021. [Online]. Available: https://doi.org/10.1016/j.cie.2021.107666
[16] H. Arslan and B. Aygün, “Performance analysis of machine learning algorithms in detection of covid-19 from common symptoms,” in 2021 29th Signal Processing and Communications Applications Conference (SIU), 2021, pp. 1–4.
[17] P. Schwab, A. Mehrjou, S. Parbhoo, L. A. Celi, J. Hetzel, M. Hofer, B. Scholkopf, and S. Bauer, “Real-time prediction of COVID-19 related mortality using electronic health records,” Nature Communications, vol. 12, no. 1, Feb. 2021. [Online]. Available: https://doi.org/10.1038/s41467-020-20816-7
[18] A. Deniz, H. E. Kiziloz, E. Sevinc, and T. Dokeroglu, “Predicting the severity of covid-19 patients using a multi-threaded evolutionary feature selection algorithm,” Expert Systems, Feb. 2022. [Online]. Available: https://doi.org/10.1111/exsy.12949
[19] R. V. Mydukuri, S. Kallam, R. Patan, F. Al-Turjman, and M. Ramachandran, “Deming least square regressed feature selection and gaussian neuro-fuzzy multi-layered data classifier for early COVID prediction,” Expert Systems, Mar. 2021. [Online]. Available: https://doi.org/10.1111/exsy.12694
[20] F. Cabitza, A. Campagner, D. Ferrari, C. D. Resta, D. Ceriotti, E. Sabetta, A. Colombini, E. D. Vecchi, G. Banfi, M. Locatelli, and A. Carobene, “Development, evaluation, and validation of machine learning models for covid-19 detection based on routine blood tests,” Clinical Chemistry and Laboratory Medicine (CCLM), vol. 59, no. 2, pp. 421–431, 2021. [Online]. Available: https://doi.org/10.1515/cclm-2020-1294
[21] Y. Unal and M. N. Dudak, “Classification of covid-19 dataset with some machine learning methods,” Journal of Amasya University the Institute of Sciences and Technology, vol. 1, pp. 30 – 37, 2020.
[22] X. Jiang, M. Coffee, A. Bari, J. Wang, X. Jiang, J. Huang, J. Shi, J. Dai, J. Cai, T. Zhang, Z. Wu, G. He, and Y. Huang, “Towards an artificial intelligence framework for data-driven prediction of coronavirus clinical severity,” Computers, Materials & Continua, vol. 62, no. 3, pp. 537–551, 2020. [Online]. Available: https://doi.org/10.32604/cmc.2020.010691
[23] A. F. de Moraes Batista, J. L. Miraglia, T. H. R. Donato, and A. D. P. C. Filho, “COVID-19 diagnosis prediction in emergency care patients: a machine learning approach,” Apr. 2020. [Online]. Available: https://doi.org/10.1101/2020.04.04.20052092
[24] P. Schwab, A. D. Sch¨utte, B. Dietz, and S. Bauer, “Clinical predictive models for COVID-19: Systematic study,” Journal of Medical Internet Research, vol. 22, no. 10, p. e21439, Oct. 2020. [Online]. Available: https://doi.org/10.2196/21439
[25] L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp. 123–140, Aug. 1996. [Online]. Available: https://doi.org/10.1007/bf00058655
[26] B. Efron and R. J. Tibshirani, An Introduction to the Bootstrap. Springer US, 1993. [Online]. Available: https://doi.org/10.1007/978-1-4899-4541-9
[27] R. E. Schapire, “The strength of weak learnability,” Machine Learning, vol. 5, no. 2, pp. 197–227, Jun. 1990. [Online]. Available: https://doi.org/10.1007/bf00116037
[28] R. E. Schapire and Y. Singer, Machine Learning, vol. 37, no. 3, pp. 297–336, 1999. [Online]. Available: https://doi.org/10.1023/a:1007614523901
[29] S. Ruggieri, “Efficient c4.5 [classification algorithm],” IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 2, pp. 438–444, 2002.
[30] J. R. Quinlan, C4.5: Programs for Machine Learning, 1993.
[31] N. Landwehr, M. Hall, and E. Frank, “Logistic model trees,” Machine Learning, vol. 59, no. 1-2, pp. 161–205, May 2005. [Online]. Available: https://doi.org/10.1007/s10994-005-0466-3
[32] S. K. Jha, P. Paramasivam, Z. Pan, and J. Wang, “Decision stump and Stacking C-based hybrid algorithm for healthcare data classification,” in Cloud Computing and Security. Springer International Publishing, 2018, pp. 205–216. [Online]. Available: https://doi.org/10.1007/978-3-030-00018-919
[33] P. Domingos and G. Hulten, “Mining high-speed data streams,” in Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM Press, 2000. [Online]. Available: https://doi.org/10.1145/347090.347107
[34] G. Hulten, L. Spencer, and P. Domingos, “Mining time-changing data streams,” in proc. of the 2001 acm sigkdd intl. conf. on knowledge discovery and data mining, 2001, pp. 97–106.
[35]“Covid-19 dataset,” https://www.kaggle.com/sudalairajkumar/novel-coronavirus-2019-dataset, accessed: 2022-03-01.
[36] C. Iwendi, A. K. Bashir, A. Peshkar, R. Sujatha, J. M. Chatterjee, S. Pasupuleti, R. Mishra, S. Pillai, and O. Jo, “COVID-19 patient health prediction using boosted random forest algorithm,” Frontiers in Public Health, vol. 8, Jul. 2020. [Online]. Available: https://doi.org/10.3389/fpubh.2020.00357
[37] J. Too and S. Mirjalili, “A hyper learning binary dragonfly algorithm for feature selection: A COVID-19 case study,” Knowledge-Based Systems, vol. 212, p. 106553, Jan. 2021. [Online]. Available: https://doi.org/10.1016/j.knosys.2020.106553

Toplam 37 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Bölüm	Makaleler
Yazarlar	Hilal Arslan 0000-0002-6449-6952
Erken Görünüm Tarihi	28 Haziran 2022
Yayımlanma Tarihi	28 Haziran 2022
Gönderilme Tarihi	30 Mart 2022
Yayımlandığı Sayı	Yıl 2022 Cilt: 13 Sayı: 2

Kaynak Göster

IEEE	H. Arslan, “Bagging and Boosting Methods for Predicting Mortality of Patients with COVID-19”, DÜMF MD, c. 13, sy. 2, ss. 221–226, 2022, doi: 10.24012/dumf.1095858.

Kapak Resmi İndir

Makale Dosyaları

Tam Metin

DUJE tarafından yayınlanan tüm makaleler, Creative Commons Atıf 4.0 Uluslararası Lisansı ile lisanslanmıştır. Bu, orijinal eser ve kaynağın uygun şekilde belirtilmesi koşuluyla, herkesin eseri kopyalamasına, yeniden dağıtmasına, yeniden düzenlemesine, iletmesine ve uyarlamasına izin verir. 24456