Investigation of Covid-19 Infection with Clinical Data Using Decision Trees

Fırat Orhanbulucu; Fatma Latifoğlu

doi:10.31590/ejosat.1171818

Research Article

Karar Ağaçları Kullanılarak Klinik Verilerle Covid-19 Enfeksiyonunun İncelenmesi

Year 2022, Issue: 40, 29 - 33, 30.09.2022

Fırat Orhanbulucu , Fatma Latifoğlu

https://doi.org/10.31590/ejosat.1171818

Abstract

2020 yılında Dünya Sağlık Örgütü (WHO) tarafından dünya çapında salgın ilan edilen koronavirüs hastalığı yani Covid-19 enfeksiyonu, ilk olarak 2019 yılının son aylarında Çin'in Wuhan kentinde görülmüş ve tüm dünyayı etkisi altına almıştır. Hızla yayılan bu salgının erken teşhisi, hastalıktan korunmak için önemlidir. Bu nedenle görüntü işleme, derin öğrenme, makine öğrenmesi gibi yöntemler salgını erken tespit etmek için önemli hale geldi. Bu çalışmada çeşitli Karar Ağacı yöntemleri ile bazı laboratuvar test sonuçlarına göre Covid-19 testi pozitif ve negatif çıkan bireyler sınıflandırılmaya çalışılmıştır. Veri setinin orijinal formu eşit olmayan bir dağılıma sahip olduğundan, bu tür veri setleri için kullanılan aşırı örnekleme ve eksik örnekleme yöntemleri bir ön işleme çalışması olarak uygulanarak veri seti dengelenmiştir. Dengeli hale getirilen veri seti ve orjinal veri seti 5-Fold Cross Validation (CV) , 10-Fold Cross Validation ve Leave-One-Out (LOO)-CV kullanılarak Random Forest (RF), Random Tree (RT), J48, Alternating decision tree (ADTree) ve Function Trees (FT) sınıflandırıcıları ile incelenmiştir. İnceleme sonucunda en başarılı sonuç orijinal veri setinde CV-5 kullanılarak %87,5, aşırı örnekleme yönteminde CV-10 ve LOO-CV kullanılarak %93,3 ve eksik örnekleme yönteminde CV-5 kullanılarak %79 ile RF sınıflandırıcısı göstermiştir. Başarı oranlarının yanı sıra hasta ve sağlıklı teşhisi için önemli olan duyarlılık-özgüllük metrik değerleri her bir sınıflandırma algoritması ve CV değeri bakımından incelenmiştir.

Keywords

Kovid19, Karar ağacı, Rastgele Orman, Aşırı Örnekleme

References

Zhu, N., Zhang, D., Wang, W., Li, X., Yang, B., Song, J., ... & Tan, W. (2020). A novel coronavirus from patients with pneumonia in China, 2019. New England Journal of Medicine. (DOI: 10.1056/NEJMoa2001017)
Hu, Z., Song, C., Xu, C., Jin, G., Chen, Y., Xu, X., ... & Shen, H. (2020). Clinical characteristics of 24 asymptomatic infections with COVID-19 screened among close contacts in Nanjing, China. Science China Life Sciences, 63(5), 706-711. (https://doi.org/10.1007/s11427-020-1661-4)
Elaziz, M. A., Hosny, K. M., Salah, A., Darwish, M. M., Lu, S., & Sahlol, A. T. (2020). New machine learning method for image-based diagnosis of COVID-19. Plos one, 15(6), e0235187. (https://doi.org/10.1371/journal.pone.0235187)
Yadav, M., Perumal, M., & Srinivas, M. (2020). Analysis on novel coronavirus (COVID-19) using machine learning methods. Chaos, Solitons & Fractals, 139, 110050. (https://doi.org/10.1016/j.chaos.2020.110050)
Apostolopoulos, I. D., & Mpesiana, T. A. (2020). Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks. Physical and Engineering Sciences in Medicine, 1. (https://doi.org/10.1007/s13246-020-00865-4)
de Moraes Batista, A. F., Miraglia, J. L., Donato, T. H. R., & Chiavegatto Filho, A. D. P. (2020). COVID-19 diagnosis prediction in emergency care patients: a machine learning approach. medRxiv. (https://doi.org/10.1101/2020.04.04.20052092)
Yavaş, M., Güran, A., & Uysal, M. Covid-19 Veri Kümesinin SMOTE Tabanlı Örnekleme Yöntemi Uygulanarak Sınıflandırılması. Avrupa Bilim ve Teknoloji Dergisi, 258-264. (https://doi.org/10.31590/ejosat.779952)
Ahmad, A., Garhwal, S., Ray, S. K., Kumar, G., Malebary, S. J., & Barukab, O. M. (2020). The number of confirmed cases of covid-19 by using machine learning: Methods and challenges. Archives of Computational Methods in Engineering, 1-9. (https://doi.org/10.1007/s11831-020-09472-8)
Einstein Data4u, E. Hospital Israelita Albert Einstein, Sao Paulo, Brazil. Diagnosis of Covid-19 and its clinical spectrum, URL: https://www.kaggle.com/einsteindata4u/datasets, (accessed 08/10/2020)
Yıldırım, P. (2016). Pattern classification with imbalanced and multiclass data for the prediction of albendazole adverse event outcomes. Procedia Computer Science, 83, 1013-1018. (https://doi.org/10.1016/j.procs.2016.04.216)
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357. (https://doi.org/10.1613/jair.953)
Hernandez, J., Carrasco-Ochoa, J. A., & Martínez-Trinidad, J. F. (2013, November). An empirical study of oversampling and undersampling for instance selection methods on imbalance datasets. In Iberoamerican Congress on Pattern Recognition (pp. 262-269). Springer, Berlin, Heidelberg. (https://doi.org/10.1007/978-3-642-41822-8_33)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 11(1), 10-18. (https://doi.org/10.1145/1656274.1656278)
Freund, Y., & Mason, L. (1999, June). The alternating decision tree learning algorithm. In icml (Vol. 99, pp. 124-133).
Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32. (https://doi.org/10.1023/A:1010933404324)
Buettner, R., Sauer, S., Maier, C., & Eckhardt, A. (2015, January). Towards ex ante prediction of user performance: a novel NeuroIS methodology based on real-time measurement of mental effort. In 2015 48th Hawaii International Conference on System Sciences (pp. 533-542). IEEE. (DOI: 10.1109/HICSS.2015.70)
A. ONAN, “Comparative Performance Analysis of Decision Tree Algorithms in the Corporate Bankruptcy Prediction”, Bilişim Teknolojileri Dergisi, vol. 8, no. 1, 2015. (https://doi.org/10.17671/btd.36087)
Quinlan, J. R. (1994). The minimum description length principle and categorical theories. In Machine Learning Proceedings 1994 (pp. 233-241). Morgan Kaufmann. (https://doi.org/10.1016/B978-1-55860-335-6.50036-2)
Pradeep, K. R., & Naveen, N. C. (2016, December). Predictive analysis of diabetes using J48 algorithm of classification techniques. In 2016 2nd International Conference on Contemporary Computing and Informatics (IC3I) (pp. 347-352). IEEE. (DOI: 10.1109/IC3I.2016.7917987)
Gama, J. (2004). Functional trees. Machine learning, 55(3), 219-250.
Hossin, M., & Sulaiman, M. N. (2015). A review on evaluation metrics for data classification evaluations. International Journal of Data Mining & Knowledge Management Process, 5(2), 1. (DOI: 10.5121/ijdkp.2015.5201).
Banerjee, A., Ray, S., Vorselaars, B., Kitson, J., Mamalakis, M., Weeks, S., ... & Mackenzie, L. S. (2020). Use of machine learning and artificial intelligence to predict SARS-CoV-2 infection from full blood counts in a population. International immunopharmacology, 86, 106705. (https://doi.org/10.1016/j.intimp.2020.106705).

Investigation of Covid-19 Infection with Clinical Data Using Decision Trees

Year 2022, Issue: 40, 29 - 33, 30.09.2022

Fırat Orhanbulucu , Fatma Latifoğlu

https://doi.org/10.31590/ejosat.1171818

Abstract

The coronavirus disease, namely Covid-19 infection, which was declared a worldwide epidemic by the World Health Organization (WHO) in 2020, was first seen in Wuhan, China in the last months of 2019 and has affected the whole world. Early diagnosis of this rapidly spreading epidemic is important to prevent the disease. For this reason, methods such as image processing, deep learning, and machine learning have become important to detect the epidemic early. In this study, it has been tried to classify individuals who test positive and negative for Covid-19 based on some laboratory test results with several Decision Tree methods. Since the original form of the data set has an uneven distribution, the data set has been balanced by applying the oversampling and undersampling methods used for such data sets as a pre-processing study. Balanced dataset and original dataset using 5-Fold Cross Validation (CV), 10-Fold Cross Validation and Leave-One-Out (LOO)-CV, Random Forest (RF), Random Tree (RT), J48, ıt was analyzed with alternating decision tree (ADTree) and Function Trees (FT) classifiers. As a result of the examination, the most successful result was shown by the RF classifier with 87.5% success rates using CV-5 in the original data set, 93.3% using CV-10 and LOO-CV in the oversampling method, and 79% using CV-5 in the undersampling method. In addition to success rates, sensitivity-specificity metrics, which are important for patient and healthy diagnosis, were examined in terms of each classification algorithm and CV value.

Keywords

Covid-19, Decision Tree, Random Forest, Oversampling

References

Zhu, N., Zhang, D., Wang, W., Li, X., Yang, B., Song, J., ... & Tan, W. (2020). A novel coronavirus from patients with pneumonia in China, 2019. New England Journal of Medicine. (DOI: 10.1056/NEJMoa2001017)
Hu, Z., Song, C., Xu, C., Jin, G., Chen, Y., Xu, X., ... & Shen, H. (2020). Clinical characteristics of 24 asymptomatic infections with COVID-19 screened among close contacts in Nanjing, China. Science China Life Sciences, 63(5), 706-711. (https://doi.org/10.1007/s11427-020-1661-4)
Elaziz, M. A., Hosny, K. M., Salah, A., Darwish, M. M., Lu, S., & Sahlol, A. T. (2020). New machine learning method for image-based diagnosis of COVID-19. Plos one, 15(6), e0235187. (https://doi.org/10.1371/journal.pone.0235187)
Yadav, M., Perumal, M., & Srinivas, M. (2020). Analysis on novel coronavirus (COVID-19) using machine learning methods. Chaos, Solitons & Fractals, 139, 110050. (https://doi.org/10.1016/j.chaos.2020.110050)
Apostolopoulos, I. D., & Mpesiana, T. A. (2020). Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks. Physical and Engineering Sciences in Medicine, 1. (https://doi.org/10.1007/s13246-020-00865-4)
de Moraes Batista, A. F., Miraglia, J. L., Donato, T. H. R., & Chiavegatto Filho, A. D. P. (2020). COVID-19 diagnosis prediction in emergency care patients: a machine learning approach. medRxiv. (https://doi.org/10.1101/2020.04.04.20052092)
Yavaş, M., Güran, A., & Uysal, M. Covid-19 Veri Kümesinin SMOTE Tabanlı Örnekleme Yöntemi Uygulanarak Sınıflandırılması. Avrupa Bilim ve Teknoloji Dergisi, 258-264. (https://doi.org/10.31590/ejosat.779952)
Ahmad, A., Garhwal, S., Ray, S. K., Kumar, G., Malebary, S. J., & Barukab, O. M. (2020). The number of confirmed cases of covid-19 by using machine learning: Methods and challenges. Archives of Computational Methods in Engineering, 1-9. (https://doi.org/10.1007/s11831-020-09472-8)
Einstein Data4u, E. Hospital Israelita Albert Einstein, Sao Paulo, Brazil. Diagnosis of Covid-19 and its clinical spectrum, URL: https://www.kaggle.com/einsteindata4u/datasets, (accessed 08/10/2020)
Yıldırım, P. (2016). Pattern classification with imbalanced and multiclass data for the prediction of albendazole adverse event outcomes. Procedia Computer Science, 83, 1013-1018. (https://doi.org/10.1016/j.procs.2016.04.216)
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357. (https://doi.org/10.1613/jair.953)
Hernandez, J., Carrasco-Ochoa, J. A., & Martínez-Trinidad, J. F. (2013, November). An empirical study of oversampling and undersampling for instance selection methods on imbalance datasets. In Iberoamerican Congress on Pattern Recognition (pp. 262-269). Springer, Berlin, Heidelberg. (https://doi.org/10.1007/978-3-642-41822-8_33)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 11(1), 10-18. (https://doi.org/10.1145/1656274.1656278)
Freund, Y., & Mason, L. (1999, June). The alternating decision tree learning algorithm. In icml (Vol. 99, pp. 124-133).
Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32. (https://doi.org/10.1023/A:1010933404324)
Buettner, R., Sauer, S., Maier, C., & Eckhardt, A. (2015, January). Towards ex ante prediction of user performance: a novel NeuroIS methodology based on real-time measurement of mental effort. In 2015 48th Hawaii International Conference on System Sciences (pp. 533-542). IEEE. (DOI: 10.1109/HICSS.2015.70)
A. ONAN, “Comparative Performance Analysis of Decision Tree Algorithms in the Corporate Bankruptcy Prediction”, Bilişim Teknolojileri Dergisi, vol. 8, no. 1, 2015. (https://doi.org/10.17671/btd.36087)
Quinlan, J. R. (1994). The minimum description length principle and categorical theories. In Machine Learning Proceedings 1994 (pp. 233-241). Morgan Kaufmann. (https://doi.org/10.1016/B978-1-55860-335-6.50036-2)
Pradeep, K. R., & Naveen, N. C. (2016, December). Predictive analysis of diabetes using J48 algorithm of classification techniques. In 2016 2nd International Conference on Contemporary Computing and Informatics (IC3I) (pp. 347-352). IEEE. (DOI: 10.1109/IC3I.2016.7917987)
Gama, J. (2004). Functional trees. Machine learning, 55(3), 219-250.
Hossin, M., & Sulaiman, M. N. (2015). A review on evaluation metrics for data classification evaluations. International Journal of Data Mining & Knowledge Management Process, 5(2), 1. (DOI: 10.5121/ijdkp.2015.5201).
Banerjee, A., Ray, S., Vorselaars, B., Kitson, J., Mamalakis, M., Weeks, S., ... & Mackenzie, L. S. (2020). Use of machine learning and artificial intelligence to predict SARS-CoV-2 infection from full blood counts in a population. International immunopharmacology, 86, 106705. (https://doi.org/10.1016/j.intimp.2020.106705).

There are 22 citations in total.

Details

Primary Language	English
Subjects	Engineering
Journal Section	Articles
Authors	Fırat Orhanbulucu 0000-0003-4558-9667 Fatma Latifoğlu 0000-0003-2018-9616
Early Pub Date	September 26, 2022
Publication Date	September 30, 2022
Published in Issue	Year 2022 Issue: 40

Cite

APA	Orhanbulucu, F., & Latifoğlu, F. (2022). Investigation of Covid-19 Infection with Clinical Data Using Decision Trees. Avrupa Bilim Ve Teknoloji Dergisi(40), 29-33. https://doi.org/10.31590/ejosat.1171818

Download Cover Image

Article Files

Full Text