Research Article
BibTex RIS Cite

A NOVEL COVID-19 CLASSIFICATION METHOD BASED ON CURE CLUSTERING

Year 2024, , 25 - 35, 30.06.2024
https://doi.org/10.70030/sjmakeu.1460760

Abstract

COVID-19 is a serious disease that spreads rapidly and affects the world. Alternative methods based on machine learning are recommended to diagnose COVID-19 positive and negative cases cheaper and faster. However, as the data size increases, problems such as space requirement or classification time may arise. KNN (K-nearest neighbor), a simple but effective machine learning method, is widely used in various fields. However, the effectiveness of the KNN algorithm decreases considerably when the sample size is large and the number of features is too large. To solve these problems, it is important to use datasets more effectively and to select meaningful parts of the data. The current study proposes an improved neighborhood-based classification method called CURE-NN and compares its performance with standard NN and KNN algorithms. The proposed CURE-NN method obtains reduced structural information from the data by applying clustering before classification to use the dataset more effectively. The resulting reduced structural information was used as a training set in the classification process. The proposed method was applied to the COVID-19 dataset. With this method, while the classification success is preserved as much as possible compared to the NN and KNN methods, the data used in the test phase is reduced by up to 96%. Experimental results show that the reduced data obtained based on structural information can be used instead of the entire data set. In addition, the method works by using only one neighbor, thus eliminating the need for the K parameter compared to the KNN algorithm.

References

  • Wang, C., Horby, P.W., Hayden, F. G., & Gao, G.F. (2020). A novel coronavirus outbreak of global health concern. The lancet 395(10223), 470-473.
  • World Health Organization. Naming the coronavirus disease (COVID-19) and the virus that causes it from https://www.who.int/emergencies/diseases/novel-coronavirus-2019/technical-guidance/naming-the-coronavirus-disease-(covid-2019)-and-the-virus-that-causes-it, accessed on 2023-08-18.
  • Khakharia, A., Shah, V., Jain, S., Shah, J., Tiwari, A., ... & Mehendale, N. (2021). Outbreak prediction of COVID-19 for dense and populated countries using machine learning. Annals of Data Science 8(1), 1-19.
  • WHO Director-General's opening remarks at the media briefing on COVID-19 - 11 March 2020, from https://www.who.int/director-general/speeches/detail/who-director-general-s-opening-remarks-at-the-media-briefing-on-covid-19---11-march-2020, accessed on 2023-08-18.
  • WHO Coronavirus (COVID-19) Dashboard, from https://covid19.who.int/, accessed on 2023-08-09.
  • Chu, D.K., Akl, E.A., Duda, S., Solo, K., Yaacoub, S., Schünemann, H.J., ... & Reinap, M. (2020). Physical distancing, face masks, and eye protection to prevent person-to-person transmission of SARS-CoV-2 and COVID-19: a systematic review and meta-analysis. The lancet 395(10242), 1973-1987.
  • Mei, X., Lee, H.C., Diao, K.Y., Huang, M., Lin, B., Liu, C., ... & Yang, Y. (2020). Artificial intelligence–enabled rapid diagnosis of patients with COVID-19. Nature medicine 26(8), 1224-1228.
  • Madaan, V., Roy, A., Gupta, C., Agrawal, P., Sharma, A., Bologa, C., & Prodan, R. (2021). XCOVNet: Chest X-ray Image Classification for COVID-19 Early Detection Using Convolutional Neural Networks. New Generation Computing 1-15.
  • Brinati, D., Campagner, A., Ferrari, D., Locatelli, M., Banfi, G., Cabitza, F. (2020). Detection of COVID-19 infection from routine blood exams with machine learning: a feasibility study. Journal of medical systems 44(8), 1-12.
  • Li, L., Qin, L., Xu, Z., Yin, Y., Wang, X., Kong, B., ... & Xia, J. (2020). Using artificial intelligence to detect COVID-19 and community-acquired pneumonia based on pulmonary CT: evaluation of the diagnostic accuracy. Radiology 296(2), E65-E71.
  • Wang, S., Kang, B., Ma, J., Zeng, X., Xiao, M., Guo, J., ... & Xu, B. (2021). A deep learning algorithm using CT images to screen for Corona Virus Disease (COVID-19). European radiology 1-9.
  • Keidar, D., Yaron, D., Goldstein, E., Shachar, Y., Blass, A., Charbinsky, L., ... & Eldar, Y. C. (2021). COVID-19 classification of X-ray images using deep neural networks. European radiology 1-10.
  • Tuncer, T., Ozyurt, F., Dogan, S., & Subasi, A. (2021). A novel Covid-19 and pneumonia classification method based on F-transform. Chemometrics and Intelligent Laboratory Systems, 210, 104256.
  • Maniruzzaman, M., Kumar, N., Abedin, M.M., Islam, M.S., Suri, H.S., El-Baz, A.S., & Suri, J.S. (2017). Comparative approaches for classification of diabetes mellitus data: Machine learning paradigm. Computer methods and programs in biomedicine, 152, 23-34.
  • Cihan, Ş., Karabulut, B., Arslan, G., & Cihan, G. (2018). Koroner Arter Hastalığı Riskinin Veri Madenciliği Yöntemleri İle İncelenmesi. Uluslararası Mühendislik Araştırma Ve Geliştirme Dergisi, 10(1), 85-93.
  • Cihan, Ş., Karabulut, B., Kokoç, M., Arslan, G., Gürel, G. (2019). Analysis of Cryotherapy Treatment of Verruca by Machine Learning. International Scientific and Vocational Studies Journal, 3(2), 56-66.
  • Magna, A.A.R., Allende-Cid, H., Taramasco, C., Becerra, C., & Figueroa, R. L. (2020). Application of Machine Learning and Word Embeddings in the Classification of Cancer Diagnosis Using Patient Anamnesis. IEEE Access 8, 106198-106213.
  • Nissim, N., Dudaie, M., Barnea, I., Shaked, N.T. (2021). Real‐Time Stain‐Free Classification of Cancer Cells and Blood Cells Using Interferometric Phase Microscopy and Machine Learning. Cytometry Part A 99(5).
  • Arpaci, I., Huang, S., Al-Emran, M., Al-Kabi, M. N., & Peng, M. (2021). Predicting the COVID-19 infection with fourteen clinical features using machine learning classification algorithms. Multimedia Tools and Applications 80(8), 11943-11957.
  • Xing, W., & Bei, Y. (2019). Medical health big data classification based on KNN classification algorithm. IEEE Access 8, 28808-28819.
  • Ahamad, M. M., Aktar, S., Rashed-Al-Mahfuz, M., Uddin, S., Liò, P., Xu, H., ... & Moni, M. A. (2020). A machine learning model to identify early stage symptoms of SARS-Cov-2 infected patients. Expert systems with applications 160, 113661.
  • Hamed, A., Sobhy, A., & Nassar, H. (2021). Accurate Classification of COVID-19 Based on Incomplete Heterogeneous Data using a KNN Variant Algorithm. Arabian Journal for Science and Engineering 1-12.
  • Sun, C., Bai, Y., Chen, D., He, L., Zhu, J., Ding, X., ... & Chen, G. (2021). Accurate classification of COVID‐19 patients with different severity via machine learning. Clinical and Translational Medicine 11(3).
  • Zoabi, Y., Deri-Rozov, S., & Shomron, N. (2021). Machine learning-based prediction of COVID-19 diagnosis based on symptoms. NPJ Digital Medicine 4(1), 1-5.
  • Viana Dos Santos Santana, Í., C.M. da Silveira, A., Sobrinho, Á., et al. (2021) Classification Models for COVID-19 Test Prioritization in Brazil: Machine Learning Approach. Journal of Medical Internet Research. 2021 Apr;23(4):e27293. DOI: 10.2196/27293. PMID: 33750734; PMCID: PMC8034680.
  • Prasath, VB, Alfeilat, HAA, Lasassmeh, O, Hassanat, A. (2017). Distance and Similarity Measures Effect on the Performance of K-Nearest Neighbors Classifier-A Review. arXiv preprint arXiv:1708.04321.
  • Viswanath, P., & Sarma, T. H. (2011). An improvement to k-nearest neighbor classifier. In 2011 IEEE Recent Advances in Intelligent Computational Systems 227-231.
  • Wang, J., Wu, X., Zhang, C. (2005). Support vector machines based on K-means clustering for real-time business intelligence systems. International Journal of Business Intelligence and Data Mining 1, 54-64.
  • Kayaalp, N., Arslan, G. (2014). Fuzzy Bayesian Classifier with Learned Mahalanobis Distance. International Journal of Intelligent Systems 29, 713-726.
  • Arslan, G., Karabulut, B., Ünver, H.M. (2020). On Using Structural Patterns in Data for Classification, Advance and Applications in Statistics 65, 33-56.
  • Karabulut, B., Arslan, G., Unver, H.M. (2021) Classification Based on Structural Information in Data. Arabian Journal for Science and Engineering 1-15.
  • Viana dos Santos Santana, Í.; C. M. da Silveira, A.; Sobrinho, A.; Chaves e Silva, L.; Dias da Silva, L.; Freire de Souza Santos, D.; Candeia, E.; Perkusich, A. (2021), “A Brazilian dataset of symptomatic patients for screening the risk of COVID-19”, Mendeley Data, V5, doi: 10.17632/b7zcgmmwx4.5
  • Guha, S., Rastogi, R., & Shim, K. (1998). CURE: an efficient clustering algorithm for large databases. In ACM Sigmod Record 27(2), 73-84, ACM.
  • Karypis, G., Han, E. H. S., & Kumar, V. (1999). Chameleon: Hierarchical clustering using dynamic modeling. Computer 8, 68-75.
  • Soofi, AA, Awan, A. (2017). Classification Techniques in Machine Learning: Applications and Issues. Journal of Basic and Applied Sciences 13, 459-465.
  • Aggarwal, CC. (2014).Instance-Based Learning:A Survey.Data Classification:Algorithms and Applications 157.
  • Angiulli, F, Narvaez, E. (2018). Pruning strategies for nearest neighbors competence preservation learners. Neurocomputing 308, 8-20.
  • Chicco, D., & Jurman, G. (2023). The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification. BioData Mining, 16(1), 4. Parvandeh, S., Yeh, H. W., Paulus, M. P., & McKinney, B. A. (2020). Consensus features nested cross-validation. Bioinformatics 36(10), 3093-3098.
  • Varma, S., & Simon, R. (2006). Bias in error estimation when using cross-validation for model selection. BMC bioinformatics 7(1), 1-8.
  • Wang, J., Wu, X., Zhang, C. (2005). Support vector machines based on K-means clustering for real-time business intelligence systems. International Journal of Business Intelligence and Data Mining 1, 54–64.
  • Lee, S.J., Park, C., Jhun, M., Koo, J.Y. (2007). Support vector machine using K-means clustering. Journal of the Korean Statistical Society 36, 175–182.
  • Chen, J., Pan, F. (2010). Clustering-based geometric support vector machines, p. 207–217. In Proceedings of the Life System Modeling and Intelligent Computing, Springer, Berlin, Heidelberg
  • Yao, Y., Liu, Y., Yu, Y., et al. (2013). K-SVM: An Effective SVM Algorithm Based on K-means Clustering. Journal of Computers 8, 2632–2639.
  • Bang, S., Jhun, M. (2014). Weighted support vector machine using k-means clustering. Communications in Statistics-Simulation and Computation 43, 2307–2324
  • Yu, H., Yang, J., Han, J. (2003). Classifying large datasets using SVMs with hierarchical clusters. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 306–315
  • Horng, S.J., Su, M.Y., Chen, Y.H., et al. (2011). A novel intrusion detection system based on hierarchical clustering and support vector machines. Expert systems with Applications 38, 306–313
  • Bang, S., Koo, J.Y., Jhun, M. (2010). Support vector machine using k-spatial medians clustering and recovery process. Communications in Statistics-Simulation and Computation 39, 1422–1434
Year 2024, , 25 - 35, 30.06.2024
https://doi.org/10.70030/sjmakeu.1460760

Abstract

References

  • Wang, C., Horby, P.W., Hayden, F. G., & Gao, G.F. (2020). A novel coronavirus outbreak of global health concern. The lancet 395(10223), 470-473.
  • World Health Organization. Naming the coronavirus disease (COVID-19) and the virus that causes it from https://www.who.int/emergencies/diseases/novel-coronavirus-2019/technical-guidance/naming-the-coronavirus-disease-(covid-2019)-and-the-virus-that-causes-it, accessed on 2023-08-18.
  • Khakharia, A., Shah, V., Jain, S., Shah, J., Tiwari, A., ... & Mehendale, N. (2021). Outbreak prediction of COVID-19 for dense and populated countries using machine learning. Annals of Data Science 8(1), 1-19.
  • WHO Director-General's opening remarks at the media briefing on COVID-19 - 11 March 2020, from https://www.who.int/director-general/speeches/detail/who-director-general-s-opening-remarks-at-the-media-briefing-on-covid-19---11-march-2020, accessed on 2023-08-18.
  • WHO Coronavirus (COVID-19) Dashboard, from https://covid19.who.int/, accessed on 2023-08-09.
  • Chu, D.K., Akl, E.A., Duda, S., Solo, K., Yaacoub, S., Schünemann, H.J., ... & Reinap, M. (2020). Physical distancing, face masks, and eye protection to prevent person-to-person transmission of SARS-CoV-2 and COVID-19: a systematic review and meta-analysis. The lancet 395(10242), 1973-1987.
  • Mei, X., Lee, H.C., Diao, K.Y., Huang, M., Lin, B., Liu, C., ... & Yang, Y. (2020). Artificial intelligence–enabled rapid diagnosis of patients with COVID-19. Nature medicine 26(8), 1224-1228.
  • Madaan, V., Roy, A., Gupta, C., Agrawal, P., Sharma, A., Bologa, C., & Prodan, R. (2021). XCOVNet: Chest X-ray Image Classification for COVID-19 Early Detection Using Convolutional Neural Networks. New Generation Computing 1-15.
  • Brinati, D., Campagner, A., Ferrari, D., Locatelli, M., Banfi, G., Cabitza, F. (2020). Detection of COVID-19 infection from routine blood exams with machine learning: a feasibility study. Journal of medical systems 44(8), 1-12.
  • Li, L., Qin, L., Xu, Z., Yin, Y., Wang, X., Kong, B., ... & Xia, J. (2020). Using artificial intelligence to detect COVID-19 and community-acquired pneumonia based on pulmonary CT: evaluation of the diagnostic accuracy. Radiology 296(2), E65-E71.
  • Wang, S., Kang, B., Ma, J., Zeng, X., Xiao, M., Guo, J., ... & Xu, B. (2021). A deep learning algorithm using CT images to screen for Corona Virus Disease (COVID-19). European radiology 1-9.
  • Keidar, D., Yaron, D., Goldstein, E., Shachar, Y., Blass, A., Charbinsky, L., ... & Eldar, Y. C. (2021). COVID-19 classification of X-ray images using deep neural networks. European radiology 1-10.
  • Tuncer, T., Ozyurt, F., Dogan, S., & Subasi, A. (2021). A novel Covid-19 and pneumonia classification method based on F-transform. Chemometrics and Intelligent Laboratory Systems, 210, 104256.
  • Maniruzzaman, M., Kumar, N., Abedin, M.M., Islam, M.S., Suri, H.S., El-Baz, A.S., & Suri, J.S. (2017). Comparative approaches for classification of diabetes mellitus data: Machine learning paradigm. Computer methods and programs in biomedicine, 152, 23-34.
  • Cihan, Ş., Karabulut, B., Arslan, G., & Cihan, G. (2018). Koroner Arter Hastalığı Riskinin Veri Madenciliği Yöntemleri İle İncelenmesi. Uluslararası Mühendislik Araştırma Ve Geliştirme Dergisi, 10(1), 85-93.
  • Cihan, Ş., Karabulut, B., Kokoç, M., Arslan, G., Gürel, G. (2019). Analysis of Cryotherapy Treatment of Verruca by Machine Learning. International Scientific and Vocational Studies Journal, 3(2), 56-66.
  • Magna, A.A.R., Allende-Cid, H., Taramasco, C., Becerra, C., & Figueroa, R. L. (2020). Application of Machine Learning and Word Embeddings in the Classification of Cancer Diagnosis Using Patient Anamnesis. IEEE Access 8, 106198-106213.
  • Nissim, N., Dudaie, M., Barnea, I., Shaked, N.T. (2021). Real‐Time Stain‐Free Classification of Cancer Cells and Blood Cells Using Interferometric Phase Microscopy and Machine Learning. Cytometry Part A 99(5).
  • Arpaci, I., Huang, S., Al-Emran, M., Al-Kabi, M. N., & Peng, M. (2021). Predicting the COVID-19 infection with fourteen clinical features using machine learning classification algorithms. Multimedia Tools and Applications 80(8), 11943-11957.
  • Xing, W., & Bei, Y. (2019). Medical health big data classification based on KNN classification algorithm. IEEE Access 8, 28808-28819.
  • Ahamad, M. M., Aktar, S., Rashed-Al-Mahfuz, M., Uddin, S., Liò, P., Xu, H., ... & Moni, M. A. (2020). A machine learning model to identify early stage symptoms of SARS-Cov-2 infected patients. Expert systems with applications 160, 113661.
  • Hamed, A., Sobhy, A., & Nassar, H. (2021). Accurate Classification of COVID-19 Based on Incomplete Heterogeneous Data using a KNN Variant Algorithm. Arabian Journal for Science and Engineering 1-12.
  • Sun, C., Bai, Y., Chen, D., He, L., Zhu, J., Ding, X., ... & Chen, G. (2021). Accurate classification of COVID‐19 patients with different severity via machine learning. Clinical and Translational Medicine 11(3).
  • Zoabi, Y., Deri-Rozov, S., & Shomron, N. (2021). Machine learning-based prediction of COVID-19 diagnosis based on symptoms. NPJ Digital Medicine 4(1), 1-5.
  • Viana Dos Santos Santana, Í., C.M. da Silveira, A., Sobrinho, Á., et al. (2021) Classification Models for COVID-19 Test Prioritization in Brazil: Machine Learning Approach. Journal of Medical Internet Research. 2021 Apr;23(4):e27293. DOI: 10.2196/27293. PMID: 33750734; PMCID: PMC8034680.
  • Prasath, VB, Alfeilat, HAA, Lasassmeh, O, Hassanat, A. (2017). Distance and Similarity Measures Effect on the Performance of K-Nearest Neighbors Classifier-A Review. arXiv preprint arXiv:1708.04321.
  • Viswanath, P., & Sarma, T. H. (2011). An improvement to k-nearest neighbor classifier. In 2011 IEEE Recent Advances in Intelligent Computational Systems 227-231.
  • Wang, J., Wu, X., Zhang, C. (2005). Support vector machines based on K-means clustering for real-time business intelligence systems. International Journal of Business Intelligence and Data Mining 1, 54-64.
  • Kayaalp, N., Arslan, G. (2014). Fuzzy Bayesian Classifier with Learned Mahalanobis Distance. International Journal of Intelligent Systems 29, 713-726.
  • Arslan, G., Karabulut, B., Ünver, H.M. (2020). On Using Structural Patterns in Data for Classification, Advance and Applications in Statistics 65, 33-56.
  • Karabulut, B., Arslan, G., Unver, H.M. (2021) Classification Based on Structural Information in Data. Arabian Journal for Science and Engineering 1-15.
  • Viana dos Santos Santana, Í.; C. M. da Silveira, A.; Sobrinho, A.; Chaves e Silva, L.; Dias da Silva, L.; Freire de Souza Santos, D.; Candeia, E.; Perkusich, A. (2021), “A Brazilian dataset of symptomatic patients for screening the risk of COVID-19”, Mendeley Data, V5, doi: 10.17632/b7zcgmmwx4.5
  • Guha, S., Rastogi, R., & Shim, K. (1998). CURE: an efficient clustering algorithm for large databases. In ACM Sigmod Record 27(2), 73-84, ACM.
  • Karypis, G., Han, E. H. S., & Kumar, V. (1999). Chameleon: Hierarchical clustering using dynamic modeling. Computer 8, 68-75.
  • Soofi, AA, Awan, A. (2017). Classification Techniques in Machine Learning: Applications and Issues. Journal of Basic and Applied Sciences 13, 459-465.
  • Aggarwal, CC. (2014).Instance-Based Learning:A Survey.Data Classification:Algorithms and Applications 157.
  • Angiulli, F, Narvaez, E. (2018). Pruning strategies for nearest neighbors competence preservation learners. Neurocomputing 308, 8-20.
  • Chicco, D., & Jurman, G. (2023). The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification. BioData Mining, 16(1), 4. Parvandeh, S., Yeh, H. W., Paulus, M. P., & McKinney, B. A. (2020). Consensus features nested cross-validation. Bioinformatics 36(10), 3093-3098.
  • Varma, S., & Simon, R. (2006). Bias in error estimation when using cross-validation for model selection. BMC bioinformatics 7(1), 1-8.
  • Wang, J., Wu, X., Zhang, C. (2005). Support vector machines based on K-means clustering for real-time business intelligence systems. International Journal of Business Intelligence and Data Mining 1, 54–64.
  • Lee, S.J., Park, C., Jhun, M., Koo, J.Y. (2007). Support vector machine using K-means clustering. Journal of the Korean Statistical Society 36, 175–182.
  • Chen, J., Pan, F. (2010). Clustering-based geometric support vector machines, p. 207–217. In Proceedings of the Life System Modeling and Intelligent Computing, Springer, Berlin, Heidelberg
  • Yao, Y., Liu, Y., Yu, Y., et al. (2013). K-SVM: An Effective SVM Algorithm Based on K-means Clustering. Journal of Computers 8, 2632–2639.
  • Bang, S., Jhun, M. (2014). Weighted support vector machine using k-means clustering. Communications in Statistics-Simulation and Computation 43, 2307–2324
  • Yu, H., Yang, J., Han, J. (2003). Classifying large datasets using SVMs with hierarchical clusters. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 306–315
  • Horng, S.J., Su, M.Y., Chen, Y.H., et al. (2011). A novel intrusion detection system based on hierarchical clustering and support vector machines. Expert systems with Applications 38, 306–313
  • Bang, S., Koo, J.Y., Jhun, M. (2010). Support vector machine using k-spatial medians clustering and recovery process. Communications in Statistics-Simulation and Computation 39, 1422–1434
There are 47 citations in total.

Details

Primary Language English
Subjects Knowledge Representation and Reasoning, Artificial Intelligence (Other)
Journal Section Original Research Articles
Authors

Bergen Karabulut 0000-0003-0755-1289

Güvenç Arslan 0000-0002-4770-2689

Halil Murat Ünver 0000-0001-9959-8425

Publication Date June 30, 2024
Submission Date March 29, 2024
Acceptance Date June 26, 2024
Published in Issue Year 2024

Cite

APA Karabulut, B., Arslan, G., & Ünver, H. M. (2024). A NOVEL COVID-19 CLASSIFICATION METHOD BASED ON CURE CLUSTERING. Scientific Journal of Mehmet Akif Ersoy University, 7(1), 25-35. https://doi.org/10.70030/sjmakeu.1460760