Araştırma Makalesi

Classification of colorectal cancer based on gene sequencing data with XGBoost model: An application of public health informatics

Cilt: 47 Sayı: 3 30 Eylül 2022
PDF İndir
EN TR

Classification of colorectal cancer based on gene sequencing data with XGBoost model: An application of public health informatics

Abstract

Purpose: This study aims to classify open-access colorectal cancer gene data and identify essential genes with the XGBoost method, a machine learning method. Materials and Methods: The open-access colorectal cancer gene dataset was used in the study. The dataset included gene sequencing results of 10 mucosae from healthy controls and the colonic mucosa of 12 patients with colorectal cancer. XGboost, one of the machine learning methods, was used to classify the disease. Accuracy, balanced accuracy, sensitivity, selectivity, positive predictive value, and negative predictive value performance metrics were evaluated for model performance. Results: According to the variable selection method, 17 genes were selected, and modeling was performed with these input variables. Accuracy, balanced accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1 score obtained from modeling results were 95.5%, 95.8%, 91.7%, 1%, 1%, and 90.9%, and 95.7%, respectively. According to the variable impotance acquired from the XGboost technique results, the CYR61, NR4A, FOSB, and NR4A2 genes can be employed as biomarkers for colorectal cancer. Conclusion: As a consequence of this research, genes that may be linked to colorectal cancer and genetic biomarkers for the illness were identified. In the future, the detected genes' reliability can be verified, therapeutic procedures can be established based on these genes, and their usefulness in clinical practice may be documented.

Keywords

Colorectal cancer, Genomics, Machine learning, XGBoost model

Kaynakça

  1. 1. Günther J, Seyfert H-M. The first line of defence: insights into mechanisms and relevance of phagocytosis in epithelial cells. Semin Immunopathol 2018; 40(6): 555-565 DOI: 10.1007/s00281-018-0701-1.
  2. 2. Cao W, Chen HD, Yu YW, Li N, Chen WQ. Changing profiles of cancer burden worldwide and in China: a secondary analysis of the global cancer statistics 2020. Chin Med J (Engl) 2021; 134(07): 783-791 DOI: 10.1097/CM9.0000000000001474.
  3. 3. Mattiuzzi C, Lippi G. Current cancer epidemiology. J Epidemiol Glob Health 2019; 9(4): 217-222 DOI: 10.2991/jegh.k.191008.001.
  4. 4. Sharma R. An examination of colorectal cancer burden by socioeconomic status: evidence from GLOBOCAN 2018. EPMA J 2020; 11(1): 95-117 DOI: 10.1007/s13167-019-00185-y.
  5. 5. Abualkhair WH, Zhou M, Ahnen D, Yu Q, Wu X-C, Karlitz JJ. Trends in incidence of early-onset colorectal cancer in the United States among those approaching screening age. JAMA Network Open 2020; 3(1): e1920407-e1920407 DOI: 10.1001/jamanetworkopen.2019.20407.
  6. 6. MacEwan JP, Dennen S, Kee R, Ali F, Shafrin J, Batt K. Changes in mortality associated with cancer drug approvals in the United States from 2000 to 2016. J Med Econ 2020; 23(12): 1558-1569 DOI: 10.1080/13696998.2020.1834403.
  7. 7. Del Boccio P, Urbani A. Homo sapiens proteomics: clinical perspectives. Ann Ist Super Sanita 2005; 41(4): 479-482.
  8. 8. Martin DB, Nelson PS. From genomics to proteomics: techniques and applications in cancer research. Trends Cell Biol 2001; 11(11): S60-S65 DOI: 10.1016/s0962-8924(01)02123-7.
  9. 9. Gagan J, Van Allen E. Next-generation sequencing to guide cancer therapy. Genome Med 7: 80. Link: https://bit ly/35WLrGw 2015
  10. 10. Grady WM, Carethers JM. Genomic and epigenetic instability in colorectal cancer pathogenesis. Gastroenterology 2008; 135(4): 1079-1099 DOI: 10.1053/j.gastro.2008.07.076.

Kaynak Göster

MLA
Akbulut, Sami, vd. “Classification of colorectal cancer based on gene sequencing data with XGBoost model: An application of public health informatics”. Cukurova Medical Journal, c. 47, sy 3, Eylül 2022, ss. 1179-86, doi:10.17826/cumj.1128653.