Year 2024,
, 771 - 778, 30.09.2024
Hüseyin Koray Mısırlıoğlu
,
Asım Leblebici
,
Gizem Çalıbaşı Koçal
,
Hülya Ellidokuz
,
Yasemin Başbınar
References
- Simon HA. The Sciences of the Artificial. 3rd Edition. Cambridge: The MIT Press; 1996.
- Harrington P. Machine Learning in Action. 1stEdition. New York: Manning Publications; 2012.
- Kartal E. Sınıflandırmaya Dayalı Makine Öğrenmesi Teknikleri ve Kardiyolojik Risk Değerlendirmesine İlişkin Bir Uygulama (Dissertation). Istanbul: Istanbul University. 2015.
- RStudio Team (2020). RStudio: Integrated Development for R. RStudio, PBC, Boston, MA URL http://www.rstudio.com/.
- Chang W, Cheng J, Allaire J, Sievert C, Schloerke B, Xie Y, Allen J, McPherson J, Dipert A, Borges B (2024). shiny: Web Application Framework for R. R package version 1.9.1.9000, https://github.com/rstudio/shiny, https://shiny.posit.co/.
- Kuhn M. Building Predictive Models in R Using the caret Package. Journal of Statistical Software 2008;28(5):1–26.
- team shinyapps io [İnternet]. shinyapps.io user guide [Accessed date: 1 September 2024]. Available from: https://docs.posit.co/shinyapps.io/guide/index.html
- Kirk S, Lee Y, Sadow CA, Levine S, Roche C, Bonaccio E, Filiippini J. (2016). The Cancer Genome Atlas Colon Adenocarcinoma Collection (TCGA-COAD) (Version 3) [Data set]. The Cancer Imaging Archive. https://doi.org/10.7937/K9/TCIA.2016.HJJHBOXZ.
- Grossman RL, Heath AP, Ferretti V, Varmus HE, Lowy DR, Kibbe WA, Staudt LM. Toward a Shared Vision for Cancer Genomic Data. New England Journal of Medicine 2016;375(12):1109-1112.
- Colaprico A, Silva TC, Olsen C, Garofano L, Cava C, Garolini D, Sabedot T, Malta TM, Pagnotta SM, Castiglioni I, Ceccarelli M, Bontempi G, Noushmehr H. TCGAbiolinks: An R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Research 2016;44(8):e71.
- Silva TC, Colaprico A, Olsen C, D'Angelo F, Bontempi G, Ceccarelli M, Noushmehr H. TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages. F1000Research 2016:5;1542.
- Mounir M, Lucchetta M, Silva TC, Olsen C, Bontempi G, Chen X, Noushmehr H, Colaprico A, Papaleo E. New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEx. PLoS Comput Biol 2019;15(3):e1006701.
- Colaprico A, Silva TC, Olsen C, Garofano L, Garolini D, Cava C, Sabedot T, Malta T, Pagnotta SM, Castiglioni I, Ceccarelli M, Bontempi G, Noushmehr H. TCGAbiolinks: TCGAbiolinks: An R/Bioconductor package for integrative analysis with GDC data (Version 2.32.0). Available from: URL: https://www.bioconductor.org/packages/release/bioc/manuals/TCGAbiolinks/man/TCGAbiolinks.pdf
- IBM Corp. Released 2021. IBM SPSS Statistics for Windows, Version 28.0. Armonk, NY: IBM Corp
- Chen EY, Tan CM, Kou Y, Duan Q, Wang Z, Meirelles GV, Clark NR, Ma'ayan A.
Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics 2013;14:128.
- Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, Koplev S, Jenkins SL, Jagodnik KM, Lachmann A, McDermott MG, Monteiro CD, Gundersen GW, Ma'ayan A.
Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Research 2016;44(W1):W90-7.
- Xie Z, Bailey A, Kuleshov MV, Clarke DJB, Evangelista JE, Jenkins SL, Lachmann A, Wojciechowicz ML, Kropiwnicki E, Jagodnik KM, Jeon M, Ma'ayan A. Gene Set Knowledge Discovery with Enrichr. Curr Protoc 2021;1(3):e90.
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 2005;102(43):15545-50.
- Liberzon A, Subramanian A, Pinchback R, Thorvaldsdóttir H, Tamayo P, Mesirov JP. Molecular signatures database (MSigDB) 3.0. Bioinformatics 2011;27(12):1739-40.
- Liberzon A, Birger C, Thorvaldsdóttir H, Ghandi M, Mesirov JP, Tamayo P. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst 2015;1(6):417-425.
- Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 2000;28(1):27-30.
- Kanehisa M. Toward understanding the origin and evolution of cellular organisms. Protein Sci 2019;28(11):1947-1951.
- Kanehisa M, Furumichi M, Sato Y, Kawashima M, Ishiguro-Watanabe M. KEGG for taxonomy-based analysis of pathways and genomes. Nucleic Acids Res 2023;51(D1):D587-D592.
- Awaysheh A, Wilcke J, Elvinger F, Rees L, Fan W, Zimmerman KL. Review of Medical Decision Support and Machine-Learning Methods. Vet Pathol. 2019;56(4):512-525.
- Wiharto W, Kusnanto H, Herianto H. Interpretation of Clinical Data Based on C4.5 Algorithm for the Diagnosis of Coronary Heart Disease. Healthc Inform Res 2016;22(3):186-95.
- Mendi B. Sağlık Bilişimi ve Güncel Uygulamalar. 1st Edition. Istanbul, Nobel Tıp Kitabevi; 2016.
- Persidis A, Persidis A. Medical Expert Systems: An Overview. J Manag Med 1991;5(3):27-34.
- Shiny [Internet]. [Accessed date: 31 Mayıs 2021]. Available from: https://cancerapp.shinyapps.io/shiny/
Glare P, Virik K, Jones M, Hudson M, Eychmuller S, Simes J, Christakis N. A systematic review of physicians' survival predictions in terminally ill cancer patients. BMJ 2003;327(7408):195-8.
- Viganó A, Bruera E, Jhangri GS, Newman SC, Fields AL, Suarez-Almazor ME. Clinical survival predictors in patients with advanced cancer. Arch Intern Med 2000;160(6):861-8.
- Feng J, Zhang H, Li F. Investigating the relevance of major signaling pathways in cancer survival using a biologically meaningful deep learning model. BMC Bioinformatics 2021;22(1):47.
AI-Assisted Survival Prediction in Colorectal Cancer: A Clinical Decision Support Tool
Year 2024,
, 771 - 778, 30.09.2024
Hüseyin Koray Mısırlıoğlu
,
Asım Leblebici
,
Gizem Çalıbaşı Koçal
,
Hülya Ellidokuz
,
Yasemin Başbınar
Abstract
Purpose: Colorectal cancer (CRC) is a leading cause of cancer-related mortality worldwide. Accurate survival prediction is crucial for advanced-stage patients to optimize treatment strategies and improve clinical outcomes. This study aimed to develop an artificial intelligence-assisted clinical decision support system (CDSS) for survival prediction in CRC patients using clinical and genomic data from the Cancer Genome Atlas Colon Adenocarcinoma Collection (TCGA-COAD) dataset.
Methods: Machine learning algorithms, including C4.5 Decision Tree, Support Vector Machines (SVM), Random Forest, and Naive Bayes, were employed to create survival prediction models. Clinical parameters and genomic data from key pathways, such as glycolysis/gluconeogenesis and mTORC1, were integrated into the models. The models were evaluated based on accuracy and performance.
Results: The Random Forest algorithm achieved the highest accuracy (82.3%) when only clinical parameters were used. When clinical data were combined with gene expression data, the model’s accuracy increased further. The resulting models were incorporated into a user-friendly web interface, SurvCOCA, for clinical use.
Conclusions: This study demonstrates the potential of AI-based tools to improve prognosis predictions in CRC patients. Further research is needed, with larger datasets and additional machine learning algorithms, to enhance clinical decision-making and optimize treatment strategies.
Ethical Statement
This study was approved by the Non-Interventional Research Ethics Committee of Dokuz Eylul University (Date: 23.11.2020, Number: 2020/28-29).
Supporting Institution
No funding was used for this study.
Thanks
The results shown here are in whole or part based upon data generated by the TCGA Research Network: http://cancergenome.nih.gov/.
References
- Simon HA. The Sciences of the Artificial. 3rd Edition. Cambridge: The MIT Press; 1996.
- Harrington P. Machine Learning in Action. 1stEdition. New York: Manning Publications; 2012.
- Kartal E. Sınıflandırmaya Dayalı Makine Öğrenmesi Teknikleri ve Kardiyolojik Risk Değerlendirmesine İlişkin Bir Uygulama (Dissertation). Istanbul: Istanbul University. 2015.
- RStudio Team (2020). RStudio: Integrated Development for R. RStudio, PBC, Boston, MA URL http://www.rstudio.com/.
- Chang W, Cheng J, Allaire J, Sievert C, Schloerke B, Xie Y, Allen J, McPherson J, Dipert A, Borges B (2024). shiny: Web Application Framework for R. R package version 1.9.1.9000, https://github.com/rstudio/shiny, https://shiny.posit.co/.
- Kuhn M. Building Predictive Models in R Using the caret Package. Journal of Statistical Software 2008;28(5):1–26.
- team shinyapps io [İnternet]. shinyapps.io user guide [Accessed date: 1 September 2024]. Available from: https://docs.posit.co/shinyapps.io/guide/index.html
- Kirk S, Lee Y, Sadow CA, Levine S, Roche C, Bonaccio E, Filiippini J. (2016). The Cancer Genome Atlas Colon Adenocarcinoma Collection (TCGA-COAD) (Version 3) [Data set]. The Cancer Imaging Archive. https://doi.org/10.7937/K9/TCIA.2016.HJJHBOXZ.
- Grossman RL, Heath AP, Ferretti V, Varmus HE, Lowy DR, Kibbe WA, Staudt LM. Toward a Shared Vision for Cancer Genomic Data. New England Journal of Medicine 2016;375(12):1109-1112.
- Colaprico A, Silva TC, Olsen C, Garofano L, Cava C, Garolini D, Sabedot T, Malta TM, Pagnotta SM, Castiglioni I, Ceccarelli M, Bontempi G, Noushmehr H. TCGAbiolinks: An R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Research 2016;44(8):e71.
- Silva TC, Colaprico A, Olsen C, D'Angelo F, Bontempi G, Ceccarelli M, Noushmehr H. TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages. F1000Research 2016:5;1542.
- Mounir M, Lucchetta M, Silva TC, Olsen C, Bontempi G, Chen X, Noushmehr H, Colaprico A, Papaleo E. New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEx. PLoS Comput Biol 2019;15(3):e1006701.
- Colaprico A, Silva TC, Olsen C, Garofano L, Garolini D, Cava C, Sabedot T, Malta T, Pagnotta SM, Castiglioni I, Ceccarelli M, Bontempi G, Noushmehr H. TCGAbiolinks: TCGAbiolinks: An R/Bioconductor package for integrative analysis with GDC data (Version 2.32.0). Available from: URL: https://www.bioconductor.org/packages/release/bioc/manuals/TCGAbiolinks/man/TCGAbiolinks.pdf
- IBM Corp. Released 2021. IBM SPSS Statistics for Windows, Version 28.0. Armonk, NY: IBM Corp
- Chen EY, Tan CM, Kou Y, Duan Q, Wang Z, Meirelles GV, Clark NR, Ma'ayan A.
Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics 2013;14:128.
- Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, Koplev S, Jenkins SL, Jagodnik KM, Lachmann A, McDermott MG, Monteiro CD, Gundersen GW, Ma'ayan A.
Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Research 2016;44(W1):W90-7.
- Xie Z, Bailey A, Kuleshov MV, Clarke DJB, Evangelista JE, Jenkins SL, Lachmann A, Wojciechowicz ML, Kropiwnicki E, Jagodnik KM, Jeon M, Ma'ayan A. Gene Set Knowledge Discovery with Enrichr. Curr Protoc 2021;1(3):e90.
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 2005;102(43):15545-50.
- Liberzon A, Subramanian A, Pinchback R, Thorvaldsdóttir H, Tamayo P, Mesirov JP. Molecular signatures database (MSigDB) 3.0. Bioinformatics 2011;27(12):1739-40.
- Liberzon A, Birger C, Thorvaldsdóttir H, Ghandi M, Mesirov JP, Tamayo P. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst 2015;1(6):417-425.
- Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 2000;28(1):27-30.
- Kanehisa M. Toward understanding the origin and evolution of cellular organisms. Protein Sci 2019;28(11):1947-1951.
- Kanehisa M, Furumichi M, Sato Y, Kawashima M, Ishiguro-Watanabe M. KEGG for taxonomy-based analysis of pathways and genomes. Nucleic Acids Res 2023;51(D1):D587-D592.
- Awaysheh A, Wilcke J, Elvinger F, Rees L, Fan W, Zimmerman KL. Review of Medical Decision Support and Machine-Learning Methods. Vet Pathol. 2019;56(4):512-525.
- Wiharto W, Kusnanto H, Herianto H. Interpretation of Clinical Data Based on C4.5 Algorithm for the Diagnosis of Coronary Heart Disease. Healthc Inform Res 2016;22(3):186-95.
- Mendi B. Sağlık Bilişimi ve Güncel Uygulamalar. 1st Edition. Istanbul, Nobel Tıp Kitabevi; 2016.
- Persidis A, Persidis A. Medical Expert Systems: An Overview. J Manag Med 1991;5(3):27-34.
- Shiny [Internet]. [Accessed date: 31 Mayıs 2021]. Available from: https://cancerapp.shinyapps.io/shiny/
Glare P, Virik K, Jones M, Hudson M, Eychmuller S, Simes J, Christakis N. A systematic review of physicians' survival predictions in terminally ill cancer patients. BMJ 2003;327(7408):195-8.
- Viganó A, Bruera E, Jhangri GS, Newman SC, Fields AL, Suarez-Almazor ME. Clinical survival predictors in patients with advanced cancer. Arch Intern Med 2000;160(6):861-8.
- Feng J, Zhang H, Li F. Investigating the relevance of major signaling pathways in cancer survival using a biologically meaningful deep learning model. BMC Bioinformatics 2021;22(1):47.