Research Article
BibTex RIS Cite

Artificial Intelligence-based Colon Cancer Prediction by Identifying Genomic Biomarkers

Year 2022, , 196 - 202, 01.05.2022
https://doi.org/10.37990/medr.1077024

Abstract

Aim: Colon cancer is the third most common type of cancer worldwide. Because of the poor prognosis and unclear preoperative staging, genetic biomarkers have become more important in the diagnosis and treatment of the disease. In this study, we aimed to determine the biomarker candidate genes for colon cancer and to develop a model that can predict colon cancer based on these genes.
Material and Methods: In the study, a dataset containing the expression levels of 2000 genes from 62 different samples (22 healthy and 40 tumor tissues) obtained by the Princeton University Gene Expression Project and shared in the figshare database was used. Data were summarized as mean ± standard deviation. Independent Samples T-Test was used for statistical analysis. The SMOTE method was applied before the feature selection to eliminate the class imbalance problem in the dataset. The 13 most important genes that may be associated with colon cancer were selected with the LASSO feature selection method. Random Forest (RF), Decision Tree (DT), and Gaussian Naive Bayes methods were used in the modeling phase.
Results: All 13 genes selected by LASSO had a statistically significant difference between normal and tumor samples. In the model created with RF, all the accuracy, specificity, f1-score, sensitivity, negative and positive predictive values were calculated as 1. The RF method offered the highest performance when compared to DT and Gaussian Naive Bayes.
Conclusion: In the study, we identified the genomic biomarkers of colon cancer and classified the disease with a high-performance model. According to our results, it can be recommended to use the LASSO+RF approach when modeling high-dimensional microarray data. 

References

  • 1. Globocan W. Estimated cancer incidence, mortality and prevalence worldwide in 2012. Int Agency Res Cancer. 2012.
  • 2. Labianca R, Beretta G, Gatta G, De Braud F, Wils J. Colon cancer. Critical reviews in oncology/hematology. 2004;51(2):145-70.
  • 3. Loboda A, Nebozhyn MV, Watters JW, Buser CA, Shaw PM, Huang PS, et al. EMT is the dominant program in human colon cancer. BMC medical genomics. 2011;4(1):1-10.
  • 4. Xu C, Meng LB, Duan YC, Cheng YJ, Zhang CM, Zhou X, et al. Screening and identification of biomarkers for systemic sclerosis via microarray technology. International Journal of Molecular Medicine. 2019;44(5):1753-70.
  • 5. Ahmad MA, Eckert C, Teredesai A, editors. Interpretable machine learning in healthcare. Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics; 2018.
  • 6. YAĞIN FH, YAĞIN B, ARSLAN AK, ÇOLAK C. Comparison of Performances of Associative Classification Methods for Cervical Cancer Prediction: Observational Study. Turkiye Klinikleri Journal of Biostatistics. 2021;13(3).
  • 7. Khaire UM, Dhanalakshmi R. High-dimensional microarray dataset classification using an improved adam optimizer (iAdam). Journal of Ambient Intelligence and Humanized Computing. 2020;11(11):5187-204.
  • 8. Hameed SS, Hassan R, Hassan WH, Muhammadsharif FF, Latiff LA. HDG-select: A novel GUI based application for gene selection and classification in high dimensional datasets. PloS one. 2021;16(1):e0246039.
  • 9. MULLA GA, DEMİR Y, HASSAN M. Combination of PCA with SMOTE Oversampling for Classification of High-Dimensional Imbalanced Data. Bitlis Eren University Journal of Science.10(3):858-69.
  • 10. GÜÇKIRAN K, Cantürk İ, ÖZYILMAZ L. DNA microarray gene expression data classification using SVM, MLP, and RF with feature selection methods relief and LASSO. Journal of Suleyman Demirel University Institute of Science and Technology. 2019;23(1):126-32.
  • 11. AKYOL K, BAYIR Ş, Baha Ş. Importance of Attribute Selection for Parkinson Disease. Academic Platform Journal of Engineering and Science. 2020;8(1):175-80.
  • 12. YILMAZ R, YAĞIN FH. Early Detection of Coronary Heart Disease Based on Machine Learning Methods. Medical Records.4(1):1-6.
  • 13. DOĞAN Ş, TÜRKOĞLU İ. Hypothyroidi and Hyperthyroidi Detection from Thyroid Hormone Parameters by Using Decision Trees. Fırat University Journal of Oriental Studies. 2007;5(2):163-9.
  • 14. Pulat M, Kocakoç İD. Bibliometric Analysis of Published Theses in the Field of Machine Learning and Decision Trees in Turkey. Journal of Management and Economics. 2021;28(2):287-308.
  • 15. Kamel H, Abdulah D, Al-Tuwaijari JM, editors. Cancer classification using gaussian naive bayes algorithm. 2019 International Engineering Conference (IEC); 2019: IEEE.
  • 16. Yan W, Bai Z, Wang J, Li X, Chi B, Chen X. ANP32A modulates cell growth by regulating p38 and Akt activity in colorectal cancer. Oncology Reports. 2017;38(3):1605-12.
  • 17. Velmurugan BK, Yeh K-T, Lee C-H, Lin S-H, Chin M-C, Chiang S-L, et al. Acidic leucine-rich nuclear phosphoprotein-32A (ANP32A) association with lymph node metastasis predicts poor survival in oral squamous cell carcinoma patients. Oncotarget. 2016;7(10):10879.
  • 18. Liu Q, Tan Y, Huang T, Ding G, Tu Z, Liu L, et al. TF-centered downstream gene set enrichment analysis: Inference of causal regulators by integrating TF-DNA interactions and protein post-translational modifications information. BMC bioinformatics. 2010;11(11):1-17.
  • 19. Mora JAM, Ordoñez FM, Bonilla DA. Improvement Of K-Means Clustering Algorithm Performance in Gene Expression Data Analysis Through Pre-Processing With Principal Component Analysis And Boosting. 2017.
  • 20. Arentz G, Chataway T, Price TJ, Izwan Z, Hardi G, Cummins AG, et al. Desmin expression in colorectal cancer stroma correlates with advanced stage disease and marks angiogenic microvessels. Clinical proteomics. 2011;8(1):1-13.
  • 21. Bhunia S, Barbhuiya MA, Gupta S, Shrivastava BR, Tiwari PK. Epigenetic downregulation of desmin in gall bladder cancer reveals its potential role in disease progression. The Indian journal of medical research. 2020;151(4):311.
  • 22. Chen H, Xu C, Qing’e Jin ZL. S100 protein family in human cancer. American journal of cancer research. 2014;4(2):89.
  • 23. Twal WO, Czirok A, Hegedus B, Knaak C, Chintalapudi MR, Okagawa H, et al. Fibulin-1 suppression of fibronectin-regulated cell adhesion and motility. Journal of cell science. 2001;114(24):4587-98.
  • 24. Xu Z, Chen H, Liu D, Huo J. Fibulin-1 is downregulated through promoter hypermethylation in colorectal cancer: a CONSORT study. Medicine. 2015;94(13).
  • 25. Tong X, Mirzoeva S, Veliceasa D, Bridgeman BB, Fitchev P, Cornwell ML, et al. Chemopreventive apigenin controls UVB-induced cutaneous proliferation and angiogenesis through HuR and thrombospondin-1. Oncotarget. 2014;5(22):11413.
  • 26. Ono C, Sato M, Taka H, Asano S-i, Matsuura Y, Bando H. Tightly regulated expression of Autographa californica multicapsid nucleopolyhedrovirus immediate early genes emerges from their interactions and possible collective behaviors. Plos one. 2015;10(3):e0119580.
  • 27. Strassburg CP, Kasai Y, Seng BA, Miniou P, Zaloudik J, Herlyn D, et al. Baculovirus recombinant expressing a secreted form of a transmembrane carcinoma-associated antigen. Cancer Research. 1992;52(4):815-21.
  • 28. Loging WT, Reisman D. Elevated expression of ribosomal protein genes L37, RPP-1, and S2 in the presence of mutant p53. Cancer Epidemiology and Prevention Biomarkers. 1999;8(11):1011-6.
  • 29. Golob-Schwarzl N, Schweiger C, Koller C, Krassnig S, Gogg-Kamerer M, Gantenbein N, et al. Separation of low and high grade colon and rectum carcinoma by eukaryotic translation initiation factors 1, 5 and 6. Oncotarget. 2017;8(60):101224.
  • 30. Oliveira P, Sanges R, Huntsman D, Stupka E, Oliveira C. Characterization of the intronic portion of cadherin superfamily members, common cancer orchestrators. European journal of human genetics. 2012;20(8):878-83.
  • 31. Van Marck V, Stove C, Jacobs K, Van den Eynden G, Bracke M. P‐cadherin in adhesion and invasion: Opposite roles in colon and bladder carcinoma. International journal of cancer. 2011;128(5):1031-44.
  • 32. Takahashi K, Sasano H, Fukushima K, Hirasawa G, Miura H, Sasaki I, et al. 11 beta-hydroxysteroid dehydrogenase type II in human colon: a new marker of fetal development and differentiation in neoplasms. Anticancer research. 1998;18(5A):3381-8.
  • 33. Baba Y, Nosho K, Shima K, Meyerhardt J, Chan A, Engelman J, et al. Prognostic significance of AMP-activated protein kinase expression and modifying effect of MAPK3/1 in colorectal cancer. British journal of cancer. 2010;103(7):1025-33.
  • 34. Esteve-Puig R, Canals F, Colome N, Merlino G, Recio JÁ. Uncoupling of the LKB1-AMPKα energy sensor pathway by growth factors and oncogenic BRAFV600E. PloS one. 2009;4(3):e4771.
  • 35. Zheng B, Jeong JH, Asara JM, Yuan Y-Y, Granter SR, Chin L, et al. Oncogenic B-RAF negatively regulates the tumor suppressor LKB1 to promote melanoma cell proliferation. Molecular cell. 2009;33(2):237-47.
  • 36. Kim M-J, Park I-J, Yun H, Kang I, Choe W, Kim S-S, et al. AMP-activated protein kinase antagonizes pro-apoptotic extracellular signal-regulated kinase activation by inducing dual-specificity protein phosphatases in response to glucose deprivation in HCT116 carcinoma. Journal of Biological Chemistry. 2010;285(19):14617-27.
  • 37. Arowolo MO, Isiaka RM, Abdulsalam SO, Saheed Y, Gbolagade KA. A comparative analysis of feature extraction methods for classifying colon cancer microarray data. EAI endorsed transactions on scalable information systems. 2017;4(14).
  • 38. Al-Rajab M, Lu J, Xu Q. Examining applying high performance genetic data feature selection and classification algorithms for colon cancer diagnosis. Computer methods and programs in biomedicine. 2017;146:11-24.

Genomik Biyobelirteçleri Belirleyerek Yapay Zeka Tabanlı Kolon Kanseri Tahmini

Year 2022, , 196 - 202, 01.05.2022
https://doi.org/10.37990/medr.1077024

Abstract

Amaç: Kolon kanseri dünya genelinde en sık görülen üçüncü kanser türüdür. Kötü prognoz ve net olmayan preoperatif evreleme nedeniyle, hastalığın tanı ve tedavisinde genetik biyobelirteçler daha önemli hale gelmiştir. Bu çalışmada kolon kanseri için biyobelirteç adayı genlerin belirlenmesi ve bu genlere dayalı olarak kolon kanserini başarılı bir şekilde tahmin eden bir modelin geliştirilmesi amaçlanmıştır.
Materyal ve Metot: Çalışmada, Princeton Üniversitesi Gen Ekspresyon Projesi ile elde edilen ve figshare veri tabanında paylaşılan 62 farklı örnekten (22 sağlıklı ve 40 tümör dokusu) 2000 genin ekspresyon düzeylerini içeren bir veri seti kullanıldı. Veriler ortalama ± standart sapma olarak özetlendi. İstatistiksel analizler için bağımsız örneklerde T-testi kullanıldı. Veri setindeki sınıf dengesizliği sorununu ortadan kaldırmak için öznitelik seçiminden önce SMOTE yöntemi uygulandı. Kolon kanseri ile ilişkili olabilecek en önemli 13 gen, LASSO öznitelik seçim yöntemi ile seçildi. Modelleme aşamasında Rastgele Orman (RF), Karar Ağacı (DT) ve Gauss naive Bayes yöntemleri kullanıldı.
Bulgular: LASSO tarafından seçilen 13 genin tümü, normal ve tümör numuneleri arasında istatistiksel olarak anlamlı bir farka sahipti. RF ile oluşturulan modelde doğruluk, seçicilik, f1-skor, duyarlılık, negatif ve pozitif prediktif değerlerinin tümü 1 olarak hesaplanmıştır. DT ve Gaussian Naive Bayes ile karşılaştırıldığında RF yöntemi en yüksek performansı vermiştir.
Sonuç: Çalışmada kolon kanserinin genomik biyobelirteçlerini belirledik ve hastalığı yüksek performanslı bir model ile sınıflandırdık. Elde ettiğimiz sonuçlara göre, yüksek boyutlu mikrodizi verilerinin modellenmesinde LASSO+RF yaklaşımının kullanılması önerilebilir.

References

  • 1. Globocan W. Estimated cancer incidence, mortality and prevalence worldwide in 2012. Int Agency Res Cancer. 2012.
  • 2. Labianca R, Beretta G, Gatta G, De Braud F, Wils J. Colon cancer. Critical reviews in oncology/hematology. 2004;51(2):145-70.
  • 3. Loboda A, Nebozhyn MV, Watters JW, Buser CA, Shaw PM, Huang PS, et al. EMT is the dominant program in human colon cancer. BMC medical genomics. 2011;4(1):1-10.
  • 4. Xu C, Meng LB, Duan YC, Cheng YJ, Zhang CM, Zhou X, et al. Screening and identification of biomarkers for systemic sclerosis via microarray technology. International Journal of Molecular Medicine. 2019;44(5):1753-70.
  • 5. Ahmad MA, Eckert C, Teredesai A, editors. Interpretable machine learning in healthcare. Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics; 2018.
  • 6. YAĞIN FH, YAĞIN B, ARSLAN AK, ÇOLAK C. Comparison of Performances of Associative Classification Methods for Cervical Cancer Prediction: Observational Study. Turkiye Klinikleri Journal of Biostatistics. 2021;13(3).
  • 7. Khaire UM, Dhanalakshmi R. High-dimensional microarray dataset classification using an improved adam optimizer (iAdam). Journal of Ambient Intelligence and Humanized Computing. 2020;11(11):5187-204.
  • 8. Hameed SS, Hassan R, Hassan WH, Muhammadsharif FF, Latiff LA. HDG-select: A novel GUI based application for gene selection and classification in high dimensional datasets. PloS one. 2021;16(1):e0246039.
  • 9. MULLA GA, DEMİR Y, HASSAN M. Combination of PCA with SMOTE Oversampling for Classification of High-Dimensional Imbalanced Data. Bitlis Eren University Journal of Science.10(3):858-69.
  • 10. GÜÇKIRAN K, Cantürk İ, ÖZYILMAZ L. DNA microarray gene expression data classification using SVM, MLP, and RF with feature selection methods relief and LASSO. Journal of Suleyman Demirel University Institute of Science and Technology. 2019;23(1):126-32.
  • 11. AKYOL K, BAYIR Ş, Baha Ş. Importance of Attribute Selection for Parkinson Disease. Academic Platform Journal of Engineering and Science. 2020;8(1):175-80.
  • 12. YILMAZ R, YAĞIN FH. Early Detection of Coronary Heart Disease Based on Machine Learning Methods. Medical Records.4(1):1-6.
  • 13. DOĞAN Ş, TÜRKOĞLU İ. Hypothyroidi and Hyperthyroidi Detection from Thyroid Hormone Parameters by Using Decision Trees. Fırat University Journal of Oriental Studies. 2007;5(2):163-9.
  • 14. Pulat M, Kocakoç İD. Bibliometric Analysis of Published Theses in the Field of Machine Learning and Decision Trees in Turkey. Journal of Management and Economics. 2021;28(2):287-308.
  • 15. Kamel H, Abdulah D, Al-Tuwaijari JM, editors. Cancer classification using gaussian naive bayes algorithm. 2019 International Engineering Conference (IEC); 2019: IEEE.
  • 16. Yan W, Bai Z, Wang J, Li X, Chi B, Chen X. ANP32A modulates cell growth by regulating p38 and Akt activity in colorectal cancer. Oncology Reports. 2017;38(3):1605-12.
  • 17. Velmurugan BK, Yeh K-T, Lee C-H, Lin S-H, Chin M-C, Chiang S-L, et al. Acidic leucine-rich nuclear phosphoprotein-32A (ANP32A) association with lymph node metastasis predicts poor survival in oral squamous cell carcinoma patients. Oncotarget. 2016;7(10):10879.
  • 18. Liu Q, Tan Y, Huang T, Ding G, Tu Z, Liu L, et al. TF-centered downstream gene set enrichment analysis: Inference of causal regulators by integrating TF-DNA interactions and protein post-translational modifications information. BMC bioinformatics. 2010;11(11):1-17.
  • 19. Mora JAM, Ordoñez FM, Bonilla DA. Improvement Of K-Means Clustering Algorithm Performance in Gene Expression Data Analysis Through Pre-Processing With Principal Component Analysis And Boosting. 2017.
  • 20. Arentz G, Chataway T, Price TJ, Izwan Z, Hardi G, Cummins AG, et al. Desmin expression in colorectal cancer stroma correlates with advanced stage disease and marks angiogenic microvessels. Clinical proteomics. 2011;8(1):1-13.
  • 21. Bhunia S, Barbhuiya MA, Gupta S, Shrivastava BR, Tiwari PK. Epigenetic downregulation of desmin in gall bladder cancer reveals its potential role in disease progression. The Indian journal of medical research. 2020;151(4):311.
  • 22. Chen H, Xu C, Qing’e Jin ZL. S100 protein family in human cancer. American journal of cancer research. 2014;4(2):89.
  • 23. Twal WO, Czirok A, Hegedus B, Knaak C, Chintalapudi MR, Okagawa H, et al. Fibulin-1 suppression of fibronectin-regulated cell adhesion and motility. Journal of cell science. 2001;114(24):4587-98.
  • 24. Xu Z, Chen H, Liu D, Huo J. Fibulin-1 is downregulated through promoter hypermethylation in colorectal cancer: a CONSORT study. Medicine. 2015;94(13).
  • 25. Tong X, Mirzoeva S, Veliceasa D, Bridgeman BB, Fitchev P, Cornwell ML, et al. Chemopreventive apigenin controls UVB-induced cutaneous proliferation and angiogenesis through HuR and thrombospondin-1. Oncotarget. 2014;5(22):11413.
  • 26. Ono C, Sato M, Taka H, Asano S-i, Matsuura Y, Bando H. Tightly regulated expression of Autographa californica multicapsid nucleopolyhedrovirus immediate early genes emerges from their interactions and possible collective behaviors. Plos one. 2015;10(3):e0119580.
  • 27. Strassburg CP, Kasai Y, Seng BA, Miniou P, Zaloudik J, Herlyn D, et al. Baculovirus recombinant expressing a secreted form of a transmembrane carcinoma-associated antigen. Cancer Research. 1992;52(4):815-21.
  • 28. Loging WT, Reisman D. Elevated expression of ribosomal protein genes L37, RPP-1, and S2 in the presence of mutant p53. Cancer Epidemiology and Prevention Biomarkers. 1999;8(11):1011-6.
  • 29. Golob-Schwarzl N, Schweiger C, Koller C, Krassnig S, Gogg-Kamerer M, Gantenbein N, et al. Separation of low and high grade colon and rectum carcinoma by eukaryotic translation initiation factors 1, 5 and 6. Oncotarget. 2017;8(60):101224.
  • 30. Oliveira P, Sanges R, Huntsman D, Stupka E, Oliveira C. Characterization of the intronic portion of cadherin superfamily members, common cancer orchestrators. European journal of human genetics. 2012;20(8):878-83.
  • 31. Van Marck V, Stove C, Jacobs K, Van den Eynden G, Bracke M. P‐cadherin in adhesion and invasion: Opposite roles in colon and bladder carcinoma. International journal of cancer. 2011;128(5):1031-44.
  • 32. Takahashi K, Sasano H, Fukushima K, Hirasawa G, Miura H, Sasaki I, et al. 11 beta-hydroxysteroid dehydrogenase type II in human colon: a new marker of fetal development and differentiation in neoplasms. Anticancer research. 1998;18(5A):3381-8.
  • 33. Baba Y, Nosho K, Shima K, Meyerhardt J, Chan A, Engelman J, et al. Prognostic significance of AMP-activated protein kinase expression and modifying effect of MAPK3/1 in colorectal cancer. British journal of cancer. 2010;103(7):1025-33.
  • 34. Esteve-Puig R, Canals F, Colome N, Merlino G, Recio JÁ. Uncoupling of the LKB1-AMPKα energy sensor pathway by growth factors and oncogenic BRAFV600E. PloS one. 2009;4(3):e4771.
  • 35. Zheng B, Jeong JH, Asara JM, Yuan Y-Y, Granter SR, Chin L, et al. Oncogenic B-RAF negatively regulates the tumor suppressor LKB1 to promote melanoma cell proliferation. Molecular cell. 2009;33(2):237-47.
  • 36. Kim M-J, Park I-J, Yun H, Kang I, Choe W, Kim S-S, et al. AMP-activated protein kinase antagonizes pro-apoptotic extracellular signal-regulated kinase activation by inducing dual-specificity protein phosphatases in response to glucose deprivation in HCT116 carcinoma. Journal of Biological Chemistry. 2010;285(19):14617-27.
  • 37. Arowolo MO, Isiaka RM, Abdulsalam SO, Saheed Y, Gbolagade KA. A comparative analysis of feature extraction methods for classifying colon cancer microarray data. EAI endorsed transactions on scalable information systems. 2017;4(14).
  • 38. Al-Rajab M, Lu J, Xu Q. Examining applying high performance genetic data feature selection and classification algorithms for colon cancer diagnosis. Computer methods and programs in biomedicine. 2017;146:11-24.
There are 38 citations in total.

Details

Primary Language English
Subjects Health Care Administration
Journal Section Original Articles
Authors

Nur Paksoy 0000-0002-2518-8148

Fatma Hilal Yağın 0000-0002-9848-7958

Publication Date May 1, 2022
Acceptance Date March 28, 2022
Published in Issue Year 2022

Cite

AMA Paksoy N, Yağın FH. Artificial Intelligence-based Colon Cancer Prediction by Identifying Genomic Biomarkers. Med Records. May 2022;4(2):196-202. doi:10.37990/medr.1077024

Cited By












 Chief Editors

Assoc. Prof. Zülal Öner
Address: İzmir Bakırçay University, Department of Anatomy, İzmir, Turkey

Assoc. Prof. Deniz Şenol
Address: Düzce University, Department of Anatomy, Düzce, Turkey

Editors
Assoc. Prof. Serkan Öner
İzmir Bakırçay University, Department of Radiology, İzmir, Türkiye

E-mail: medrecsjournal@gmail.com

Publisher:
Medical Records Association (Tıbbi Kayıtlar Derneği)
Address: Orhangazi Neighborhood, 440th Street,
Green Life Complex, Block B, Floor 3, No. 69
Düzce, Türkiye
Web: www.tibbikayitlar.org.tr

Publication Support: 

Effect Publishing & Agency
Phone: + 90 (540) 035 44 35
E-mail: info@effectpublishing.com
Address: Akdeniz Neighborhood, Şehit Fethi Bey Street,
No: 66/B, Ground floor, 35210 Konak/İzmir, Türkiye
web: www.effectpublishing.com