Research Article
BibTex RIS Cite

Analysis of Mutated RNA-Type Breast Cancer Data with Machine Learning Methods

Year 2021, Volume: 16 Issue: 2, 251 - 260, 15.09.2021

Abstract

According to the data for the year 2020, the three most common types of cancer in women are; breast, lung, and colorectal. These types of cancer make up 50% of other types of cancer seen in women. In addition, only breast cancer accounts for 30% of cancer types in women. Early diagnosis and treatment processes of breast cancer patients are important and the correct application of this process increases the survival rate of the patients. Artificial intelligence can contribute to the observational performance of radiologists in breast cancer screening. On the other hand, artificial intelligence-based approaches can also be used to increase the accuracy of digital mammography. The dataset used in this study consists of mutated RNA-type breast cancer data. The data set includes the clinical and genetic characteristics of the patients. In the approach of the study, it is suggested to use various machine learning methods together. Support Vector Machines method has been decided the best performance with 97.55% in the analyzes performed. It has been observed that the recommended approach in the diagnosis of breast cancer gave successful results.

References

  • [1] R.L. Siegel, K.D. Miller, H.E. Fuchs, A. Jemal, Cancer Statistics, 2021, CA: A Cancer Journal for Clinicians. 71 (2021) 7–33. https://doi.org/10.3322/caac.21654.
  • [2] A. Aloraini, Different Machine Learning Algorithms for Breast Cancer Diagnosis, International Journal of Artificial Intelligence & Applications. 3 (2012) 21–30. https://doi.org/10.5121/ijaia.2012.3603.
  • [3] S. Pacilè, J. Lopez, P. Chone, T. Bertinotti, J.M. Grouin, P. Fillard, Improving Breast Cancer Detection Accuracy of Mammography with the Concurrent Use of an Artificial Intelligence Tool, Radiology: Artificial Intelligence. 2 (2020) e190208. https://doi.org/10.1148/ryai.2020190208.
  • [4] S.M. McKinney, M. Sieniek, V. Godbole, J. Godwin, N. Antropova, H. Ashrafian, and et al. International evaluation of an AI system for breast cancer screening, Nature. 577 (2020) 89–94. https://doi.org/10.1038/s41586-019-1799-6.
  • [5] I. Sechopoulos, J. Teuwen, R. Mann, Artificial intelligence for breast cancer detection in mammography and digital breast tomosynthesis: State of the art, Seminars in Cancer Biology. (2020). https://doi.org/10.1016/j.semcancer.2020.06.002.
  • [6] K. Dembrower, E. Wåhlin, Y. Liu, M. Salim, K. Smith, P. Lindholm, M. Eklund, F. Strand, Effect of artificial intelligence-based triaging of breast cancer screening mammograms on cancer detection and radiologist workload: a retrospective simulation study, The Lancet Digital Health. 2 (2020) e468–e474. https://doi.org/10.1016/S2589-7500(20)30185-0.
  • [7] B. Pereira, S.F. Chin, O.M. Rueda, H.K.M. Vollan, and et al., The somatic mutation profiles of 2,433 breast cancers refines their genomic and transcriptomic landscapes, Nature Communications. 7 (2016) 1–16. https://doi.org/10.1038/ncomms11479.
  • [8] M. Marjanović, M. Kovačević, B. Bajat, V. Voženílek, Landslide susceptibility assessment using SVM machine learning algorithm, Engineering Geology. 123 (2011) 225–234. https://doi.org/10.1016/j.enggeo.2011.09.006.
  • [9] T. Shon, Y. Kim, C. Lee, J. Moon, A machine learning framework for network anomaly detection using SVM and GA, in: Proceedings from the 6th Annual IEEE System, Man and Cybernetics Information Assurance Workshop, SMC 2005, 2005: pp. 176–183. https://doi.org/10.1109/IAW.2005.1495950.
  • [10] Support Vector Machine Machine learning algorithm with example and code - Codershood, (n.d.). https://www.codershood.info/2019/01/10/support-vector-machine-machine-learning-algorithm-with-example-and-code/ (accessed February 17, 2021).
  • [11] A. Ibrahem Ahmed Osman, A. Najah Ahmed, M.F. Chow, Y. Feng Huang, A. El-Shafie, Extreme gradient boosting (Xgboost) model to predict the groundwater levels in Selangor Malaysia, Ain Shams Engineering Journal. (2021). https://doi.org/10.1016/j.asej.2020.11.011.
  • [12] A. Natekin, A. Knoll, Gradient boosting machines, a tutorial, Frontiers in Neurorobotics. 7 (2013). https://doi.org/10.3389/fnbot.2013.00021.
  • [13] Y.Y. Song, Y. Lu, Decision tree methods: applications for classification and prediction, Shanghai Archives of Psychiatry. 27 (2015) 130–135. https://doi.org/10.11919/j.issn.1002-0829.215044.
  • [14] An Introduction to Machine Learning, (n.d.). https://bioinformatics-training.github.io/intro-machine-learning-2017/decision-trees.html (accessed February 18, 2021).
  • [15] K. Fawagreh, M.M. Gaber, E. Elyan, Random forests: from early developments to recent advancements, Systems Science & Control Engineering. 2 (2014) 602–609. https://doi.org/10.1080/21642583.2014.956265.
  • [16] Random Forest Regression. Random Forest Regression is a… | by Chaya Bakshi | Level Up Coding, (n.d.). https://levelup.gitconnected.com/random-forest-regression-209c0f354c84 (accessed February 18, 2021).
  • [17] T. Carneiro, R.V.M. da Nobrega, T. Nepomuceno, G. bin Bian, V.H.C. de Albuquerque, P.P.R. Filho, Performance Analysis of Google Colaboratory as a Tool for Accelerating Deep Learning Applications, IEEE Access. 6 (2018) 61677–61685. https://doi.org/10.1109/ACCESS.2018.2874767.
  • [18] S. Walker, W. Khan, K. Katic, W. Maassen, W. Zeiler, Accuracy of different machine learning algorithms and added-value of predicting aggregated-level energy performance of commercial buildings, Energy and Buildings. 209 (2020) 109705. https://doi.org/10.1016/j.enbuild.2019.109705.
  • [19] GitHub - scikit-learn-contrib/sklearn-pandas: Pandas integration with sklearn, (n.d.). https://github.com/scikit-learn-contrib/sklearn-pandas (accessed February 21, 2021).
  • [20] D. Chicco, G. Jurman, The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation, BMC Genomics. 21 (2020) 1–13. https://doi.org/10.1186/s12864-019-6413-7.
  • [21] G. Dubourg-Felonneau, T. Cannings, F. Cotter, H. Thompson, N. Patel, J.W. Cassidy, H.W. Clifford, A Framework for Implementing Machine Learning on Omics Data, ArXiv. (2018) 1–5.

Mutasyona Uğramış RNA tipli Göğüs Kanseri Verilerinin Makine Öğrenme Yöntemleri ile Analizi

Year 2021, Volume: 16 Issue: 2, 251 - 260, 15.09.2021

Abstract

Kadınlarda görülen en yaygın üç kanser türü 2020 yılı verilerine göre; göğüs, akciğer ve kolorektaldır. Bu kanser türleri kadınlarda görülen diğer kanser türleri arasında %50'sini oluşturmaktadır. Ayrıca, kadınlarda görülen kanser türleri arasında yalnızca göğüs kanseri %30'unu oluşturmaktadır. Göğüs kanseri hastalarının, erken tanı ve tedavi süreçleri önemlidir ve bu sürecin doğru uygulanması hastaların hayatta kalma oranlarını artırır. Yapay zekâ, radyologların göğüs kanseri taramasındaki gözlemleme performanslarına katkı sağlayabilir. Öte yandan yapay zekâ tabanlı yaklaşımlar, dijital mamografinin doğruluğunu artırmak için de kullanılabilir. Bu çalışmada kullanılan veri kümesi mutasyona uğramış RNA tipi göğüs kanseri verilerinden oluşur. Veri kümesinde hastaların klinik özellikleri ile genetik özellikleri yer alır. Çalışmanın yaklaşımında çeşitli makine öğrenimi yöntemlerinin bir arada kullanılması önerilmiştir. Gerçekleştirilen analizlerde en iyi performansı %97,55 oranında Destek Vektör Makineleri yöntemi verdi. Göğüs kanseri tanısında önerilen yaklaşımın başarılı sonuçlar verdiği gözlemlenmiştir.

References

  • [1] R.L. Siegel, K.D. Miller, H.E. Fuchs, A. Jemal, Cancer Statistics, 2021, CA: A Cancer Journal for Clinicians. 71 (2021) 7–33. https://doi.org/10.3322/caac.21654.
  • [2] A. Aloraini, Different Machine Learning Algorithms for Breast Cancer Diagnosis, International Journal of Artificial Intelligence & Applications. 3 (2012) 21–30. https://doi.org/10.5121/ijaia.2012.3603.
  • [3] S. Pacilè, J. Lopez, P. Chone, T. Bertinotti, J.M. Grouin, P. Fillard, Improving Breast Cancer Detection Accuracy of Mammography with the Concurrent Use of an Artificial Intelligence Tool, Radiology: Artificial Intelligence. 2 (2020) e190208. https://doi.org/10.1148/ryai.2020190208.
  • [4] S.M. McKinney, M. Sieniek, V. Godbole, J. Godwin, N. Antropova, H. Ashrafian, and et al. International evaluation of an AI system for breast cancer screening, Nature. 577 (2020) 89–94. https://doi.org/10.1038/s41586-019-1799-6.
  • [5] I. Sechopoulos, J. Teuwen, R. Mann, Artificial intelligence for breast cancer detection in mammography and digital breast tomosynthesis: State of the art, Seminars in Cancer Biology. (2020). https://doi.org/10.1016/j.semcancer.2020.06.002.
  • [6] K. Dembrower, E. Wåhlin, Y. Liu, M. Salim, K. Smith, P. Lindholm, M. Eklund, F. Strand, Effect of artificial intelligence-based triaging of breast cancer screening mammograms on cancer detection and radiologist workload: a retrospective simulation study, The Lancet Digital Health. 2 (2020) e468–e474. https://doi.org/10.1016/S2589-7500(20)30185-0.
  • [7] B. Pereira, S.F. Chin, O.M. Rueda, H.K.M. Vollan, and et al., The somatic mutation profiles of 2,433 breast cancers refines their genomic and transcriptomic landscapes, Nature Communications. 7 (2016) 1–16. https://doi.org/10.1038/ncomms11479.
  • [8] M. Marjanović, M. Kovačević, B. Bajat, V. Voženílek, Landslide susceptibility assessment using SVM machine learning algorithm, Engineering Geology. 123 (2011) 225–234. https://doi.org/10.1016/j.enggeo.2011.09.006.
  • [9] T. Shon, Y. Kim, C. Lee, J. Moon, A machine learning framework for network anomaly detection using SVM and GA, in: Proceedings from the 6th Annual IEEE System, Man and Cybernetics Information Assurance Workshop, SMC 2005, 2005: pp. 176–183. https://doi.org/10.1109/IAW.2005.1495950.
  • [10] Support Vector Machine Machine learning algorithm with example and code - Codershood, (n.d.). https://www.codershood.info/2019/01/10/support-vector-machine-machine-learning-algorithm-with-example-and-code/ (accessed February 17, 2021).
  • [11] A. Ibrahem Ahmed Osman, A. Najah Ahmed, M.F. Chow, Y. Feng Huang, A. El-Shafie, Extreme gradient boosting (Xgboost) model to predict the groundwater levels in Selangor Malaysia, Ain Shams Engineering Journal. (2021). https://doi.org/10.1016/j.asej.2020.11.011.
  • [12] A. Natekin, A. Knoll, Gradient boosting machines, a tutorial, Frontiers in Neurorobotics. 7 (2013). https://doi.org/10.3389/fnbot.2013.00021.
  • [13] Y.Y. Song, Y. Lu, Decision tree methods: applications for classification and prediction, Shanghai Archives of Psychiatry. 27 (2015) 130–135. https://doi.org/10.11919/j.issn.1002-0829.215044.
  • [14] An Introduction to Machine Learning, (n.d.). https://bioinformatics-training.github.io/intro-machine-learning-2017/decision-trees.html (accessed February 18, 2021).
  • [15] K. Fawagreh, M.M. Gaber, E. Elyan, Random forests: from early developments to recent advancements, Systems Science & Control Engineering. 2 (2014) 602–609. https://doi.org/10.1080/21642583.2014.956265.
  • [16] Random Forest Regression. Random Forest Regression is a… | by Chaya Bakshi | Level Up Coding, (n.d.). https://levelup.gitconnected.com/random-forest-regression-209c0f354c84 (accessed February 18, 2021).
  • [17] T. Carneiro, R.V.M. da Nobrega, T. Nepomuceno, G. bin Bian, V.H.C. de Albuquerque, P.P.R. Filho, Performance Analysis of Google Colaboratory as a Tool for Accelerating Deep Learning Applications, IEEE Access. 6 (2018) 61677–61685. https://doi.org/10.1109/ACCESS.2018.2874767.
  • [18] S. Walker, W. Khan, K. Katic, W. Maassen, W. Zeiler, Accuracy of different machine learning algorithms and added-value of predicting aggregated-level energy performance of commercial buildings, Energy and Buildings. 209 (2020) 109705. https://doi.org/10.1016/j.enbuild.2019.109705.
  • [19] GitHub - scikit-learn-contrib/sklearn-pandas: Pandas integration with sklearn, (n.d.). https://github.com/scikit-learn-contrib/sklearn-pandas (accessed February 21, 2021).
  • [20] D. Chicco, G. Jurman, The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation, BMC Genomics. 21 (2020) 1–13. https://doi.org/10.1186/s12864-019-6413-7.
  • [21] G. Dubourg-Felonneau, T. Cannings, F. Cotter, H. Thompson, N. Patel, J.W. Cassidy, H.W. Clifford, A Framework for Implementing Machine Learning on Omics Data, ArXiv. (2018) 1–5.
There are 21 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section TJST
Authors

Rumeysa Hanife Kars 0000-0002-2865-0414

Publication Date September 15, 2021
Submission Date July 13, 2021
Published in Issue Year 2021 Volume: 16 Issue: 2

Cite

APA Kars, R. H. (2021). Analysis of Mutated RNA-Type Breast Cancer Data with Machine Learning Methods. Turkish Journal of Science and Technology, 16(2), 251-260.
AMA Kars RH. Analysis of Mutated RNA-Type Breast Cancer Data with Machine Learning Methods. TJST. September 2021;16(2):251-260.
Chicago Kars, Rumeysa Hanife. “Analysis of Mutated RNA-Type Breast Cancer Data With Machine Learning Methods”. Turkish Journal of Science and Technology 16, no. 2 (September 2021): 251-60.
EndNote Kars RH (September 1, 2021) Analysis of Mutated RNA-Type Breast Cancer Data with Machine Learning Methods. Turkish Journal of Science and Technology 16 2 251–260.
IEEE R. H. Kars, “Analysis of Mutated RNA-Type Breast Cancer Data with Machine Learning Methods”, TJST, vol. 16, no. 2, pp. 251–260, 2021.
ISNAD Kars, Rumeysa Hanife. “Analysis of Mutated RNA-Type Breast Cancer Data With Machine Learning Methods”. Turkish Journal of Science and Technology 16/2 (September 2021), 251-260.
JAMA Kars RH. Analysis of Mutated RNA-Type Breast Cancer Data with Machine Learning Methods. TJST. 2021;16:251–260.
MLA Kars, Rumeysa Hanife. “Analysis of Mutated RNA-Type Breast Cancer Data With Machine Learning Methods”. Turkish Journal of Science and Technology, vol. 16, no. 2, 2021, pp. 251-60.
Vancouver Kars RH. Analysis of Mutated RNA-Type Breast Cancer Data with Machine Learning Methods. TJST. 2021;16(2):251-60.