Research Article

Evaluation of Missing Data Imputation Methods and PCA Techniques for Machine Learning Models in Breast Cancer Diagnosis Using WBCD

Volume: 13 Number: 3 September 26, 2024
EN

Evaluation of Missing Data Imputation Methods and PCA Techniques for Machine Learning Models in Breast Cancer Diagnosis Using WBCD

Abstract

Cancer is one of the leading causes of human mortality and breast cancer deaths are particularly common among women. Early diagnosis of breast cancer is considered a key way to reduce these deaths. The use of expert systems, artificial intelligence and machine learning techniques in the medical field aims to assist doctors in early disease detection. One of the main objectives of these technologies is to diagnose life-threatening diseases such as breast cancer earlier and more accurately. In this study, analyses conducted on the Wisconsin Breast Cancer Dataset (WBCD) evaluated the effects of different missing data imputation methods and PCA-based data reduction technique on model performance using supervised machine learning methods. In the first stage of the study, the detection and management of missing values in the dataset were addressed. It was found that imputing missing values with median performed better compared to other methods. Subsequently, the dataset was reduced in size using the PCA method and the performance of algorithms was measured by experimenting with different numbers of components. The results indicate that effectively addressing the missing data problem and using PCA-based data reduction techniques significantly improve model performance. The best performance was achieved by imputing missing data with median values and reducing data dimensionality with PCA. This study emphasizes the importance of combining machine learning approaches for breast cancer diagnosis with missing data management strategies. Additionally, the effects of different missing data imputation methods and PCA on model performance have been thoroughly examined.

Keywords

References

  1. Sun YS, Zhao Z, Yang ZN, Xu F, Lu HJ, Zhu ZY, et al. Risk factors and preventions of breast cancer. Int J Biol Sci 2017;13:1387–97. https://doi.org/10.7150/ijbs.21635.
  2. Cardoso F, Kyriakides S, Ohno S, Penault-Llorca F, Poortmans P, Rubio IT, et al. Early breast cancer: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Annals of Oncology 2019;30:1194–220. https://doi.org/10.1093/annonc/mdz173.
  3. Ginsburg O, Yip CH, Brooks A, Cabanes A, Caleffi M, Yataco JAD, et al. Breast Cancer Early Detection: A Phased Approach to Implementation. Cancer 2020;126:2379–93. https://doi.org/10.1002/cncr.32887.
  4. Global Breast Cancer Initiative Implementation Framework Assessing, strengthening and scaling up services for the early detection and management of breast cancer. Geneva: 2023.
  5. Ting Sim JZ, Fong QW, Huang W, Tan CH. Machine learning in medicine: what clinicians should know. Singapore Med J 2023;64:91–7. https://doi.org/10.11622/smedj.2021054.
  6. Saturi S. Review on Machine Learning Techniques for Medical Data Classification and Disease Diagnosis. Regen Eng Transl Med 2023;9:141–64. https://doi.org/10.1007/s40883-022-00273-y.
  7. Zhang B, Shi H, Wang H. Machine Learning and AI in Cancer Prognosis, Prediction, and Treatment Selection: A Critical Approach. J Multidiscip Healthc 2023;16:1779–91. https://doi.org/10.2147/JMDH.S410301.
  8. Savić M, Kurbalija V, Ilić M, Ivanović M, Jakovetić D, Valachis A, et al. The Application of Machine Learning Techniques in Prediction of Quality of Life Features for Cancer Patients. Computer Science and Information Systems 2023;29:381–404. https://doi.org/10.2298/CSIS220227061S.

Details

Primary Language

English

Subjects

Clinical Sciences (Other), Biomedical Sciences and Technology, Biomedical Engineering (Other)

Journal Section

Research Article

Publication Date

September 26, 2024

Submission Date

March 28, 2024

Acceptance Date

August 25, 2024

Published in Issue

Year 2024 Volume: 13 Number: 3

APA
Koca, Y. B., & Aktepe, E. (2024). Evaluation of Missing Data Imputation Methods and PCA Techniques for Machine Learning Models in Breast Cancer Diagnosis Using WBCD. Türk Doğa Ve Fen Dergisi, 13(3), 109-116. https://doi.org/10.46810/tdfd.1460871
AMA
1.Koca YB, Aktepe E. Evaluation of Missing Data Imputation Methods and PCA Techniques for Machine Learning Models in Breast Cancer Diagnosis Using WBCD. TJNS. 2024;13(3):109-116. doi:10.46810/tdfd.1460871
Chicago
Koca, Yavuz Bahadir, and Elif Aktepe. 2024. “Evaluation of Missing Data Imputation Methods and PCA Techniques for Machine Learning Models in Breast Cancer Diagnosis Using WBCD”. Türk Doğa Ve Fen Dergisi 13 (3): 109-16. https://doi.org/10.46810/tdfd.1460871.
EndNote
Koca YB, Aktepe E (September 1, 2024) Evaluation of Missing Data Imputation Methods and PCA Techniques for Machine Learning Models in Breast Cancer Diagnosis Using WBCD. Türk Doğa ve Fen Dergisi 13 3 109–116.
IEEE
[1]Y. B. Koca and E. Aktepe, “Evaluation of Missing Data Imputation Methods and PCA Techniques for Machine Learning Models in Breast Cancer Diagnosis Using WBCD”, TJNS, vol. 13, no. 3, pp. 109–116, Sept. 2024, doi: 10.46810/tdfd.1460871.
ISNAD
Koca, Yavuz Bahadir - Aktepe, Elif. “Evaluation of Missing Data Imputation Methods and PCA Techniques for Machine Learning Models in Breast Cancer Diagnosis Using WBCD”. Türk Doğa ve Fen Dergisi 13/3 (September 1, 2024): 109-116. https://doi.org/10.46810/tdfd.1460871.
JAMA
1.Koca YB, Aktepe E. Evaluation of Missing Data Imputation Methods and PCA Techniques for Machine Learning Models in Breast Cancer Diagnosis Using WBCD. TJNS. 2024;13:109–116.
MLA
Koca, Yavuz Bahadir, and Elif Aktepe. “Evaluation of Missing Data Imputation Methods and PCA Techniques for Machine Learning Models in Breast Cancer Diagnosis Using WBCD”. Türk Doğa Ve Fen Dergisi, vol. 13, no. 3, Sept. 2024, pp. 109-16, doi:10.46810/tdfd.1460871.
Vancouver
1.Yavuz Bahadir Koca, Elif Aktepe. Evaluation of Missing Data Imputation Methods and PCA Techniques for Machine Learning Models in Breast Cancer Diagnosis Using WBCD. TJNS. 2024 Sep. 1;13(3):109-16. doi:10.46810/tdfd.1460871

Cited By

This work is licensed under the Creative Commons Attribution-Non-Commercial-Non-Derivable 4.0 International License.