An Empirical Comparison of Machine Learning Algorithms for Predicting Breast Cancer
Öz
According to recent statistics, breast cancer is one of the most prevalent cancers among women in the world. It represents the majority of new cancer cases and cancer-related deaths. Early diagnosis is very important, as it becomes fatal unless detected and treated in early stages. With the latest advances in artificial intelligence and machine learning (ML), there is a great potential to diagnose breast cancer by using structured data. In this paper, we conduct an empirical comparison of 10 popular machine learning models for the prediction of breast cancer. We used well known Wisconsin Breast Cancer Dataset (WBCD) to train the models and employed advanced accuracy metrics for comparison. Experimental results show that all models demonstrate superior accuracy, while Support Vector Machines (SVM) had slightly better performance than other methods. Logistic Regression, K-Nearest Neighbors and Neural Networks also proved to be strong classifiers for predicting breast cancer.
Anahtar Kelimeler
Kaynakça
- Agarap, A. F. M. (2018). On breast cancer detection: an application of machine learning algorithms on the wisconsin diagnostic dataset. Paper presented at the Proceedings of the 2nd International Conference on Machine Learning and Soft Computing.
- Akay, M. F. (2009). Support vector machines combined with feature selection for breast cancer diagnosis. Expert Systems with Applications, 36(2), 3240-3247. American Cancer Society. (2018). "Cancer Facts & Figures 2018". Atlanta, American Cancer Society.
- Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. Paper presented at the Proceedings of the fifth annual workshop on Computational learning theory.
- Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
- Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Wadsworth Int. Group, 37(15), 237-251.
- Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. Paper presented at the Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining.
- Chen, T., He, T., Benesty, M., Khotilovich, V., & Tang, Y. (2015). Xgboost: extreme gradient boosting. R package version 0.4-2, 1-4.
- Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine learning, 3(4), 261-283.
Ayrıntılar
Birincil Dil
İngilizce
Konular
Mühendislik
Bölüm
Araştırma Makalesi
Yayımlanma Tarihi
31 Aralık 2019
Gönderilme Tarihi
11 Kasım 2019
Kabul Tarihi
26 Aralık 2019
Yayımlandığı Sayı
Yıl 2019 Cilt: 3 Sayı: 0
Cited By
The Investigation of the Success of Different Machine Learning Methods in Breast Cancer Diagnosis
Konuralp Tıp Dergisi
https://doi.org/10.18521/ktd.912462