Year 2019, Volume 3 , Issue 0, Pages 9 - 20 2019-12-31

An Empirical Comparison of Machine Learning Algorithms for Predicting Breast Cancer

Fatih BASCİFTCİ [1] , Hamit Taner ÜNAL [2]


According to recent statistics, breast cancer is one of the most prevalent cancers among women in the world. It represents the majority of new cancer cases and cancer-related deaths. Early diagnosis is very important, as it becomes fatal unless detected and treated in early stages. With the latest advances in artificial intelligence and machine learning (ML), there is a great potential to diagnose breast cancer by using structured data. In this paper, we conduct an empirical comparison of 10 popular machine learning models for the prediction of breast cancer. We used well known Wisconsin Breast Cancer Dataset (WBCD) to train the models and employed advanced accuracy metrics for comparison. Experimental results show that all models demonstrate superior accuracy, while Support Vector Machines (SVM) had slightly better performance than other methods. Logistic Regression, K-Nearest Neighbors and Neural Networks also proved to be strong classifiers for predicting breast cancer.

Breast cancer, artificial intelligence, machine learning, medical decision support systems
  • Agarap, A. F. M. (2018). On breast cancer detection: an application of machine learning algorithms on the wisconsin diagnostic dataset. Paper presented at the Proceedings of the 2nd International Conference on Machine Learning and Soft Computing.
  • Akay, M. F. (2009). Support vector machines combined with feature selection for breast cancer diagnosis. Expert Systems with Applications, 36(2), 3240-3247. American Cancer Society. (2018). "Cancer Facts & Figures 2018". Atlanta, American Cancer Society.
  • Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. Paper presented at the Proceedings of the fifth annual workshop on Computational learning theory.
  • Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
  • Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Wadsworth Int. Group, 37(15), 237-251.
  • Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. Paper presented at the Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining.
  • Chen, T., He, T., Benesty, M., Khotilovich, V., & Tang, Y. (2015). Xgboost: extreme gradient boosting. R package version 0.4-2, 1-4.
  • Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine learning, 3(4), 261-283.
  • Cortes, C., & Vapnik, V. (1995). Soft margin classifiers. Machine learning, 20, 273-297.
  • Cover, T. M., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE transactions on information theory, 13(1), 21-27.
  • Frank, E., Hall, M., & Pfahringer, B. (2002). Locally weighted naive bayes. Paper presented at the Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence.
  • Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1), 119-139.
  • Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.
  • Friedman, J. H. (2002). Stochastic gradient boosting. Computational statistics & data analysis, 38(4), 367-378.
  • Han, J., Kamber, M., & Pei, J. (2011). Data mining concepts and techniques third edition. The Morgan Kaufmann Series in Data Management Systems, 83-124.
  • Hecht-Nielsen, R. (1992). Theory of the backpropagation neural network. In Neural networks for perception (pp. 65-93): Elsevier.
  • Ho, T. K. (1998). Nearest neighbors in random subspaces. Paper presented at the Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR).
  • Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (Vol. 398): John Wiley & Sons.
  • Jain, D., & Singh, V. (2018). Diagnosis of Breast Cancer and Diabetes using Hybrid Feature Selection Method. Paper presented at the 2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC).
  • Kadam, V. J., Jadhav, S. M., & Vijayakumar, K. (2019). Breast Cancer Diagnosis Using Feature Ensemble Learning Based on Stacked Sparse Autoencoders and Softmax Regression. Journal of medical systems, 43(8), 263.
  • Kleinbaum, D. G., Dietz, K., Gail, M., Klein, M., & Klein, M. (2002). Logistic regression: Springer.
  • Li, X., Wang, L., & Sung, E. (2008). AdaBoost with SVM-based component classifiers. Engineering Applications of Artificial Intelligence, 21(5), 785-795.
  • Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R news, 2(3), 18-22.
  • Mitchell, T. M. (1997). Machine learning. 1997. Burr Ridge, IL: McGraw Hill, 45(37), 870-877.
  • Morgan, J. N., & Sonquist, J. A. (1963). Problems in the analysis of survey data, and a proposal. Journal of the American statistical association, 58(302), 415-434.
  • Omondiagbe, D. A., Veeramani, S., & Sidhu, A. S. (2019). Machine Learning Classification Techniques for Breast Cancer Diagnosis. Paper presented at the IOP Conference Series: Materials Science and Engineering.
  • Polat, K., & Güneş, S. (2007). Breast cancer diagnosis using least square support vector machine. Digital signal processing, 17(4), 694-701.
  • Rashed, E., & El Seoud, M. (2019). Deep learning approach for breast cancer diagnosis. Paper presented at the Proceedings of the 2019 8th International Conference on Software and Information Engineering.
  • Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65(6), 386.
  • Rustam, Z., & Hartini, S. (2019). Classification of Breast Cancer using Fast Fuzzy Clustering based on Kernel. Paper presented at the IOP Conference Series: Materials Science and Engineering.
  • Sadhukhan, S., Upadhyay, N., & Chakraborty, P. (2020). Breast Cancer Diagnosis Using Image Processing and Machine Learning. In Emerging Technology in Modelling and Graphics (pp. 113-127): Springer.
  • Sethi, A. (2018). Analogizing of Evolutionary and Machine Learning Algorithms for Prognosis of Breast Cancer. Paper presented at the 2018 7th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO).
  • Siegel, R., & Jemal, A. (2015). Cancer facts & figures 2015. American Cancer Society Cancer Facts & Figures.
  • Sri, M. N., Sailaja, D., Priyanka, J. H., Chittineni, S., & RamaKrishnaMurthy, M. (2019). Performance Evaluation of SVM and Neural Network Classification Methods for Diagnosis of Breast Cancer. Paper presented at the International Conference on E-Business and Telecommunications.
  • ŞENOL, Ü., & MUSAYEV, Z. Estimating Wind Energy Potential by Artificial Neural Networks Method. Bilge International Journal of Science and Technology Research, 1(1), 23-31.
  • Tekin, S., & Çan, T. Yapay Sinir Ağları Yöntemi ile Ermenek Havzası’nın (Karaman) Kayma Türü Heyelan Duyarlılık Değerlendirmesi. Bilge International Journal of Science and Technology Research, 3(1), 21-28.
  • Timofeev, R. (2004). Classification and regression trees (CART) theory and applications. Humboldt University, Berlin.
  • Tokmak, M., & Küçüksille, E. U. Kötü Amaçlı Windows Çalıştırılabilir Dosyalarının Derin Öğrenme İle Tespiti. Bilge International Journal of Science and Technology Research, 3(1), 67-76.
  • Vapnik, V. (1998). Statistical Learning Theory Wiley-Interscience. New York.
  • Wright, R. E. (1995). Logistic regression.
  • Yue, W., Wang, Z., Chen, H., Payne, A., & Liu, X. (2018). Machine learning with applications in breast cancer diagnosis and prognosis. Designs, 2(2), 13.
  • Zheng, Z., & Webb, G. I. (2000). Lazy learning of Bayesian rules. Machine learning, 41(1), 53-84.
Primary Language en
Subjects Engineering
Journal Section Research Articles
Authors

Author: Fatih BASCİFTCİ (Primary Author)
Institution: FACULTY OF TECHNOLOGY
Country: Turkey


Author: Hamit Taner ÜNAL

Dates

Publication Date : December 31, 2019

APA Basciftci, F , Ünal, H . (2019). An Empirical Comparison of Machine Learning Algorithms for Predicting Breast Cancer . Bilge International Journal of Science and Technology Research , ICONST 2019 , 9-20 . DOI: 10.30516/bilgesci.645067