Classification and Diagnostic Prediction of Breast Cancers via Different Classifiers
Abstract
Cancer is one of the leading causes of human death in the world and has caused the death of approximately 9.6 million people in 2018. Breast cancer is the most important cause of cancer deaths in women. However, breast cancer is a type of cancer that can be treated when diagnosed early. The aim of this study is to identify cancer early in life. In this study, early diagnosis and treatment were performed by using machine learning methods. The characteristics of the people included in the Wisconsin Diagnostic Breast Cancer (WDBC) data set were classified by support vector machines (SVM), k-nearest neighborhood, Naive Bayes, J48 and random forests methods. The preprocessing step was applied to the data set prior to classification. After the preprocessing stage, 5 different classifiers were applied to the data using 10-fold cross-validation method. Accuracy, sensitivity, specificity values and confusion matrices were used to measure the success of the methods. As a result of the application, it was found that SVM with linear kernel was the most successful method with 98.24% success rate. Although it was a very simple method, the second most successful method was the k-nearest neighborhood method with a success rate of 97.72%. When the results obtained from feature selection are evaluated, it is seen that feature selection and other preprocessing methods increase the success of the system. It can be said that the success achieved in comparison with previous studies is at a good level.
Keywords
References
- [1] O. WH. (2018, 10.01.2018). Cancer. Available: http://www.who.int/en/news-room/fact-sheets/detail/cancer
- [2] C. Fitzmaurice, C. Allen, and R. Barber, "A systematic analysis for the Global Burden of Disease Study," JAMA Oncol, vol. 3, pp. 524-548, 2017.
- [3] A. Asuncion and D. Newman, "UCI machine learning repository," ed, 2007.
- [4] (10.01.2018). Repository UML. Breast Cancer Wisconsin (Diagnostic) Data Set. Available: http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29
- [5] J. R. Quinlan, "Improved use of continuous attributes in C4. 5," Journal of artificial intelligence research, vol. 4, pp. 77-90, 1996.
- [6] C. A. Pena-Reyes and M. Sipper, "A fuzzy-genetic approach to breast cancer diagnosis," Artificial intelligence in medicine, vol. 17, pp. 131-155, 1999.
- [7] D. Nauck and R. Kruse, "Obtaining interpretable fuzzy classification rules from medical data," Artificial intelligence in medicine, vol. 16, pp. 149-169, 1999.
- [8] R. Setiono, "Generating concise and accurate classification rules for breast cancer diagnosis," Artificial Intelligence in medicine, vol. 18, pp. 205-219, 2000.
Details
Primary Language
English
Subjects
Engineering
Journal Section
Research Article
Authors
Ahmet Saygılı
*
0000-0001-8625-4842
Türkiye
Publication Date
December 31, 2018
Submission Date
October 25, 2018
Acceptance Date
December 7, 2018
Published in Issue
Year 2018 Volume: 2 Number: 2