Classification and Diagnostic Prediction of Breast Cancers via Different Classifiers
Öz
Cancer is one of the leading causes of human death in the world and has caused the death of approximately 9.6 million people in 2018. Breast cancer is the most important cause of cancer deaths in women. However, breast cancer is a type of cancer that can be treated when diagnosed early. The aim of this study is to identify cancer early in life. In this study, early diagnosis and treatment were performed by using machine learning methods. The characteristics of the people included in the Wisconsin Diagnostic Breast Cancer (WDBC) data set were classified by support vector machines (SVM), k-nearest neighborhood, Naive Bayes, J48 and random forests methods. The preprocessing step was applied to the data set prior to classification. After the preprocessing stage, 5 different classifiers were applied to the data using 10-fold cross-validation method. Accuracy, sensitivity, specificity values and confusion matrices were used to measure the success of the methods. As a result of the application, it was found that SVM with linear kernel was the most successful method with 98.24% success rate. Although it was a very simple method, the second most successful method was the k-nearest neighborhood method with a success rate of 97.72%. When the results obtained from feature selection are evaluated, it is seen that feature selection and other preprocessing methods increase the success of the system. It can be said that the success achieved in comparison with previous studies is at a good level.
Anahtar Kelimeler
Kaynakça
- [1] O. WH. (2018, 10.01.2018). Cancer. Available: http://www.who.int/en/news-room/fact-sheets/detail/cancer
- [2] C. Fitzmaurice, C. Allen, and R. Barber, "A systematic analysis for the Global Burden of Disease Study," JAMA Oncol, vol. 3, pp. 524-548, 2017.
- [3] A. Asuncion and D. Newman, "UCI machine learning repository," ed, 2007.
- [4] (10.01.2018). Repository UML. Breast Cancer Wisconsin (Diagnostic) Data Set. Available: http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29
- [5] J. R. Quinlan, "Improved use of continuous attributes in C4. 5," Journal of artificial intelligence research, vol. 4, pp. 77-90, 1996.
- [6] C. A. Pena-Reyes and M. Sipper, "A fuzzy-genetic approach to breast cancer diagnosis," Artificial intelligence in medicine, vol. 17, pp. 131-155, 1999.
- [7] D. Nauck and R. Kruse, "Obtaining interpretable fuzzy classification rules from medical data," Artificial intelligence in medicine, vol. 16, pp. 149-169, 1999.
- [8] R. Setiono, "Generating concise and accurate classification rules for breast cancer diagnosis," Artificial Intelligence in medicine, vol. 18, pp. 205-219, 2000.
Ayrıntılar
Birincil Dil
İngilizce
Konular
Mühendislik
Bölüm
Araştırma Makalesi
Yazarlar
Ahmet Saygılı
*
0000-0001-8625-4842
Türkiye
Yayımlanma Tarihi
31 Aralık 2018
Gönderilme Tarihi
25 Ekim 2018
Kabul Tarihi
7 Aralık 2018
Yayımlandığı Sayı
Yıl 2018 Cilt: 2 Sayı: 2