Cancer is one of
the leading causes of human death in the world and has caused the death of
approximately 9.6 million people in 2018. Breast cancer is the most important
cause of cancer deaths in women. However, breast cancer is a type of cancer
that can be treated when diagnosed early. The aim of this study is to identify
cancer early in life. In this study, early diagnosis and treatment were
performed by using machine learning methods. The characteristics of the people
included in the Wisconsin Diagnostic Breast Cancer (WDBC) data set were
classified by support vector machines (SVM), k-nearest neighborhood, Naive
Bayes, J48 and random forests methods. The preprocessing step was applied to
the data set prior to classification. After the preprocessing stage, 5
different classifiers were applied to the data using 10-fold cross-validation method. Accuracy, sensitivity,
specificity values and confusion matrices were used to measure the success of
the methods. As a result of the application, it was found that SVM with linear kernel was the most successful method
with 98.24% success rate. Although it was a very simple method, the second most
successful method was the k-nearest neighborhood method with a success rate of
97.72%. When the results obtained from feature selection are evaluated, it is
seen that feature selection and other preprocessing methods increase the
success of the system. It can be said that the success achieved in comparison
with previous studies is at a good level.
Cancer is one of the leading causes of human death in the world and has caused the death of approximately 9.6 million people in 2018. Breast cancer is the most important cause of cancer deaths in women. However, breast cancer is a type of cancer that can be treated when diagnosed early. The aim of this study is to identify cancer early in life. In this study, early diagnosis and treatment were performed by using machine learning methods. The characteristics of the people included in the Wisconsin Diagnostic Breast Cancer (WDBC) data set were classified by support vector machines (SVM), k-nearest neighborhood, Naive Bayes, J48 and random forests methods. The preprocessing step was applied to the data set prior to classification. After the preprocessing stage, 5 different classifiers were applied to the data using 10-fold cross-validation method. Accuracy, sensitivity, specificity values and confusion matrices were used to measure the success of the methods. As a result of the application, it was found that SVM with linear kernel was the most successful method with 98.24% success rate. Although it was a very simple method, the second most successful method was the k-nearest neighborhood method with a success rate of 97.72%. When the results obtained from feature selection are evaluated, it is seen that feature selection and other preprocessing methods increase the success of the system. It can be said that the success achieved in comparison with previous studies is at a good level.
Primary Language | English |
---|---|
Subjects | Engineering |
Journal Section | Articles |
Authors | |
Publication Date | December 31, 2018 |
Acceptance Date | December 7, 2018 |
Published in Issue | Year 2018 Volume: 2 Issue: 2 |