Classification and Diagnostic Prediction of Breast Cancers via Different Classifiers

Ahmet Saygılı

Research Article

Classification and Diagnostic Prediction of Breast Cancers via Different Classifiers

Year 2018, Volume: 2 Issue: 2, 48 - 56, 31.12.2018

Ahmet Saygılı

Abstract

Cancer is one of
the leading causes of human death in the world and has caused the death of
approximately 9.6 million people in 2018. Breast cancer is the most important
cause of cancer deaths in women. However, breast cancer is a type of cancer
that can be treated when diagnosed early. The aim of this study is to identify
cancer early in life. In this study, early diagnosis and treatment were
performed by using machine learning methods. The characteristics of the people
included in the Wisconsin Diagnostic Breast Cancer (WDBC) data set were
classified by support vector machines (SVM), k-nearest neighborhood, Naive
Bayes, J48 and random forests methods. The preprocessing step was applied to
the data set prior to classification. After the preprocessing stage, 5
different classifiers were applied to the data using 10-fold cross-validation method. Accuracy, sensitivity,
specificity values and confusion matrices were used to measure the success of
the methods. As a result of the application, it was found that SVM with linear kernel was the most successful method
with 98.24% success rate. Although it was a very simple method, the second most
successful method was the k-nearest neighborhood method with a success rate of
97.72%. When the results obtained from feature selection are evaluated, it is
seen that feature selection and other preprocessing methods increase the
success of the system. It can be said that the success achieved in comparison
with previous studies is at a good level.

Keywords

Classification, Breast Cancer, WDBC, Support Vector Machines, Gain Ratio

References

[1] O. WH. (2018, 10.01.2018). Cancer. Available: http://www.who.int/en/news-room/fact-sheets/detail/cancer
[2] C. Fitzmaurice, C. Allen, and R. Barber, "A systematic analysis for the Global Burden of Disease Study," JAMA Oncol, vol. 3, pp. 524-548, 2017.
[3] A. Asuncion and D. Newman, "UCI machine learning repository," ed, 2007.
[4] (10.01.2018). Repository UML. Breast Cancer Wisconsin (Diagnostic) Data Set. Available: http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29
[5] J. R. Quinlan, "Improved use of continuous attributes in C4. 5," Journal of artificial intelligence research, vol. 4, pp. 77-90, 1996.
[6] C. A. Pena-Reyes and M. Sipper, "A fuzzy-genetic approach to breast cancer diagnosis," Artificial intelligence in medicine, vol. 17, pp. 131-155, 1999.
[7] D. Nauck and R. Kruse, "Obtaining interpretable fuzzy classification rules from medical data," Artificial intelligence in medicine, vol. 16, pp. 149-169, 1999.
[8] R. Setiono, "Generating concise and accurate classification rules for breast cancer diagnosis," Artificial Intelligence in medicine, vol. 18, pp. 205-219, 2000.
[9] A. A. Albrecht, G. Lappas, S. A. Vinterbo, C. Wong, and L. Ohno-Machado, "Two applications of the LSA machine," in Neural Information Processing, 2002. ICONIP'02. Proceedings of the 9th International Conference on, 2002, pp. 184-189.
[10] J. Abonyi and F. Szeifert, "Supervised fuzzy clustering for the identification of fuzzy classifiers," Pattern Recognition Letters, vol. 24, pp. 2195-2207, 2003.
[11] T. Kiyan and T. Yildirim, "Breast cancer diagnosis using statistical neural networks," IU-Journal of Electrical & Electronics Engineering, vol. 4, pp. 1149-1153, 2004.
[12] K. Polat and S. Güneş, "Breast cancer diagnosis using least square support vector machine," Digital signal processing, vol. 17, pp. 694-701, 2007.
[13] E. D. Übeyli, "Implementing automated diagnostic systems for breast cancer detection," Expert systems with Applications, vol. 33, pp. 1054-1062, 2007.
[14] M. F. Akay, "Support vector machines combined with feature selection for breast cancer diagnosis," Expert systems with applications, vol. 36, pp. 3240-3247, 2009.
[15] Y. Peng, Z. Wu, and J. Jiang, "A novel feature selection approach for biomedical data classification," Journal of Biomedical Informatics, vol. 43, pp. 15-23, 2010.
[16] G. I. Salama, M. Abdelhalim, and M. A.-e. Zeid, "Breast cancer diagnosis on three different datasets using multi-classifiers," Breast Cancer (WDBC), vol. 32, p. 2, 2012.
[17] W. H. Wolberg, W. N. Street, and O. L. Mangasarian, "Breast cancer Wisconsin (diagnostic) data set," UCI Machine Learning Repository [http://archive. ics. uci. edu/ml/], 1992.
[18] W. Wolberg. (1993). Cancer Images. Available: ftp://ftp.cs.wisc.edu/math-prog/cpo-dataset/machine-learn/cancer/cancer_images/
[19] W. H. Wolberg, W. N. Street, and O. L. Mangasarian, "Breast cytology diagnosis via digital image analysis," Analytical and Quantitative Cytology and Histology, vol. 15, pp. 396-404, 1993.
[20] J. R. Quinlan, "Induction of decision trees," Machine learning, vol. 1, pp. 81-106, 1986.
[21] R. E. Blahut, Principles and practice of information theory: Addison-Wesley Longman Publishing Co., Inc., 1987.
[22] A. G. Karegowda, A. Manjunath, and M. Jayaram, "Comparative study of attribute selection using gain ratio and correlation based feature selection," International Journal of Information Technology and Knowledge Management, vol. 2, pp. 271-277, 2010.
[23] J. R. Quinlan, "Bagging, boosting, and C4. 5," in AAAI/IAAI, Vol. 1, 1996, pp. 725-730.
[24] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, "The WEKA data mining software: an update," ACM SIGKDD explorations newsletter, vol. 11, pp. 10-18, 2009.
[25] J. Han and M. Kamber, "Data mining concepts and techniques San Francisco Moraga Kaufman," 2001.

Classification and Diagnostic Prediction of Breast Cancers via Different Classifiers

Year 2018, Volume: 2 Issue: 2, 48 - 56, 31.12.2018

Ahmet Saygılı

Abstract

Cancer is one of the leading causes of human death in the world and has caused the death of approximately 9.6 million people in 2018. Breast cancer is the most important cause of cancer deaths in women. However, breast cancer is a type of cancer that can be treated when diagnosed early. The aim of this study is to identify cancer early in life. In this study, early diagnosis and treatment were performed by using machine learning methods. The characteristics of the people included in the Wisconsin Diagnostic Breast Cancer (WDBC) data set were classified by support vector machines (SVM), k-nearest neighborhood, Naive Bayes, J48 and random forests methods. The preprocessing step was applied to the data set prior to classification. After the preprocessing stage, 5 different classifiers were applied to the data using 10-fold cross-validation method. Accuracy, sensitivity, specificity values and confusion matrices were used to measure the success of the methods. As a result of the application, it was found that SVM with linear kernel was the most successful method with 98.24% success rate. Although it was a very simple method, the second most successful method was the k-nearest neighborhood method with a success rate of 97.72%. When the results obtained from feature selection are evaluated, it is seen that feature selection and other preprocessing methods increase the success of the system. It can be said that the success achieved in comparison with previous studies is at a good level.

Keywords

Breast Cancer, WDBC, Support Vector Machines, Gain Ratio, k-NN, Random Forest

References

[1] O. WH. (2018, 10.01.2018). Cancer. Available: http://www.who.int/en/news-room/fact-sheets/detail/cancer
[2] C. Fitzmaurice, C. Allen, and R. Barber, "A systematic analysis for the Global Burden of Disease Study," JAMA Oncol, vol. 3, pp. 524-548, 2017.
[3] A. Asuncion and D. Newman, "UCI machine learning repository," ed, 2007.
[4] (10.01.2018). Repository UML. Breast Cancer Wisconsin (Diagnostic) Data Set. Available: http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29
[5] J. R. Quinlan, "Improved use of continuous attributes in C4. 5," Journal of artificial intelligence research, vol. 4, pp. 77-90, 1996.
[6] C. A. Pena-Reyes and M. Sipper, "A fuzzy-genetic approach to breast cancer diagnosis," Artificial intelligence in medicine, vol. 17, pp. 131-155, 1999.
[7] D. Nauck and R. Kruse, "Obtaining interpretable fuzzy classification rules from medical data," Artificial intelligence in medicine, vol. 16, pp. 149-169, 1999.
[8] R. Setiono, "Generating concise and accurate classification rules for breast cancer diagnosis," Artificial Intelligence in medicine, vol. 18, pp. 205-219, 2000.
[9] A. A. Albrecht, G. Lappas, S. A. Vinterbo, C. Wong, and L. Ohno-Machado, "Two applications of the LSA machine," in Neural Information Processing, 2002. ICONIP'02. Proceedings of the 9th International Conference on, 2002, pp. 184-189.
[10] J. Abonyi and F. Szeifert, "Supervised fuzzy clustering for the identification of fuzzy classifiers," Pattern Recognition Letters, vol. 24, pp. 2195-2207, 2003.
[11] T. Kiyan and T. Yildirim, "Breast cancer diagnosis using statistical neural networks," IU-Journal of Electrical & Electronics Engineering, vol. 4, pp. 1149-1153, 2004.
[12] K. Polat and S. Güneş, "Breast cancer diagnosis using least square support vector machine," Digital signal processing, vol. 17, pp. 694-701, 2007.
[13] E. D. Übeyli, "Implementing automated diagnostic systems for breast cancer detection," Expert systems with Applications, vol. 33, pp. 1054-1062, 2007.
[14] M. F. Akay, "Support vector machines combined with feature selection for breast cancer diagnosis," Expert systems with applications, vol. 36, pp. 3240-3247, 2009.
[15] Y. Peng, Z. Wu, and J. Jiang, "A novel feature selection approach for biomedical data classification," Journal of Biomedical Informatics, vol. 43, pp. 15-23, 2010.
[16] G. I. Salama, M. Abdelhalim, and M. A.-e. Zeid, "Breast cancer diagnosis on three different datasets using multi-classifiers," Breast Cancer (WDBC), vol. 32, p. 2, 2012.
[17] W. H. Wolberg, W. N. Street, and O. L. Mangasarian, "Breast cancer Wisconsin (diagnostic) data set," UCI Machine Learning Repository [http://archive. ics. uci. edu/ml/], 1992.
[18] W. Wolberg. (1993). Cancer Images. Available: ftp://ftp.cs.wisc.edu/math-prog/cpo-dataset/machine-learn/cancer/cancer_images/
[19] W. H. Wolberg, W. N. Street, and O. L. Mangasarian, "Breast cytology diagnosis via digital image analysis," Analytical and Quantitative Cytology and Histology, vol. 15, pp. 396-404, 1993.
[20] J. R. Quinlan, "Induction of decision trees," Machine learning, vol. 1, pp. 81-106, 1986.
[21] R. E. Blahut, Principles and practice of information theory: Addison-Wesley Longman Publishing Co., Inc., 1987.
[22] A. G. Karegowda, A. Manjunath, and M. Jayaram, "Comparative study of attribute selection using gain ratio and correlation based feature selection," International Journal of Information Technology and Knowledge Management, vol. 2, pp. 271-277, 2010.
[23] J. R. Quinlan, "Bagging, boosting, and C4. 5," in AAAI/IAAI, Vol. 1, 1996, pp. 725-730.
[24] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, "The WEKA data mining software: an update," ACM SIGKDD explorations newsletter, vol. 11, pp. 10-18, 2009.
[25] J. Han and M. Kamber, "Data mining concepts and techniques San Francisco Moraga Kaufman," 2001.

There are 25 citations in total.

Details

Primary Language	English
Subjects	Engineering
Journal Section	Articles
Authors	Ahmet Saygılı 0000-0001-8625-4842
Publication Date	December 31, 2018
Acceptance Date	December 7, 2018
Published in Issue	Year 2018 Volume: 2 Issue: 2

Cite

APA	Saygılı, A. (2018). Classification and Diagnostic Prediction of Breast Cancers via Different Classifiers. International Scientific and Vocational Studies Journal, 2(2), 48-56.
AMA	Saygılı A. Classification and Diagnostic Prediction of Breast Cancers via Different Classifiers. ISVOS. December 2018;2(2):48-56.
Chicago	Saygılı, Ahmet. “Classification and Diagnostic Prediction of Breast Cancers via Different Classifiers”. International Scientific and Vocational Studies Journal 2, no. 2 (December 2018): 48-56.
EndNote	Saygılı A (December 1, 2018) Classification and Diagnostic Prediction of Breast Cancers via Different Classifiers. International Scientific and Vocational Studies Journal 2 2 48–56.
IEEE	A. Saygılı, “Classification and Diagnostic Prediction of Breast Cancers via Different Classifiers”, ISVOS, vol. 2, no. 2, pp. 48–56, 2018.
ISNAD	Saygılı, Ahmet. “Classification and Diagnostic Prediction of Breast Cancers via Different Classifiers”. International Scientific and Vocational Studies Journal 2/2 (December 2018), 48-56.
JAMA	Saygılı A. Classification and Diagnostic Prediction of Breast Cancers via Different Classifiers. ISVOS. 2018;2:48–56.
MLA	Saygılı, Ahmet. “Classification and Diagnostic Prediction of Breast Cancers via Different Classifiers”. International Scientific and Vocational Studies Journal, vol. 2, no. 2, 2018, pp. 48-56.
Vancouver	Saygılı A. Classification and Diagnostic Prediction of Breast Cancers via Different Classifiers. ISVOS. 2018;2(2):48-56.

Article Files

Full Text

Creative Commons Atıf 4.0 It is licensed under an International License