Kanser Teşhisi için Makine Öğrenmesi Tekniklerine Dayalı Yeni Bir Sınıflandırma Metodu

Can Eyüpoğlu; Erdem Yavuz

doi:10.35193/bseufbd.742456

Araştırma Makalesi

A New Classification Method Based on Machine Learning Techniques for Cancer Diagnosis

Yıl 2020, Cilt: 7 Sayı: 2, 1106 - 1123, 30.12.2020

Can Eyüpoğlu , Erdem Yavuz

https://doi.org/10.35193/bseufbd.742456

Cited By: 1

Öz

One of the major causes of human death is cancer. Breast cancer is the main reason for cancer deaths among women. Early diagnosis is the way to reduce deaths due to this cancer type. One of the main objectives of the use of expert systems, artificial intelligence and machine learning techniques in medicine is to assist doctors in early diagnosis of diseases. Among cancer types, the risk of death can be greatly reduced by early diagnosis, especially in breast cancer. In this study, a new cancer diagnosis method based on Principal Component Analysis (PCA) and Feed Forward Neural Network (FFNN) has been proposed. The performance of the proposed method is tested on the Breast Cancer Coimbra Dataset (BCCD) with classification accuracy, precision, recall and F-measure metrics. Besides, the comparative performance analysis of the proposed method with conventional machine learning techniques and studies in the literature is performed. Experimental results show that the proposed method is effective and can be utilized by doctors for early diagnosis.

Anahtar Kelimeler

Cancer Diagnosis, Breast Cancer, Machine Learning, Principal Component Analysis, Feed Forward Neural Network

Kaynakça

International Agency for Research on Cancer. (2020). https://www.iarc.fr/, (25.05.2020).
Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R. L., Torre, L. A., & Jemal, A. (2018). Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians, 68(6), 394-424.
World Health Organization. (2020). https://www.who.int/, (25.05.2020).
New Global Cancer Data: GLOBOCAN 2018. (2020). https://www.uicc.org/new-global-cancer-data-globocan-2018, (25.05.2020).
Eyupoglu, C. (2018). Breast cancer classification using k-nearest neighbors algorithm. The Online Journal of Science and Technology, 8(3), 29-34.
Jeleń, Ł., Krzyżak, A., Fevens, T., & Jeleń, M. (2016). Influence of feature set reduction on breast cancer malignancy classification of fine needle aspiration biopsies. Computers in Biology and Medicine, 79, 80-91.
Patrício, M., Pereira, J., Crisóstomo, J., Matafome, P., Gomes, M., Seiça, R., & Caramelo, F. (2018). Using Resistin, glucose, age and BMI to predict the presence of breast cancer. BMC Cancer, 18(1), 29.
Li, Y., & Chen, Z. (2018). Performance evaluation of machine learning methods for breast cancer prediction. Applied and Computational Mathematics, 7(4), 212-216.
Livieris, I., Pintelas, E., Kanavos, A., & Pintelas, P. (2018). An improved self-labeled algorithm for cancer prediction. Advances in Experimental Medicine and Biology.
Aslan, M. F., Celik, Y., Sabanci, K., & Durdu, A. (2018). Breast cancer diagnosis by different machine learning methods using blood analysis data. International Journal of Intelligent Systems and Applications in Engineering, 6(4), 289-293.
Patrício, M., Pereira, J., Crisóstomo, J., Matafome, P., Gomes, M., Seiça, R., & Caramelo, F. (2018). Breast Cancer Coimbra Data Set. UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Coimbra, (25.05.2020).
Salo, F., Nassif, A. B., & Essex, A. (2019). Dimensionality reduction with IG-PCA and ensemble classifier for network intrusion detection. Computer Networks, 148, 164-175.
Jackson, J. E. (2005). A user’s guide to principal components. John Wiley & Sons.
MathWorks. (2018). Statistics and Machine Learning Toolbox. The MathWorks Inc.
Yavuz, E., & Eyüpoğlu, C. Meme Kanseri Teşhisi İçin Yeni Bir Skor Füzyon Yaklaşımı. Düzce Üniversitesi Bilim ve Teknoloji Dergisi, 7(3), 1045-1060.
Yavuz, E., Eyupoglu, C., Sanver, U., & Yazici, R. (2017). An ensemble of neural networks for breast cancer diagnosis. 2017 International Conference on Computer Science and Engineering (UBMK), pp. 538-543, 5-8 October, Antalya, Turkey.
Yavuz, E., Kasapbaşı, M. C., Eyüpoğlu, C., & Yazıcı, R. (2018). An epileptic seizure detection system based on cepstral analysis and generalized regression neural network. Biocybernetics and Biomedical Engineering, 38(2), 201-216.
Du, K. L., & Swamy, M. N. S. (2006). Neural Networks in a Softcomputing Framework. Springer Science & Business Media.
Schalkoff, R. J. (1997). Artificial Neural Networks. McGraw-Hill.
Genkin, A., Lewis, D. D., & Madigan, D. (2007). Large-scale Bayesian logistic regression for text categorization. Technometrics, 49(3), 291-304.
John, G. H., & Langley, P. (1995). Estimating continuous distributions in Bayesian classifiers. 10th Conference on Uncertainty in Artificial Intelligence (UAI’95), pp. 338-345, 18-20 August, Montréal, Qué, Canada.
Le Cessie, S., & Van Houwelingen, J. C. (1992). Ridge estimators in logistic regression. Journal of the Royal Statistical Society: Series C (Applied Statistics), 41(1), 191-201.
Frank, E. (2014). Fully supervised training of Gaussian radial basis function networks in WEKA. Department of Computer Science, University of Waikato, Hamilton, New Zealand.
Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22(3), 400-407.
Keerthi, S. S., Shevade, S. K., Bhattacharyya, C., & Murthy, K. R. K. (2001). Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Computation, 13(3), 637-649.
Freund, Y., & Schapire, R. E. (1999). Large margin classification using the perceptron algorithm. Machine Learning, 37(3), 277-296.
Aha, D. W., Kibler, D., & Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6(1), 37-66.
Cleary, J. G., & Trigg, L. E. (1995). K*: An instance-based learner using an entropic distance measure. 12th International Conference on Machine Learning, pp. 108-114, 9-12 July, Tahoe City, California.
Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. 13th International Conference on Machine Learning, pp: 148-156, 3-6 July, Bari, Italy.
Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11(1), 63-90.
Iba, W., & Langley, P. (1992). Induction of one-level decision trees. 9th International Conference on Machine Learning, pp. 233-240, 1-3 July, Aberdeen, Scotland.
Hulten, G., Spencer, L., & Domingos, P. (2001). Mining time-changing data streams. 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97-106, 26-29 August, San Francisco, California.
Quinlan, R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA.
Landwehr, N., Hall, M., & Frank, E. (2005). Logistic model trees. Machine Learning, 59(1-2), 161-205.
Kohavi, R. (1996). Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid. 2nd International Conference on Knoledge Discovery and Data Mining, pp. 202-207, 2-4 August, Portland, Oregon.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.
Eyüpoğlu, C. (2018). Büyük veride etkin gizlilik koruması için yazılım tasarımı. Doktora Tezi, İstanbul Üniversitesi, Fen Bilimleri Enstitüsü, Bilgisayar Mühendisliği Anabilim Dalı, İstanbul.
Yavuz, E., & Eyupoglu, C. (2019). A cepstrum analysis-based classification method for hand movement surface EMG signals. Medical & Biological Engineering & Computing, 57(10), 2179-2201.
Eyupoglu, C., Aydin, M. A., Zaim, A. H., & Sertbas, A. (2018). An efficient big data anonymization algorithm based on chaos and perturbation techniques. Entropy, 20(5), 373.
Yavuz, E., & Eyupoglu, C. (2020). An effective approach for breast cancer diagnosis based on routine blood analysis features. Medical & Biological Engineering & Computing. https://doi.org/10.1007/s11517-020-02187-9
Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427-437.

Kanser Teşhisi için Makine Öğrenmesi Tekniklerine Dayalı Yeni Bir Sınıflandırma Metodu

Yıl 2020, Cilt: 7 Sayı: 2, 1106 - 1123, 30.12.2020

Can Eyüpoğlu , Erdem Yavuz

https://doi.org/10.35193/bseufbd.742456

Cited By: 1

Öz

İnsan ölümlerinin en büyük nedenlerinden biri kanserdir. Kadınlar arasındaki kanser ölümlerinin başlıca sebebi ise meme kanseridir. Bu kanser türü sebebiyle yaşanan ölümleri azaltmanın yolu erken teşhistir. Uzman sistemler, yapay zeka ve makine öğrenmesi tekniklerinin tıp alanında kullanılmasının temel amaçlarından biri hastalıkları erken teşhis etmede doktorlara yardımcı olmaktır. Kanser türleri arasında özellikle meme kanserinde erken teşhis sayesinde ölüm riski büyük oranda düşürülebilir. Bu çalışmada temel bileşen analizi (Principal Component Analysis-PCA) ve ileri beslemeli sinir ağı (Feed Forward Neural Network-FFNN) temelli yeni bir kanser teşhisi yöntemi önerilmiştir. Önerilen yöntemin performansı Meme Kanseri Coimbra Veri Seti (Breast Cancer Coimbra Dataset-BCCD) üzerinde sınıflandırma doğruluğu, kesinlik, duyarlılık ve F-ölçütü metrikleri ile test edilmiştir. Ayrıca önerilen yöntemin klasik makine öğrenmesi teknikleri ve literatürdeki çalışmalar ile ayrıntılı olarak karşılaştırmalı performans analizi yapılmıştır. Deneysel sonuçlar önerilen yöntemin etkin olduğunu ve erken teşhis için doktorlar tarafından kullanılabileceğini göstermektedir.

Anahtar Kelimeler

Kanser Teşhisi, Meme Kanseri, Makine Öğrenmesi, Temel Bileşen Analizi, İleri Beslemeli Sinir Ağı

Kaynakça

International Agency for Research on Cancer. (2020). https://www.iarc.fr/, (25.05.2020).
Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R. L., Torre, L. A., & Jemal, A. (2018). Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians, 68(6), 394-424.
World Health Organization. (2020). https://www.who.int/, (25.05.2020).
New Global Cancer Data: GLOBOCAN 2018. (2020). https://www.uicc.org/new-global-cancer-data-globocan-2018, (25.05.2020).
Eyupoglu, C. (2018). Breast cancer classification using k-nearest neighbors algorithm. The Online Journal of Science and Technology, 8(3), 29-34.
Jeleń, Ł., Krzyżak, A., Fevens, T., & Jeleń, M. (2016). Influence of feature set reduction on breast cancer malignancy classification of fine needle aspiration biopsies. Computers in Biology and Medicine, 79, 80-91.
Patrício, M., Pereira, J., Crisóstomo, J., Matafome, P., Gomes, M., Seiça, R., & Caramelo, F. (2018). Using Resistin, glucose, age and BMI to predict the presence of breast cancer. BMC Cancer, 18(1), 29.
Li, Y., & Chen, Z. (2018). Performance evaluation of machine learning methods for breast cancer prediction. Applied and Computational Mathematics, 7(4), 212-216.
Livieris, I., Pintelas, E., Kanavos, A., & Pintelas, P. (2018). An improved self-labeled algorithm for cancer prediction. Advances in Experimental Medicine and Biology.
Aslan, M. F., Celik, Y., Sabanci, K., & Durdu, A. (2018). Breast cancer diagnosis by different machine learning methods using blood analysis data. International Journal of Intelligent Systems and Applications in Engineering, 6(4), 289-293.
Patrício, M., Pereira, J., Crisóstomo, J., Matafome, P., Gomes, M., Seiça, R., & Caramelo, F. (2018). Breast Cancer Coimbra Data Set. UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Coimbra, (25.05.2020).
Salo, F., Nassif, A. B., & Essex, A. (2019). Dimensionality reduction with IG-PCA and ensemble classifier for network intrusion detection. Computer Networks, 148, 164-175.
Jackson, J. E. (2005). A user’s guide to principal components. John Wiley & Sons.
MathWorks. (2018). Statistics and Machine Learning Toolbox. The MathWorks Inc.
Yavuz, E., & Eyüpoğlu, C. Meme Kanseri Teşhisi İçin Yeni Bir Skor Füzyon Yaklaşımı. Düzce Üniversitesi Bilim ve Teknoloji Dergisi, 7(3), 1045-1060.
Yavuz, E., Eyupoglu, C., Sanver, U., & Yazici, R. (2017). An ensemble of neural networks for breast cancer diagnosis. 2017 International Conference on Computer Science and Engineering (UBMK), pp. 538-543, 5-8 October, Antalya, Turkey.
Yavuz, E., Kasapbaşı, M. C., Eyüpoğlu, C., & Yazıcı, R. (2018). An epileptic seizure detection system based on cepstral analysis and generalized regression neural network. Biocybernetics and Biomedical Engineering, 38(2), 201-216.
Du, K. L., & Swamy, M. N. S. (2006). Neural Networks in a Softcomputing Framework. Springer Science & Business Media.
Schalkoff, R. J. (1997). Artificial Neural Networks. McGraw-Hill.
Genkin, A., Lewis, D. D., & Madigan, D. (2007). Large-scale Bayesian logistic regression for text categorization. Technometrics, 49(3), 291-304.
John, G. H., & Langley, P. (1995). Estimating continuous distributions in Bayesian classifiers. 10th Conference on Uncertainty in Artificial Intelligence (UAI’95), pp. 338-345, 18-20 August, Montréal, Qué, Canada.
Le Cessie, S., & Van Houwelingen, J. C. (1992). Ridge estimators in logistic regression. Journal of the Royal Statistical Society: Series C (Applied Statistics), 41(1), 191-201.
Frank, E. (2014). Fully supervised training of Gaussian radial basis function networks in WEKA. Department of Computer Science, University of Waikato, Hamilton, New Zealand.
Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22(3), 400-407.
Keerthi, S. S., Shevade, S. K., Bhattacharyya, C., & Murthy, K. R. K. (2001). Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Computation, 13(3), 637-649.
Freund, Y., & Schapire, R. E. (1999). Large margin classification using the perceptron algorithm. Machine Learning, 37(3), 277-296.
Aha, D. W., Kibler, D., & Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6(1), 37-66.
Cleary, J. G., & Trigg, L. E. (1995). K*: An instance-based learner using an entropic distance measure. 12th International Conference on Machine Learning, pp. 108-114, 9-12 July, Tahoe City, California.
Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. 13th International Conference on Machine Learning, pp: 148-156, 3-6 July, Bari, Italy.
Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11(1), 63-90.
Iba, W., & Langley, P. (1992). Induction of one-level decision trees. 9th International Conference on Machine Learning, pp. 233-240, 1-3 July, Aberdeen, Scotland.
Hulten, G., Spencer, L., & Domingos, P. (2001). Mining time-changing data streams. 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97-106, 26-29 August, San Francisco, California.
Quinlan, R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA.
Landwehr, N., Hall, M., & Frank, E. (2005). Logistic model trees. Machine Learning, 59(1-2), 161-205.
Kohavi, R. (1996). Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid. 2nd International Conference on Knoledge Discovery and Data Mining, pp. 202-207, 2-4 August, Portland, Oregon.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.
Eyüpoğlu, C. (2018). Büyük veride etkin gizlilik koruması için yazılım tasarımı. Doktora Tezi, İstanbul Üniversitesi, Fen Bilimleri Enstitüsü, Bilgisayar Mühendisliği Anabilim Dalı, İstanbul.
Yavuz, E., & Eyupoglu, C. (2019). A cepstrum analysis-based classification method for hand movement surface EMG signals. Medical & Biological Engineering & Computing, 57(10), 2179-2201.
Eyupoglu, C., Aydin, M. A., Zaim, A. H., & Sertbas, A. (2018). An efficient big data anonymization algorithm based on chaos and perturbation techniques. Entropy, 20(5), 373.
Yavuz, E., & Eyupoglu, C. (2020). An effective approach for breast cancer diagnosis based on routine blood analysis features. Medical & Biological Engineering & Computing. https://doi.org/10.1007/s11517-020-02187-9
Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427-437.

Toplam 41 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	Mühendislik
Bölüm	Makaleler
Yazarlar	Can Eyüpoğlu 0000-0002-6133-8617 Erdem Yavuz 0000-0002-3159-2497
Yayımlanma Tarihi	30 Aralık 2020
Gönderilme Tarihi	25 Mayıs 2020
Kabul Tarihi	29 Haziran 2020
Yayımlandığı Sayı	Yıl 2020 Cilt: 7 Sayı: 2

Kaynak Göster

APA	Eyüpoğlu, C., & Yavuz, E. (2020). Kanser Teşhisi için Makine Öğrenmesi Tekniklerine Dayalı Yeni Bir Sınıflandırma Metodu. Bilecik Şeyh Edebali Üniversitesi Fen Bilimleri Dergisi, 7(2), 1106-1123. https://doi.org/10.35193/bseufbd.742456

Cited By

A new hybrid feature reduction method by using MCMSTClustering algorithm with various feature projection methods: a case study on sleep disorder diagnosis

Signal, Image and Video Processing

https://doi.org/10.1007/s11760-024-03097-1

Makale Dosyaları

Tam Metin