Research Article
BibTex RIS Cite

A New Cervical Cancer Diagnosis Method Based on Correlation-based Feature Selection, Genetic Search and Random Forests Techniques

Year 2020, Issue: 19, 263 - 271, 31.08.2020
https://doi.org/10.31590/ejosat.725305

Abstract

Cervical cancer is one of the most common types of cancer in women. The way to reduce the number of deaths due to this type of cancer is early diagnosis. Machine learning and data mining techniques are used to assist doctors while early diagnosing the disease. In this study, a new method exploiting correlation-based feature selection (CFS), genetic algorithm (GA) and random forests (RF) techniques is proposed for the diagnosis of cervical cancer. The performance of the proposed method consisting of three stages: data preprocessing, feature selection and classification has been tested using classification accuracy, precision, recall, and F-measure metrics. In the sequel, the performance results are compared with the conventional machine learning techniques and the existing studies in the literature. It can be seen from the experimental results that the proposed method is effective and can be used as an auxiliary tool by doctors in diagnosing cervical cancer early.

References

  • Abdoh, S. F., Rizka, M. A., & Maghraby, F. A. (2018). Cervical cancer diagnosis using random forest classifier with SMOTE and feature reduction techniques. IEEE Access, 6, 59475-59485.
  • Adem, K., Kiliçarslan, S., & Cömert, O. (2019). Classification and diagnosis of cervical cancer with stacked autoencoder and softmax classification. Expert Systems with Applications, 115, 557-564.
  • Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
  • Bremermann, H. J. (1958). The evolution of intelligence: The nervous system as a model of its environment. University of Washington, Department of Mathematics.
  • Cleary, J. G., & Trigg, L. E. (1995, July). K*: An instance-based learner using an entropic distance measure. In 12th International Conference on Machine Learning (pp. 108-114).
  • Deng, X., Luo, Y., & Wang, C. (2018). Analysis of Risk Factors for Cervical Cancer Based on Machine Learning Methods. In 2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS) (pp. 631-635). IEEE.
  • Eyupoglu, C., Aydin, M. A., Zaim, A. H., & Sertbas, A. (2018). An efficient big data anonymization algorithm based on chaos and perturbation techniques. Entropy, 20(5), 373.
  • Fernandes, K., Cardoso, J. S., & Fernandes, J. (2017a). Cervical cancer (Risk Factors) Data Set [Data file]. Available from http://archive.ics.uci.edu/ml/datasets/Cervical+cancer+%28Risk+Factors%29
  • Fernandes, K., Cardoso, J. S., & Fernandes, J. (2017b). Transfer learning with partial observability applied to cervical cancer screening. In Iberian Conference on Pattern Recognition and Image Analysis (pp. 243-250). Springer, Cham.
  • Frank, E. (2014). Fully supervised training of Gaussian radial basis function networks in WEKA. Department of Computer Science, University of Waikato, Hamilton, New Zealand.
  • Fraser, A. S. (1957). Simulation of Genetic Systems by Automatic Digital Computers II. Effects of Linkage on Rates of Advance Under Selection. Australian Journal of Biological Sciences, 10(4), 492-500.
  • Freund, Y., & Schapire, R. E. (1996, July). Experiments with a new boosting algorithm. In 13th International Conference on Machine Learning (pp. 148-156).
  • Freund, Y., & Schapire, R. E. (1999). Large margin classification using the perceptron algorithm. Machine Learning, 37(3), 277-296.
  • Goldberg, D. E. (1989). Genetic algorithms in search, optimization, and machine learning. Addison-Wesley.
  • Hall, M. A. (1999). Correlation-based Feature Selection for Machine Learning. PhD thesis. University of Waikato, Hamilton, New Zealand.
  • Holland, J. H. (1975). Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor.
  • Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11(1), 63-90.
  • John, G. H., & Langley, P. (1995, August). Estimating continuous distributions in Bayesian classifiers. In 10th Conference on Uncertainty in Artificial Intelligence (UAI’95) (pp. 338-345).
  • Keerthi, S. S., Shevade, S. K., Bhattacharyya, C., & Murthy, K. R. K. (2001). Improvements to Platt’s SMO algorithm for SVM classifier design. Neural computation, 13(3), 637-649.
  • Landwehr, N., Hall, M., & Frank, E. (2005). Logistic model trees. Machine Learning, 59(1-2), 161-205.
  • Le Cessie, S., & Van Houwelingen, J. C. (1992). Ridge estimators in logistic regression. Journal of the Royal Statistical Society: Series C (Applied Statistics), 41(1), 191-201.
  • Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA.
  • Rayavarapu, K., & Krishna, K. K. (2018, March). Prediction of Cervical Cancer using Voting and DNN Classifiers. In 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT) (pp. 1-5). IEEE.
  • Sastry, K., Goldberg, D., & Kendall, G. (2005). Genetic algorithms. In Search methodologies (pp. 97-125). Springer, Boston, MA.
  • Sawhney, R., Mathur, P., & Shankar, R. (2018, May). A firefly algorithm based wrapper-penalty feature selection method for cancer diagnosis. In International Conference on Computational Science and Its Applications (pp. 438-449). Springer, Cham.
  • World Health Organization. (2019). Human papillomavirus (HPV) and cervical cancer. Retrieved from https://www.who.int/en/news-room/fact-sheets/detail/human-papillomavirus-(hpv)-and-cervical-cancer
  • World Health Organization. (2020). Cervical cancer. Retrieved from https://www.who.int/cancer/prevention/diagnosis-screening/cervical-cancer/en/
  • Wu, W., & Zhou, H. (2017). Data-driven diagnosis of cervical cancer with support vector machine-based approaches. IEEE Access, 5, 25189-25195.
  • Yavuz, E., & Eyüpoğlu, C. (2019). Meme Kanseri Teşhisi İçin Yeni Bir Skor Füzyon Yaklaşımı. Düzce Üniversitesi Bilim ve Teknoloji Dergisi, 7(3), 1045-1060.
  • Yavuz, E., Eyupoglu, C., Sanver, U., & Yazici, R. (2017). An ensemble of neural networks for breast cancer diagnosis. In 2017 International Conference on Computer Science and Engineering (UBMK) (pp. 538-543). IEEE.

Korelasyon Temelli Özellik Seçimi, Genetik Arama ve Rastgele Ormanlar Tekniklerine Dayanan Yeni Bir Rahim Ağzı Kanseri Teşhis Yöntemi

Year 2020, Issue: 19, 263 - 271, 31.08.2020
https://doi.org/10.31590/ejosat.725305

Abstract

Rahim ağzı kanseri kadınlarda en sık görülen kanser türlerinden biridir. Bu kanser türü nedeniyle gerçekleşecek ölümlerin sayısını azaltmanın yolu erken teşhistir. Hastalığı erken teşhis ederken doktorlara yardımcı olmak için makine öğrenmesi ve veri madenciliği teknikleri kullanılmaktadır. Bu çalışmada rahim ağzı kanseri teşhisi için korelasyon temelli özellik seçimi (correlation-based feature selection-CFS), genetik algoritma (genetic algorithm-GA) ve rastgele ormanlar (random forests-RF) tekniklerinden yararlanan yeni bir yöntem önerilmiştir. Veri önişleme, özellik seçimi ve sınıflandırma olmak üzere üç aşamadan oluşan yöntemin performansı; sınıflandırma doğruluğu, kesinlik, duyarlılık ve F-ölçütü metrikleri kullanılarak test edilmiştir. Ardından performans sonuçları klasik makine öğrenmesi teknikleri ve literatürde var olan çalışmalarla karşılaştırılmıştır. Deneysel sonuçlardan önerilen yöntemin etkili olduğu ve rahim ağzı kanserini erken teşhis etmede doktorlar tarafından yardımcı bir araç olarak kullanılabileceği görülmektedir.

References

  • Abdoh, S. F., Rizka, M. A., & Maghraby, F. A. (2018). Cervical cancer diagnosis using random forest classifier with SMOTE and feature reduction techniques. IEEE Access, 6, 59475-59485.
  • Adem, K., Kiliçarslan, S., & Cömert, O. (2019). Classification and diagnosis of cervical cancer with stacked autoencoder and softmax classification. Expert Systems with Applications, 115, 557-564.
  • Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
  • Bremermann, H. J. (1958). The evolution of intelligence: The nervous system as a model of its environment. University of Washington, Department of Mathematics.
  • Cleary, J. G., & Trigg, L. E. (1995, July). K*: An instance-based learner using an entropic distance measure. In 12th International Conference on Machine Learning (pp. 108-114).
  • Deng, X., Luo, Y., & Wang, C. (2018). Analysis of Risk Factors for Cervical Cancer Based on Machine Learning Methods. In 2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS) (pp. 631-635). IEEE.
  • Eyupoglu, C., Aydin, M. A., Zaim, A. H., & Sertbas, A. (2018). An efficient big data anonymization algorithm based on chaos and perturbation techniques. Entropy, 20(5), 373.
  • Fernandes, K., Cardoso, J. S., & Fernandes, J. (2017a). Cervical cancer (Risk Factors) Data Set [Data file]. Available from http://archive.ics.uci.edu/ml/datasets/Cervical+cancer+%28Risk+Factors%29
  • Fernandes, K., Cardoso, J. S., & Fernandes, J. (2017b). Transfer learning with partial observability applied to cervical cancer screening. In Iberian Conference on Pattern Recognition and Image Analysis (pp. 243-250). Springer, Cham.
  • Frank, E. (2014). Fully supervised training of Gaussian radial basis function networks in WEKA. Department of Computer Science, University of Waikato, Hamilton, New Zealand.
  • Fraser, A. S. (1957). Simulation of Genetic Systems by Automatic Digital Computers II. Effects of Linkage on Rates of Advance Under Selection. Australian Journal of Biological Sciences, 10(4), 492-500.
  • Freund, Y., & Schapire, R. E. (1996, July). Experiments with a new boosting algorithm. In 13th International Conference on Machine Learning (pp. 148-156).
  • Freund, Y., & Schapire, R. E. (1999). Large margin classification using the perceptron algorithm. Machine Learning, 37(3), 277-296.
  • Goldberg, D. E. (1989). Genetic algorithms in search, optimization, and machine learning. Addison-Wesley.
  • Hall, M. A. (1999). Correlation-based Feature Selection for Machine Learning. PhD thesis. University of Waikato, Hamilton, New Zealand.
  • Holland, J. H. (1975). Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor.
  • Holte, R. C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11(1), 63-90.
  • John, G. H., & Langley, P. (1995, August). Estimating continuous distributions in Bayesian classifiers. In 10th Conference on Uncertainty in Artificial Intelligence (UAI’95) (pp. 338-345).
  • Keerthi, S. S., Shevade, S. K., Bhattacharyya, C., & Murthy, K. R. K. (2001). Improvements to Platt’s SMO algorithm for SVM classifier design. Neural computation, 13(3), 637-649.
  • Landwehr, N., Hall, M., & Frank, E. (2005). Logistic model trees. Machine Learning, 59(1-2), 161-205.
  • Le Cessie, S., & Van Houwelingen, J. C. (1992). Ridge estimators in logistic regression. Journal of the Royal Statistical Society: Series C (Applied Statistics), 41(1), 191-201.
  • Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA.
  • Rayavarapu, K., & Krishna, K. K. (2018, March). Prediction of Cervical Cancer using Voting and DNN Classifiers. In 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT) (pp. 1-5). IEEE.
  • Sastry, K., Goldberg, D., & Kendall, G. (2005). Genetic algorithms. In Search methodologies (pp. 97-125). Springer, Boston, MA.
  • Sawhney, R., Mathur, P., & Shankar, R. (2018, May). A firefly algorithm based wrapper-penalty feature selection method for cancer diagnosis. In International Conference on Computational Science and Its Applications (pp. 438-449). Springer, Cham.
  • World Health Organization. (2019). Human papillomavirus (HPV) and cervical cancer. Retrieved from https://www.who.int/en/news-room/fact-sheets/detail/human-papillomavirus-(hpv)-and-cervical-cancer
  • World Health Organization. (2020). Cervical cancer. Retrieved from https://www.who.int/cancer/prevention/diagnosis-screening/cervical-cancer/en/
  • Wu, W., & Zhou, H. (2017). Data-driven diagnosis of cervical cancer with support vector machine-based approaches. IEEE Access, 5, 25189-25195.
  • Yavuz, E., & Eyüpoğlu, C. (2019). Meme Kanseri Teşhisi İçin Yeni Bir Skor Füzyon Yaklaşımı. Düzce Üniversitesi Bilim ve Teknoloji Dergisi, 7(3), 1045-1060.
  • Yavuz, E., Eyupoglu, C., Sanver, U., & Yazici, R. (2017). An ensemble of neural networks for breast cancer diagnosis. In 2017 International Conference on Computer Science and Engineering (UBMK) (pp. 538-543). IEEE.
There are 30 citations in total.

Details

Primary Language Turkish
Subjects Engineering
Journal Section Articles
Authors

Can Eyüpoğlu 0000-0002-6133-8617

Publication Date August 31, 2020
Published in Issue Year 2020 Issue: 19

Cite

APA Eyüpoğlu, C. (2020). Korelasyon Temelli Özellik Seçimi, Genetik Arama ve Rastgele Ormanlar Tekniklerine Dayanan Yeni Bir Rahim Ağzı Kanseri Teşhis Yöntemi. Avrupa Bilim Ve Teknoloji Dergisi(19), 263-271. https://doi.org/10.31590/ejosat.725305