Araştırma Makalesi

Comparison of discretization methods for classifier decision trees and decision rules on medical data sets

Sayı: 35 7 Mayıs 2022
PDF İndir
TR EN

Comparison of discretization methods for classifier decision trees and decision rules on medical data sets

Abstract

Data sets in real life are given by real numbers in databases. On the other hand, many data mining methods like association rules and induction rules require only discrete attributes. For this reason, it is necessary to convert data sets with continuous attributes into data sets with discrete attributes. Discretization process is reducing the number of values for a given continuous attribute by dividing the range of the attribute into intervals. In this paper, eight discretization methods are presented with JRip, OneR, J48, and Part classifier algorithms of rules and tress. Experiments include a ten-fold cross validation provided on the basis of real-life data sets from the UCI repository. We show that discretization is important step to significantly increase the classification results of these algorithms. Finally, as a result of the study, it was seen that MDL and J48, CAIM and Jrip and Extended Chi and J48 methods gave the highest accuracy for PIMA, WBC and DERMA data sets, respectively.

Keywords

Kaynakça

  1. Abraham, R., Simha, J. B., & Iyengar, S. S. (2009). Effective Discretization and Hybrid feature selection using Naïve Bayesian classifier for Medical datamining. International Journal of Computational Intelligence Research, 5(2), 116–129.
  2. Chmielewski, M. R., & Grzymala-Busse, J. W. (1996). Global discretization of continuous attributes as preprocessing for machine learning. International Journal of Approximate Reasoning, 15(4), 319–331.
  3. Cohen, W. W. (1995). Fast effective rule induction. In Machine learning proceedings 1995 (pp. 115–123). Elsevier.
  4. Das, K., & Vyas, O. P. (2010). A suitability study of discretization methods for associative classifiers. International Journal of Computer Applications, 5(10), 0975–8887.
  5. Dermatology dataset. Available from: Https://archive.ics.uci.edu/ml/datasets/Dermatology. (n.d.).
  6. Dua, D., & Graff, C. (2019). UCI Machine Learning Repository [http://archive. Ics. Uci. Edu/ml]. Irvine, CA: University of California. School of Information and Computer Science, 25, 27.
  7. Fayyad, U., & Irani, K. (1993). Multi-interval discretization of continuous-valued attributes for classification learning.
  8. Fayyad, U. M., & Irani, K. B. (1992). On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8(1), 87–102.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Mühendislik

Bölüm

Araştırma Makalesi

Yayımlanma Tarihi

7 Mayıs 2022

Gönderilme Tarihi

28 Şubat 2022

Kabul Tarihi

12 Mart 2022

Yayımlandığı Sayı

Yıl 2022 Sayı: 35

Kaynak Göster

APA
Kaya, Y., & Tekin, R. (2022). Comparison of discretization methods for classifier decision trees and decision rules on medical data sets. Avrupa Bilim ve Teknoloji Dergisi, 35, 275-281. https://doi.org/10.31590/ejosat.1080098