TR
EN
Comparison of discretization methods for classifier decision trees and decision rules on medical data sets
Abstract
Data sets in real life are given by real numbers in databases. On the other hand, many data mining methods like association rules and induction rules require only discrete attributes. For this reason, it is necessary to convert data sets with continuous attributes into data sets with discrete attributes. Discretization process is reducing the number of values for a given continuous attribute by dividing the range of the attribute into intervals. In this paper, eight discretization methods are presented with JRip, OneR, J48, and Part classifier algorithms of rules and tress. Experiments include a ten-fold cross validation provided on the basis of real-life data sets from the UCI repository. We show that discretization is important step to significantly increase the classification results of these algorithms. Finally, as a result of the study, it was seen that MDL and J48, CAIM and Jrip and Extended Chi and J48 methods gave the highest accuracy for PIMA, WBC and DERMA data sets, respectively.
Keywords
Kaynakça
- Abraham, R., Simha, J. B., & Iyengar, S. S. (2009). Effective Discretization and Hybrid feature selection using Naïve Bayesian classifier for Medical datamining. International Journal of Computational Intelligence Research, 5(2), 116–129.
- Chmielewski, M. R., & Grzymala-Busse, J. W. (1996). Global discretization of continuous attributes as preprocessing for machine learning. International Journal of Approximate Reasoning, 15(4), 319–331.
- Cohen, W. W. (1995). Fast effective rule induction. In Machine learning proceedings 1995 (pp. 115–123). Elsevier.
- Das, K., & Vyas, O. P. (2010). A suitability study of discretization methods for associative classifiers. International Journal of Computer Applications, 5(10), 0975–8887.
- Dermatology dataset. Available from: Https://archive.ics.uci.edu/ml/datasets/Dermatology. (n.d.).
- Dua, D., & Graff, C. (2019). UCI Machine Learning Repository [http://archive. Ics. Uci. Edu/ml]. Irvine, CA: University of California. School of Information and Computer Science, 25, 27.
- Fayyad, U., & Irani, K. (1993). Multi-interval discretization of continuous-valued attributes for classification learning.
- Fayyad, U. M., & Irani, K. B. (1992). On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8(1), 87–102.
Ayrıntılar
Birincil Dil
İngilizce
Konular
Mühendislik
Bölüm
Araştırma Makalesi
Yayımlanma Tarihi
7 Mayıs 2022
Gönderilme Tarihi
28 Şubat 2022
Kabul Tarihi
12 Mart 2022
Yayımlandığı Sayı
Yıl 2022 Sayı: 35