Most of the machine learning and data mining algorithms use discrete
data for the classification process. But, most data in practice include
continuous features. Therefore, a discretization pre-processing step is applied
on these datasets before the classification. Discretization process converts
continuous values to discrete values. In the literature, there are many methods
used for discretization process. These methods are grouped as supervised and
unsupervised methods according to whether a class information is used or not.
In this paper, we used two unsupervised methods: Equal Width Interval (EW),
Equal Frequency (EF) and one supervised method: Entropy Based (EB)
discretization. In the experiments, a well-known 10 dataset from UCI (Machine
Learning Repository) is used in order to compare the effect of the
discretization methods on the classification. The results show that, Naive
Bayes (NB), C4.5 and ID3 classification algorithms obtain higher accuracy with
EB discretization method.
Discretization Supervised and Unsupervised Discretization Continuous Features Discrete Feature
Subjects | Engineering |
---|---|
Journal Section | Research Article |
Authors | |
Publication Date | December 26, 2016 |
Published in Issue | Year 2016 Volume: 4 Issue: Special Issue-1 |