Comparison of the effect of unsupervised and supervised discretization methods on classification process
Abstract
Most of the machine learning and data mining algorithms use discrete
data for the classification process. But, most data in practice include
continuous features. Therefore, a discretization pre-processing step is applied
on these datasets before the classification. Discretization process converts
continuous values to discrete values. In the literature, there are many methods
used for discretization process. These methods are grouped as supervised and
unsupervised methods according to whether a class information is used or not.
In this paper, we used two unsupervised methods: Equal Width Interval (EW),
Equal Frequency (EF) and one supervised method: Entropy Based (EB)
discretization. In the experiments, a well-known 10 dataset from UCI (Machine
Learning Repository) is used in order to compare the effect of the
discretization methods on the classification. The results show that, Naive
Bayes (NB), C4.5 and ID3 classification algorithms obtain higher accuracy with
EB discretization method.
Keywords
References
- [1] Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.
- [2] Dougherty, J., Kohavi, R., & Sahami, M. (1995, July). Supervised and unsupervised discretization of continuous features. In Machine learning: proceedings of the twelfth international conference (Vol. 12, pp. 194-202).
- [3] Hacibeyoglu, M., Arslan, A., & Kahramanli, S. (2011). Improving Classification Accuracy with Discretization on Data Sets Including Continuous Valued Features. Ionosphere, 34(351), 2.
- [4] Gupta, A., Mehrotra, K. G., & Mohan, C. (2010). A clustering-based discretization for supervised learning. Statistics & probability letters, 80(9), 816-824.
- [5] Joiţa, D. (2010). Unsupervised static discretization methods in data mining. Titu Maiorescu University, Bucharest, Romania.
- [6] Gama, J., & Pinto, C. (2006, April). Discretization from data streams: applications to histograms and data mining. In Proceedings of the 2006 ACM symposium on Applied computing (pp. 662-667). ACM.
- [7] Jiang, S. Y., Li, X., Zheng, Q., & Wang, L. X. (2009, May). Approximate equal frequency discretization method. In 2009 WRI Global Congress on Intelligent Systems (Vol. 3, pp. 514-518). IEEE.
- [8] Agre, G., & Peev, S. (2002). On supervised and unsupervised discretization. Cybernetics and information technologies, 2(2), 43-57.
Details
Primary Language
English
Subjects
Engineering
Journal Section
Research Article
Publication Date
December 26, 2016
Submission Date
November 22, 2016
Acceptance Date
December 1, 2016
Published in Issue
Year 2016 Volume: 4 Number: Special Issue-1
Cited By
EF_Unique: An Improved Version of Unsupervised Equal Frequency Discretization Method
Arabian Journal for Science and Engineering
https://doi.org/10.1007/s13369-018-3144-zPredicting building damages in mega-disasters under uncertainty: An improved Bayesian network learning approach
Sustainable Cities and Society
https://doi.org/10.1016/j.scs.2020.102689A Machine Learning Security Framework for Iot Systems
IEEE Access
https://doi.org/10.1109/ACCESS.2020.2996214Supervised discretization of continuous-valued attributes for classification using RACER algorithm
Expert Systems with Applications
https://doi.org/10.1016/j.eswa.2023.121203Combining data discretization and missing value imputation for incomplete medical datasets
PLOS ONE
https://doi.org/10.1371/journal.pone.0295032On Combining Instance Selection and Discretisation: A Comparative Study of Two Combination Orders
Journal of Information & Knowledge Management
https://doi.org/10.1142/S0219649224500813