TY - JOUR TT - Comparison of the effect of unsupervised and supervised discretization methods on classification process AU - Hacıbeyoğlu, Mehmet AU - Ibrahım, Mohammed H. PY - 2016 DA - December DO - 10.18201/ijisae.267490 JF - International Journal of Intelligent Systems and Applications in Engineering PB - İsmail SARITAŞ WT - DergiPark SN - 2147-6799 SP - 105 EP - 108 VL - 4 IS - Special Issue-1 KW - Discretization KW - Supervised and Unsupervised Discretization KW - Continuous Features KW - Discrete Feature N2 - Most of the machine learning and data mining algorithms use discretedata for the classification process. But, most data in practice includecontinuous features. Therefore, a discretization pre-processing step is appliedon these datasets before the classification. Discretization process convertscontinuous values to discrete values. In the literature, there are many methodsused for discretization process. These methods are grouped as supervised andunsupervised methods according to whether a class information is used or not.In this paper, we used two unsupervised methods: Equal Width Interval (EW),Equal Frequency (EF) and one supervised method: Entropy Based (EB)discretization. In the experiments, a well-known 10 dataset from UCI (MachineLearning Repository) is used in order to compare the effect of thediscretization methods on the classification. The results show that, NaiveBayes (NB), C4.5 and ID3 classification algorithms obtain higher accuracy withEB discretization method. CR - [1] Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier. CR - [2] Dougherty, J., Kohavi, R., & Sahami, M. (1995, July). Supervised and unsupervised discretization of continuous features. In Machine learning: proceedings of the twelfth international conference (Vol. 12, pp. 194-202). CR - [3] Hacibeyoglu, M., Arslan, A., & Kahramanli, S. (2011). Improving Classification Accuracy with Discretization on Data Sets Including Continuous Valued Features. Ionosphere, 34(351), 2. CR - [4] Gupta, A., Mehrotra, K. G., & Mohan, C. (2010). A clustering-based discretization for supervised learning. Statistics & probability letters, 80(9), 816-824. CR - [5] Joiţa, D. (2010). Unsupervised static discretization methods in data mining. Titu Maiorescu University, Bucharest, Romania. CR - [6] Gama, J., & Pinto, C. (2006, April). Discretization from data streams: applications to histograms and data mining. In Proceedings of the 2006 ACM symposium on Applied computing (pp. 662-667). ACM. CR - [7] Jiang, S. Y., Li, X., Zheng, Q., & Wang, L. X. (2009, May). Approximate equal frequency discretization method. In 2009 WRI Global Congress on Intelligent Systems (Vol. 3, pp. 514-518). IEEE. CR - [8] Agre, G., & Peev, S. (2002). On supervised and unsupervised discretization. Cybernetics and information technologies, 2(2), 43-57. CR - [9] Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., ... & Zhou, Z. H. (2008). Top 10 algorithms in data mining. Knowledge and information systems, 14(1), 1-37. CR - [10] HSSINA, B., Merbouha, A., Ezzikouri, H., & Erritali, M. (2014). A comparative study of decision tree ID3 and C4. 5. Int. J. Adv. Comput. Sci. Appl, 4(2). UR - https://doi.org/10.18201/ijisae.267490 L1 - https://dergipark.org.tr/en/download/article-file/234095 ER -