Research Article
BibTex RIS Cite

Comparison of the effect of unsupervised and supervised discretization methods on classification process

Year 2016, , 105 - 108, 26.12.2016
https://doi.org/10.18201/ijisae.267490

Abstract

Most of the machine learning and data mining algorithms use discrete
data for the classification process. But, most data in practice include
continuous features. Therefore, a discretization pre-processing step is applied
on these datasets before the classification. Discretization process converts
continuous values to discrete values. In the literature, there are many methods
used for discretization process. These methods are grouped as supervised and
unsupervised methods according to whether a class information is used or not.
In this paper, we used two unsupervised methods: Equal Width Interval (EW),
Equal Frequency (EF) and one supervised method: Entropy Based (EB)
discretization. In the experiments, a well-known 10 dataset from UCI (Machine
Learning Repository) is used in order to compare the effect of the
discretization methods on the classification. The results show that, Naive
Bayes (NB), C4.5 and ID3 classification algorithms obtain higher accuracy with
EB discretization method.

References

  • [1] Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.
  • [2] Dougherty, J., Kohavi, R., & Sahami, M. (1995, July). Supervised and unsupervised discretization of continuous features. In Machine learning: proceedings of the twelfth international conference (Vol. 12, pp. 194-202).
  • [3] Hacibeyoglu, M., Arslan, A., & Kahramanli, S. (2011). Improving Classification Accuracy with Discretization on Data Sets Including Continuous Valued Features. Ionosphere, 34(351), 2.
  • [4] Gupta, A., Mehrotra, K. G., & Mohan, C. (2010). A clustering-based discretization for supervised learning. Statistics & probability letters, 80(9), 816-824.
  • [5] Joiţa, D. (2010). Unsupervised static discretization methods in data mining. Titu Maiorescu University, Bucharest, Romania.
  • [6] Gama, J., & Pinto, C. (2006, April). Discretization from data streams: applications to histograms and data mining. In Proceedings of the 2006 ACM symposium on Applied computing (pp. 662-667). ACM.
  • [7] Jiang, S. Y., Li, X., Zheng, Q., & Wang, L. X. (2009, May). Approximate equal frequency discretization method. In 2009 WRI Global Congress on Intelligent Systems (Vol. 3, pp. 514-518). IEEE.
  • [8] Agre, G., & Peev, S. (2002). On supervised and unsupervised discretization. Cybernetics and information technologies, 2(2), 43-57.
  • [9] Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., ... & Zhou, Z. H. (2008). Top 10 algorithms in data mining. Knowledge and information systems, 14(1), 1-37.
  • [10] HSSINA, B., Merbouha, A., Ezzikouri, H., & Erritali, M. (2014). A comparative study of decision tree ID3 and C4. 5. Int. J. Adv. Comput. Sci. Appl, 4(2).
Year 2016, , 105 - 108, 26.12.2016
https://doi.org/10.18201/ijisae.267490

Abstract

References

  • [1] Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.
  • [2] Dougherty, J., Kohavi, R., & Sahami, M. (1995, July). Supervised and unsupervised discretization of continuous features. In Machine learning: proceedings of the twelfth international conference (Vol. 12, pp. 194-202).
  • [3] Hacibeyoglu, M., Arslan, A., & Kahramanli, S. (2011). Improving Classification Accuracy with Discretization on Data Sets Including Continuous Valued Features. Ionosphere, 34(351), 2.
  • [4] Gupta, A., Mehrotra, K. G., & Mohan, C. (2010). A clustering-based discretization for supervised learning. Statistics & probability letters, 80(9), 816-824.
  • [5] Joiţa, D. (2010). Unsupervised static discretization methods in data mining. Titu Maiorescu University, Bucharest, Romania.
  • [6] Gama, J., & Pinto, C. (2006, April). Discretization from data streams: applications to histograms and data mining. In Proceedings of the 2006 ACM symposium on Applied computing (pp. 662-667). ACM.
  • [7] Jiang, S. Y., Li, X., Zheng, Q., & Wang, L. X. (2009, May). Approximate equal frequency discretization method. In 2009 WRI Global Congress on Intelligent Systems (Vol. 3, pp. 514-518). IEEE.
  • [8] Agre, G., & Peev, S. (2002). On supervised and unsupervised discretization. Cybernetics and information technologies, 2(2), 43-57.
  • [9] Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., ... & Zhou, Z. H. (2008). Top 10 algorithms in data mining. Knowledge and information systems, 14(1), 1-37.
  • [10] HSSINA, B., Merbouha, A., Ezzikouri, H., & Erritali, M. (2014). A comparative study of decision tree ID3 and C4. 5. Int. J. Adv. Comput. Sci. Appl, 4(2).
There are 10 citations in total.

Details

Subjects Engineering
Journal Section Research Article
Authors

Mehmet Hacıbeyoğlu This is me

Mohammed H. Ibrahım

Publication Date December 26, 2016
Published in Issue Year 2016

Cite

APA Hacıbeyoğlu, M., & Ibrahım, M. H. (2016). Comparison of the effect of unsupervised and supervised discretization methods on classification process. International Journal of Intelligent Systems and Applications in Engineering, 4(Special Issue-1), 105-108. https://doi.org/10.18201/ijisae.267490
AMA Hacıbeyoğlu M, Ibrahım MH. Comparison of the effect of unsupervised and supervised discretization methods on classification process. International Journal of Intelligent Systems and Applications in Engineering. December 2016;4(Special Issue-1):105-108. doi:10.18201/ijisae.267490
Chicago Hacıbeyoğlu, Mehmet, and Mohammed H. Ibrahım. “Comparison of the Effect of Unsupervised and Supervised Discretization Methods on Classification Process”. International Journal of Intelligent Systems and Applications in Engineering 4, no. Special Issue-1 (December 2016): 105-8. https://doi.org/10.18201/ijisae.267490.
EndNote Hacıbeyoğlu M, Ibrahım MH (December 1, 2016) Comparison of the effect of unsupervised and supervised discretization methods on classification process. International Journal of Intelligent Systems and Applications in Engineering 4 Special Issue-1 105–108.
IEEE M. Hacıbeyoğlu and M. H. Ibrahım, “Comparison of the effect of unsupervised and supervised discretization methods on classification process”, International Journal of Intelligent Systems and Applications in Engineering, vol. 4, no. Special Issue-1, pp. 105–108, 2016, doi: 10.18201/ijisae.267490.
ISNAD Hacıbeyoğlu, Mehmet - Ibrahım, Mohammed H. “Comparison of the Effect of Unsupervised and Supervised Discretization Methods on Classification Process”. International Journal of Intelligent Systems and Applications in Engineering 4/Special Issue-1 (December 2016), 105-108. https://doi.org/10.18201/ijisae.267490.
JAMA Hacıbeyoğlu M, Ibrahım MH. Comparison of the effect of unsupervised and supervised discretization methods on classification process. International Journal of Intelligent Systems and Applications in Engineering. 2016;4:105–108.
MLA Hacıbeyoğlu, Mehmet and Mohammed H. Ibrahım. “Comparison of the Effect of Unsupervised and Supervised Discretization Methods on Classification Process”. International Journal of Intelligent Systems and Applications in Engineering, vol. 4, no. Special Issue-1, 2016, pp. 105-8, doi:10.18201/ijisae.267490.
Vancouver Hacıbeyoğlu M, Ibrahım MH. Comparison of the effect of unsupervised and supervised discretization methods on classification process. International Journal of Intelligent Systems and Applications in Engineering. 2016;4(Special Issue-1):105-8.