Research Article

A Comparison of Five Methods for Missing Value Imputation in Data Sets

Volume: 2 Number: 2 December 31, 2018
TR EN

A Comparison of Five Methods for Missing Value Imputation in Data Sets

Abstract

The missing values in the data sets do not allow for accurate analysis. Therefore, the correct imputation of missing values has become the focus of attention of researchers in recent years. This paper focuses on a comparison of most reliable and up to date estimation methods to imputing the missing values. Imputation of missing values has a very high priority because of its impact on next pre-processing, data analysis, classification, clustering, etc. Root mean square error (RMSE) value, classification accuracy and execution time are used to evaluate the performances of most popular five methods (mean, k-nearest neighbors, singular value decomposition, bayesian principal component analysis and missForest). When RMSE and classification accuracy values of methods were compared, it has observed that missForest method outperformed other methods in all datasets.

Keywords

References

  1. [1] T.D. Pigott, “A review of methods for missing data”, Educational Resarch and Evaluation, Cilt. 7, s. 353-383. DOI: 10.1076/edre.7.4.353.8937, 2001.
  2. [2] P.D. Allison, “Missing data techniques for structural equation modeling”, Journal of Abnormal Psychology, Cilt. 4, s. 545-557. DOI: 10.1037/0021-843X.112.4.545, 2003.
  3. [3] J.W. Osborne, “Best practices in data cleaning”, California: Sage Publication, Inc., s. 596, 2013.
  4. [4] A.G. Di Nuovo, “Missing data analysis with fuzzy C-Means: A study of its application in a psychological scenario”, Expert Syst Appl, Cilt. 38, s. 6793-6797, DOI: 10.1016/j.eswa.2010.12.067, 2011.
  5. [5] C. Bergmeir, J.M. Benitez, “On the use of cross-validation for time series predictor evaluation”, Inform Sciences, Cilt. 191, s. 192-213, DOI: 10.1016/j.ins.2011.12.028, 2012.
  6. [6] J. Van Hulse, and T.M. Khoshgoftaar, “Incomplete-case nearest neighbor imputation in software measurement data”, IRI 2007: Proceedings of the 2007 IEEE International Conference on Information Reuse and Integration, s. 630-637 DOI: 10.1109/IRI.2007.4296691, 2007.
  7. [7] S. Genc, F.E.Boran, D. Akay, and Z.S. Xu, “Interval multiplicative transitivity for consistency, missing values and priority weights of interval fuzzy preference relations”, Inform Sciences, Cilt. 180, s. 4877-4891, DOI: 10.1016/j.ins.2010.08.019, 2010.
  8. [8] R.J.A. Little, and D.B. Rubin, “Statistical Analysis with Missing Data”, 333. John Wiley & Sons, 2014.

Details

Primary Language

Turkish

Subjects

Computer Software

Journal Section

Research Article

Publication Date

December 31, 2018

Submission Date

November 28, 2018

Acceptance Date

December 12, 2018

Published in Issue

Year 2018 Volume: 2 Number: 2

APA
Cihan, P. (2018). A Comparison of Five Methods for Missing Value Imputation in Data Sets. International Scientific and Vocational Studies Journal, 2(2), 80-85. https://izlik.org/JA22BM54LC
AMA
1.Cihan P. A Comparison of Five Methods for Missing Value Imputation in Data Sets. ISVOS. 2018;2(2):80-85. https://izlik.org/JA22BM54LC
Chicago
Cihan, Pınar. 2018. “A Comparison of Five Methods for Missing Value Imputation in Data Sets”. International Scientific and Vocational Studies Journal 2 (2): 80-85. https://izlik.org/JA22BM54LC.
EndNote
Cihan P (December 1, 2018) A Comparison of Five Methods for Missing Value Imputation in Data Sets. International Scientific and Vocational Studies Journal 2 2 80–85.
IEEE
[1]P. Cihan, “A Comparison of Five Methods for Missing Value Imputation in Data Sets”, ISVOS, vol. 2, no. 2, pp. 80–85, Dec. 2018, [Online]. Available: https://izlik.org/JA22BM54LC
ISNAD
Cihan, Pınar. “A Comparison of Five Methods for Missing Value Imputation in Data Sets”. International Scientific and Vocational Studies Journal 2/2 (December 1, 2018): 80-85. https://izlik.org/JA22BM54LC.
JAMA
1.Cihan P. A Comparison of Five Methods for Missing Value Imputation in Data Sets. ISVOS. 2018;2:80–85.
MLA
Cihan, Pınar. “A Comparison of Five Methods for Missing Value Imputation in Data Sets”. International Scientific and Vocational Studies Journal, vol. 2, no. 2, Dec. 2018, pp. 80-85, https://izlik.org/JA22BM54LC.
Vancouver
1.Pınar Cihan. A Comparison of Five Methods for Missing Value Imputation in Data Sets. ISVOS [Internet]. 2018 Dec. 1;2(2):80-5. Available from: https://izlik.org/JA22BM54LC

INTERNATIONAL SCIENTIFIC AND VOCATIONAL STUDIES JOURNAL will publish the content under Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license, which gives permission to copy and redistribute the material in any medium or format other than commercial purposes, as well as remix, transform, and build upon the material by providing appropriate credit to the original work.