BibTex RIS Cite

IMPUTATION AND DELETION METHODS UNDER THE PRESENCE OF MISSING VALUES AND OUTLIERS: A COMPARATIVE STUDY

Year 2016, Volume: 29 Issue: 4, 799 - 809, 19.12.2016

Abstract

Missing data and imputation methods are studied in many disciplines.  However, the methods have some different properties and some constraints according to missingness mechanism. In this paper, we examine some deletion and imputation methods’ behaviors under the presence of outliers. We obtain a mean vector and covariance matrix with missing and contaminated data and compare the results of imputation methods using mean square errors. In second application, we use the regression data and examine the effect of missingness on regression model’s parameters. We compare the imputed values with real values and explain the results of classical and robust imputation methods. 

References

  • Afifi, A. A. and Elashoff, R. M., “Missing observations in multivariate statistics I. Review of the literature, Journal of the American Statistical Association”, 61:595-605, (1966).
  • Allison, P. D. “Missing data: Quantitative applications in the social sciences. British Journal of Mathematical and Statistical Psychology”, 55(1): 193-196, (2002).
  • Beale, E. M. L., Little, R. J. A. “Missing values in multivariate analysis, Journal of the Royal Statistical Society, Series B”, 37:129-145, (1975).
  • Branden, K., Verboven, V. S., “Robust data imputation, Computational Biology and Chemistry”, 33(1): 7-13, (2009).
  • Cheng, T. S., Victoria-Feser, M. P. “High-breakdown estimation of multivariate mean and covariance with missing observations”, British J. Math. Statist. Psych., 5: 317–335, (2002).
  • Dempster, A. P., Laird, N. M., Rubin, D. B., “Maximum likelihood from incomplete data via the EM algorithm”, Journal of the Royal Statistical Society, Series B, 39: 1-38, (1977).
  • Dempster, A. P., Rubin, D. B. 1983, “Introduction of incomplete data in sample surveys (Volume 2)” Theory and Bibliography (W. G. Madow, I. Olkin, D.B. Rubin eds.)”, 3-10, New York.
  • Graham, J.W., Missing Data: Analysis and Design, Springer New York, 324 p., (2014).
  • Hampel, F. R. “The influence curve and its role in robust estimation”, The Annals of Statistics, 69: 383–393, (1974).
  • Hawkins, D.M., Bradu, D. and Kass, G.V. “Location of several outliers in multiple regression data using elemental sets”. Technometrics, 26: 197–208. (1984).
  • Hubert, M., Rousseeuw, P. J. and Vanden Branden, K., “ROBPCA: a new approach to robust principal component analysis”, Technometrics, 47(1): 64-79, (2005).
  • Ibrahim, J.G. and Molenberghs, G., “Missing Data Methods in Longitudinal Studies: A Review, Test (Madrid, Spain)”, 18.1:1–43, (2009).
  • Little, R. J. A., Smith, P. J., “Editing and imputing for quantitative survey data”, Journal of the American Statistical Association 82:58-68, (1987).
  • Little, R. J. A., Rubin, D. B., Statistical Analysis with Missing Data (2nd ed.), Hoboken, N. Jersey, Wiley, (2002).
  • Lynch, S.M. and Bron, J.S., “Handling Missing Data in Social Research”, Chapman & Hall/CRC Statistics in the Social and Behavioral Sciences, (2015).
  • O'Kelly, M. and Ratitch, B., “Clinical Trials with Missing Data: A Guide for Practitioners”, John Wiley & Sons, (2014).
  • Raghunathan, T., “Missing Data Analysis in Practice”, Chapman & Hall CRC Interdisciplinary Statistics, (2015).
  • Rubin, D. B. “Inference and missing data”, Biometrika, 63:581–592, (1976).
  • Schafer, J. L., “Analysis of incomplete multivariate data”, Boca Raton, FL: Chapman & Hall, (1997).
  • Stanimirova, I. and Walczak, W., “Classification of data with missing elements and outliers”, Talanta, 76, 602-609, (2008).
  • Toka, O., Kayıp Veri Durumunda Sağlam Kestirim, H.Ü. Fen Bilimleri Enstitüsü Yüksek Lisans Tezi, Ankara, (2012).
  • Verboven, S., Branden, K.V. and Goos, P. “Sequential imputation for missing values”, Computational Biology and Chemistry, 31:320-327, (2007).
  • Wang, J., Data Mining: Opportunities and Challenges, Idea Group Inc (IGI), (2003).
  • Wilks, S. S., “Moments and distributions of estimates of population parameters from fragmentary samples”, The Annals of Mathematical Statistics, 3:163–195, (1932).
  • Zhou, X., Zhou, H. C., Lui, D. and Ding, X. Applied Missing Data Analysis in the Health Sciences, John Wiley & Sons, (2014).
Year 2016, Volume: 29 Issue: 4, 799 - 809, 19.12.2016

Abstract

References

  • Afifi, A. A. and Elashoff, R. M., “Missing observations in multivariate statistics I. Review of the literature, Journal of the American Statistical Association”, 61:595-605, (1966).
  • Allison, P. D. “Missing data: Quantitative applications in the social sciences. British Journal of Mathematical and Statistical Psychology”, 55(1): 193-196, (2002).
  • Beale, E. M. L., Little, R. J. A. “Missing values in multivariate analysis, Journal of the Royal Statistical Society, Series B”, 37:129-145, (1975).
  • Branden, K., Verboven, V. S., “Robust data imputation, Computational Biology and Chemistry”, 33(1): 7-13, (2009).
  • Cheng, T. S., Victoria-Feser, M. P. “High-breakdown estimation of multivariate mean and covariance with missing observations”, British J. Math. Statist. Psych., 5: 317–335, (2002).
  • Dempster, A. P., Laird, N. M., Rubin, D. B., “Maximum likelihood from incomplete data via the EM algorithm”, Journal of the Royal Statistical Society, Series B, 39: 1-38, (1977).
  • Dempster, A. P., Rubin, D. B. 1983, “Introduction of incomplete data in sample surveys (Volume 2)” Theory and Bibliography (W. G. Madow, I. Olkin, D.B. Rubin eds.)”, 3-10, New York.
  • Graham, J.W., Missing Data: Analysis and Design, Springer New York, 324 p., (2014).
  • Hampel, F. R. “The influence curve and its role in robust estimation”, The Annals of Statistics, 69: 383–393, (1974).
  • Hawkins, D.M., Bradu, D. and Kass, G.V. “Location of several outliers in multiple regression data using elemental sets”. Technometrics, 26: 197–208. (1984).
  • Hubert, M., Rousseeuw, P. J. and Vanden Branden, K., “ROBPCA: a new approach to robust principal component analysis”, Technometrics, 47(1): 64-79, (2005).
  • Ibrahim, J.G. and Molenberghs, G., “Missing Data Methods in Longitudinal Studies: A Review, Test (Madrid, Spain)”, 18.1:1–43, (2009).
  • Little, R. J. A., Smith, P. J., “Editing and imputing for quantitative survey data”, Journal of the American Statistical Association 82:58-68, (1987).
  • Little, R. J. A., Rubin, D. B., Statistical Analysis with Missing Data (2nd ed.), Hoboken, N. Jersey, Wiley, (2002).
  • Lynch, S.M. and Bron, J.S., “Handling Missing Data in Social Research”, Chapman & Hall/CRC Statistics in the Social and Behavioral Sciences, (2015).
  • O'Kelly, M. and Ratitch, B., “Clinical Trials with Missing Data: A Guide for Practitioners”, John Wiley & Sons, (2014).
  • Raghunathan, T., “Missing Data Analysis in Practice”, Chapman & Hall CRC Interdisciplinary Statistics, (2015).
  • Rubin, D. B. “Inference and missing data”, Biometrika, 63:581–592, (1976).
  • Schafer, J. L., “Analysis of incomplete multivariate data”, Boca Raton, FL: Chapman & Hall, (1997).
  • Stanimirova, I. and Walczak, W., “Classification of data with missing elements and outliers”, Talanta, 76, 602-609, (2008).
  • Toka, O., Kayıp Veri Durumunda Sağlam Kestirim, H.Ü. Fen Bilimleri Enstitüsü Yüksek Lisans Tezi, Ankara, (2012).
  • Verboven, S., Branden, K.V. and Goos, P. “Sequential imputation for missing values”, Computational Biology and Chemistry, 31:320-327, (2007).
  • Wang, J., Data Mining: Opportunities and Challenges, Idea Group Inc (IGI), (2003).
  • Wilks, S. S., “Moments and distributions of estimates of population parameters from fragmentary samples”, The Annals of Mathematical Statistics, 3:163–195, (1932).
  • Zhou, X., Zhou, H. C., Lui, D. and Ding, X. Applied Missing Data Analysis in the Health Sciences, John Wiley & Sons, (2014).
There are 25 citations in total.

Details

Journal Section Statistics
Authors

Onur Toka

Meral Çetin

Publication Date December 19, 2016
Published in Issue Year 2016 Volume: 29 Issue: 4

Cite

APA Toka, O., & Çetin, M. (2016). IMPUTATION AND DELETION METHODS UNDER THE PRESENCE OF MISSING VALUES AND OUTLIERS: A COMPARATIVE STUDY. Gazi University Journal of Science, 29(4), 799-809.
AMA Toka O, Çetin M. IMPUTATION AND DELETION METHODS UNDER THE PRESENCE OF MISSING VALUES AND OUTLIERS: A COMPARATIVE STUDY. Gazi University Journal of Science. December 2016;29(4):799-809.
Chicago Toka, Onur, and Meral Çetin. “IMPUTATION AND DELETION METHODS UNDER THE PRESENCE OF MISSING VALUES AND OUTLIERS: A COMPARATIVE STUDY”. Gazi University Journal of Science 29, no. 4 (December 2016): 799-809.
EndNote Toka O, Çetin M (December 1, 2016) IMPUTATION AND DELETION METHODS UNDER THE PRESENCE OF MISSING VALUES AND OUTLIERS: A COMPARATIVE STUDY. Gazi University Journal of Science 29 4 799–809.
IEEE O. Toka and M. Çetin, “IMPUTATION AND DELETION METHODS UNDER THE PRESENCE OF MISSING VALUES AND OUTLIERS: A COMPARATIVE STUDY”, Gazi University Journal of Science, vol. 29, no. 4, pp. 799–809, 2016.
ISNAD Toka, Onur - Çetin, Meral. “IMPUTATION AND DELETION METHODS UNDER THE PRESENCE OF MISSING VALUES AND OUTLIERS: A COMPARATIVE STUDY”. Gazi University Journal of Science 29/4 (December 2016), 799-809.
JAMA Toka O, Çetin M. IMPUTATION AND DELETION METHODS UNDER THE PRESENCE OF MISSING VALUES AND OUTLIERS: A COMPARATIVE STUDY. Gazi University Journal of Science. 2016;29:799–809.
MLA Toka, Onur and Meral Çetin. “IMPUTATION AND DELETION METHODS UNDER THE PRESENCE OF MISSING VALUES AND OUTLIERS: A COMPARATIVE STUDY”. Gazi University Journal of Science, vol. 29, no. 4, 2016, pp. 799-0.
Vancouver Toka O, Çetin M. IMPUTATION AND DELETION METHODS UNDER THE PRESENCE OF MISSING VALUES AND OUTLIERS: A COMPARATIVE STUDY. Gazi University Journal of Science. 2016;29(4):799-80.