BibTex RIS Kaynak Göster

IMPUTATION AND DELETION METHODS UNDER THE PRESENCE OF MISSING VALUES AND OUTLIERS: A COMPARATIVE STUDY

Yıl 2016, Cilt: 29 Sayı: 4, 799 - 809, 19.12.2016

Öz

Missing data and imputation methods are studied in many disciplines.  However, the methods have some different properties and some constraints according to missingness mechanism. In this paper, we examine some deletion and imputation methods’ behaviors under the presence of outliers. We obtain a mean vector and covariance matrix with missing and contaminated data and compare the results of imputation methods using mean square errors. In second application, we use the regression data and examine the effect of missingness on regression model’s parameters. We compare the imputed values with real values and explain the results of classical and robust imputation methods. 

Kaynakça

  • Afifi, A. A. and Elashoff, R. M., “Missing observations in multivariate statistics I. Review of the literature, Journal of the American Statistical Association”, 61:595-605, (1966).
  • Allison, P. D. “Missing data: Quantitative applications in the social sciences. British Journal of Mathematical and Statistical Psychology”, 55(1): 193-196, (2002).
  • Beale, E. M. L., Little, R. J. A. “Missing values in multivariate analysis, Journal of the Royal Statistical Society, Series B”, 37:129-145, (1975).
  • Branden, K., Verboven, V. S., “Robust data imputation, Computational Biology and Chemistry”, 33(1): 7-13, (2009).
  • Cheng, T. S., Victoria-Feser, M. P. “High-breakdown estimation of multivariate mean and covariance with missing observations”, British J. Math. Statist. Psych., 5: 317–335, (2002).
  • Dempster, A. P., Laird, N. M., Rubin, D. B., “Maximum likelihood from incomplete data via the EM algorithm”, Journal of the Royal Statistical Society, Series B, 39: 1-38, (1977).
  • Dempster, A. P., Rubin, D. B. 1983, “Introduction of incomplete data in sample surveys (Volume 2)” Theory and Bibliography (W. G. Madow, I. Olkin, D.B. Rubin eds.)”, 3-10, New York.
  • Graham, J.W., Missing Data: Analysis and Design, Springer New York, 324 p., (2014).
  • Hampel, F. R. “The influence curve and its role in robust estimation”, The Annals of Statistics, 69: 383–393, (1974).
  • Hawkins, D.M., Bradu, D. and Kass, G.V. “Location of several outliers in multiple regression data using elemental sets”. Technometrics, 26: 197–208. (1984).
  • Hubert, M., Rousseeuw, P. J. and Vanden Branden, K., “ROBPCA: a new approach to robust principal component analysis”, Technometrics, 47(1): 64-79, (2005).
  • Ibrahim, J.G. and Molenberghs, G., “Missing Data Methods in Longitudinal Studies: A Review, Test (Madrid, Spain)”, 18.1:1–43, (2009).
  • Little, R. J. A., Smith, P. J., “Editing and imputing for quantitative survey data”, Journal of the American Statistical Association 82:58-68, (1987).
  • Little, R. J. A., Rubin, D. B., Statistical Analysis with Missing Data (2nd ed.), Hoboken, N. Jersey, Wiley, (2002).
  • Lynch, S.M. and Bron, J.S., “Handling Missing Data in Social Research”, Chapman & Hall/CRC Statistics in the Social and Behavioral Sciences, (2015).
  • O'Kelly, M. and Ratitch, B., “Clinical Trials with Missing Data: A Guide for Practitioners”, John Wiley & Sons, (2014).
  • Raghunathan, T., “Missing Data Analysis in Practice”, Chapman & Hall CRC Interdisciplinary Statistics, (2015).
  • Rubin, D. B. “Inference and missing data”, Biometrika, 63:581–592, (1976).
  • Schafer, J. L., “Analysis of incomplete multivariate data”, Boca Raton, FL: Chapman & Hall, (1997).
  • Stanimirova, I. and Walczak, W., “Classification of data with missing elements and outliers”, Talanta, 76, 602-609, (2008).
  • Toka, O., Kayıp Veri Durumunda Sağlam Kestirim, H.Ü. Fen Bilimleri Enstitüsü Yüksek Lisans Tezi, Ankara, (2012).
  • Verboven, S., Branden, K.V. and Goos, P. “Sequential imputation for missing values”, Computational Biology and Chemistry, 31:320-327, (2007).
  • Wang, J., Data Mining: Opportunities and Challenges, Idea Group Inc (IGI), (2003).
  • Wilks, S. S., “Moments and distributions of estimates of population parameters from fragmentary samples”, The Annals of Mathematical Statistics, 3:163–195, (1932).
  • Zhou, X., Zhou, H. C., Lui, D. and Ding, X. Applied Missing Data Analysis in the Health Sciences, John Wiley & Sons, (2014).
Yıl 2016, Cilt: 29 Sayı: 4, 799 - 809, 19.12.2016

Öz

Kaynakça

  • Afifi, A. A. and Elashoff, R. M., “Missing observations in multivariate statistics I. Review of the literature, Journal of the American Statistical Association”, 61:595-605, (1966).
  • Allison, P. D. “Missing data: Quantitative applications in the social sciences. British Journal of Mathematical and Statistical Psychology”, 55(1): 193-196, (2002).
  • Beale, E. M. L., Little, R. J. A. “Missing values in multivariate analysis, Journal of the Royal Statistical Society, Series B”, 37:129-145, (1975).
  • Branden, K., Verboven, V. S., “Robust data imputation, Computational Biology and Chemistry”, 33(1): 7-13, (2009).
  • Cheng, T. S., Victoria-Feser, M. P. “High-breakdown estimation of multivariate mean and covariance with missing observations”, British J. Math. Statist. Psych., 5: 317–335, (2002).
  • Dempster, A. P., Laird, N. M., Rubin, D. B., “Maximum likelihood from incomplete data via the EM algorithm”, Journal of the Royal Statistical Society, Series B, 39: 1-38, (1977).
  • Dempster, A. P., Rubin, D. B. 1983, “Introduction of incomplete data in sample surveys (Volume 2)” Theory and Bibliography (W. G. Madow, I. Olkin, D.B. Rubin eds.)”, 3-10, New York.
  • Graham, J.W., Missing Data: Analysis and Design, Springer New York, 324 p., (2014).
  • Hampel, F. R. “The influence curve and its role in robust estimation”, The Annals of Statistics, 69: 383–393, (1974).
  • Hawkins, D.M., Bradu, D. and Kass, G.V. “Location of several outliers in multiple regression data using elemental sets”. Technometrics, 26: 197–208. (1984).
  • Hubert, M., Rousseeuw, P. J. and Vanden Branden, K., “ROBPCA: a new approach to robust principal component analysis”, Technometrics, 47(1): 64-79, (2005).
  • Ibrahim, J.G. and Molenberghs, G., “Missing Data Methods in Longitudinal Studies: A Review, Test (Madrid, Spain)”, 18.1:1–43, (2009).
  • Little, R. J. A., Smith, P. J., “Editing and imputing for quantitative survey data”, Journal of the American Statistical Association 82:58-68, (1987).
  • Little, R. J. A., Rubin, D. B., Statistical Analysis with Missing Data (2nd ed.), Hoboken, N. Jersey, Wiley, (2002).
  • Lynch, S.M. and Bron, J.S., “Handling Missing Data in Social Research”, Chapman & Hall/CRC Statistics in the Social and Behavioral Sciences, (2015).
  • O'Kelly, M. and Ratitch, B., “Clinical Trials with Missing Data: A Guide for Practitioners”, John Wiley & Sons, (2014).
  • Raghunathan, T., “Missing Data Analysis in Practice”, Chapman & Hall CRC Interdisciplinary Statistics, (2015).
  • Rubin, D. B. “Inference and missing data”, Biometrika, 63:581–592, (1976).
  • Schafer, J. L., “Analysis of incomplete multivariate data”, Boca Raton, FL: Chapman & Hall, (1997).
  • Stanimirova, I. and Walczak, W., “Classification of data with missing elements and outliers”, Talanta, 76, 602-609, (2008).
  • Toka, O., Kayıp Veri Durumunda Sağlam Kestirim, H.Ü. Fen Bilimleri Enstitüsü Yüksek Lisans Tezi, Ankara, (2012).
  • Verboven, S., Branden, K.V. and Goos, P. “Sequential imputation for missing values”, Computational Biology and Chemistry, 31:320-327, (2007).
  • Wang, J., Data Mining: Opportunities and Challenges, Idea Group Inc (IGI), (2003).
  • Wilks, S. S., “Moments and distributions of estimates of population parameters from fragmentary samples”, The Annals of Mathematical Statistics, 3:163–195, (1932).
  • Zhou, X., Zhou, H. C., Lui, D. and Ding, X. Applied Missing Data Analysis in the Health Sciences, John Wiley & Sons, (2014).
Toplam 25 adet kaynakça vardır.

Ayrıntılar

Bölüm Statistics
Yazarlar

Onur Toka

Meral Çetin

Yayımlanma Tarihi 19 Aralık 2016
Yayımlandığı Sayı Yıl 2016 Cilt: 29 Sayı: 4

Kaynak Göster

APA Toka, O., & Çetin, M. (2016). IMPUTATION AND DELETION METHODS UNDER THE PRESENCE OF MISSING VALUES AND OUTLIERS: A COMPARATIVE STUDY. Gazi University Journal of Science, 29(4), 799-809.
AMA Toka O, Çetin M. IMPUTATION AND DELETION METHODS UNDER THE PRESENCE OF MISSING VALUES AND OUTLIERS: A COMPARATIVE STUDY. Gazi University Journal of Science. Aralık 2016;29(4):799-809.
Chicago Toka, Onur, ve Meral Çetin. “IMPUTATION AND DELETION METHODS UNDER THE PRESENCE OF MISSING VALUES AND OUTLIERS: A COMPARATIVE STUDY”. Gazi University Journal of Science 29, sy. 4 (Aralık 2016): 799-809.
EndNote Toka O, Çetin M (01 Aralık 2016) IMPUTATION AND DELETION METHODS UNDER THE PRESENCE OF MISSING VALUES AND OUTLIERS: A COMPARATIVE STUDY. Gazi University Journal of Science 29 4 799–809.
IEEE O. Toka ve M. Çetin, “IMPUTATION AND DELETION METHODS UNDER THE PRESENCE OF MISSING VALUES AND OUTLIERS: A COMPARATIVE STUDY”, Gazi University Journal of Science, c. 29, sy. 4, ss. 799–809, 2016.
ISNAD Toka, Onur - Çetin, Meral. “IMPUTATION AND DELETION METHODS UNDER THE PRESENCE OF MISSING VALUES AND OUTLIERS: A COMPARATIVE STUDY”. Gazi University Journal of Science 29/4 (Aralık 2016), 799-809.
JAMA Toka O, Çetin M. IMPUTATION AND DELETION METHODS UNDER THE PRESENCE OF MISSING VALUES AND OUTLIERS: A COMPARATIVE STUDY. Gazi University Journal of Science. 2016;29:799–809.
MLA Toka, Onur ve Meral Çetin. “IMPUTATION AND DELETION METHODS UNDER THE PRESENCE OF MISSING VALUES AND OUTLIERS: A COMPARATIVE STUDY”. Gazi University Journal of Science, c. 29, sy. 4, 2016, ss. 799-0.
Vancouver Toka O, Çetin M. IMPUTATION AND DELETION METHODS UNDER THE PRESENCE OF MISSING VALUES AND OUTLIERS: A COMPARATIVE STUDY. Gazi University Journal of Science. 2016;29(4):799-80.