Closed-form estimates for missing counts in multidimensional incomplete tables
Year 2024,
, 803 - 822, 27.06.2024
Sayan Ghosh
,
Palaniappan Vellaisamy
Abstract
A useful technique for analyzing incomplete tables is to model the missing data mechanisms of the variables using log-linear models. In this paper, we use log-linear parametrization and propose estimation methods for arbitrary three-way and $n$-dimensional incomplete tables. All possible cases in which data on one or more of the variables may be missing are considered. We provide simple closed form estimates of expected cell counts and parameters for the various missing data models. We also obtain explicit boundary estimates under nonignorable nonresponse models. Finally, a real-life dataset is analyzed to illustrate our results for modelling and estimation in multidimensional incomplete tables.
References
- [1] S.G. Baker, The multinomial-Poisson transformation, J. R. Stat. Soc. Ser. D. 43,
495-504, 1994.
- [2] S.G. Baker, A simple method for computing the observed information matrix when
using the EM algorithm with categorical data, J. Comput. Graph. Stat. 1 (1), 63-76,
1992.
- [3] S.G. Baker and N.M. Laird, Regression analysis for categorical variables with outcome
subject to nonignorable nonresponse, J. Am. Stat. Assoc. 83, 62-69, 1988.
- [4] S.G. Baker, W.F. Rosenberger and R. Dersimonian, Closed-form estimates for missing
counts in two-way contingency tables, Stat. Med. 11, 643-657, 1992.
- [5] P.S. Clarke, On boundary solutions and identifiability in categorical regression with
non-ignorable non-response, Biom. J. 44, 701-717, 2002.
- [6] P.S. Clarke and P.W.F. Smith, On maximum likelihood estimation for log-linear models
with non-ignorable non-responses, Stat. Probab. Lett. 73, 441-448, 2005.
- [7] P.S. Clarke and P.W.F. Smith, Interval estimation for log-linear models with one
variable subject to non-ignorable non-response, J. R. Stat. Soc. Ser. B. Stat. Methodol.
66, 357-368, 2004.
- [8] A.P. Dempster, N.M. Laird and D.B. Rubin, Maximum likelihood from incomplete
data via the EM algorithm, J. R. Stat. Soc. Ser. B. Stat. Methodol. 39 (1), 1-38,
1977.
- [9] R.E. Fay, Causal models for patterns of nonresponse, J. Am. Stat. Assoc. 81, 354-365,
1986.
- [10] J.J. Forster and P.W.F. Smith, Model-based inference for categorical survey data subject
to nonignorable nonresponse, J. R. Stat. Soc. Ser. B. Stat. Methodol. 60, 57-70,
1998.
- [11] S. Ghosh and P. Vellaisamy, On the occurrence of boundary solutions in two-way
incomplete tables, REVSTAT 48, 89-108, 2020.
- [12] S. Ghosh and P. Vellaisamy, On the occurrence of boundary solutions in multidimensional
incomplete tables, Stat. Probab. Lett. 119, 63-75, 2016.
- [13] R.J.A. Little, Pattern-mixture models for multivariate incomplete data, J. Am. Stat.
Assoc. 88, 125-134, 1993.
- [14] R.J.A. Little and D.B. Rubin, Statistical Analysis with Missing Data. 2nd ed. New
York: Wiley, 2002.
- [15] W.R. Madych, Solutions of underdetermined systems of linear equations, Spatial statistics
and imaging, Institute of Mathematical Statistics, Hayward, CA, 227-238,
1991.
- [16] X.L. Meng and D.B. Rubin, Using EM to obtain asymptotic variancecovariance matrices:
the SEM algorithm, J. Am. Stat. Assoc. 86, 899-909, 1991.
- [17] T. Park and M.B. Brown, Models for categorical data with nonignorable nonresponse,
J. Am. Stat. Assoc. 89, 44-52, 1994.
- [18] Y.D. Park, D. Kim and S. Kim, Identification of the occurrence of boundary solutions
in a contingency table with nonignorable nonresponse, Stat. Probab. Lett. 93, 34-40,
2014.
- [19] D.B. Rubin, H.S. Stern and V. Vehovar, Handling “Don’t know" survey responses :
the case of the Slovenian plebiscite, J. Am. Stat. Assoc. 90, 822-828, 1995.
- [20] P.W.F. Smith, C.J. Skinner and P.S. Clarke, Allowing for non-ignorable nonresponse
in the analysis of voting intention data, J. R. Stat. Soc. Ser. C. Appl. Stat. 48, 563-
577, 1999.
- [21] G. Williams, Overdetermined systems of linear equations, Am. Math. Mon. 97, 511-
513, 1990.
Year 2024,
, 803 - 822, 27.06.2024
Sayan Ghosh
,
Palaniappan Vellaisamy
References
- [1] S.G. Baker, The multinomial-Poisson transformation, J. R. Stat. Soc. Ser. D. 43,
495-504, 1994.
- [2] S.G. Baker, A simple method for computing the observed information matrix when
using the EM algorithm with categorical data, J. Comput. Graph. Stat. 1 (1), 63-76,
1992.
- [3] S.G. Baker and N.M. Laird, Regression analysis for categorical variables with outcome
subject to nonignorable nonresponse, J. Am. Stat. Assoc. 83, 62-69, 1988.
- [4] S.G. Baker, W.F. Rosenberger and R. Dersimonian, Closed-form estimates for missing
counts in two-way contingency tables, Stat. Med. 11, 643-657, 1992.
- [5] P.S. Clarke, On boundary solutions and identifiability in categorical regression with
non-ignorable non-response, Biom. J. 44, 701-717, 2002.
- [6] P.S. Clarke and P.W.F. Smith, On maximum likelihood estimation for log-linear models
with non-ignorable non-responses, Stat. Probab. Lett. 73, 441-448, 2005.
- [7] P.S. Clarke and P.W.F. Smith, Interval estimation for log-linear models with one
variable subject to non-ignorable non-response, J. R. Stat. Soc. Ser. B. Stat. Methodol.
66, 357-368, 2004.
- [8] A.P. Dempster, N.M. Laird and D.B. Rubin, Maximum likelihood from incomplete
data via the EM algorithm, J. R. Stat. Soc. Ser. B. Stat. Methodol. 39 (1), 1-38,
1977.
- [9] R.E. Fay, Causal models for patterns of nonresponse, J. Am. Stat. Assoc. 81, 354-365,
1986.
- [10] J.J. Forster and P.W.F. Smith, Model-based inference for categorical survey data subject
to nonignorable nonresponse, J. R. Stat. Soc. Ser. B. Stat. Methodol. 60, 57-70,
1998.
- [11] S. Ghosh and P. Vellaisamy, On the occurrence of boundary solutions in two-way
incomplete tables, REVSTAT 48, 89-108, 2020.
- [12] S. Ghosh and P. Vellaisamy, On the occurrence of boundary solutions in multidimensional
incomplete tables, Stat. Probab. Lett. 119, 63-75, 2016.
- [13] R.J.A. Little, Pattern-mixture models for multivariate incomplete data, J. Am. Stat.
Assoc. 88, 125-134, 1993.
- [14] R.J.A. Little and D.B. Rubin, Statistical Analysis with Missing Data. 2nd ed. New
York: Wiley, 2002.
- [15] W.R. Madych, Solutions of underdetermined systems of linear equations, Spatial statistics
and imaging, Institute of Mathematical Statistics, Hayward, CA, 227-238,
1991.
- [16] X.L. Meng and D.B. Rubin, Using EM to obtain asymptotic variancecovariance matrices:
the SEM algorithm, J. Am. Stat. Assoc. 86, 899-909, 1991.
- [17] T. Park and M.B. Brown, Models for categorical data with nonignorable nonresponse,
J. Am. Stat. Assoc. 89, 44-52, 1994.
- [18] Y.D. Park, D. Kim and S. Kim, Identification of the occurrence of boundary solutions
in a contingency table with nonignorable nonresponse, Stat. Probab. Lett. 93, 34-40,
2014.
- [19] D.B. Rubin, H.S. Stern and V. Vehovar, Handling “Don’t know" survey responses :
the case of the Slovenian plebiscite, J. Am. Stat. Assoc. 90, 822-828, 1995.
- [20] P.W.F. Smith, C.J. Skinner and P.S. Clarke, Allowing for non-ignorable nonresponse
in the analysis of voting intention data, J. R. Stat. Soc. Ser. C. Appl. Stat. 48, 563-
577, 1999.
- [21] G. Williams, Overdetermined systems of linear equations, Am. Math. Mon. 97, 511-
513, 1990.