Bayesian and frequentist approaches on estimation and testing for a zero-inflated binomial distribution
Year 2022,
, 834 - 856, 01.06.2022
Seung Ji Nam
,
Seong Kim
,
Hon Keung Tony Ng
Abstract
To analyze discrete count data with excessive zeros, different zero-inflated statistical models that allow for frequent zero-valued observations have been developed. When the underlying data generation process of non-zero values is based on the number of successes in a sequence of independent Bernoulli trials, the zero-inflated binomial distribution is perhaps adequate for modeling purposes. In this paper, we discuss statistical inference for a zero-inflated binomial distribution using the objective Bayesian and frequentist approaches. Point and interval estimation of the model parameters and hypothesis testing for excessive zeros in a zero-inflated binomial distribution are developed. A Monte Carlo simulation study is used to assess the performance of estimation and hypothesis testing procedures. A comparative study of the objective Bayesian approach and the frequentist approach is provided. The proposed statistical inferential methods are applied to analyze an earthquake dataset and a baseball dataset for illustration.
Supporting Institution
National Research Foundation of Korea
Project Number
NRF-2021R1A2C1005271
References
- [1] J. Albert and P. Williamson, Using model/data simulations to detect streakiness,
Amer. Statist. 55 (1), 41-50, 2001.
- [2] N. Amek, N. Bayoh, M. Hamel, K.A. Lindblade, J. Gimnig, K.F. Laserson, L.
Slutsker, T. Smith and P. Vounatsou, Spatio-temporal modeling of sparse geostatistical
malaria sporozoite rate data using a zero inflated binomial model, Spat Spatiotemporal Epidemiol 2 (4), 283-290, 2011.
- [3] C.C. Astuti and A.D. Mulyanto, Estimation parameters and modelling zero inflated
negative binomial, Cauchy: Jurnal Matematika Murni dan Aplikasi 4 (3), 115-119,
2016.
- [4] M.J. Bayarri, J.O. Berger and G.S. Datta, Objective Bayes testing of Poisson versus
inflated Poisson models, IMS Collections 3, 105-121, 2008.
- [5] J.O. Berger and L.R. Pericchi, The intrinsic Bayes factor for model selection and
prediction, J. Amer. Statist. Assoc. 91 (433), 109-122, 1996.
- [6] W. Bodromurti, K.A. Notodiputro and A. Kurnia, Zero inflated binomial model for
infant mortality data in Indonesia, Int. J. Appl. Eng. Res. 13, 3139-3143, 2018.
- [7] G. Claeskens, R. Nguti and P. Janssen, One-sided tests in shared frailty models, Test
17 (1), 69-82, 2008.
- [8] A C. Cohen, Estimation in mixtures of discrete distributions, Statistical Pub, 1963.
- [9] F. De Santis and S. Gubbiotti, Sample size requirements for calibrated approximate
credible intervals for proportions in clincal trials, Int. J. Environ. Res. Public Health
18 (2) 1-11, 2021.
- [10] D. Deng and S.R. Paul, Score tests for zero inflation in generalized linear models,
Canad. J. Statist. 28 (3), 563-570, 2000.
- [11] A. Diallo, A. Diop and J.F. Dupuy, Estimation in zero-inflated binomial regression
with missing covariates, Statistics 53 (5), 839-865, 2019.
- [12] C. Dong, D.B. Clarke, X. Yan, A. Khattak and B. Huang, Multivariate random-
parameters zero-inflated negative binomial regression model: an application to estimate crash frequencies at intersections, Accid Anal Prev 70, 320-329, 2014.
- [13] C. Huang, X. Liu, T. Yao and X. Wang, An efficient EM algorithm for the mixture
of negative binomial models, J. Phys. Conf 1324 (1), 012093, 2019.
- [14] S. Jiang, G. Xiao, A.Y. Koh, J. Kim, Q. Li and X. Zhan, A Bayesian zero-inflated
negative binomial regression model for the integrative analysis of microbiome data,
Biostatistics 22 (3), 522-540, 2021.
- [15] N.L. Johnson and S. Kotz, Distributions in statistics: discrete distributions, John
Wiley & Sons, 1969.
- [16] R. Kass and A.E. Raftery, Bayes Factors, J. Amer. Statist. Assoc. 90 (430), 773-795,
1995.
- [17] R. Kass and S. Vaidyanathan, Approximate Bayes factors and orthogonal parameters
with application to testing equality of two binomial proportions, J. R. Stat. Soc. Ser.
B. Stat. Methodol. 54 (1), 129-144, 1992.
- [18] S.W. Kim, S. Shahin, H.K.T. Ng and J. Kim, Binary segmentation procedures using
the bivariate binomial distribution for detecting streakiness in sports data, Comput.
Statist., 36 (3), 1821-1843, 2021.
- [19] Q. Li, M. Zhang, Y. Xie and G. Xiao, Bayesian modeling of spatial molecular profiling
data via Gaussian process. Bioinformatics 37 (22), 4129-4136, 2021.
- [20] Z. Li, K. Lee, M. Karagas, J. Madan, A. Hoen, A. O’Malley, and H. Li, Conditional
regression based on a multivariate zero-inflated logistic-normal model for microbiome
relative abundance data, Stat. Biosci. 10 (3), 587-608, 2018.
- [21] T. Loyes, B. Moerkerke, O.D. Smet and A. Buysse, The analysis of zero-inflated
count data: beyond zero-inflated Poisson regression, Br. J. Math. Stat. Psychol. 65
(1), 163-180, 2011.
- [22] B. Quost and T. Denoeux, Clustering and classification of fuzzy data using the fuzzy
EM algorithm, Fuzzy Sets and Systems 286, 134-156, 2016.
- [23] M. Ridout, J. Hinde and C.G.B. Demetrio, A score test for testing a zero-inflated Poisson regression model against zero-inflated negative binomial alternatives, Biometrics
57 (1), 219-223, 2001.
- [24] L. Sahabo and S. Yi, Normally approximated Bayesian credible interval of binomial
proportion, J Korean Stat Soc 30 (1), 233-244, 2019.
- [25] S. Self and K. Liang, Asymptotic properties of maximum likelihood estimators and
likelihood ratio tests under nonstandard conditions, J. Amer. Statist. Assoc. 82 (398),
605-610, 1987.
- [26] F. Tang and J.E. Cavanaugh, State-space models for binomial time series with excess
zeros, J Time Ser Anal. 9, 128-151, 2017.
- [27] T. W. Yee, VGAM: Vector generalized linear and additive models, R package version
1.1-5, 2021.
- [28] X. Zhang, H. Mallick, Z. Tang, L. Zhang, X. Cui, A. Benson and N. Yi, Negative
binomial mixed models for analyzing microbiome count data, BMC Bioinform. 18 (4),
1-10, 2017.
- [29] M. Zulkifli, I. Noriszura and A.M. Razali, Zero-inflated Poisson versus zero-inflated
negative binomial: application to theft insurance data, The 7th IMT-GT International
Conference on Mathematics, Statistics and its Applications, 2011.
Year 2022,
, 834 - 856, 01.06.2022
Seung Ji Nam
,
Seong Kim
,
Hon Keung Tony Ng
Project Number
NRF-2021R1A2C1005271
References
- [1] J. Albert and P. Williamson, Using model/data simulations to detect streakiness,
Amer. Statist. 55 (1), 41-50, 2001.
- [2] N. Amek, N. Bayoh, M. Hamel, K.A. Lindblade, J. Gimnig, K.F. Laserson, L.
Slutsker, T. Smith and P. Vounatsou, Spatio-temporal modeling of sparse geostatistical
malaria sporozoite rate data using a zero inflated binomial model, Spat Spatiotemporal Epidemiol 2 (4), 283-290, 2011.
- [3] C.C. Astuti and A.D. Mulyanto, Estimation parameters and modelling zero inflated
negative binomial, Cauchy: Jurnal Matematika Murni dan Aplikasi 4 (3), 115-119,
2016.
- [4] M.J. Bayarri, J.O. Berger and G.S. Datta, Objective Bayes testing of Poisson versus
inflated Poisson models, IMS Collections 3, 105-121, 2008.
- [5] J.O. Berger and L.R. Pericchi, The intrinsic Bayes factor for model selection and
prediction, J. Amer. Statist. Assoc. 91 (433), 109-122, 1996.
- [6] W. Bodromurti, K.A. Notodiputro and A. Kurnia, Zero inflated binomial model for
infant mortality data in Indonesia, Int. J. Appl. Eng. Res. 13, 3139-3143, 2018.
- [7] G. Claeskens, R. Nguti and P. Janssen, One-sided tests in shared frailty models, Test
17 (1), 69-82, 2008.
- [8] A C. Cohen, Estimation in mixtures of discrete distributions, Statistical Pub, 1963.
- [9] F. De Santis and S. Gubbiotti, Sample size requirements for calibrated approximate
credible intervals for proportions in clincal trials, Int. J. Environ. Res. Public Health
18 (2) 1-11, 2021.
- [10] D. Deng and S.R. Paul, Score tests for zero inflation in generalized linear models,
Canad. J. Statist. 28 (3), 563-570, 2000.
- [11] A. Diallo, A. Diop and J.F. Dupuy, Estimation in zero-inflated binomial regression
with missing covariates, Statistics 53 (5), 839-865, 2019.
- [12] C. Dong, D.B. Clarke, X. Yan, A. Khattak and B. Huang, Multivariate random-
parameters zero-inflated negative binomial regression model: an application to estimate crash frequencies at intersections, Accid Anal Prev 70, 320-329, 2014.
- [13] C. Huang, X. Liu, T. Yao and X. Wang, An efficient EM algorithm for the mixture
of negative binomial models, J. Phys. Conf 1324 (1), 012093, 2019.
- [14] S. Jiang, G. Xiao, A.Y. Koh, J. Kim, Q. Li and X. Zhan, A Bayesian zero-inflated
negative binomial regression model for the integrative analysis of microbiome data,
Biostatistics 22 (3), 522-540, 2021.
- [15] N.L. Johnson and S. Kotz, Distributions in statistics: discrete distributions, John
Wiley & Sons, 1969.
- [16] R. Kass and A.E. Raftery, Bayes Factors, J. Amer. Statist. Assoc. 90 (430), 773-795,
1995.
- [17] R. Kass and S. Vaidyanathan, Approximate Bayes factors and orthogonal parameters
with application to testing equality of two binomial proportions, J. R. Stat. Soc. Ser.
B. Stat. Methodol. 54 (1), 129-144, 1992.
- [18] S.W. Kim, S. Shahin, H.K.T. Ng and J. Kim, Binary segmentation procedures using
the bivariate binomial distribution for detecting streakiness in sports data, Comput.
Statist., 36 (3), 1821-1843, 2021.
- [19] Q. Li, M. Zhang, Y. Xie and G. Xiao, Bayesian modeling of spatial molecular profiling
data via Gaussian process. Bioinformatics 37 (22), 4129-4136, 2021.
- [20] Z. Li, K. Lee, M. Karagas, J. Madan, A. Hoen, A. O’Malley, and H. Li, Conditional
regression based on a multivariate zero-inflated logistic-normal model for microbiome
relative abundance data, Stat. Biosci. 10 (3), 587-608, 2018.
- [21] T. Loyes, B. Moerkerke, O.D. Smet and A. Buysse, The analysis of zero-inflated
count data: beyond zero-inflated Poisson regression, Br. J. Math. Stat. Psychol. 65
(1), 163-180, 2011.
- [22] B. Quost and T. Denoeux, Clustering and classification of fuzzy data using the fuzzy
EM algorithm, Fuzzy Sets and Systems 286, 134-156, 2016.
- [23] M. Ridout, J. Hinde and C.G.B. Demetrio, A score test for testing a zero-inflated Poisson regression model against zero-inflated negative binomial alternatives, Biometrics
57 (1), 219-223, 2001.
- [24] L. Sahabo and S. Yi, Normally approximated Bayesian credible interval of binomial
proportion, J Korean Stat Soc 30 (1), 233-244, 2019.
- [25] S. Self and K. Liang, Asymptotic properties of maximum likelihood estimators and
likelihood ratio tests under nonstandard conditions, J. Amer. Statist. Assoc. 82 (398),
605-610, 1987.
- [26] F. Tang and J.E. Cavanaugh, State-space models for binomial time series with excess
zeros, J Time Ser Anal. 9, 128-151, 2017.
- [27] T. W. Yee, VGAM: Vector generalized linear and additive models, R package version
1.1-5, 2021.
- [28] X. Zhang, H. Mallick, Z. Tang, L. Zhang, X. Cui, A. Benson and N. Yi, Negative
binomial mixed models for analyzing microbiome count data, BMC Bioinform. 18 (4),
1-10, 2017.
- [29] M. Zulkifli, I. Noriszura and A.M. Razali, Zero-inflated Poisson versus zero-inflated
negative binomial: application to theft insurance data, The 7th IMT-GT International
Conference on Mathematics, Statistics and its Applications, 2011.