Research Article
BibTex RIS Cite
Year 2020, Volume: 69 Issue: 2, 1083 - 1103, 31.12.2020
https://doi.org/10.31801/cfsuasmas.614492

Abstract

References

  • Albert, A. and Anderson, J. A. On the existence of maximum likelihood estimates in logistic regression models. Biometrica (1984), 71, 1-10.
  • Betancourt, M. J., Byrne, S., Livingstone, S. and Girolami, M. The Geometric Foundations of Hamiltonian Monte Carlo. ArXiv e-prints 1410.5110, 2014.
  • Clogg, C. C., Rubin, D. B., Schenker, N., Schultz, B. and Weidman, L. Multiple imputation of industry and occupation codes in census public-use samples using Bayesian logistic regression. Journal of the American Statistical Association (1991), 86, 68-78.
  • Denwood, M. J. runjags: An R package providing interface utilities, model templates, parallel computing methods and additional distributions for MCMC models in JAGS. Journal of Statistical Software (2016), 71, 1-25.
  • Duane, S., Kennedy, A. D., Pendleton, B. J. and Roweth, D. Hybrid Monte Carlo. Physics Letters B (1987), 195, 216-222.
  • Efron, B. and Tibshirani, R. J. An introduction to the bootstrap. New York: Chapman & Hall, 1993. Firth, D. Bias reduction of maximum likelihood estimates. Biometrika (1993), 80, 27-38.
  • Fisher, R. A. On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society (1922), 222, 309-368.
  • Gelman, A., Jakulin, A., Pittau, M. G. and Su, Y. A weekly informative prior distribution for logistic and other regression models. Annals of Applied Statistics (2008), 2, 1360-83.
  • Geman, S. and Geman, D. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence (1984), 6, 721-741.
  • Ghosh, J., Li., Y. and Mitra, R. On the use of Cauchy prior distributions for Bayesian logistic regression. International Society for Bayesian Analysis (2018), 13, 359-383.
  • Goodrich, B., Gabry, J., Ali, I. and Brilleman, S. rstanarm: Bayesian applied regression modeling via Stan. R package version 2.17.4 (2018).
  • Hastings, W. Monte Carlo sampling methods using Markov chains and their applications. Biometrika (1970), 57, 97-109.
  • Heinze, G. and Schemper, M. A solution to the problem of separation in logistic regression. Statistics in Medicine (2002), 21, 2409-2419.
  • Hox, J. J. Multilevel analysis: Techniques and applications (2nd ed.). New York, NY: Routledge, 2010.
  • Jeffreys, H. An invariant form of the prior probability in estimation problems. Proceedings of the Royal Society of London A (1946), 186, 453-61.
  • Lokhorst, J. The lasso and generalised linear models. Honors Project. University of Adelaide, Adelaide (1999).
  • Maciejewski, R. Data representations, transformations, and statistics for visual reasoning. Synthesis Lectures on Visualization (2011), 2, 1-85.
  • Makalic, E. and Schmidt, D. High-dimensional Bayesian regularised regression with the BayesReg package (2016). arXiv:1611.06649.
  • Mansournia, M. A., Geroldinger, A., Greenland, S., and Heinze, G. Separation in logistic regression: Causes, consequences, and control. American Journal of Epidemiology (2018), 187, 864-870.
  • McCullagh, P. and Nelder, J. Generalized linear models (2nd ed.). Boca Raton, FL: Chapman & Hall / CRC, 1989.
  • Mehta, C. R. and Patel, N. R. Exact logistic regression: theory and examples. Statistics in Medicine (1995), 14, 2143-2160.
  • Metropolis, N., Rosenbuth, A., Rosenbuth, M., Teller, A. and Teller, E. Equations of state calculations by fast computing machines. The Journal of Chemical Physics (1953), 21, 1087-1092.
  • Mood, C. Logistic regression: Why we cannot do what we think we can do, and what we can do about it. European Sociological Review (2010), 26, 67-82.
  • Muth, C., Oravecz, Z. and Gabry, J. User-friendly Bayesian regression modeling: A tutorial with rstanarm an shinystan. The Quantitative Methods for Psychology (2018), 14, 99-119.
  • Neal, R. MCMC using Hamiltonian Dynamics. In Handbook of Markov Chain Monte Carlo (S. Brooks, A. Gelman, G. L. Jones and X.-L. Meng, eds. CRC Press, New York.), 2013.
  • Newton, I. Philosophiae naturalis principia mathematica. Colonia Allobrogum: sumptibus CI. et Ant. Philibert (1760).
  • Ohkura, M. and Kamakura, T. Test for a regression parameter in a logistic regression model under the small sample size and the high event occurrence probability. Japanese Applied Statistics (in Japanese) (2011), 40, 41-51.
  • Rainey, C. Dealing with separation in logistic regression models. Political Analysis (2016), 24, 339-355.
  • Roth, V. The generalized lasso. IEEE Transactions on Neural Networks (2004), 15, 16-28.
  • Schaefer, R. L., Roi, L. D. and Wolfe, R. A. A ridge logistic estimator. Communications in Statistics - Theory and Methods (1984), 13, 99-113.
  • Walther, B. A. and Moore, J. L. The concepts of bias, precision and accuracy, and their use in testing the performance of species richness estimators, with a literature review of estimator performance. Ecography (2005), 28, 815-829.
  • Webb, M. C., Wilson, J. R. and Chong, J. An analysis of quasi-complete binary data with logistic models: Applications to alcohol abuse data. Journal of Data Science (2004), 2, 273-285.
  • Yuan, K. H. and Hayashi, K. Standard errors in covariance structure models: Asymptotic versus bootstrap. British Journal of Mathematical and Statistical Psychology (2006), 59, 397-417.
  • Zorn, C. A solution to separation in binary response models. Political Analysis (2005), 13, 157-170.

A comparative study on the performance of frequentist and Bayesian estimation methods under separation in logistic regression

Year 2020, Volume: 69 Issue: 2, 1083 - 1103, 31.12.2020
https://doi.org/10.31801/cfsuasmas.614492

Abstract

Separation is one of the most commonly encountered estimation problems in the context of logistic regression, which often occurs with small and medium sample sizes. The method of maximum likelihood (MLE; Fisher) provides spuriously high parameter estimates and their standard errors under separation in logistic regression. Many researchers in social sciences utilize simple but ad-hoc solutions to overcome this issue, such as "doing nothing strategy", removing variable(s) from the model, and combining the levels of the categorical variable in the data causing separation etc. The limitations of these basic solutions have motivated researchers to use more appropriate and innovative estimation techniques to deal with the problem. However, the performance and comparison of these techniques have not been fully investigated yet. The main goal of this paper is to close this research gap by comparing the performance of frequentist and Bayesian estimation methods for coping with separation. A simulation study is performed to investigate the performance of asymptotic, bootstrap-based, and Bayesian estimation techniques with respect to bias, precision, and accuracy measures under separation. In line with the simulation study, a real-data example is used to illustrate how to utilize these methods to solve separation in logistic regression.

References

  • Albert, A. and Anderson, J. A. On the existence of maximum likelihood estimates in logistic regression models. Biometrica (1984), 71, 1-10.
  • Betancourt, M. J., Byrne, S., Livingstone, S. and Girolami, M. The Geometric Foundations of Hamiltonian Monte Carlo. ArXiv e-prints 1410.5110, 2014.
  • Clogg, C. C., Rubin, D. B., Schenker, N., Schultz, B. and Weidman, L. Multiple imputation of industry and occupation codes in census public-use samples using Bayesian logistic regression. Journal of the American Statistical Association (1991), 86, 68-78.
  • Denwood, M. J. runjags: An R package providing interface utilities, model templates, parallel computing methods and additional distributions for MCMC models in JAGS. Journal of Statistical Software (2016), 71, 1-25.
  • Duane, S., Kennedy, A. D., Pendleton, B. J. and Roweth, D. Hybrid Monte Carlo. Physics Letters B (1987), 195, 216-222.
  • Efron, B. and Tibshirani, R. J. An introduction to the bootstrap. New York: Chapman & Hall, 1993. Firth, D. Bias reduction of maximum likelihood estimates. Biometrika (1993), 80, 27-38.
  • Fisher, R. A. On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society (1922), 222, 309-368.
  • Gelman, A., Jakulin, A., Pittau, M. G. and Su, Y. A weekly informative prior distribution for logistic and other regression models. Annals of Applied Statistics (2008), 2, 1360-83.
  • Geman, S. and Geman, D. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence (1984), 6, 721-741.
  • Ghosh, J., Li., Y. and Mitra, R. On the use of Cauchy prior distributions for Bayesian logistic regression. International Society for Bayesian Analysis (2018), 13, 359-383.
  • Goodrich, B., Gabry, J., Ali, I. and Brilleman, S. rstanarm: Bayesian applied regression modeling via Stan. R package version 2.17.4 (2018).
  • Hastings, W. Monte Carlo sampling methods using Markov chains and their applications. Biometrika (1970), 57, 97-109.
  • Heinze, G. and Schemper, M. A solution to the problem of separation in logistic regression. Statistics in Medicine (2002), 21, 2409-2419.
  • Hox, J. J. Multilevel analysis: Techniques and applications (2nd ed.). New York, NY: Routledge, 2010.
  • Jeffreys, H. An invariant form of the prior probability in estimation problems. Proceedings of the Royal Society of London A (1946), 186, 453-61.
  • Lokhorst, J. The lasso and generalised linear models. Honors Project. University of Adelaide, Adelaide (1999).
  • Maciejewski, R. Data representations, transformations, and statistics for visual reasoning. Synthesis Lectures on Visualization (2011), 2, 1-85.
  • Makalic, E. and Schmidt, D. High-dimensional Bayesian regularised regression with the BayesReg package (2016). arXiv:1611.06649.
  • Mansournia, M. A., Geroldinger, A., Greenland, S., and Heinze, G. Separation in logistic regression: Causes, consequences, and control. American Journal of Epidemiology (2018), 187, 864-870.
  • McCullagh, P. and Nelder, J. Generalized linear models (2nd ed.). Boca Raton, FL: Chapman & Hall / CRC, 1989.
  • Mehta, C. R. and Patel, N. R. Exact logistic regression: theory and examples. Statistics in Medicine (1995), 14, 2143-2160.
  • Metropolis, N., Rosenbuth, A., Rosenbuth, M., Teller, A. and Teller, E. Equations of state calculations by fast computing machines. The Journal of Chemical Physics (1953), 21, 1087-1092.
  • Mood, C. Logistic regression: Why we cannot do what we think we can do, and what we can do about it. European Sociological Review (2010), 26, 67-82.
  • Muth, C., Oravecz, Z. and Gabry, J. User-friendly Bayesian regression modeling: A tutorial with rstanarm an shinystan. The Quantitative Methods for Psychology (2018), 14, 99-119.
  • Neal, R. MCMC using Hamiltonian Dynamics. In Handbook of Markov Chain Monte Carlo (S. Brooks, A. Gelman, G. L. Jones and X.-L. Meng, eds. CRC Press, New York.), 2013.
  • Newton, I. Philosophiae naturalis principia mathematica. Colonia Allobrogum: sumptibus CI. et Ant. Philibert (1760).
  • Ohkura, M. and Kamakura, T. Test for a regression parameter in a logistic regression model under the small sample size and the high event occurrence probability. Japanese Applied Statistics (in Japanese) (2011), 40, 41-51.
  • Rainey, C. Dealing with separation in logistic regression models. Political Analysis (2016), 24, 339-355.
  • Roth, V. The generalized lasso. IEEE Transactions on Neural Networks (2004), 15, 16-28.
  • Schaefer, R. L., Roi, L. D. and Wolfe, R. A. A ridge logistic estimator. Communications in Statistics - Theory and Methods (1984), 13, 99-113.
  • Walther, B. A. and Moore, J. L. The concepts of bias, precision and accuracy, and their use in testing the performance of species richness estimators, with a literature review of estimator performance. Ecography (2005), 28, 815-829.
  • Webb, M. C., Wilson, J. R. and Chong, J. An analysis of quasi-complete binary data with logistic models: Applications to alcohol abuse data. Journal of Data Science (2004), 2, 273-285.
  • Yuan, K. H. and Hayashi, K. Standard errors in covariance structure models: Asymptotic versus bootstrap. British Journal of Mathematical and Statistical Psychology (2006), 59, 397-417.
  • Zorn, C. A solution to separation in binary response models. Political Analysis (2005), 13, 157-170.
There are 34 citations in total.

Details

Primary Language English
Subjects Applied Mathematics
Journal Section Research Articles
Authors

Yasin Altinisik 0000-0001-9375-2276

Publication Date December 31, 2020
Submission Date September 2, 2019
Acceptance Date May 29, 2020
Published in Issue Year 2020 Volume: 69 Issue: 2

Cite

APA Altinisik, Y. (2020). A comparative study on the performance of frequentist and Bayesian estimation methods under separation in logistic regression. Communications Faculty of Sciences University of Ankara Series A1 Mathematics and Statistics, 69(2), 1083-1103. https://doi.org/10.31801/cfsuasmas.614492
AMA Altinisik Y. A comparative study on the performance of frequentist and Bayesian estimation methods under separation in logistic regression. Commun. Fac. Sci. Univ. Ank. Ser. A1 Math. Stat. December 2020;69(2):1083-1103. doi:10.31801/cfsuasmas.614492
Chicago Altinisik, Yasin. “A Comparative Study on the Performance of Frequentist and Bayesian Estimation Methods under Separation in Logistic Regression”. Communications Faculty of Sciences University of Ankara Series A1 Mathematics and Statistics 69, no. 2 (December 2020): 1083-1103. https://doi.org/10.31801/cfsuasmas.614492.
EndNote Altinisik Y (December 1, 2020) A comparative study on the performance of frequentist and Bayesian estimation methods under separation in logistic regression. Communications Faculty of Sciences University of Ankara Series A1 Mathematics and Statistics 69 2 1083–1103.
IEEE Y. Altinisik, “A comparative study on the performance of frequentist and Bayesian estimation methods under separation in logistic regression”, Commun. Fac. Sci. Univ. Ank. Ser. A1 Math. Stat., vol. 69, no. 2, pp. 1083–1103, 2020, doi: 10.31801/cfsuasmas.614492.
ISNAD Altinisik, Yasin. “A Comparative Study on the Performance of Frequentist and Bayesian Estimation Methods under Separation in Logistic Regression”. Communications Faculty of Sciences University of Ankara Series A1 Mathematics and Statistics 69/2 (December 2020), 1083-1103. https://doi.org/10.31801/cfsuasmas.614492.
JAMA Altinisik Y. A comparative study on the performance of frequentist and Bayesian estimation methods under separation in logistic regression. Commun. Fac. Sci. Univ. Ank. Ser. A1 Math. Stat. 2020;69:1083–1103.
MLA Altinisik, Yasin. “A Comparative Study on the Performance of Frequentist and Bayesian Estimation Methods under Separation in Logistic Regression”. Communications Faculty of Sciences University of Ankara Series A1 Mathematics and Statistics, vol. 69, no. 2, 2020, pp. 1083-0, doi:10.31801/cfsuasmas.614492.
Vancouver Altinisik Y. A comparative study on the performance of frequentist and Bayesian estimation methods under separation in logistic regression. Commun. Fac. Sci. Univ. Ank. Ser. A1 Math. Stat. 2020;69(2):1083-10.

Communications Faculty of Sciences University of Ankara Series A1 Mathematics and Statistics.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.