Research Article
BibTex RIS Cite
Year 2021, Volume: 21 Issue: 3, 263 - 272, 30.06.2021
https://doi.org/10.21121/eab.960840

Abstract

References

  • Allison P.D. (2008). Convergence failures in logistic regression. In: Proceedings of the SAS Global Forum 2008 Conference. SAS Institute Inc., Cary, NC. http://www2.sas.com/proceedings/ forum2008/360-2008.pdf
  • Cengiz, M.A., Terzi, E. Şenel, T. ve Murat, N. (2013). Lojistik regresyonda parametre tahmininde Bayesci bir yaklaşım. Afyon Kocatepe Üniversitesi Fen Bilimleri Dergisi, 12(2012), 15-22.
  • Derr R.E. (2009). Performing exact logistic regression with the SAS System-Revised 2009. Proceedings of the Twenty-fifth Annual SAS Users Group International Conference; Cary, NC; 2009: Citeseer.
  • Devika, S. Jeyaseelan, L. ve Sebastian, G. (2016). Analysis of sparse data in logistic regression in medical research: a newer approach. Journal of Postgraduate Medicine, 62(1), 26-31.
  • Eyduran, E. (2008). Usage of penalized maximum likelihood estimation method in medical research: an alternative to maximum likelihood estimation method, JRMS 13(6), 325- 330.
  • Firth D. (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80(1), 27-38.
  • Gavanji, R. (2019). Penalized Regression Methods for Modelling Rare Events Data with Application to Occupational Injury Study (Doctoral dissertation, University of Saskatchewan).
  • Gelman, A. ve Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, USA.
  • Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.V., Vehtari, A., Rubin, D.B. (2013). Bayesian Data Analysis, Third Edition. Chapman and Hall, London.
  • Gelman, A., Jakulin, A., Pittau, M.G., and Su, Y. (2009). A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 2(4), 1360–1383.
  • Greenland, S., Schwartzbaum, J.A., Finkle, W.D. (2000). Problems dur to small samples and sparse data in conditional regression analysis. American Journal of Epidemiology, 151(5), 531-539.
  • Guns, M., and Vanacker, V. (2012). Logistic regression applied to natural hazards: rare event logistic regression with replications. Natural Hazards and Earth System Sciences, 12(6), 1937-1947.
  • Heinze, G. And Schemper, M. (2002). A solution to the problem of separation in logistic regression. Statistics in Medicine, 21, 2409-2419.
  • King, E.N. ve Ryan, T.P. (2002). A preliminary investigation of maximum likelihood logistic regression versus exact logistic regression. The American Statistician, 56(3), 163-170.
  • King, G. and Zeng, L. (2001). Logistic regression in rare events data. Political Analysis. 9(2), 137-163.
  • Kocak, M. (2017). An empirical Bayesian approach in estimating odds ratios for rare or zero events. Turkiye Klinikleri J Biostat, 9(1), 1-11.
  • Mehta, C.R. and Patel, N.R. (1995). Exact logistic regression: theory and examples. Statistics in Medicine, 14(19), 2143-2160.
  • Muchlinski, D., Siroky, D., He, J., and Kocher, M. (2016). Comparing random forest with logistic regression for predicting class-imbalanced civil war onset data. Political Analysis, 87-103.
  • Paal, V.D. (2013). A comparison of different methods for modelling rare events data. Master Thesis. Universiteit Gent, Belgium.
  • Rainey, C. (2016). Dealing with separation in logistic regression models. Political Analysis, 2016(24), 339-355.
  • Soliman, A.M.A., MacLehose, R.F. and Carlson, A. (2013). Bayesian models with a weakly informative prior: A useful alternative for solving sparse data problems. Value In Health. 16(3), A48-A49.
  • Webb, M.C., Wilson, J.R. ve Chong, J. (2004). An analysis of quasi- complete binary data with logistic models: applications to alcohol abuse data. Journal of Data Science, 2(2004), 273-285.
  • Zorn, C. (2005). A solution to separation in binary logit models. Political Analysis, 13,157-170.

Comparison of Different Estimation Approaches in Rare Events Data

Year 2021, Volume: 21 Issue: 3, 263 - 272, 30.06.2021
https://doi.org/10.21121/eab.960840

Abstract

In social science researches, there may be cases where a category of the dependent variable is seen hundred times less (more) than the other category. Events like wars, mass migrations or coups in social sciences; an event of interest in binary variable(s) may have very low prevalence, resulting in low or even zero cell counts in one or two cells in the 2X2 tables of two factors. In this case, independent variable predict the dependent variable perfectly or almost perfectly, and this leads to an issue called complete or quasi-complete separation problem in statistical modelling. This study aims to compare three methods suggested in the literature for the quasi-complete separation in a real small dataset; penalized maximum likelihood (Firth-type), exact logistic regression and bayesian logistic regression. Methods were compared via odds ratios, odds’ standard error estimates, confidence intervals and statistical significance. Parameter estimates were obtained under three different models with binary and continuous variables. Results show that all methods can provide convergence in the presence of quasi-complete separation. In conclusion, bayesian logistic regression estimates tend to be superior than the other methods in terms of estimation of standard errors.

References

  • Allison P.D. (2008). Convergence failures in logistic regression. In: Proceedings of the SAS Global Forum 2008 Conference. SAS Institute Inc., Cary, NC. http://www2.sas.com/proceedings/ forum2008/360-2008.pdf
  • Cengiz, M.A., Terzi, E. Şenel, T. ve Murat, N. (2013). Lojistik regresyonda parametre tahmininde Bayesci bir yaklaşım. Afyon Kocatepe Üniversitesi Fen Bilimleri Dergisi, 12(2012), 15-22.
  • Derr R.E. (2009). Performing exact logistic regression with the SAS System-Revised 2009. Proceedings of the Twenty-fifth Annual SAS Users Group International Conference; Cary, NC; 2009: Citeseer.
  • Devika, S. Jeyaseelan, L. ve Sebastian, G. (2016). Analysis of sparse data in logistic regression in medical research: a newer approach. Journal of Postgraduate Medicine, 62(1), 26-31.
  • Eyduran, E. (2008). Usage of penalized maximum likelihood estimation method in medical research: an alternative to maximum likelihood estimation method, JRMS 13(6), 325- 330.
  • Firth D. (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80(1), 27-38.
  • Gavanji, R. (2019). Penalized Regression Methods for Modelling Rare Events Data with Application to Occupational Injury Study (Doctoral dissertation, University of Saskatchewan).
  • Gelman, A. ve Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, USA.
  • Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.V., Vehtari, A., Rubin, D.B. (2013). Bayesian Data Analysis, Third Edition. Chapman and Hall, London.
  • Gelman, A., Jakulin, A., Pittau, M.G., and Su, Y. (2009). A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 2(4), 1360–1383.
  • Greenland, S., Schwartzbaum, J.A., Finkle, W.D. (2000). Problems dur to small samples and sparse data in conditional regression analysis. American Journal of Epidemiology, 151(5), 531-539.
  • Guns, M., and Vanacker, V. (2012). Logistic regression applied to natural hazards: rare event logistic regression with replications. Natural Hazards and Earth System Sciences, 12(6), 1937-1947.
  • Heinze, G. And Schemper, M. (2002). A solution to the problem of separation in logistic regression. Statistics in Medicine, 21, 2409-2419.
  • King, E.N. ve Ryan, T.P. (2002). A preliminary investigation of maximum likelihood logistic regression versus exact logistic regression. The American Statistician, 56(3), 163-170.
  • King, G. and Zeng, L. (2001). Logistic regression in rare events data. Political Analysis. 9(2), 137-163.
  • Kocak, M. (2017). An empirical Bayesian approach in estimating odds ratios for rare or zero events. Turkiye Klinikleri J Biostat, 9(1), 1-11.
  • Mehta, C.R. and Patel, N.R. (1995). Exact logistic regression: theory and examples. Statistics in Medicine, 14(19), 2143-2160.
  • Muchlinski, D., Siroky, D., He, J., and Kocher, M. (2016). Comparing random forest with logistic regression for predicting class-imbalanced civil war onset data. Political Analysis, 87-103.
  • Paal, V.D. (2013). A comparison of different methods for modelling rare events data. Master Thesis. Universiteit Gent, Belgium.
  • Rainey, C. (2016). Dealing with separation in logistic regression models. Political Analysis, 2016(24), 339-355.
  • Soliman, A.M.A., MacLehose, R.F. and Carlson, A. (2013). Bayesian models with a weakly informative prior: A useful alternative for solving sparse data problems. Value In Health. 16(3), A48-A49.
  • Webb, M.C., Wilson, J.R. ve Chong, J. (2004). An analysis of quasi- complete binary data with logistic models: applications to alcohol abuse data. Journal of Data Science, 2(2004), 273-285.
  • Zorn, C. (2005). A solution to separation in binary logit models. Political Analysis, 13,157-170.
There are 23 citations in total.

Details

Primary Language English
Subjects Economics
Journal Section Articles
Authors

Ece Bacaksız This is me 0000-0003-0534-6011

Selçuk Koç This is me 0000-0001-7451-2699

Publication Date June 30, 2021
Acceptance Date June 23, 2021
Published in Issue Year 2021 Volume: 21 Issue: 3

Cite

APA Bacaksız, E., & Koç, S. (2021). Comparison of Different Estimation Approaches in Rare Events Data. Ege Academic Review, 21(3), 263-272. https://doi.org/10.21121/eab.960840
AMA Bacaksız E, Koç S. Comparison of Different Estimation Approaches in Rare Events Data. ear. June 2021;21(3):263-272. doi:10.21121/eab.960840
Chicago Bacaksız, Ece, and Selçuk Koç. “Comparison of Different Estimation Approaches in Rare Events Data”. Ege Academic Review 21, no. 3 (June 2021): 263-72. https://doi.org/10.21121/eab.960840.
EndNote Bacaksız E, Koç S (June 1, 2021) Comparison of Different Estimation Approaches in Rare Events Data. Ege Academic Review 21 3 263–272.
IEEE E. Bacaksız and S. Koç, “Comparison of Different Estimation Approaches in Rare Events Data”, ear, vol. 21, no. 3, pp. 263–272, 2021, doi: 10.21121/eab.960840.
ISNAD Bacaksız, Ece - Koç, Selçuk. “Comparison of Different Estimation Approaches in Rare Events Data”. Ege Academic Review 21/3 (June 2021), 263-272. https://doi.org/10.21121/eab.960840.
JAMA Bacaksız E, Koç S. Comparison of Different Estimation Approaches in Rare Events Data. ear. 2021;21:263–272.
MLA Bacaksız, Ece and Selçuk Koç. “Comparison of Different Estimation Approaches in Rare Events Data”. Ege Academic Review, vol. 21, no. 3, 2021, pp. 263-72, doi:10.21121/eab.960840.
Vancouver Bacaksız E, Koç S. Comparison of Different Estimation Approaches in Rare Events Data. ear. 2021;21(3):263-72.