Allison P.D. (2008). Convergence failures in logistic regression.
In: Proceedings of the SAS Global Forum 2008 Conference.
SAS Institute Inc., Cary, NC. http://www2.sas.com/proceedings/
forum2008/360-2008.pdf
Cengiz, M.A., Terzi, E. Şenel, T. ve Murat, N. (2013). Lojistik regresyonda
parametre tahmininde Bayesci bir yaklaşım. Afyon
Kocatepe Üniversitesi Fen Bilimleri Dergisi, 12(2012), 15-22.
Derr R.E. (2009). Performing exact logistic regression with the
SAS System-Revised 2009. Proceedings of the Twenty-fifth
Annual SAS Users Group International Conference; Cary,
NC; 2009: Citeseer.
Devika, S. Jeyaseelan, L. ve Sebastian, G. (2016). Analysis of
sparse data in logistic regression in medical research: a
newer approach. Journal of Postgraduate Medicine, 62(1),
26-31.
Eyduran, E. (2008). Usage of penalized maximum likelihood
estimation method in medical research: an alternative to
maximum likelihood estimation method, JRMS 13(6), 325-
330.
Firth D. (1993). Bias reduction of maximum likelihood estimates.
Biometrika, 80(1), 27-38.
Gavanji, R. (2019). Penalized Regression Methods for Modelling
Rare Events Data with Application to Occupational Injury
Study (Doctoral dissertation, University of Saskatchewan).
Gelman, A. ve Hill, J. (2007). Data analysis using regression and
multilevel/hierarchical models. Cambridge University Press,
USA.
Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.V., Vehtari, A.,
Rubin, D.B. (2013). Bayesian Data Analysis, Third Edition.
Chapman and Hall, London.
Gelman, A., Jakulin, A., Pittau, M.G., and Su, Y. (2009). A weakly
informative default prior distribution for logistic and other
regression models. The Annals of Applied Statistics, 2(4),
1360–1383.
Greenland, S., Schwartzbaum, J.A., Finkle, W.D. (2000). Problems
dur to small samples and sparse data in conditional
regression analysis. American Journal of Epidemiology,
151(5), 531-539.
Guns, M., and Vanacker, V. (2012). Logistic regression applied
to natural hazards: rare event logistic regression with replications.
Natural Hazards and Earth System Sciences, 12(6),
1937-1947.
Heinze, G. And Schemper, M. (2002). A solution to the problem
of separation in logistic regression. Statistics in Medicine, 21,
2409-2419.
King, E.N. ve Ryan, T.P. (2002). A preliminary investigation of
maximum likelihood logistic regression versus exact logistic
regression. The American Statistician, 56(3), 163-170.
King, G. and Zeng, L. (2001). Logistic regression in rare events
data. Political Analysis. 9(2), 137-163.
Kocak, M. (2017). An empirical Bayesian approach in estimating
odds ratios for rare or zero events. Turkiye Klinikleri J Biostat,
9(1), 1-11.
Mehta, C.R. and Patel, N.R. (1995). Exact logistic regression: theory
and examples. Statistics in Medicine, 14(19), 2143-2160.
Muchlinski, D., Siroky, D., He, J., and Kocher, M. (2016). Comparing
random forest with logistic regression for predicting
class-imbalanced civil war onset data. Political Analysis,
87-103.
Paal, V.D. (2013). A comparison of different methods for modelling
rare events data. Master Thesis. Universiteit Gent,
Belgium.
Rainey, C. (2016). Dealing with separation in logistic regression
models. Political Analysis, 2016(24), 339-355.
Soliman, A.M.A., MacLehose, R.F. and Carlson, A. (2013). Bayesian
models with a weakly informative prior: A useful alternative
for solving sparse data problems. Value In Health.
16(3), A48-A49.
Webb, M.C., Wilson, J.R. ve Chong, J. (2004). An analysis of quasi-
complete binary data with logistic models: applications
to alcohol abuse data. Journal of Data Science, 2(2004),
273-285.
Zorn, C. (2005). A solution to separation in binary logit models.
Political Analysis, 13,157-170.
Comparison of Different Estimation Approaches in Rare Events Data
Year 2021,
Volume: 21 Issue: 3, 263 - 272, 30.06.2021
In social science researches, there may be cases where a category of the dependent variable is seen hundred times less (more) than the other category. Events like wars, mass migrations or coups in social sciences; an event of interest in binary variable(s) may have very low prevalence, resulting in low or even zero cell counts in one or two cells in the 2X2 tables of two factors. In this case, independent variable predict the dependent variable perfectly or almost perfectly, and this leads to an issue called complete or quasi-complete separation problem in statistical modelling. This study aims to compare three methods suggested in the literature for the quasi-complete separation in a real small dataset; penalized maximum likelihood (Firth-type), exact logistic regression and bayesian logistic regression. Methods were compared via odds ratios, odds’ standard error estimates, confidence intervals and statistical significance. Parameter estimates were obtained under three different models with binary and continuous variables. Results show that all methods can provide convergence in the presence of quasi-complete separation. In conclusion, bayesian logistic regression estimates tend to be superior than the other methods in terms of estimation of standard errors.
Allison P.D. (2008). Convergence failures in logistic regression.
In: Proceedings of the SAS Global Forum 2008 Conference.
SAS Institute Inc., Cary, NC. http://www2.sas.com/proceedings/
forum2008/360-2008.pdf
Cengiz, M.A., Terzi, E. Şenel, T. ve Murat, N. (2013). Lojistik regresyonda
parametre tahmininde Bayesci bir yaklaşım. Afyon
Kocatepe Üniversitesi Fen Bilimleri Dergisi, 12(2012), 15-22.
Derr R.E. (2009). Performing exact logistic regression with the
SAS System-Revised 2009. Proceedings of the Twenty-fifth
Annual SAS Users Group International Conference; Cary,
NC; 2009: Citeseer.
Devika, S. Jeyaseelan, L. ve Sebastian, G. (2016). Analysis of
sparse data in logistic regression in medical research: a
newer approach. Journal of Postgraduate Medicine, 62(1),
26-31.
Eyduran, E. (2008). Usage of penalized maximum likelihood
estimation method in medical research: an alternative to
maximum likelihood estimation method, JRMS 13(6), 325-
330.
Firth D. (1993). Bias reduction of maximum likelihood estimates.
Biometrika, 80(1), 27-38.
Gavanji, R. (2019). Penalized Regression Methods for Modelling
Rare Events Data with Application to Occupational Injury
Study (Doctoral dissertation, University of Saskatchewan).
Gelman, A. ve Hill, J. (2007). Data analysis using regression and
multilevel/hierarchical models. Cambridge University Press,
USA.
Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.V., Vehtari, A.,
Rubin, D.B. (2013). Bayesian Data Analysis, Third Edition.
Chapman and Hall, London.
Gelman, A., Jakulin, A., Pittau, M.G., and Su, Y. (2009). A weakly
informative default prior distribution for logistic and other
regression models. The Annals of Applied Statistics, 2(4),
1360–1383.
Greenland, S., Schwartzbaum, J.A., Finkle, W.D. (2000). Problems
dur to small samples and sparse data in conditional
regression analysis. American Journal of Epidemiology,
151(5), 531-539.
Guns, M., and Vanacker, V. (2012). Logistic regression applied
to natural hazards: rare event logistic regression with replications.
Natural Hazards and Earth System Sciences, 12(6),
1937-1947.
Heinze, G. And Schemper, M. (2002). A solution to the problem
of separation in logistic regression. Statistics in Medicine, 21,
2409-2419.
King, E.N. ve Ryan, T.P. (2002). A preliminary investigation of
maximum likelihood logistic regression versus exact logistic
regression. The American Statistician, 56(3), 163-170.
King, G. and Zeng, L. (2001). Logistic regression in rare events
data. Political Analysis. 9(2), 137-163.
Kocak, M. (2017). An empirical Bayesian approach in estimating
odds ratios for rare or zero events. Turkiye Klinikleri J Biostat,
9(1), 1-11.
Mehta, C.R. and Patel, N.R. (1995). Exact logistic regression: theory
and examples. Statistics in Medicine, 14(19), 2143-2160.
Muchlinski, D., Siroky, D., He, J., and Kocher, M. (2016). Comparing
random forest with logistic regression for predicting
class-imbalanced civil war onset data. Political Analysis,
87-103.
Paal, V.D. (2013). A comparison of different methods for modelling
rare events data. Master Thesis. Universiteit Gent,
Belgium.
Rainey, C. (2016). Dealing with separation in logistic regression
models. Political Analysis, 2016(24), 339-355.
Soliman, A.M.A., MacLehose, R.F. and Carlson, A. (2013). Bayesian
models with a weakly informative prior: A useful alternative
for solving sparse data problems. Value In Health.
16(3), A48-A49.
Webb, M.C., Wilson, J.R. ve Chong, J. (2004). An analysis of quasi-
complete binary data with logistic models: applications
to alcohol abuse data. Journal of Data Science, 2(2004),
273-285.
Zorn, C. (2005). A solution to separation in binary logit models.
Political Analysis, 13,157-170.
Bacaksız, E., & Koç, S. (2021). Comparison of Different Estimation Approaches in Rare Events Data. Ege Academic Review, 21(3), 263-272. https://doi.org/10.21121/eab.960840
AMA
Bacaksız E, Koç S. Comparison of Different Estimation Approaches in Rare Events Data. ear. June 2021;21(3):263-272. doi:10.21121/eab.960840
Chicago
Bacaksız, Ece, and Selçuk Koç. “Comparison of Different Estimation Approaches in Rare Events Data”. Ege Academic Review 21, no. 3 (June 2021): 263-72. https://doi.org/10.21121/eab.960840.
EndNote
Bacaksız E, Koç S (June 1, 2021) Comparison of Different Estimation Approaches in Rare Events Data. Ege Academic Review 21 3 263–272.
IEEE
E. Bacaksız and S. Koç, “Comparison of Different Estimation Approaches in Rare Events Data”, ear, vol. 21, no. 3, pp. 263–272, 2021, doi: 10.21121/eab.960840.
ISNAD
Bacaksız, Ece - Koç, Selçuk. “Comparison of Different Estimation Approaches in Rare Events Data”. Ege Academic Review 21/3 (June 2021), 263-272. https://doi.org/10.21121/eab.960840.
JAMA
Bacaksız E, Koç S. Comparison of Different Estimation Approaches in Rare Events Data. ear. 2021;21:263–272.
MLA
Bacaksız, Ece and Selçuk Koç. “Comparison of Different Estimation Approaches in Rare Events Data”. Ege Academic Review, vol. 21, no. 3, 2021, pp. 263-72, doi:10.21121/eab.960840.
Vancouver
Bacaksız E, Koç S. Comparison of Different Estimation Approaches in Rare Events Data. ear. 2021;21(3):263-72.