In this Monte Carlo simulation study, the performance of six different propensity score methods implemented through weighting cases was investigated: inverse probability of treatment weighting, truncated inverse probability of treatment weighting, propensity score stratification, marginal mean weighting through propensity score stratification, optimal full propensity score matching, and marginal mean weighting through optimal full propensity score matching. These methods aim to reduce selection bias in estimates of the average treatment effect (ATE) in observational studies. For the estimation of standard errors of the ATE with weights, three methods were compared: weighted least squares (WLS), Taylor series linearization (TSL), and jackknife (JK). Results indicated that covariance adjustment extensions of the investigated propensity score methods, in combination with TSL and JK standard error estimation methods, remove the selection bias appropriately and provide the most accurate standard errors under the simulated conditions.
Abadie, A., & Imbens, G. W. (2006). Large Sample properties of matching estimators for average treatment effects. Econometrica, 74, 235-2667. https://doi.org/10.1111/j.1468-0262.2006.00655.x
Arpino, B., & Mealli, F. (2011). The specification of the propensity score in multilevel observational studies. Computational Statistics & Data Analysis, 55, 1770-1780. https://doi.org/10.1016/j.csda.2010.11.008
Asparouhov, T. (2006). General Multi-Level Modeling with Sampling Weights. Communications in Statistics: Theory and Methods, 35(3), 439-460. https://doi.org/10.1080/03610920500476598
Austin, P. C. (2009a). The relative ability of different propensity score methods to balance measured covariates between treated and untreated subjects in observational studies. Medical Decision Making, 29, 661-677. https://doi.org/10.1177/0272989X09341755
Austin, P. C. (2009b). Type I error rates, coverage of confidence intervals, and variance estimation in propensity-score matched analyses. The International Journal of Biostatistics, 5(1), Art. 13. https://doi.org/10.2202/1557-4679.1146
Austin, P. C. (2010a). The performance of different propensity-score methods for estimating differences in proportions (risk differences or absolute risk reductions) in observational studies. Statistics in Medicine, 29, 2137-2148. https://doi.org/10.1002/sim.3854
Austin, P. C. (2010b). Statistical criteria for selecting the optimal number of untreated subjects matched to each treated subject when using many-to-one matching on propensity score. Practice of Epidemiology, 172(9), 1092-1097. https://doi.org/10.1093/aje/kwq224
Austin P.C., Grootendorst P., & Anderson G.M. (2007). A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: A Monte Carlo study. Statistics in Medicine, 26, 734–753. https://doi.org/10.1002/sim.2580
Bang, H., & Robins, J. M. (2005). Doubly Robust Estimation in Missing Data and Causal Inference Models. Biometrics, 61(4), 962-973. https://doi.org/10.1111/j.1541-0420.2005.00377.x
Bembom, O., & van der Laan M. J. (2008). Data-adaptive selection of the truncation level for inverse-probability-of-treatment-weighted estimators. U.C. Berkeley Division of Biostatistics Working Paper Series. Paper 230.
Brookhart, M. A., Schneeweiss, S., Rothman, K. J., Glynn, R. J., Avorn, J., & Sturmer, T. (2006). Variable selection for propensity score models. American Journal of Epidemiology, 163, 1149-1156. https://doi.org/10.1093/aje/kwj149
Cepeda M. S., Boston, R., Farrar, J. T., & Strom, B. L., (2003). Optimal matching with a variable number of controls vs. a fixed number of controls for a cohort study: trade-offs. Journal of Clinical Epidemiology, 56, 230-237. https://doi.org/10.1016/S0895-4356(02)00583-8
Cochran, W. G. (1968). The effectiveness of adjustment by subclassification in removing bias in observational studies. Biometrics, 24, 295-313. https://doi.org/10.2307/2528036
Cochran, W.G., & Rubin, D. B. (1973). Controlling bias in observational studies: a review. Sankhya: The Indian Journal of Statistics, Series A 35(4), 417-446.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Academic Press. https://doi.org/10.4324/9780203771587
Crump, R. K., Hotz, V. J., Imbens, G. W., & Mitnik O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96, 187-199. https://doi.org/10.1093/biomet/asn055
Freedman, D. A. & Berk, R. A. (2008). Weighting regressions by propensity scores. Evaluation Review, 32(4), 392-409. https://doi.org/10.1177/0193841X08317586
Funk M. J., Westreich D., Wiesen C., Sturmer T., Brookhart M. A., & Davidian M. (2011). Doubly robust estimation of causal effects. American Journal of Epidemiology, 173(7), 761-767. https://doi.org/10.1093/aje/kwq439
Gu, X. S., & Rosenbaum, P. R. (1993). Comparison of multivariate matching methods: structures, distances, and algorithms. Journal of Computational and Graphical Statistics, 4, 405-420. https://doi.org/10.2307/1390693
Guo, S., & Fraser, M. W. (2010). Propensity score analysis: statistical methods and applications. Sage.
Hansen, B.B., & Klopfer, S.O. (2006) Optimal full matching and related designs via network flows. Journal of Computational and Graphical Statistics, 15, 609-627. https://doi.org/10.1198/106186006X137047
Harder, V. S., Stuart, E. A., & Anthony, J. C. (2010). Propensity score techniques and the assessment of measured covariate balance to test causal associations in psychological research. Psychological Methods, 15(3), 234-349. https://doi.org/10.1037/a0019623
Heckman, J. J. (1978). Dummy endogenous variables in simultaneous equations system. Econometrica, 47, 931-960. https://doi.org/10.2307/1909757
Heckman, J. J., Ichimura, H., & Todd, P. E. (1997). Matching as an econometric evaluation estimator: Evidence from evaluating a job training programme. Review of Economic Studies, 65, 261-294. https://doi.org/10.2307/2971733
Hernan, M. A., Hernandez-Diaz, S., & Robins, J. M. (2004). A structural approach to selection bias. Epidemiology, 82, 387-394. https://doi.org/10.1097/01.ede.0000135174.63482.43
Ho, D., Imai, K., King, G., & Stuart, A. E. (2007). Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis. 15(3), 199-236. https://doi.org/10.1093/pan/mpl013
Hong, G. (2012). Marginal mean weighting through stratification: a generalized method for evaluating multivalued and multiple treatments with nonexperimental data. Psychological methods, 17, 44-60. https://doi.org/10.1037/a0024918
Hong, G., & Hong, Y. (2008). Reading instruction time and homogeneous grouping in kindergarten: An application of marginal mean weighting through stratification. Educational Evaluation and Policy Analysis, 31, 54-81. https://doi.org/10.3102/0162373708328259
Hong, G., & Raudenbush, S. W. (2006). Evaluating kindergarten retention policy: A case study of causal inference for multilevel observational data. Journal of the American Statistical Association, 101, 901-910. https://doi.org/10.1198/016214506000000447
Hoogland, J. J., & Boomsma, A. (1998). Robustness studies in covariance structure modeling: An overview and meta-analysis. Sociological Methods & Research, 26, 523-539. https://doi.org/10.1177/0049124198026003003
Horvitz, D. G., & Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of American Statistical Association, 47, 663-685. https://doi.org/10.2307/2280784
Kang, J. D. Y., & Schafer, J. L. (2007). Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, 22(4), 523-539. https://doi.org/10.1214/07-STS227
Leite, W. L. (2016). Practical propensity score methods using R. Sage.
Leite, W. L., Aydin, B., & Gurel, S. (2019). A comparison of propensity score weighting methods for evaluating the effects of programs with multiple versions. Journal of Experimental Education, 87(1), 75-88. https://doi.org/10.1080/00220973.2017.1409179
Leite, W. L., Jimenez, F., Kaya, Y., Stapleton, L. M., MacInnes, J. W., & Sandbach, R. (2015). An Evaluation of Weighting Methods Based on Propensity Scores to Reduce Selection Bias in Multilevel Observational Studies. Multivariate Behavioral Research, 50, 265-284. https://doi.org/10.1080/00273171.2014.991018
Lohr, S. L. (1999). Sampling: design and analysis. Duxbury Press.
Lumley, T. (2011). “survey: analysis of complex survey samples”. R package version 3.62.1
Lunceford, J. K., & Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Statistics in Medicine, 23, 2937-2960. https://doi.org/10.1002/sim.1903
McCaffrey, D. F., Ridgeway, G., & Morral, A. R. (2004). Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychological Methods, 9, 403-425. https://doi.org/10.1037/1082-989X.9.4.403
McKelvey, R. D., & Zavoina,W. (1975).A statistical model for the analysis of ordinal level dependent variables. Journal of Mathematical Sociology, 4, 103-120. https://doi.org/10.1080/0022250X.1975.9989847
National Center for Education Statistics. (2010). School survey on crime and safety. Retrieved from http://nces.ed.gov/surveys/ssocs on June 1 2011.
Neugebauer, R., & van der Laan, M. (2005). Why prefer double robust estimates in causal inference?. Journal of Statistical Planning and Inference, 129, 405-426. https://doi.org/10.1016/j.jspi.2004.06.060
Olejnik, S., & Algina, J. (2003). Generalized eta and omega squared statistics: measures of effect size for some common research designs. Psychological Methods, 8(4), 434-447. https://doi.org/10.1037/1082-989X.8.4.434
R Development Core Team. (2011). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrived from http://www.Rproject.org.
Robins, J. M., Hernan, M. A., & Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11, 550-560. https://doi.org/10.1097/00001648-200009000-00011
Robins, J. M., & Rotnitzky A. (2001). Comment on the Peter J. Bickel and Jaimyoung Kwon, ‘Inference for semiparametric models: Some questions and an answer’. Statistica Sinica, 11, 920-936
Rosenbaum, R. P. (1989). Optimal matching for observational studies. Journal of the American Statistical Association, 408, 1024-1032. https://doi.org/10.2307/2290079
Rosenbaum, R. P. (1991). A characterization of optimal designs for observational studies. Journal of the Royal Statistics Society, 53, 597-610. https://doi.org/10.1111/j.2517-6161.1991.tb01848.x
Rosenbaum, P. R. (2010). Design of observational studies. Springer.
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrica, 70, 41-55. https://doi.org/10.1093/biomet/70.1.41
Rosenbaum, P. R., & Rubin, D. B. (1984). Reducing bias in observational studies using subclassification on the propensity score. Journal of the American Statistical Association, 79, 516-524. https://doi.org/10.2307/2288398
Rodgers, J. L. (1999). The bootstrap, the jackknife, and the randomization test: A sampling taxonomy. Multivariate Behavioral Research, 34, 441-456. https://doi.org/10.1207/S15327906MBR3404_2
Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66, 688-701. https://doi.org/10.1037/h0037350
Rubin, D. B. (2007). Statistical inference for causal effects, with emphasis on applications in epidemiology and medical statistics. Handbook of Statistics, 27, 28-63. https://doi.org/10.1016/S0169-7161(07)27002-6
Schafer, J. L. & Kang, J. (2008). Average causal effects from nonrandomized studies: a practical guide and simulated example. Psychological Method, 13(4), 279-313. https://doi.org/10.1037/a0014268
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin.
Stapleton, L. (2008). Chapter18: Analysis of data from complex surveys. In: E. D. de Leeuw, J. J. Hox & D. A. Dillman. International handbook of survey methodology. Psychology Press.
Steiner, P. M., Cook, T. D., Shadish, W. R., & Clark, M. H. (2010). The importance of covariate selection in controlling for selection bias in observational studies. Psychological Methods, 15(3), 250-276. https://doi.org/10.1037/a0018719
Strayhorn, T. L. (2009). Accessing and analyzing national databases. In T. J. Kowalski & T. J. Lasley II (Eds.), Handbook of data-based decision making in education (pp. 105-122). NY: Routledge.
Sturmer, T., Rothman, K. J., Avorn, J., & Glynn, R. J. (2010). Treatment effects in the presence of unmeasured confounding: dealing with observations in the tails of the propensity score distribution-a simulation study. Practice of Epidemiology, 172(7), 842-854. https://doi.org/10.1093/aje/kwq198
Stuart, E. A. (2010). Matching methods for causal inference: A review and look forward. Statistical Science, 25(1), 1-21. https://doi.org/10.1214/09-STS313
Thoemmes, F. J. & Kim, E. S. (2011). A systematic review of propensity score methods in the social sciences. Multivariate Behavioral Research, 46, 90-118. https://doi.org/10.1080/00273171.2011.540475
Thoemmes, F. J. & West, S. (2011). The use of propensity scores for nonrandomized designs with clustered data. Multivariate Behavioral Research, 46, 514-543. https://doi.org/10.1080/00273171.2011.569395
U.S. Department of Education, Institute of Education Sciences, & What Works Clearinghouse. (2013). What Works Clearinghouse: Procedures and Standards Handbook (Version 3.0). Retrieved from Washington, DC: http://whatworks.ed.gov
Venables, W. N. & Ripley, B. D. (2002). Modern Applied Statistics with S (4th ed.). Springer.
Weitzen S., Lapane K. L., Toledano A. Y., Hume A. L., & Mor V. (2004). Principles for modeling propensity scores in medical research: A systematic literature review. Pharmacoepidemiology and Drug Safety, 13, 841–853. https://doi.org/10.1002/pds.969
Winship, C. & Morgan, S. (1999). The estimation of causal effects from observational data. Annual Review of Sociology, 25, 659-706.
Wolter, K. M. (2007). Introduction to Variance Estimation. Springer.
Ortalama İşlem Etkisi Kestiriminde Seçim Yanlılığını Gidermek İçin Eğilim Puanı Ağırlıklandırma Metotlarının Karşılaştırılması
Yıl 2023,
Cilt: 2023 Sayı: 21, 989 - 1031, 31.10.2023
Bu Monte Carlo simulasyon çalışmasında, ters olasılık ağırlıklandırması, kesilmiş ters olasılık ağırlıklandırması, eğilim puanı tabakalandırması, eğilim puanı tabakalandırması üzerinden marjinal ortalama ağırlıklandırması, optimal tam eğilim puanı eşleştirmesi ve optimal tam eğilim puanı eşleştirmesi üzerinden marjinal ortalama ağırlıklandırması olmak üzere bireylerin ağırlıklandırmalarına dayalı altı farklı eğilim puanı metodu uygulamasının performansı araştırılmıştır. Bu metotlar gözlemsel çalışmalarda kestirilen ortalama işlem etkisinde bulunan seçim yanlılığını düşürmeyi amaçlar. Ağırlıklandırma ile ortalama işlem etkisinin standart hatası kestiriminde ağırlıklandırılmış en küçük kareler, Taylor serileri doğrusallaştırma ve jackknife metotları kullanılmıştır. Araştırma sonucunda, simule edilen bütün durumlarda kovaryans düzeltmesi ilaveli ağırlıklandırmaya dayalı eğilim puanı metotlarının Taylor serileri doğrusallaştırma ve jackknife standart hata kestirim metotları ile birlikte kullanılması ile seçim yanlılığının uygun bir şekilde ortadan kaldırdığı ve doğru standart hataların kestirildiği bulunmuştur.
Abadie, A., & Imbens, G. W. (2006). Large Sample properties of matching estimators for average treatment effects. Econometrica, 74, 235-2667. https://doi.org/10.1111/j.1468-0262.2006.00655.x
Arpino, B., & Mealli, F. (2011). The specification of the propensity score in multilevel observational studies. Computational Statistics & Data Analysis, 55, 1770-1780. https://doi.org/10.1016/j.csda.2010.11.008
Asparouhov, T. (2006). General Multi-Level Modeling with Sampling Weights. Communications in Statistics: Theory and Methods, 35(3), 439-460. https://doi.org/10.1080/03610920500476598
Austin, P. C. (2009a). The relative ability of different propensity score methods to balance measured covariates between treated and untreated subjects in observational studies. Medical Decision Making, 29, 661-677. https://doi.org/10.1177/0272989X09341755
Austin, P. C. (2009b). Type I error rates, coverage of confidence intervals, and variance estimation in propensity-score matched analyses. The International Journal of Biostatistics, 5(1), Art. 13. https://doi.org/10.2202/1557-4679.1146
Austin, P. C. (2010a). The performance of different propensity-score methods for estimating differences in proportions (risk differences or absolute risk reductions) in observational studies. Statistics in Medicine, 29, 2137-2148. https://doi.org/10.1002/sim.3854
Austin, P. C. (2010b). Statistical criteria for selecting the optimal number of untreated subjects matched to each treated subject when using many-to-one matching on propensity score. Practice of Epidemiology, 172(9), 1092-1097. https://doi.org/10.1093/aje/kwq224
Austin P.C., Grootendorst P., & Anderson G.M. (2007). A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: A Monte Carlo study. Statistics in Medicine, 26, 734–753. https://doi.org/10.1002/sim.2580
Bang, H., & Robins, J. M. (2005). Doubly Robust Estimation in Missing Data and Causal Inference Models. Biometrics, 61(4), 962-973. https://doi.org/10.1111/j.1541-0420.2005.00377.x
Bembom, O., & van der Laan M. J. (2008). Data-adaptive selection of the truncation level for inverse-probability-of-treatment-weighted estimators. U.C. Berkeley Division of Biostatistics Working Paper Series. Paper 230.
Brookhart, M. A., Schneeweiss, S., Rothman, K. J., Glynn, R. J., Avorn, J., & Sturmer, T. (2006). Variable selection for propensity score models. American Journal of Epidemiology, 163, 1149-1156. https://doi.org/10.1093/aje/kwj149
Cepeda M. S., Boston, R., Farrar, J. T., & Strom, B. L., (2003). Optimal matching with a variable number of controls vs. a fixed number of controls for a cohort study: trade-offs. Journal of Clinical Epidemiology, 56, 230-237. https://doi.org/10.1016/S0895-4356(02)00583-8
Cochran, W. G. (1968). The effectiveness of adjustment by subclassification in removing bias in observational studies. Biometrics, 24, 295-313. https://doi.org/10.2307/2528036
Cochran, W.G., & Rubin, D. B. (1973). Controlling bias in observational studies: a review. Sankhya: The Indian Journal of Statistics, Series A 35(4), 417-446.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Academic Press. https://doi.org/10.4324/9780203771587
Crump, R. K., Hotz, V. J., Imbens, G. W., & Mitnik O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika, 96, 187-199. https://doi.org/10.1093/biomet/asn055
Freedman, D. A. & Berk, R. A. (2008). Weighting regressions by propensity scores. Evaluation Review, 32(4), 392-409. https://doi.org/10.1177/0193841X08317586
Funk M. J., Westreich D., Wiesen C., Sturmer T., Brookhart M. A., & Davidian M. (2011). Doubly robust estimation of causal effects. American Journal of Epidemiology, 173(7), 761-767. https://doi.org/10.1093/aje/kwq439
Gu, X. S., & Rosenbaum, P. R. (1993). Comparison of multivariate matching methods: structures, distances, and algorithms. Journal of Computational and Graphical Statistics, 4, 405-420. https://doi.org/10.2307/1390693
Guo, S., & Fraser, M. W. (2010). Propensity score analysis: statistical methods and applications. Sage.
Hansen, B.B., & Klopfer, S.O. (2006) Optimal full matching and related designs via network flows. Journal of Computational and Graphical Statistics, 15, 609-627. https://doi.org/10.1198/106186006X137047
Harder, V. S., Stuart, E. A., & Anthony, J. C. (2010). Propensity score techniques and the assessment of measured covariate balance to test causal associations in psychological research. Psychological Methods, 15(3), 234-349. https://doi.org/10.1037/a0019623
Heckman, J. J. (1978). Dummy endogenous variables in simultaneous equations system. Econometrica, 47, 931-960. https://doi.org/10.2307/1909757
Heckman, J. J., Ichimura, H., & Todd, P. E. (1997). Matching as an econometric evaluation estimator: Evidence from evaluating a job training programme. Review of Economic Studies, 65, 261-294. https://doi.org/10.2307/2971733
Hernan, M. A., Hernandez-Diaz, S., & Robins, J. M. (2004). A structural approach to selection bias. Epidemiology, 82, 387-394. https://doi.org/10.1097/01.ede.0000135174.63482.43
Ho, D., Imai, K., King, G., & Stuart, A. E. (2007). Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis. 15(3), 199-236. https://doi.org/10.1093/pan/mpl013
Hong, G. (2012). Marginal mean weighting through stratification: a generalized method for evaluating multivalued and multiple treatments with nonexperimental data. Psychological methods, 17, 44-60. https://doi.org/10.1037/a0024918
Hong, G., & Hong, Y. (2008). Reading instruction time and homogeneous grouping in kindergarten: An application of marginal mean weighting through stratification. Educational Evaluation and Policy Analysis, 31, 54-81. https://doi.org/10.3102/0162373708328259
Hong, G., & Raudenbush, S. W. (2006). Evaluating kindergarten retention policy: A case study of causal inference for multilevel observational data. Journal of the American Statistical Association, 101, 901-910. https://doi.org/10.1198/016214506000000447
Hoogland, J. J., & Boomsma, A. (1998). Robustness studies in covariance structure modeling: An overview and meta-analysis. Sociological Methods & Research, 26, 523-539. https://doi.org/10.1177/0049124198026003003
Horvitz, D. G., & Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of American Statistical Association, 47, 663-685. https://doi.org/10.2307/2280784
Kang, J. D. Y., & Schafer, J. L. (2007). Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, 22(4), 523-539. https://doi.org/10.1214/07-STS227
Leite, W. L. (2016). Practical propensity score methods using R. Sage.
Leite, W. L., Aydin, B., & Gurel, S. (2019). A comparison of propensity score weighting methods for evaluating the effects of programs with multiple versions. Journal of Experimental Education, 87(1), 75-88. https://doi.org/10.1080/00220973.2017.1409179
Leite, W. L., Jimenez, F., Kaya, Y., Stapleton, L. M., MacInnes, J. W., & Sandbach, R. (2015). An Evaluation of Weighting Methods Based on Propensity Scores to Reduce Selection Bias in Multilevel Observational Studies. Multivariate Behavioral Research, 50, 265-284. https://doi.org/10.1080/00273171.2014.991018
Lohr, S. L. (1999). Sampling: design and analysis. Duxbury Press.
Lumley, T. (2011). “survey: analysis of complex survey samples”. R package version 3.62.1
Lunceford, J. K., & Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Statistics in Medicine, 23, 2937-2960. https://doi.org/10.1002/sim.1903
McCaffrey, D. F., Ridgeway, G., & Morral, A. R. (2004). Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychological Methods, 9, 403-425. https://doi.org/10.1037/1082-989X.9.4.403
McKelvey, R. D., & Zavoina,W. (1975).A statistical model for the analysis of ordinal level dependent variables. Journal of Mathematical Sociology, 4, 103-120. https://doi.org/10.1080/0022250X.1975.9989847
National Center for Education Statistics. (2010). School survey on crime and safety. Retrieved from http://nces.ed.gov/surveys/ssocs on June 1 2011.
Neugebauer, R., & van der Laan, M. (2005). Why prefer double robust estimates in causal inference?. Journal of Statistical Planning and Inference, 129, 405-426. https://doi.org/10.1016/j.jspi.2004.06.060
Olejnik, S., & Algina, J. (2003). Generalized eta and omega squared statistics: measures of effect size for some common research designs. Psychological Methods, 8(4), 434-447. https://doi.org/10.1037/1082-989X.8.4.434
R Development Core Team. (2011). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrived from http://www.Rproject.org.
Robins, J. M., Hernan, M. A., & Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11, 550-560. https://doi.org/10.1097/00001648-200009000-00011
Robins, J. M., & Rotnitzky A. (2001). Comment on the Peter J. Bickel and Jaimyoung Kwon, ‘Inference for semiparametric models: Some questions and an answer’. Statistica Sinica, 11, 920-936
Rosenbaum, R. P. (1989). Optimal matching for observational studies. Journal of the American Statistical Association, 408, 1024-1032. https://doi.org/10.2307/2290079
Rosenbaum, R. P. (1991). A characterization of optimal designs for observational studies. Journal of the Royal Statistics Society, 53, 597-610. https://doi.org/10.1111/j.2517-6161.1991.tb01848.x
Rosenbaum, P. R. (2010). Design of observational studies. Springer.
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrica, 70, 41-55. https://doi.org/10.1093/biomet/70.1.41
Rosenbaum, P. R., & Rubin, D. B. (1984). Reducing bias in observational studies using subclassification on the propensity score. Journal of the American Statistical Association, 79, 516-524. https://doi.org/10.2307/2288398
Rodgers, J. L. (1999). The bootstrap, the jackknife, and the randomization test: A sampling taxonomy. Multivariate Behavioral Research, 34, 441-456. https://doi.org/10.1207/S15327906MBR3404_2
Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66, 688-701. https://doi.org/10.1037/h0037350
Rubin, D. B. (2007). Statistical inference for causal effects, with emphasis on applications in epidemiology and medical statistics. Handbook of Statistics, 27, 28-63. https://doi.org/10.1016/S0169-7161(07)27002-6
Schafer, J. L. & Kang, J. (2008). Average causal effects from nonrandomized studies: a practical guide and simulated example. Psychological Method, 13(4), 279-313. https://doi.org/10.1037/a0014268
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin.
Stapleton, L. (2008). Chapter18: Analysis of data from complex surveys. In: E. D. de Leeuw, J. J. Hox & D. A. Dillman. International handbook of survey methodology. Psychology Press.
Steiner, P. M., Cook, T. D., Shadish, W. R., & Clark, M. H. (2010). The importance of covariate selection in controlling for selection bias in observational studies. Psychological Methods, 15(3), 250-276. https://doi.org/10.1037/a0018719
Strayhorn, T. L. (2009). Accessing and analyzing national databases. In T. J. Kowalski & T. J. Lasley II (Eds.), Handbook of data-based decision making in education (pp. 105-122). NY: Routledge.
Sturmer, T., Rothman, K. J., Avorn, J., & Glynn, R. J. (2010). Treatment effects in the presence of unmeasured confounding: dealing with observations in the tails of the propensity score distribution-a simulation study. Practice of Epidemiology, 172(7), 842-854. https://doi.org/10.1093/aje/kwq198
Stuart, E. A. (2010). Matching methods for causal inference: A review and look forward. Statistical Science, 25(1), 1-21. https://doi.org/10.1214/09-STS313
Thoemmes, F. J. & Kim, E. S. (2011). A systematic review of propensity score methods in the social sciences. Multivariate Behavioral Research, 46, 90-118. https://doi.org/10.1080/00273171.2011.540475
Thoemmes, F. J. & West, S. (2011). The use of propensity scores for nonrandomized designs with clustered data. Multivariate Behavioral Research, 46, 514-543. https://doi.org/10.1080/00273171.2011.569395
U.S. Department of Education, Institute of Education Sciences, & What Works Clearinghouse. (2013). What Works Clearinghouse: Procedures and Standards Handbook (Version 3.0). Retrieved from Washington, DC: http://whatworks.ed.gov
Venables, W. N. & Ripley, B. D. (2002). Modern Applied Statistics with S (4th ed.). Springer.
Weitzen S., Lapane K. L., Toledano A. Y., Hume A. L., & Mor V. (2004). Principles for modeling propensity scores in medical research: A systematic literature review. Pharmacoepidemiology and Drug Safety, 13, 841–853. https://doi.org/10.1002/pds.969
Winship, C. & Morgan, S. (1999). The estimation of causal effects from observational data. Annual Review of Sociology, 25, 659-706.
Wolter, K. M. (2007). Introduction to Variance Estimation. Springer.
Gürel, S., & Leite, W. L. (2023). Comparison of Propensity Score Weighting Methods to Remove Selection Bias in Average Treatment Effect Estimates. International Journal of Turkish Education Sciences, 2023(21), 989-1031. https://doi.org/10.46778/goputeb.1312865