Sample Size Determination and Optimal Design of Randomized/Non-equivalent Pretest-posttest Control-group Designs

Metin Bulus

doi:10.17984/adyuebd.941434

Other

Sample Size Determination and Optimal Design of Randomized/Non-equivalent Pretest-posttest Control-group Designs

Year 2021, Volume: 11 Issue: 1, 48 - 69, 29.06.2021

Metin Bulus

https://doi.org/10.17984/adyuebd.941434

Cited By: 9

Abstract

A recent systematic review of experimental studies conducted in Turkey between 2010 and 2020 reported that small sample sizes had been a significant drawback (Bulus & Koyuncu, 2021). A small chunk of the studies in the review were randomized pretest-posttest control-group designs. In contrast, the overwhelming majority of them were non-equivalent pretest-posttest control-group designs (no randomization). They had an average sample size below 70 for different domains and outcomes. Designing experimental studies with such small sample sizes implies a strong (and perhaps an erroneous) assumption about the minimum relevant effect size (MRES) of an intervention; that is, a standardized treatment effect of Cohen’s d < 0.50 is not relevant to education policy or practice. Thus, an introduction to sample size determination for randomized/non-equivalent pretest-posttest control group designs is warranted. This study describes nuts and bolts of sample size determination (or power analysis). It also derives expressions for optimal design under differential cost per treatment and control units, and implements these expressions in an Excel workbook. Finally, this study provides convenient tables to guide sample size decisions for MRES values between 0.20 ≤ Cohen’s d ≤ 0.50.

Keywords

experimental design , pretest-posttest , random assignment , sample size , power analysis , optimal design , non-equivalent control-group design

References

Bloom, H. S. (2006). The core analytics of randomized experiments for social research (MDRC Working Papers on Research Methodology). http://www.mdrc.org/publications/437/full.pdf
Brunner, M., Keller, U., Wenger, M., Fischbach, A., & Lüdtke, O. (2018). Between-school variation in students' achievement, motivation, affect, and learning strategies: Results from 81 countries for planning group-randomized trials in education. Journal of Research on Educational Effectiveness, 11(3), 452-478. https://doi.org/10.1080/19345747.2017.1375584
Bulus, M., & Dong, N. (2021a). Bound constrained optimization of sample sizes subject to monetary restrictions in planning of multilevel randomized trials and regression discontinuity studies. The Journal of Experimental Education, 89(2), 379–401. https://doi.org/10.1080/00220973.2019.1636197
Bulus, M., & Dong, N. (2021b). cosa: Bound constrained optimal sample size allocation. R package version 2.1.0. https://CRAN.R-project.org/package=cosa Bulus, M., Dong, N., Kelcey, B., & Spybrook, J. (2021). PowerUpR: Power analysis tools for multilevel randomized experiments. R package version 1.1.0. https://CRAN.R-project.org/package=PowerUpR
Bulus, M., & Koyuncu, I. (2021). Statistical power and precision of experimental studies originated in the Republic of Turkey from 2010 to 2020: Current practices and some recommendations. Journal of Participatory Education Research, 8(4), 24-43. https://doi.org/10.17275/per.21.77.8.4
Bulus, M., & Sahin, S. G. (2019). Estimation and standardization of variance parameters for planning cluster-randomized trials: A short guide for researchers. Journal of Measurement and Evaluation in Education and Psychology, 10(2), 179-201. https://doi.org/10.21031/epod.530642
Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research. Houghton Mifflin.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates.
Cook, T. D., Campbell, D. T., & Shadish, W. (2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin.
Dong, N., & Maynard, R. (2013). PowerUp!: A tool for calculating minimum detectable effect sizes and minimum required sample sizes for experimental and quasi-experimental design studies. Journal of Research on Educational Effectiveness, 6(1), 24-67. https://doi.org/10.1080/19345747.2012.673143 Erdfelder, E., Faul, F., & Buchner, A. (1996). GPOWER: A general power analysis program. Behavior Research Methods, Instruments & Computers, 28(1), 1-11. https://doi.org/10.3758/BF03203630
Faul, F., Erdfelder, E., Buchner, A., & Lang, A. G. (2009). Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behavior Research Methods, 41(4), 1149-1160. https://doi.org/10.3758/BRM.41.4.1149
Fraenkel, J. R., Wallen, N. E., & Hyun, H. (2011). How to design and evaluate research in education (10th Ed.). McGraw-Hill.
Hansen, B. B. (2006). Bias reduction in observational studies via prognosis scores. Technical report #441, University of Michigan Statistics Department. http://dept.stat.lsa.umich.edu/~bbh/rspaper2006-06.pdf
Hansen, B. B. (2008). The prognostic analogue of the propensity score. Biometrika, 95(2), 481-488. https://doi.org/10.1093/biomet/asn004
Hedges, L. V. (1992). Modeling publication selection effects in meta-analysis. Statistical Science, 7(2), 246-255. https://www.jstor.org/stable/2246311
Hedges, L. V., & Hedberg, E. C. (2013). Intraclass correlations and covariate outcome correlations for planning two-and three-level cluster-randomized experiments in education. Evaluation Review, 37(6), 445-489. https://doi.org/10.1177/0193841X14529126
Hedges, L. V., & Rhoads, C. (2010). Statistical power analysis in education research (NCSER 2010-3006). Washington, DC: National Center for Special Education Research, Institute of Education Sciences, U.S. Department of Education. https://files.eric.ed.gov/fulltext/ED509387.pdf
Huber, M. (2014). Identifying causal mechanisms (primarily) based on inverse probability weighting. Journal of Applied Econometrics, 29(6), 920-943. https://doi.org/10.1002/jae.2341
Iacus, S. M., King, G., & Porro, G. (2012). Causal inference without balance checking: Coarsened exact matching. Political Analysis, 20(1), 1-24. https://www.jstor.org/stable/41403736
Konstantopoulos, S. (2008a). The power of the test for treatment effects in three-level block randomized designs. Journal of Research on Educational Effectiveness, 1, 265-288. https://doi.org/10.1080/19345740802328216
Konstantopoulos, S. (2008b). The power of the test for treatment effects in three-level cluster-randomized designs. Journal of Research on Educational Effectiveness, 1, 66-88. https://doi.org/10.1080/19345740701692522
Leacy, F. P., & Stuart, E. A. (2014). On the joint use of propensity and prognostic scores in estimation of the average treatment effect on the treated: a simulation study. Statistics in Medicine, 33(20), 3488-3508. https://doi.org/10.1002/sim.6030
Mosteller, F. F., & Boruch, R. F. (Eds.). (2004). Evidence matters: Randomized trials in education research. Brookings Institution Press.
Oakes, J. M., & Feldman, H. A. (2001). Statistical power for non-equivalent pretest-posttest designs: The impact of change-score versus ANCOVA models. Evaluation Review, 25(1), 3-28. https://doi.org/10.1177%2F0193841X0102500101
Raudenbush, S. W., & Liu, X. (2000). Statistical power and optimal design for multisite randomized trials. Psychological Methods, 5, 199-213. https://doi.org/10.1037/1082-989X.5.2.199
Rickles, J., Zeiser, K., & West, B. (2018). Accounting for student attrition in power calculations: Benchmarks and guidance. Journal of Research on Educational Effectiveness, 11(4), 622-644. https://doi.org/10.1080/19345747.2018.1502384
Rosenbaum, P. R. & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41-55. https://doi.org/10.1093/biomet/70.1.41
Slavin, R. E. (2008). Perspectives on evidence-based research in education: What works? Issues in synthesizing educational program evaluations. Educational Researcher, 37(1), 5-14. https://doi.org/10.3102%2F0013189X08314117
Sterne, J. A., Gavaghan, D., & Egger, M. (2000). Publication and related bias in meta-analysis: power of statistical tests and prevalence in the literature. Journal of Clinical Epidemiology, 53(11), 1119-1129. https://doi.org/10.1016/s0895-4356(00)00242-0
Wyss, R., Ellis, A. R., Brookhart, M. A., Jonsson Funk, M., Girman, C. J., Simpson, R. J., & Stürmer, T. (2015). Matching on the disease risk score in comparative effectiveness research of new treatments. Pharmacoepidemiology and Drug Safety, 24(9), 951-961. https://doi.org/10.1002/pds.3810
Vevea, J. L., & Hedges, L. V. (1995). A general linear model for estimating effect size in the presence of publication bias. Psychometrika, 60(3), 419-435. https://doi.org/10.1007/BF02294384
Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science, 12(6), 1100-1122. https://doi.org/10.1177%2F1745691617693393
Yildirim, I., Cirak-Kurt, S., & Sen, S. (2019). The effect of teaching "Learning strategies" on academic achievement: A meta-analysis study. Eurasian Journal of Educational Research, 79, 87-114. https://doi.org/10.14689/ejer.2019.79.5

Randomize/Eşdeğer Olmayan Öntest-Sontest Kontrol Gruplu Tasarımların Örneklem Büyüklüğünün Belirlenmesi ve Optimal Tasarımı

Year 2021, Volume: 11 Issue: 1, 48 - 69, 29.06.2021

Metin Bulus

https://doi.org/10.17984/adyuebd.941434

Cited By: 9

Abstract

Türkiye'de 2010 ve 2020 yılları arasında yürütülen deneysel çalışmaların yakın zamanlarda yapılan sistematik bir incelemesi, küçük örneklem büyüklüklerinin önemli bir dezavantaj olduğunu bildirmiştir (Bulus ve Koyuncu, 2021). İncelemedeki çalışmaların küçük bir kısmı randomize öntest-sontest kontrol gruplu tasarımlardan oluşmaktaydı. Buna karşılık, bu çalışmaların ezici çoğunluğu eşdeğer olmayan öntest-sontest kontrol gruplu tasarımlardı (rastgele atama yapılmamış). Farklı disiplinlerde yapılan ve farklı öğrenme çıktılarını hedef alan bu çalışmalar için ortalama örneklem büyüklüğü 70'in altındaydı. Bu kadar küçük örneklem büyüklükleriyle deneysel çalışmalar tasarlamak, bir müdahalenin ilgilenilen en küçük etki büyüklüğü (MRES) hakkında güçlü (ve belki de hatalı) bir varsayımı ima eder; yani, standartlaştırılmış Cohen'in d’si < 0.50 gibi bir program etkisi eğitim politikası veya uygulamalarının radarında değildir. Bu nedenle, randomize/eşdeğer olmayan öntest-sontest kontrol gruplu tasarımlar için örneklem büyüklüğü belirlemeye giriş bir gereklilik haline gelmektedir. Bu çalışma, örneklem büyüklüğü belirlemenin (veya güç analizi) temellerini açıklar. Ayrıca, deney ve kontrol birimleri başına farklı maliyetler söz konusu olduğunda optimal tasarım için matematiksel ifadeler türetir ve bu ifadeleri bir Excel çalışma kitabında uygular. Son olarak, bu çalışma, 0.20 ≤ Cohen'in d’si ≤ 0.50 arasındaki MRES değerleri için örneklem büyüklüğü belirlemede rehberlik edecek tabloları sağlar.

Keywords

ön test-son test , deneysel tasarım , rastgele atama , rastgele atama , örneklem büyüklüğü , güç analizi , optimal tasarım , eşdeğer olmayan kontrol gruplu tasarım

References

Bloom, H. S. (2006). The core analytics of randomized experiments for social research (MDRC Working Papers on Research Methodology). http://www.mdrc.org/publications/437/full.pdf
Brunner, M., Keller, U., Wenger, M., Fischbach, A., & Lüdtke, O. (2018). Between-school variation in students' achievement, motivation, affect, and learning strategies: Results from 81 countries for planning group-randomized trials in education. Journal of Research on Educational Effectiveness, 11(3), 452-478. https://doi.org/10.1080/19345747.2017.1375584
Bulus, M., & Dong, N. (2021a). Bound constrained optimization of sample sizes subject to monetary restrictions in planning of multilevel randomized trials and regression discontinuity studies. The Journal of Experimental Education, 89(2), 379–401. https://doi.org/10.1080/00220973.2019.1636197
Bulus, M., & Dong, N. (2021b). cosa: Bound constrained optimal sample size allocation. R package version 2.1.0. https://CRAN.R-project.org/package=cosa Bulus, M., Dong, N., Kelcey, B., & Spybrook, J. (2021). PowerUpR: Power analysis tools for multilevel randomized experiments. R package version 1.1.0. https://CRAN.R-project.org/package=PowerUpR
Bulus, M., & Koyuncu, I. (2021). Statistical power and precision of experimental studies originated in the Republic of Turkey from 2010 to 2020: Current practices and some recommendations. Journal of Participatory Education Research, 8(4), 24-43. https://doi.org/10.17275/per.21.77.8.4
Bulus, M., & Sahin, S. G. (2019). Estimation and standardization of variance parameters for planning cluster-randomized trials: A short guide for researchers. Journal of Measurement and Evaluation in Education and Psychology, 10(2), 179-201. https://doi.org/10.21031/epod.530642
Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research. Houghton Mifflin.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates.
Cook, T. D., Campbell, D. T., & Shadish, W. (2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin.
Dong, N., & Maynard, R. (2013). PowerUp!: A tool for calculating minimum detectable effect sizes and minimum required sample sizes for experimental and quasi-experimental design studies. Journal of Research on Educational Effectiveness, 6(1), 24-67. https://doi.org/10.1080/19345747.2012.673143 Erdfelder, E., Faul, F., & Buchner, A. (1996). GPOWER: A general power analysis program. Behavior Research Methods, Instruments & Computers, 28(1), 1-11. https://doi.org/10.3758/BF03203630
Faul, F., Erdfelder, E., Buchner, A., & Lang, A. G. (2009). Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behavior Research Methods, 41(4), 1149-1160. https://doi.org/10.3758/BRM.41.4.1149
Fraenkel, J. R., Wallen, N. E., & Hyun, H. (2011). How to design and evaluate research in education (10th Ed.). McGraw-Hill.
Hansen, B. B. (2006). Bias reduction in observational studies via prognosis scores. Technical report #441, University of Michigan Statistics Department. http://dept.stat.lsa.umich.edu/~bbh/rspaper2006-06.pdf
Hansen, B. B. (2008). The prognostic analogue of the propensity score. Biometrika, 95(2), 481-488. https://doi.org/10.1093/biomet/asn004
Hedges, L. V. (1992). Modeling publication selection effects in meta-analysis. Statistical Science, 7(2), 246-255. https://www.jstor.org/stable/2246311
Hedges, L. V., & Hedberg, E. C. (2013). Intraclass correlations and covariate outcome correlations for planning two-and three-level cluster-randomized experiments in education. Evaluation Review, 37(6), 445-489. https://doi.org/10.1177/0193841X14529126
Hedges, L. V., & Rhoads, C. (2010). Statistical power analysis in education research (NCSER 2010-3006). Washington, DC: National Center for Special Education Research, Institute of Education Sciences, U.S. Department of Education. https://files.eric.ed.gov/fulltext/ED509387.pdf
Huber, M. (2014). Identifying causal mechanisms (primarily) based on inverse probability weighting. Journal of Applied Econometrics, 29(6), 920-943. https://doi.org/10.1002/jae.2341
Iacus, S. M., King, G., & Porro, G. (2012). Causal inference without balance checking: Coarsened exact matching. Political Analysis, 20(1), 1-24. https://www.jstor.org/stable/41403736
Konstantopoulos, S. (2008a). The power of the test for treatment effects in three-level block randomized designs. Journal of Research on Educational Effectiveness, 1, 265-288. https://doi.org/10.1080/19345740802328216
Konstantopoulos, S. (2008b). The power of the test for treatment effects in three-level cluster-randomized designs. Journal of Research on Educational Effectiveness, 1, 66-88. https://doi.org/10.1080/19345740701692522
Leacy, F. P., & Stuart, E. A. (2014). On the joint use of propensity and prognostic scores in estimation of the average treatment effect on the treated: a simulation study. Statistics in Medicine, 33(20), 3488-3508. https://doi.org/10.1002/sim.6030
Mosteller, F. F., & Boruch, R. F. (Eds.). (2004). Evidence matters: Randomized trials in education research. Brookings Institution Press.
Oakes, J. M., & Feldman, H. A. (2001). Statistical power for non-equivalent pretest-posttest designs: The impact of change-score versus ANCOVA models. Evaluation Review, 25(1), 3-28. https://doi.org/10.1177%2F0193841X0102500101
Raudenbush, S. W., & Liu, X. (2000). Statistical power and optimal design for multisite randomized trials. Psychological Methods, 5, 199-213. https://doi.org/10.1037/1082-989X.5.2.199
Rickles, J., Zeiser, K., & West, B. (2018). Accounting for student attrition in power calculations: Benchmarks and guidance. Journal of Research on Educational Effectiveness, 11(4), 622-644. https://doi.org/10.1080/19345747.2018.1502384
Rosenbaum, P. R. & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41-55. https://doi.org/10.1093/biomet/70.1.41
Slavin, R. E. (2008). Perspectives on evidence-based research in education: What works? Issues in synthesizing educational program evaluations. Educational Researcher, 37(1), 5-14. https://doi.org/10.3102%2F0013189X08314117
Sterne, J. A., Gavaghan, D., & Egger, M. (2000). Publication and related bias in meta-analysis: power of statistical tests and prevalence in the literature. Journal of Clinical Epidemiology, 53(11), 1119-1129. https://doi.org/10.1016/s0895-4356(00)00242-0
Wyss, R., Ellis, A. R., Brookhart, M. A., Jonsson Funk, M., Girman, C. J., Simpson, R. J., & Stürmer, T. (2015). Matching on the disease risk score in comparative effectiveness research of new treatments. Pharmacoepidemiology and Drug Safety, 24(9), 951-961. https://doi.org/10.1002/pds.3810
Vevea, J. L., & Hedges, L. V. (1995). A general linear model for estimating effect size in the presence of publication bias. Psychometrika, 60(3), 419-435. https://doi.org/10.1007/BF02294384
Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science, 12(6), 1100-1122. https://doi.org/10.1177%2F1745691617693393
Yildirim, I., Cirak-Kurt, S., & Sen, S. (2019). The effect of teaching "Learning strategies" on academic achievement: A meta-analysis study. Eurasian Journal of Educational Research, 79, 87-114. https://doi.org/10.14689/ejer.2019.79.5

There are 33 citations in total.

Details

Primary Language	English
Journal Section	Research Articles
Authors	Metin Bulus 0000-0003-4348-6322
Publication Date	June 29, 2021
Acceptance Date	June 23, 2021
Published in Issue	Year 2021 Volume: 11 Issue: 1

Cite

APA	Bulus, M. (2021). Sample Size Determination and Optimal Design of Randomized/Non-equivalent Pretest-posttest Control-group Designs. Adıyaman University Journal of Educational Sciences, 11(1), 48-69. https://doi.org/10.17984/adyuebd.941434