BibTex RIS Cite

Sample Size Determination and Optimal Design of Randomized/Non-equivalent Pretest-posttest Control-group Designs

Year 2021, Volume: 11 Issue: 1, 48 - 69, 29.06.2021
https://doi.org/10.17984/adyuebd.941434

Abstract

A recent systematic review of experimental studies conducted in Turkey between 2010 and 2020 reported that small sample sizes had been a significant drawback (Bulus & Koyuncu, 2021). A small chunk of the studies in the review were randomized pretest-posttest control-group designs. In contrast, the overwhelming majority of them were non-equivalent pretest-posttest control-group designs (no randomization). They had an average sample size below 70 for different domains and outcomes. Designing experimental studies with such small sample sizes implies a strong (and perhaps an erroneous) assumption about the minimum relevant effect size (MRES) of an intervention; that is, a standardized treatment effect of Cohen’s d < 0.50 is not relevant to education policy or practice. Thus, an introduction to sample size determination for randomized/non-equivalent pretest-posttest control group designs is warranted. This study describes nuts and bolts of sample size determination (or power analysis). It also derives expressions for optimal design under differential cost per treatment and control units, and implements these expressions in an Excel workbook. Finally, this study provides convenient tables to guide sample size decisions for MRES values between 0.20 ≤ Cohen’s d ≤ 0.50.

References

  • Bloom, H. S. (2006). The core analytics of randomized experiments for social research (MDRC Working Papers on Research Methodology). http://www.mdrc.org/publications/437/full.pdf
  • Brunner, M., Keller, U., Wenger, M., Fischbach, A., & Lüdtke, O. (2018). Between-school variation in students' achievement, motivation, affect, and learning strategies: Results from 81 countries for planning group-randomized trials in education. Journal of Research on Educational Effectiveness, 11(3), 452-478. https://doi.org/10.1080/19345747.2017.1375584
  • Bulus, M., & Dong, N. (2021a). Bound constrained optimization of sample sizes subject to monetary restrictions in planning of multilevel randomized trials and regression discontinuity studies. The Journal of Experimental Education, 89(2), 379–401. https://doi.org/10.1080/00220973.2019.1636197
  • Bulus, M., & Dong, N. (2021b). cosa: Bound constrained optimal sample size allocation. R package version 2.1.0. https://CRAN.R-project.org/package=cosa Bulus, M., Dong, N., Kelcey, B., & Spybrook, J. (2021). PowerUpR: Power analysis tools for multilevel randomized experiments. R package version 1.1.0. https://CRAN.R-project.org/package=PowerUpR
  • Bulus, M., & Koyuncu, I. (2021). Statistical power and precision of experimental studies originated in the Republic of Turkey from 2010 to 2020: Current practices and some recommendations. Journal of Participatory Education Research, 8(4), 24-43. https://doi.org/10.17275/per.21.77.8.4
  • Bulus, M., & Sahin, S. G. (2019). Estimation and standardization of variance parameters for planning cluster-randomized trials: A short guide for researchers. Journal of Measurement and Evaluation in Education and Psychology, 10(2), 179-201. https://doi.org/10.21031/epod.530642
  • Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research. Houghton Mifflin.
  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates.
  • Cook, T. D., Campbell, D. T., & Shadish, W. (2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin.
  • Dong, N., & Maynard, R. (2013). PowerUp!: A tool for calculating minimum detectable effect sizes and minimum required sample sizes for experimental and quasi-experimental design studies. Journal of Research on Educational Effectiveness, 6(1), 24-67. https://doi.org/10.1080/19345747.2012.673143 Erdfelder, E., Faul, F., & Buchner, A. (1996). GPOWER: A general power analysis program. Behavior Research Methods, Instruments & Computers, 28(1), 1-11. https://doi.org/10.3758/BF03203630
  • Faul, F., Erdfelder, E., Buchner, A., & Lang, A. G. (2009). Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behavior Research Methods, 41(4), 1149-1160. https://doi.org/10.3758/BRM.41.4.1149
  • Fraenkel, J. R., Wallen, N. E., & Hyun, H. (2011). How to design and evaluate research in education (10th Ed.). McGraw-Hill.
  • Hansen, B. B. (2006). Bias reduction in observational studies via prognosis scores. Technical report #441, University of Michigan Statistics Department. http://dept.stat.lsa.umich.edu/~bbh/rspaper2006-06.pdf
  • Hansen, B. B. (2008). The prognostic analogue of the propensity score. Biometrika, 95(2), 481-488. https://doi.org/10.1093/biomet/asn004
  • Hedges, L. V. (1992). Modeling publication selection effects in meta-analysis. Statistical Science, 7(2), 246-255. https://www.jstor.org/stable/2246311
  • Hedges, L. V., & Hedberg, E. C. (2013). Intraclass correlations and covariate outcome correlations for planning two-and three-level cluster-randomized experiments in education. Evaluation Review, 37(6), 445-489. https://doi.org/10.1177/0193841X14529126
  • Hedges, L. V., & Rhoads, C. (2010). Statistical power analysis in education research (NCSER 2010-3006). Washington, DC: National Center for Special Education Research, Institute of Education Sciences, U.S. Department of Education. https://files.eric.ed.gov/fulltext/ED509387.pdf
  • Huber, M. (2014). Identifying causal mechanisms (primarily) based on inverse probability weighting. Journal of Applied Econometrics, 29(6), 920-943. https://doi.org/10.1002/jae.2341
  • Iacus, S. M., King, G., & Porro, G. (2012). Causal inference without balance checking: Coarsened exact matching. Political Analysis, 20(1), 1-24. https://www.jstor.org/stable/41403736
  • Konstantopoulos, S. (2008a). The power of the test for treatment effects in three-level block randomized designs. Journal of Research on Educational Effectiveness, 1, 265-288. https://doi.org/10.1080/19345740802328216
  • Konstantopoulos, S. (2008b). The power of the test for treatment effects in three-level cluster-randomized designs. Journal of Research on Educational Effectiveness, 1, 66-88. https://doi.org/10.1080/19345740701692522
  • Leacy, F. P., & Stuart, E. A. (2014). On the joint use of propensity and prognostic scores in estimation of the average treatment effect on the treated: a simulation study. Statistics in Medicine, 33(20), 3488-3508. https://doi.org/10.1002/sim.6030
  • Mosteller, F. F., & Boruch, R. F. (Eds.). (2004). Evidence matters: Randomized trials in education research. Brookings Institution Press.
  • Oakes, J. M., & Feldman, H. A. (2001). Statistical power for non-equivalent pretest-posttest designs: The impact of change-score versus ANCOVA models. Evaluation Review, 25(1), 3-28. https://doi.org/10.1177%2F0193841X0102500101
  • Raudenbush, S. W., & Liu, X. (2000). Statistical power and optimal design for multisite randomized trials. Psychological Methods, 5, 199-213. https://doi.org/10.1037/1082-989X.5.2.199
  • Rickles, J., Zeiser, K., & West, B. (2018). Accounting for student attrition in power calculations: Benchmarks and guidance. Journal of Research on Educational Effectiveness, 11(4), 622-644. https://doi.org/10.1080/19345747.2018.1502384
  • Rosenbaum, P. R. & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41-55. https://doi.org/10.1093/biomet/70.1.41
  • Slavin, R. E. (2008). Perspectives on evidence-based research in education: What works? Issues in synthesizing educational program evaluations. Educational Researcher, 37(1), 5-14. https://doi.org/10.3102%2F0013189X08314117
  • Sterne, J. A., Gavaghan, D., & Egger, M. (2000). Publication and related bias in meta-analysis: power of statistical tests and prevalence in the literature. Journal of Clinical Epidemiology, 53(11), 1119-1129. https://doi.org/10.1016/s0895-4356(00)00242-0
  • Wyss, R., Ellis, A. R., Brookhart, M. A., Jonsson Funk, M., Girman, C. J., Simpson, R. J., & Stürmer, T. (2015). Matching on the disease risk score in comparative effectiveness research of new treatments. Pharmacoepidemiology and Drug Safety, 24(9), 951-961. https://doi.org/10.1002/pds.3810
  • Vevea, J. L., & Hedges, L. V. (1995). A general linear model for estimating effect size in the presence of publication bias. Psychometrika, 60(3), 419-435. https://doi.org/10.1007/BF02294384
  • Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science, 12(6), 1100-1122. https://doi.org/10.1177%2F1745691617693393
  • Yildirim, I., Cirak-Kurt, S., & Sen, S. (2019). The effect of teaching "Learning strategies" on academic achievement: A meta-analysis study. Eurasian Journal of Educational Research, 79, 87-114. https://doi.org/10.14689/ejer.2019.79.5

Randomize/Eşdeğer Olmayan Öntest-Sontest Kontrol Gruplu Tasarımların Örneklem Büyüklüğünün Belirlenmesi ve Optimal Tasarımı

Year 2021, Volume: 11 Issue: 1, 48 - 69, 29.06.2021
https://doi.org/10.17984/adyuebd.941434

Abstract

Türkiye'de 2010 ve 2020 yılları arasında yürütülen deneysel çalışmaların yakın zamanlarda yapılan sistematik bir incelemesi, küçük örneklem büyüklüklerinin önemli bir dezavantaj olduğunu bildirmiştir (Bulus ve Koyuncu, 2021). İncelemedeki çalışmaların küçük bir kısmı randomize öntest-sontest kontrol gruplu tasarımlardan oluşmaktaydı. Buna karşılık, bu çalışmaların ezici çoğunluğu eşdeğer olmayan öntest-sontest kontrol gruplu tasarımlardı (rastgele atama yapılmamış). Farklı disiplinlerde yapılan ve farklı öğrenme çıktılarını hedef alan bu çalışmalar için ortalama örneklem büyüklüğü 70'in altındaydı. Bu kadar küçük örneklem büyüklükleriyle deneysel çalışmalar tasarlamak, bir müdahalenin ilgilenilen en küçük etki büyüklüğü (MRES) hakkında güçlü (ve belki de hatalı) bir varsayımı ima eder; yani, standartlaştırılmış Cohen'in d’si < 0.50 gibi bir program etkisi eğitim politikası veya uygulamalarının radarında değildir. Bu nedenle, randomize/eşdeğer olmayan öntest-sontest kontrol gruplu tasarımlar için örneklem büyüklüğü belirlemeye giriş bir gereklilik haline gelmektedir. Bu çalışma, örneklem büyüklüğü belirlemenin (veya güç analizi) temellerini açıklar. Ayrıca, deney ve kontrol birimleri başına farklı maliyetler söz konusu olduğunda optimal tasarım için matematiksel ifadeler türetir ve bu ifadeleri bir Excel çalışma kitabında uygular. Son olarak, bu çalışma, 0.20 ≤ Cohen'in d’si ≤ 0.50 arasındaki MRES değerleri için örneklem büyüklüğü belirlemede rehberlik edecek tabloları sağlar.

References

  • Bloom, H. S. (2006). The core analytics of randomized experiments for social research (MDRC Working Papers on Research Methodology). http://www.mdrc.org/publications/437/full.pdf
  • Brunner, M., Keller, U., Wenger, M., Fischbach, A., & Lüdtke, O. (2018). Between-school variation in students' achievement, motivation, affect, and learning strategies: Results from 81 countries for planning group-randomized trials in education. Journal of Research on Educational Effectiveness, 11(3), 452-478. https://doi.org/10.1080/19345747.2017.1375584
  • Bulus, M., & Dong, N. (2021a). Bound constrained optimization of sample sizes subject to monetary restrictions in planning of multilevel randomized trials and regression discontinuity studies. The Journal of Experimental Education, 89(2), 379–401. https://doi.org/10.1080/00220973.2019.1636197
  • Bulus, M., & Dong, N. (2021b). cosa: Bound constrained optimal sample size allocation. R package version 2.1.0. https://CRAN.R-project.org/package=cosa Bulus, M., Dong, N., Kelcey, B., & Spybrook, J. (2021). PowerUpR: Power analysis tools for multilevel randomized experiments. R package version 1.1.0. https://CRAN.R-project.org/package=PowerUpR
  • Bulus, M., & Koyuncu, I. (2021). Statistical power and precision of experimental studies originated in the Republic of Turkey from 2010 to 2020: Current practices and some recommendations. Journal of Participatory Education Research, 8(4), 24-43. https://doi.org/10.17275/per.21.77.8.4
  • Bulus, M., & Sahin, S. G. (2019). Estimation and standardization of variance parameters for planning cluster-randomized trials: A short guide for researchers. Journal of Measurement and Evaluation in Education and Psychology, 10(2), 179-201. https://doi.org/10.21031/epod.530642
  • Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research. Houghton Mifflin.
  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates.
  • Cook, T. D., Campbell, D. T., & Shadish, W. (2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin.
  • Dong, N., & Maynard, R. (2013). PowerUp!: A tool for calculating minimum detectable effect sizes and minimum required sample sizes for experimental and quasi-experimental design studies. Journal of Research on Educational Effectiveness, 6(1), 24-67. https://doi.org/10.1080/19345747.2012.673143 Erdfelder, E., Faul, F., & Buchner, A. (1996). GPOWER: A general power analysis program. Behavior Research Methods, Instruments & Computers, 28(1), 1-11. https://doi.org/10.3758/BF03203630
  • Faul, F., Erdfelder, E., Buchner, A., & Lang, A. G. (2009). Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behavior Research Methods, 41(4), 1149-1160. https://doi.org/10.3758/BRM.41.4.1149
  • Fraenkel, J. R., Wallen, N. E., & Hyun, H. (2011). How to design and evaluate research in education (10th Ed.). McGraw-Hill.
  • Hansen, B. B. (2006). Bias reduction in observational studies via prognosis scores. Technical report #441, University of Michigan Statistics Department. http://dept.stat.lsa.umich.edu/~bbh/rspaper2006-06.pdf
  • Hansen, B. B. (2008). The prognostic analogue of the propensity score. Biometrika, 95(2), 481-488. https://doi.org/10.1093/biomet/asn004
  • Hedges, L. V. (1992). Modeling publication selection effects in meta-analysis. Statistical Science, 7(2), 246-255. https://www.jstor.org/stable/2246311
  • Hedges, L. V., & Hedberg, E. C. (2013). Intraclass correlations and covariate outcome correlations for planning two-and three-level cluster-randomized experiments in education. Evaluation Review, 37(6), 445-489. https://doi.org/10.1177/0193841X14529126
  • Hedges, L. V., & Rhoads, C. (2010). Statistical power analysis in education research (NCSER 2010-3006). Washington, DC: National Center for Special Education Research, Institute of Education Sciences, U.S. Department of Education. https://files.eric.ed.gov/fulltext/ED509387.pdf
  • Huber, M. (2014). Identifying causal mechanisms (primarily) based on inverse probability weighting. Journal of Applied Econometrics, 29(6), 920-943. https://doi.org/10.1002/jae.2341
  • Iacus, S. M., King, G., & Porro, G. (2012). Causal inference without balance checking: Coarsened exact matching. Political Analysis, 20(1), 1-24. https://www.jstor.org/stable/41403736
  • Konstantopoulos, S. (2008a). The power of the test for treatment effects in three-level block randomized designs. Journal of Research on Educational Effectiveness, 1, 265-288. https://doi.org/10.1080/19345740802328216
  • Konstantopoulos, S. (2008b). The power of the test for treatment effects in three-level cluster-randomized designs. Journal of Research on Educational Effectiveness, 1, 66-88. https://doi.org/10.1080/19345740701692522
  • Leacy, F. P., & Stuart, E. A. (2014). On the joint use of propensity and prognostic scores in estimation of the average treatment effect on the treated: a simulation study. Statistics in Medicine, 33(20), 3488-3508. https://doi.org/10.1002/sim.6030
  • Mosteller, F. F., & Boruch, R. F. (Eds.). (2004). Evidence matters: Randomized trials in education research. Brookings Institution Press.
  • Oakes, J. M., & Feldman, H. A. (2001). Statistical power for non-equivalent pretest-posttest designs: The impact of change-score versus ANCOVA models. Evaluation Review, 25(1), 3-28. https://doi.org/10.1177%2F0193841X0102500101
  • Raudenbush, S. W., & Liu, X. (2000). Statistical power and optimal design for multisite randomized trials. Psychological Methods, 5, 199-213. https://doi.org/10.1037/1082-989X.5.2.199
  • Rickles, J., Zeiser, K., & West, B. (2018). Accounting for student attrition in power calculations: Benchmarks and guidance. Journal of Research on Educational Effectiveness, 11(4), 622-644. https://doi.org/10.1080/19345747.2018.1502384
  • Rosenbaum, P. R. & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41-55. https://doi.org/10.1093/biomet/70.1.41
  • Slavin, R. E. (2008). Perspectives on evidence-based research in education: What works? Issues in synthesizing educational program evaluations. Educational Researcher, 37(1), 5-14. https://doi.org/10.3102%2F0013189X08314117
  • Sterne, J. A., Gavaghan, D., & Egger, M. (2000). Publication and related bias in meta-analysis: power of statistical tests and prevalence in the literature. Journal of Clinical Epidemiology, 53(11), 1119-1129. https://doi.org/10.1016/s0895-4356(00)00242-0
  • Wyss, R., Ellis, A. R., Brookhart, M. A., Jonsson Funk, M., Girman, C. J., Simpson, R. J., & Stürmer, T. (2015). Matching on the disease risk score in comparative effectiveness research of new treatments. Pharmacoepidemiology and Drug Safety, 24(9), 951-961. https://doi.org/10.1002/pds.3810
  • Vevea, J. L., & Hedges, L. V. (1995). A general linear model for estimating effect size in the presence of publication bias. Psychometrika, 60(3), 419-435. https://doi.org/10.1007/BF02294384
  • Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science, 12(6), 1100-1122. https://doi.org/10.1177%2F1745691617693393
  • Yildirim, I., Cirak-Kurt, S., & Sen, S. (2019). The effect of teaching "Learning strategies" on academic achievement: A meta-analysis study. Eurasian Journal of Educational Research, 79, 87-114. https://doi.org/10.14689/ejer.2019.79.5
There are 33 citations in total.

Details

Primary Language English
Journal Section Research Articles
Authors

Metin Bulus 0000-0003-4348-6322

Publication Date June 29, 2021
Acceptance Date June 23, 2021
Published in Issue Year 2021 Volume: 11 Issue: 1

Cite

APA Bulus, M. (2021). Sample Size Determination and Optimal Design of Randomized/Non-equivalent Pretest-posttest Control-group Designs. Adıyaman University Journal of Educational Sciences, 11(1), 48-69. https://doi.org/10.17984/adyuebd.941434

                                                                                                                                                                                                                                                      
by-nc-nd.png?resize=300%2C105&ssl=1 This work is licensed under CC BY-NC-ND 4.0