Research Article
BibTex RIS Cite

What should be the number of replications in Monte Carlo Simulation Method for Classical Test Theory Parameters?

Year 2020, Volume: 9 Issue: 2, 410 - 429, 24.06.2020

Abstract

The importance of the
number of repetitions in the simulation studies to produce truth-reflecting
results is indisputable. When a research is designed using Monte Carlo
simulation technique, the number of repetitions is very important for the
reliability and validity of the research results. However, there is no clear
information on how many repetitions are sufficient. In this study, it is aimed
to determine the effect of number of repetitions in Monte Carlo simulation
method on item and test parameter estimations in Classical Test Theory and to
determine the number of repetitions required. For this purpose, the data
obtained by changing the number of replication under different conditions total
variance ratio Cronbach's Alpha coefficient average of item discrimination and
model-data-fit parameters were examined.
This study is a Monte Carlo simulation study. In the research, R program “psyc” package was used for
data generation and analysis.
In
this study, the number of items in a one-dimensional structure is fixed to 20,
the response category is 5, and the sample size is changed to 100, 250, 500,
1000 and 3000. According to results of the study, in a study based on CTT, it
is suggested that researchers produce data with 1000 replications when sample
size is 100, 500 replications when sample size is 250, 250  replications when sample size is 500 and 100
replications when sample size is 1000 and 3000.

References

  • Aiken, L, R. (2000). Psychological testing and assessment. Boston. Allyn and Bacon.
  • Baker, F. B. (1998). An investigation of the item parameter recovery of a Gibbs sampling procedure. Applied Psychological Measurement, 22, 153-169.
  • Binois M., Huang J., Gramacy R.B., and Ludkovski M. (2019). Replication or exploration? Sequential design for stochastic simulation experiments. https://arxiv.org/abs/1710.03206. DOI: 10.1080/00401706.2018.1469433
  • Bock, R. D., and Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443-459.
  • Bock, R.D. (1997). A brief history of item response theory. Educational Measurement: Issues and Practice. Winter 1997.
  • Brooks, C. (2002). Introductory econometrics for finance. Cambridge University Press.
  • Brown, R. L. (1994). Efficacy of the indirect approach for estimating structural equation models with missing data: A Comparison of Methods. Structural Equation Modeling: A Multidisciplinary Journal. 1(4), 287-316.
  • Büyüköztürk, Ş. (2002). Sosyal bilimler için veri analizi el kitabı. Ankara: Pegem Yayıncılık.
  • Charter, R.A. (2003). Study samples are too small to produce sufficiently precise reliability coefficients. The Journal of General Psychology, 130, 117-129.
  • Child, D. (2006). The Essentials of factor analysis. Continuum, London.
  • Çelen, Ü. (2008). Klasik Test Kuramı ve Madde Tepki Kuramına dayalı olarak geliştirilen iki testin psikometrik özelliklerinin karşılaştırılması. İlköğretim Online, 7(3),758-768.
  • De Ayala, R. J. (2009). The theory and practice of item response theory. New York,
  • Dooley, K. (2002). Simulation research methods. In J. Baum (Ed.). Companion to organizations. London: Blackwell.
  • Drasgow, F., Levine, M., Tsien, S., Williams, B., and Mead, A. (1995). Fitting polytomous item response theory models to multiple-choice tests. Applied Psychological Measurement, 19(2), 143-165
  • Fay D.S., and Gerow K. A (2013). Biologist's guide to statistical thinking and analysis, WormBook, ed. The C. Elegans Research Community, WormBook, doi/10.1895/wormbook.1.159.1, http://www.wormbook.org.
  • Gifford, J.A., and Swaminathan, H. (1990), Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement 27(4), 361-370.
  • Glen Satten A., Flanders W. D., and Yang Q. (2001). Accounting for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model. Am. J. Hum. Genet. 68:466–477.
  • Goldman, S.H., and Raju, N. S. (1986). Recovery of one- and two-parameter logistic item parameters: An empirical study. Educational and Psychological Measurement, 46(1), 11-21.
  • Hambleton R.K., Swaminathan H. and H. J. Rogers (1991). Fundamentals of item response theory. Newbury Park, CA: SAGE Publications, Inc.
  • Hambleton, R. K., Jones R.W., and Rogers, H. J. (1993). Influence of item parameter estimation errors in test development. Journal of Educational Measurement, 30, 143-155.
  • Hammersley, J.M., and Handscomb, D.C. (1964). Monte-Carlo Methods. Springer Netherlands http://dx.doi.org/10.1007/978-94-009-5819-7
  • Harwell, M. R., and Janosky, J. E. (1991). An empirical study of the effects of small datasets and varying prior distribution variances on item parameter estimation in BILOG. Applied Psychological Measurement, 15, 279-291.
  • Harwell, M. R., Stone, C. A., Hsu, T. C., and Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101-125. https://doi.org/10.1177/014662169602000201
  • Harwell, M.R., Rubinstein E., Hayes W.S., and Olds, C. (1992). Summarizing Monte Carlo results in methodological research: The fixed effects single- and two-factor ANOVA cases. Journal of Educational Statistic, 17, 315-339.
  • Hauck, W.W., and Anderson, S. (1984). A new statistical procedure for testing equivalence in two-group comparative bioavailability trials. Journal of Pharmacokinetics and Biopharmaceutics, 12, 83-91.
  • Hulin, C. L., Lissak, R. I., and Drasgow, F. (1982). Recovery of two and three parameter logistic item characteristic curves: A Monte Carlo study. Applied Psychological Measurement, 6, 249-260.
  • Hutchinson, S.R., and Bandalos, D.L. (1997). A guide to Monte Carlo simulations for applied researchers. Journal of Vocational Education Research, 22(4), 233-245.
  • Kannan P., Sgammato A., Tannenbaum R.J., and Katz I.R. (2015) Evaluating the consistency of Angoff-Based cut scores using subsets of items within a generalizability theory framework, Applied Measurement in Education, 28(3), 169-186, DOI: 10.1080/08957347.2015.1042156
  • Kéry, M., and Royle J. A. (2016). Applied hierarchical modeling in ecology Volume 1: Prelude and Static Models Book.Science Direct.
  • Kline, P. (1986) A handbook of test construction: Introduction to psychometric design. New York: Methuen & Company.
  • Koçak, D. (2016). Kayıp veriyle baş etme yöntemlerinin Madde Tepki Kuramı bir parametereli lojistik modelinde model veri uyumuna ve standart hataya etkisi. Yayımlanmamış Doktora Tezi, Ankara Üniversitesi Eğitim Bilimleri Enstitüsü, Ankara.
  • Leventhall, B., and Ames, A. (2019). Using SAS for Monte Carlo Simulation Studies in Item Response Theory. National Council on Measurement in Education Annual Meeting in Toronto, Ontario Canada.
  • Lewis, P.A.W., and E.J. Orav. (1989). Simulation methodology for statisticians, operations analysts, and engineers. Volume 1. Wadsworth & Brooks/Cole, California: Pacific Grove.
  • Lord, F. M., and Novick, M. R.(1968). Statistical theories of mental test scores. Reading MA: Addison- Wesley.
  • Mundform, D.J., Schaffer, J., Myoung-Jin, K., Dale, S, Ampai, T., and Pornsin S.(2011). Number of replications required in Monte Carlo simulation studies: A synthesis of four studies. Journal of Modern Applied Statistical Methods: 10(1), Article 4. Available at: http://digitalcommons.wayne.edu/jmasm/vol10/iss1/4
  • Murie C., and Nadon R. (2018). A correction for the LPE statistical test. Bioconductor. https://www.bioconductor.org/packages/devel/bioc/vignettes/LPEadj/inst/doc/LPEadj.pdf
  • Naylor, T.H., Blantify J., Burdick D.S., and Chu K. (1968). Computer simulation techniques. John Wiley and Sons, New York.
  • Nunnally, J.C., and Bernstein, J.H. (1994). Psychometric theory. New York: McGraw-Hill. New Y: The Guilford Press.
  • Qualls, A. L., and Ansley, T. N. (1985, April). A comparison of item and ability parameter estimates derived from LOGIST and BILOG. Paper presented at the meeting of the National Council on Measurement in Education, Chicago.
  • R Development Core Team (2011), R: A language and environment for statistical computing, a foundation for statistical computing,.Vienna, Austria, ISBN 3900051-070, Erişim:[http://www.R-project.org].
  • Rubinstein, R. Y. (1981). Simulation and the Monte Carlo method. John Wiley and Sons, New York.
  • Saeki H., and Tango T. (2014) Statistical inference for non-inferiority of a diagnostic procedure compared to an alternative procedure, based on the difference in correlated proportions from multiple raters. In: van Montfort K., Oud J., Ghidey W. (eds) Developments in Statistical Evaluation of Clinical Trials. Springer, Berlin, Heidelberg.
  • Segall, D.O. (1994). The reliability of linearly equated tests. Psychometrika, 59, 361-375.
  • Sobol I.M. (1971). The Monte Carlo method. Moscow, Russian.
  • Stone, C. A. (1993). The use of multiple replications in IRT based Monte Carlo research. Paper presented at the European Meeting of the Psychometric Society, Barcelona.
  • Tabachnick, B. G., and Fidel, L. S. (1996). Using multivariate statistics. (3. Ed). MA: Allyn & Bacon, Inc.
  • Yaşa, F. (1996). Rasgele değişen bazı fiziksel olayların 3 boyutlu monte carlo yöntemi ile modellenmesi. (Yayınlanmamış yüksek lisans tezi). Kahramanmaraş Sütçü İmam Üniversitesi / Fen Bilimleri Enstitüsü. Kahramanmaraş,Türkiye.
  • Yen, W. M. (1987). A comparison of the efficiency and accuracy of BILOG and LOGIST. Psychometrika, 52, 275-291.
  • Yo, H., Han, K.T., and Oh, H.J. (2019). Software Packages for Item Response Theory–Based Test Simulation: WinGen3, SimulCAT, MSTGen, and IRTEQ. National Council on Measurement in Education Annual Meeting in Toronto, Ontorio Canada.
  • Yurdugül, H. (2008). Cronbach alfa katsayısı için minimum örneklem genişliği: Monte Carlo çalışması. H.U. Journal of Education, 35, 397-405.

Monte Carlo Simülasyon Yönteminde Tekrar Sayısı Klasik Test Kuramı Parametreleri İçin Kaç Olmalıdır?

Year 2020, Volume: 9 Issue: 2, 410 - 429, 24.06.2020

Abstract

Simülasyon çalışmalarındaki tekrar sayısının gerçeği
yansıtan sonuçlar üretmedeki önemi tartışılmazdır. Monte Carlo simülasyon
tekniği kullanılarak bir araştırma tasarlandığında, tekrar sayısı araştırma
sonuçlarının güvenilirliği ve geçerliliği için çok önemlidir. Ancak, kaç
tekrarın yeterli olduğu konusunda net bir bilgi yoktur. Bu çalışmada, Klasik
Test Teorisinde Monte Carlo simülasyon yöntemindeki tekrar sayısının madde ve
test parametresi tahminlerine etkisini belirlemek ve gerekli tekrar sayısını belirlemek
amaçlanmıştır. Bu amaçla, farklı koşullar altında tekrar sayısının
değiştirilmesiyle elde edilen veriler toplam varyans oranı, Cronbach Alfa
katsayısı, madde madde ortalama ortalaması ve model veri uyumu parametreleri
incelenmiştir.
Bu
çalışma bir Monte Carlo simülasyon çalışmasıdır.
Araştırmada veri üretimi ve analizi için R programı
(2011) “psych” paketi kullanılmıştır.
Bu çalışmada, tek boyutlu bir yapıdaki madde sayısı
20'ye, cevap kategorisi 5’e sabitlenerek, örneklem büyüklüğü 100, 250, 500, 1000
ve 3000 olarak değiştirilmiştir. Çalışmanın sonuçlarına göre, Klasik Test
Kuramı'na dayalı bir çalışmada araştırmacıların, örneklem büyüklüğü 100 iken
1000 tekrar ile, örneklem büyüklüğü 250 iken 500 tekrar ile, örneklem 500 iken
250 tekrar ile ve örneklem büyüklüğü 1000 ve 3000 iken 100 tekrar ile veri
üretmeleri önerilmektedir.

References

  • Aiken, L, R. (2000). Psychological testing and assessment. Boston. Allyn and Bacon.
  • Baker, F. B. (1998). An investigation of the item parameter recovery of a Gibbs sampling procedure. Applied Psychological Measurement, 22, 153-169.
  • Binois M., Huang J., Gramacy R.B., and Ludkovski M. (2019). Replication or exploration? Sequential design for stochastic simulation experiments. https://arxiv.org/abs/1710.03206. DOI: 10.1080/00401706.2018.1469433
  • Bock, R. D., and Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443-459.
  • Bock, R.D. (1997). A brief history of item response theory. Educational Measurement: Issues and Practice. Winter 1997.
  • Brooks, C. (2002). Introductory econometrics for finance. Cambridge University Press.
  • Brown, R. L. (1994). Efficacy of the indirect approach for estimating structural equation models with missing data: A Comparison of Methods. Structural Equation Modeling: A Multidisciplinary Journal. 1(4), 287-316.
  • Büyüköztürk, Ş. (2002). Sosyal bilimler için veri analizi el kitabı. Ankara: Pegem Yayıncılık.
  • Charter, R.A. (2003). Study samples are too small to produce sufficiently precise reliability coefficients. The Journal of General Psychology, 130, 117-129.
  • Child, D. (2006). The Essentials of factor analysis. Continuum, London.
  • Çelen, Ü. (2008). Klasik Test Kuramı ve Madde Tepki Kuramına dayalı olarak geliştirilen iki testin psikometrik özelliklerinin karşılaştırılması. İlköğretim Online, 7(3),758-768.
  • De Ayala, R. J. (2009). The theory and practice of item response theory. New York,
  • Dooley, K. (2002). Simulation research methods. In J. Baum (Ed.). Companion to organizations. London: Blackwell.
  • Drasgow, F., Levine, M., Tsien, S., Williams, B., and Mead, A. (1995). Fitting polytomous item response theory models to multiple-choice tests. Applied Psychological Measurement, 19(2), 143-165
  • Fay D.S., and Gerow K. A (2013). Biologist's guide to statistical thinking and analysis, WormBook, ed. The C. Elegans Research Community, WormBook, doi/10.1895/wormbook.1.159.1, http://www.wormbook.org.
  • Gifford, J.A., and Swaminathan, H. (1990), Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement 27(4), 361-370.
  • Glen Satten A., Flanders W. D., and Yang Q. (2001). Accounting for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model. Am. J. Hum. Genet. 68:466–477.
  • Goldman, S.H., and Raju, N. S. (1986). Recovery of one- and two-parameter logistic item parameters: An empirical study. Educational and Psychological Measurement, 46(1), 11-21.
  • Hambleton R.K., Swaminathan H. and H. J. Rogers (1991). Fundamentals of item response theory. Newbury Park, CA: SAGE Publications, Inc.
  • Hambleton, R. K., Jones R.W., and Rogers, H. J. (1993). Influence of item parameter estimation errors in test development. Journal of Educational Measurement, 30, 143-155.
  • Hammersley, J.M., and Handscomb, D.C. (1964). Monte-Carlo Methods. Springer Netherlands http://dx.doi.org/10.1007/978-94-009-5819-7
  • Harwell, M. R., and Janosky, J. E. (1991). An empirical study of the effects of small datasets and varying prior distribution variances on item parameter estimation in BILOG. Applied Psychological Measurement, 15, 279-291.
  • Harwell, M. R., Stone, C. A., Hsu, T. C., and Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101-125. https://doi.org/10.1177/014662169602000201
  • Harwell, M.R., Rubinstein E., Hayes W.S., and Olds, C. (1992). Summarizing Monte Carlo results in methodological research: The fixed effects single- and two-factor ANOVA cases. Journal of Educational Statistic, 17, 315-339.
  • Hauck, W.W., and Anderson, S. (1984). A new statistical procedure for testing equivalence in two-group comparative bioavailability trials. Journal of Pharmacokinetics and Biopharmaceutics, 12, 83-91.
  • Hulin, C. L., Lissak, R. I., and Drasgow, F. (1982). Recovery of two and three parameter logistic item characteristic curves: A Monte Carlo study. Applied Psychological Measurement, 6, 249-260.
  • Hutchinson, S.R., and Bandalos, D.L. (1997). A guide to Monte Carlo simulations for applied researchers. Journal of Vocational Education Research, 22(4), 233-245.
  • Kannan P., Sgammato A., Tannenbaum R.J., and Katz I.R. (2015) Evaluating the consistency of Angoff-Based cut scores using subsets of items within a generalizability theory framework, Applied Measurement in Education, 28(3), 169-186, DOI: 10.1080/08957347.2015.1042156
  • Kéry, M., and Royle J. A. (2016). Applied hierarchical modeling in ecology Volume 1: Prelude and Static Models Book.Science Direct.
  • Kline, P. (1986) A handbook of test construction: Introduction to psychometric design. New York: Methuen & Company.
  • Koçak, D. (2016). Kayıp veriyle baş etme yöntemlerinin Madde Tepki Kuramı bir parametereli lojistik modelinde model veri uyumuna ve standart hataya etkisi. Yayımlanmamış Doktora Tezi, Ankara Üniversitesi Eğitim Bilimleri Enstitüsü, Ankara.
  • Leventhall, B., and Ames, A. (2019). Using SAS for Monte Carlo Simulation Studies in Item Response Theory. National Council on Measurement in Education Annual Meeting in Toronto, Ontario Canada.
  • Lewis, P.A.W., and E.J. Orav. (1989). Simulation methodology for statisticians, operations analysts, and engineers. Volume 1. Wadsworth & Brooks/Cole, California: Pacific Grove.
  • Lord, F. M., and Novick, M. R.(1968). Statistical theories of mental test scores. Reading MA: Addison- Wesley.
  • Mundform, D.J., Schaffer, J., Myoung-Jin, K., Dale, S, Ampai, T., and Pornsin S.(2011). Number of replications required in Monte Carlo simulation studies: A synthesis of four studies. Journal of Modern Applied Statistical Methods: 10(1), Article 4. Available at: http://digitalcommons.wayne.edu/jmasm/vol10/iss1/4
  • Murie C., and Nadon R. (2018). A correction for the LPE statistical test. Bioconductor. https://www.bioconductor.org/packages/devel/bioc/vignettes/LPEadj/inst/doc/LPEadj.pdf
  • Naylor, T.H., Blantify J., Burdick D.S., and Chu K. (1968). Computer simulation techniques. John Wiley and Sons, New York.
  • Nunnally, J.C., and Bernstein, J.H. (1994). Psychometric theory. New York: McGraw-Hill. New Y: The Guilford Press.
  • Qualls, A. L., and Ansley, T. N. (1985, April). A comparison of item and ability parameter estimates derived from LOGIST and BILOG. Paper presented at the meeting of the National Council on Measurement in Education, Chicago.
  • R Development Core Team (2011), R: A language and environment for statistical computing, a foundation for statistical computing,.Vienna, Austria, ISBN 3900051-070, Erişim:[http://www.R-project.org].
  • Rubinstein, R. Y. (1981). Simulation and the Monte Carlo method. John Wiley and Sons, New York.
  • Saeki H., and Tango T. (2014) Statistical inference for non-inferiority of a diagnostic procedure compared to an alternative procedure, based on the difference in correlated proportions from multiple raters. In: van Montfort K., Oud J., Ghidey W. (eds) Developments in Statistical Evaluation of Clinical Trials. Springer, Berlin, Heidelberg.
  • Segall, D.O. (1994). The reliability of linearly equated tests. Psychometrika, 59, 361-375.
  • Sobol I.M. (1971). The Monte Carlo method. Moscow, Russian.
  • Stone, C. A. (1993). The use of multiple replications in IRT based Monte Carlo research. Paper presented at the European Meeting of the Psychometric Society, Barcelona.
  • Tabachnick, B. G., and Fidel, L. S. (1996). Using multivariate statistics. (3. Ed). MA: Allyn & Bacon, Inc.
  • Yaşa, F. (1996). Rasgele değişen bazı fiziksel olayların 3 boyutlu monte carlo yöntemi ile modellenmesi. (Yayınlanmamış yüksek lisans tezi). Kahramanmaraş Sütçü İmam Üniversitesi / Fen Bilimleri Enstitüsü. Kahramanmaraş,Türkiye.
  • Yen, W. M. (1987). A comparison of the efficiency and accuracy of BILOG and LOGIST. Psychometrika, 52, 275-291.
  • Yo, H., Han, K.T., and Oh, H.J. (2019). Software Packages for Item Response Theory–Based Test Simulation: WinGen3, SimulCAT, MSTGen, and IRTEQ. National Council on Measurement in Education Annual Meeting in Toronto, Ontorio Canada.
  • Yurdugül, H. (2008). Cronbach alfa katsayısı için minimum örneklem genişliği: Monte Carlo çalışması. H.U. Journal of Education, 35, 397-405.
There are 50 citations in total.

Details

Primary Language Turkish
Journal Section Research Article
Authors

Duygu Koçak 0000-0003-3211-0426

Publication Date June 24, 2020
Published in Issue Year 2020Volume: 9 Issue: 2

Cite

APA Koçak, D. (2020). Monte Carlo Simülasyon Yönteminde Tekrar Sayısı Klasik Test Kuramı Parametreleri İçin Kaç Olmalıdır?. Cumhuriyet Uluslararası Eğitim Dergisi, 9(2), 410-429.

14550                 

© Cumhuriyet University, Faculty of Education