Research Article
BibTex RIS Cite

Monte Carlo Simulation Studies in Item Response Theory with the R Programming Language

Year 2017, Volume: 8 Issue: 3, 266 - 287, 30.09.2017
https://doi.org/10.21031/epod.305821

Abstract

Monte Carlo simulation studies play an important role in operational and academic research in educational measurement and psychometrics. Item response theory (IRT) is a psychometric area in which researchers and practitioners often use Monte Carlo simulations to address various research questions. Over the past decade, R has been one of the most widely used programming languages in Monte Carlo studies. R is a free, open-source programming language for statistical computing and data visualization. Many user-created packages in R allow researchers to conduct various IRT analyses (e.g., item parameter estimation, ability estimation, and differential item functioning) and expand these analyses to comprehensive simulation scenarios where the researchers can investigate their specific research questions. This study aims to introduce R and demonstrate the design and implementation of Monte Carlo simulation studies using the R programming language. Three IRT-related Monte Carlo simulation studies are presented. Each simulation study involved a Monte Carlo simulation function based on the R programming language. The design and execution of the R commands is explained in the context of each simulation study.

References

  • Albano, A. D. (2016). equate: An R package for observed-score linking and equating. Journal of Statistical Software, 74(8), 1–36.
  • Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York: Guilford.
  • Browne, M., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.). Testing structural equation models. Newbury Park, CA: Sage.
  • Bulut, O. (2013). Between-person and within-person subscore reliability: Comparison of unidimensional and multidimensional IRT models. (Unpublished doctoral dissertation). University of Minnesota, Minneapolis, MN.
  • Bulut, O. (2015). Applying item response theory models to Entrance Examination for Graduate Studies: Practical issues and insights. Journal of Measurement and Evaluation in Education and Psychology, 6(2), 313–330.
  • Cai, L. (2013). flexMIRT version 2.00: A numerical engine for flexible multilevel multidimensional item analysis and test scoring [Computer software]. Chapel Hill, NC: Vector Psychometric Group.
  • Cai, L., Thissen, D., & du Toit, S. H. C. (2011). IRTPRO: Flexible, multidimensional, multiple categorical IRT modeling [Computer software]. Lincolnwood, IL: Scientific Software International.
  • Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29.
  • Chalmers, R. P. (2016). Generating adaptive and non-adaptive test interfaces for multidimensional item response theory applications. Journal of Statistical Software, 71(5), 1–39.
  • Choi, S. W., Gibbons, L. E., & Crane, P. K. (2016). lordif: Logistic ordinal regression differential item functioning using IRT [Computer software]. Available from https://CRAN.R-project.org/package=lordif.
  • Feinberg, R. A., & Rubright, J. D. (2016). Conducting simulation studies in psychometrics. Educational Measurement: Issues and Practice, 35(2), 36–49.
  • Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278–295.
  • Hallgren, K. A. (2013). Conducting simulation studies in the R programming environment. Tutorials in Quantitative Methods for Psychology, 9(2), 43–60.
  • Hambleton, R. K., Swaminathan, H., & Rogers, J. H. (1991). Fundamentals of item response theory. New York: Sage publications.
  • Harwell, M., Stone, C. A., Hsu, T., & Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101–125.
  • Hu, L.T., & Bentler, P.M. (1999). Cutoff criteria for fit indices in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55.
  • Jiang, S., Wang, C., & Weiss, D. J. (2016). Sample size requirements for estimation of item parameters in the multidimensional graded response model. Frontiers in Psychology, 7(109), 1–10.
  • Lee, S., Bulut, O., & Suh, Y. (2016). Multidimensional extension of multiple indicators multiple causes models to detect DIF. Educational and Psychological Measurement. Advance online publication. doi: 10.1177/0013164416651116.
  • Mair, P., Hatzinger, R., & Maier M. J. (2016). eRm: Extended Rasch modeling [Computer software]. Available from http://CRAN.R-project.org/package=eRm.
  • Magis, D., Beland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42, 847–862.
  • Magis, D., & Raiche, G. (2012). Random generation of response patterns under computerized adaptive testing with the R Package catR. Journal of Statistical Software, 48(8), 1–31.
  • Muthén, L. K., & Muthén, B. O. (1998-2015). Mplus User’s Guide. Seventh Edition. Los Angeles, CA: Muthén & Muthén.
  • Partchev, I. (2016). irtoys: A collection of functions related to item response theory (IRT) [Computer software]. Available from https://CRAN.R-project.org/package=irtoys.
  • R Core Team (2017). R: A language and environment for statistical computing [Computer software]. Vienna, Austria: R Foundation for Statistical Computing. Available from https://cran.r-project.org/. Raju, N. S., van der Linden, W. J., & Fleer, P. F., (1995). An IRT-based internal measure of test bias with application of differential item functioning. Applied Psychological Measurement, 19, 353–368.
  • Revolution Analytics, & Weston, S. (2015). doParallel: Foreach parallel adaptor for the 'parallel' package [Computer software]. Available from https://CRAN.R-project.org/package=doParallel.
  • Rizopoulos, D. (2006). ltm: An R package for latent variable modelling and item response theory analyses. Journal of Statistical Software, 17(5), 1–25.
  • Robitzsch, A. (2017). sirt: Supplementary item response theory models [Computer software]. Available from https://CRAN.R-project.org/package=sirt.
  • Robitzsch, A. & Rupp, A. A. (2009). The impact of missing data on the detection of differential item functioning. Educational Psychological Measurement, 69, 18–34.
  • Rusch, T., Mair, P., & Hatzinger, R. (2013). Psychometrics with R: A review of CRAN packages for item response theory. Retrieved from http://epub.wu.ac.at/4010/1/resrepIRThandbook.pdf
  • Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores (Psychometric Monograph No. 17). Richmond, VA: Psychometric Society.
  • SAS Institute Inc. (2014). Version 9.4 of the SAS system for Windows. Cary, NC: SAS Institute Inc.
  • Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DIF as well as item bias/DTF. Psychometrika, 58, 159–194.
  • Sinharay, S., Stern, H. S., & Russell, D. (2001). The use of multiple imputation for the analysis of missing data. Psychological Methods, 6, 317–329.
  • StataCorp. (2015). Stata statistical software: Release 14. College Station, TX: StataCorp LP.
  • Stout, W., Li, H., Nandakumar, R., & Bolt, D. (1997). MULTISIB – A procedure to investigate DIF when a test is intentionally multidimensional. Applied Psychological Measurement, 21, 195–213.
  • Suh, Y., & Cho, S.-J. (2014). Chi-square difference tests for detecting functioning in a multidimensional IRT model: A Monte Carlo study. Applied Psychological Measurement, 38, 359–375.
  • Ünlü, A., & Yanagida, T. (2011). R you ready for R?: The CRAN psychometrics task view. British Journal of Mathematical and Statistical Psychology, 64(1), 182–186.
  • Venables, W. N., & Ripley, B. D. (2002). Modern applied statistics with S. New York, NY: Springer.
  • Woods, C. M., & Grimm, K. J. (2011). Testing for nonuniform differential item functioning with multiple indicator multiple cause models. Applied Psychological Measurement, 35, 339–361.
  • Yao, L. (2003). BMIRT: Bayesian multivariate item response theory [Computer software]. Monterey, CA: Defense Manpower Data Center. Available from http://www.bmirt.com.

R Programlama Dili ile Madde Tepki Kuramında Monte Carlo Simülasyon Çalışmaları

Year 2017, Volume: 8 Issue: 3, 266 - 287, 30.09.2017
https://doi.org/10.21031/epod.305821

Abstract

.Eğitimde ölçme ve psikometri alanlarında yapılan akademik ve uygulamaya dönük araştırmalarda Monte Carlo simülasyon çalışmaları önemli bir rol oynamaktadır. Psikometrik çalışmalarda araştırmacıların Monte Carlo simülasyonlarına sıklıkla başvurduğu temel konulardan birisi Madde Tepki Kuramı’dır (MTK). Geçtiğimiz son on yılda MTK ile ilgili yapılan simülasyon çalışmalarında R'ın sıklıkla kullanıldığı görülmektedir. R istatiksel hesaplama ve görsel üretme için kullanılan ücretsiz ve açık kaynak bir programlama dilidir. R kullanıcıları tarafından üretilen birçok paket program ile madde parametrelerini kestirme, madde yanlılık analizleri gibi birçok MTK temelli analiz yapılabilmektedir. Bu çalışma, R programına dair giriş niteliğinde bilgiler vermek ve R programlama dili MTK temelli Monte Carlo simülasyon çalışmalarının nasıl yapılabileceğini göstermeyi amaçlamaktadır. R programlama dilini örneklerle açıklamak için üç farklı Monte Carlo simülasyon çalışması gösterilmektedir. Her bir çalışmada, simülasyon içerisindeki R komutları ve fonksiyonları MTK kapsamında açıklanmaktadır.

References

  • Albano, A. D. (2016). equate: An R package for observed-score linking and equating. Journal of Statistical Software, 74(8), 1–36.
  • Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York: Guilford.
  • Browne, M., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A. Bollen & J. S. Long (Eds.). Testing structural equation models. Newbury Park, CA: Sage.
  • Bulut, O. (2013). Between-person and within-person subscore reliability: Comparison of unidimensional and multidimensional IRT models. (Unpublished doctoral dissertation). University of Minnesota, Minneapolis, MN.
  • Bulut, O. (2015). Applying item response theory models to Entrance Examination for Graduate Studies: Practical issues and insights. Journal of Measurement and Evaluation in Education and Psychology, 6(2), 313–330.
  • Cai, L. (2013). flexMIRT version 2.00: A numerical engine for flexible multilevel multidimensional item analysis and test scoring [Computer software]. Chapel Hill, NC: Vector Psychometric Group.
  • Cai, L., Thissen, D., & du Toit, S. H. C. (2011). IRTPRO: Flexible, multidimensional, multiple categorical IRT modeling [Computer software]. Lincolnwood, IL: Scientific Software International.
  • Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29.
  • Chalmers, R. P. (2016). Generating adaptive and non-adaptive test interfaces for multidimensional item response theory applications. Journal of Statistical Software, 71(5), 1–39.
  • Choi, S. W., Gibbons, L. E., & Crane, P. K. (2016). lordif: Logistic ordinal regression differential item functioning using IRT [Computer software]. Available from https://CRAN.R-project.org/package=lordif.
  • Feinberg, R. A., & Rubright, J. D. (2016). Conducting simulation studies in psychometrics. Educational Measurement: Issues and Practice, 35(2), 36–49.
  • Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278–295.
  • Hallgren, K. A. (2013). Conducting simulation studies in the R programming environment. Tutorials in Quantitative Methods for Psychology, 9(2), 43–60.
  • Hambleton, R. K., Swaminathan, H., & Rogers, J. H. (1991). Fundamentals of item response theory. New York: Sage publications.
  • Harwell, M., Stone, C. A., Hsu, T., & Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101–125.
  • Hu, L.T., & Bentler, P.M. (1999). Cutoff criteria for fit indices in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55.
  • Jiang, S., Wang, C., & Weiss, D. J. (2016). Sample size requirements for estimation of item parameters in the multidimensional graded response model. Frontiers in Psychology, 7(109), 1–10.
  • Lee, S., Bulut, O., & Suh, Y. (2016). Multidimensional extension of multiple indicators multiple causes models to detect DIF. Educational and Psychological Measurement. Advance online publication. doi: 10.1177/0013164416651116.
  • Mair, P., Hatzinger, R., & Maier M. J. (2016). eRm: Extended Rasch modeling [Computer software]. Available from http://CRAN.R-project.org/package=eRm.
  • Magis, D., Beland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42, 847–862.
  • Magis, D., & Raiche, G. (2012). Random generation of response patterns under computerized adaptive testing with the R Package catR. Journal of Statistical Software, 48(8), 1–31.
  • Muthén, L. K., & Muthén, B. O. (1998-2015). Mplus User’s Guide. Seventh Edition. Los Angeles, CA: Muthén & Muthén.
  • Partchev, I. (2016). irtoys: A collection of functions related to item response theory (IRT) [Computer software]. Available from https://CRAN.R-project.org/package=irtoys.
  • R Core Team (2017). R: A language and environment for statistical computing [Computer software]. Vienna, Austria: R Foundation for Statistical Computing. Available from https://cran.r-project.org/. Raju, N. S., van der Linden, W. J., & Fleer, P. F., (1995). An IRT-based internal measure of test bias with application of differential item functioning. Applied Psychological Measurement, 19, 353–368.
  • Revolution Analytics, & Weston, S. (2015). doParallel: Foreach parallel adaptor for the 'parallel' package [Computer software]. Available from https://CRAN.R-project.org/package=doParallel.
  • Rizopoulos, D. (2006). ltm: An R package for latent variable modelling and item response theory analyses. Journal of Statistical Software, 17(5), 1–25.
  • Robitzsch, A. (2017). sirt: Supplementary item response theory models [Computer software]. Available from https://CRAN.R-project.org/package=sirt.
  • Robitzsch, A. & Rupp, A. A. (2009). The impact of missing data on the detection of differential item functioning. Educational Psychological Measurement, 69, 18–34.
  • Rusch, T., Mair, P., & Hatzinger, R. (2013). Psychometrics with R: A review of CRAN packages for item response theory. Retrieved from http://epub.wu.ac.at/4010/1/resrepIRThandbook.pdf
  • Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores (Psychometric Monograph No. 17). Richmond, VA: Psychometric Society.
  • SAS Institute Inc. (2014). Version 9.4 of the SAS system for Windows. Cary, NC: SAS Institute Inc.
  • Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DIF as well as item bias/DTF. Psychometrika, 58, 159–194.
  • Sinharay, S., Stern, H. S., & Russell, D. (2001). The use of multiple imputation for the analysis of missing data. Psychological Methods, 6, 317–329.
  • StataCorp. (2015). Stata statistical software: Release 14. College Station, TX: StataCorp LP.
  • Stout, W., Li, H., Nandakumar, R., & Bolt, D. (1997). MULTISIB – A procedure to investigate DIF when a test is intentionally multidimensional. Applied Psychological Measurement, 21, 195–213.
  • Suh, Y., & Cho, S.-J. (2014). Chi-square difference tests for detecting functioning in a multidimensional IRT model: A Monte Carlo study. Applied Psychological Measurement, 38, 359–375.
  • Ünlü, A., & Yanagida, T. (2011). R you ready for R?: The CRAN psychometrics task view. British Journal of Mathematical and Statistical Psychology, 64(1), 182–186.
  • Venables, W. N., & Ripley, B. D. (2002). Modern applied statistics with S. New York, NY: Springer.
  • Woods, C. M., & Grimm, K. J. (2011). Testing for nonuniform differential item functioning with multiple indicator multiple cause models. Applied Psychological Measurement, 35, 339–361.
  • Yao, L. (2003). BMIRT: Bayesian multivariate item response theory [Computer software]. Monterey, CA: Defense Manpower Data Center. Available from http://www.bmirt.com.
There are 40 citations in total.

Details

Journal Section Articles
Authors

Okan Bulut 0000-0001-5853-1267

Önder Sünbül This is me

Publication Date September 30, 2017
Acceptance Date July 12, 2017
Published in Issue Year 2017 Volume: 8 Issue: 3

Cite

APA Bulut, O., & Sünbül, Ö. (2017). R Programlama Dili ile Madde Tepki Kuramında Monte Carlo Simülasyon Çalışmaları. Journal of Measurement and Evaluation in Education and Psychology, 8(3), 266-287. https://doi.org/10.21031/epod.305821

Cited By












Computer Adaptive Testing Simulations in R
International Journal of Assessment Tools in Education
BAŞAK ERDEM KARA
https://doi.org/10.21449/ijate.621157