Year 2021, Volume 8 , Issue 2, Pages 423 - 453 2021-06-10

A Guide for More Accurate and Precise Estimations in Simulative Unidimensional IRT Models


There is a great deal of research about item response theory (IRT) conducted by simulations. Item and ability parameters are estimated with varying numbers of replications under different test conditions. However, it is not clear what the appropriate number of replications should be. The aim of the current study is to develop guidelines for the adequate number of replications in conducting Monte Carlo simulation studies involving unidimensional IRT models. For this aim, 192 simulation conditions which included four sample sizes, two test lengths, eight replication numbers, and unidimensional IRT models were generated. Accuracy and precision of item and ability parameter estimations and model fit values were evaluated by considering the number of replications. In this context, for the item and ability parameters; mean error, root mean square error, standard error of estimates, and for model fit; M_2, 〖RMSEA〗_2, and Type I error rates were considered. The number of replications did not seem to influence the model fit, it was decisive in Type I error inflation and error prediction accuracy for all IRT models. It was concluded that to get more accurate results, the number of replications should be at least 625 in terms of accuracy of the Type I error rate estimation for all IRT models. Also, 156 replications and above can be recommended. Item parameter biases were examined, and the largest bias values were obtained from the 3PL model. It can be concluded that the increase in the number of parameters estimated by the model resulted in more biased estimates.
Monte carlo simulation study, Replication, Unidimensional item response theory models, Bias estimation, Type I error inflation
  • Ames, A. J., Leventhal, B. C., & Ezike, N. C. (2020). Monte Carlo simulation in item response theory applications using SAS. Measurement: Interdisciplinary Research and Perspectives, 18(2), 55-74.
  • Babcock, B. (2011). Estimating a noncompensatory IRT model using Metropolis within Gibbs sampling. Applied Psychological Measurement, 35(4), 317 329.
  • Bahry, L. M. (2012). Polytomous item response theory parameter recovery: an investigation of nonnormal distributions and small sample size [Master’s Thesis]. ProQuest Dissertations and Theses Global.
  • Baker, F. B. (1998). An investigation of the item parameter recovery characteristics of a Gibbs sampling procedure. Applied Psychological Measurement, 22(2), 153-169.
  • Baldwin, P. (2011). A strategy for developing a common metric in item response theory when parameter posterior distributions are known. Journal of Educational Measurement, 48(1), 1-11. Retrieved December 9, 2020, from
  • Barış Pekmezci, F., & Gülleroğlu, H. (2019). Investigation of the orthogonality assumption in the bifactor item response theory. Eurasian Journal of Educational Research, 19(79), 69-86.
  • Bulut, O., & Sünbül, Ö. (2017). Monte Carlo simulation studies in item response theory with the R programming language. Journal of Measurement and Evaluation in Education and Psychology, 8(3), 266-287.
  • Cai, L., & Thissen, D. (2014). Modern Approaches to Parameter Estimation in Item Response Theory from: Handbook of Item Response Theory Modeling, Applications to Typical Performance Assessment. Routledge.
  • Chalmers, R. P. (2012). Mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1 29.
  • Chuah, S. C., Drasgow, F., & Luecht, R. (2006). How big is big enough? Sample size requirements for CAST item parameter estimation. Applied Measurement in Education, 19(3), 241-255.
  • Cohen, A. S., Kim, S. H., & Baker, F. B. (1993). Detection of differential item functioning in the graded response model. Applied Psychological Measurement, 17(4), 335-350.
  • Crişan, D. R., Tendeiro, J. N., & Meijer, R. R. (2017). Investigating the practical consequences of model misfit in unidimensional IRT models. Applied Psychological Measurement, 41(6), 439-455.
  • Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Orlando: Harcourt Brace Jovanovich Inc.
  • De Ayala, R. J. (2009). The theory and practice of item response theory. New York, NY: Guilford Press.
  • De La Torre, J., & Patz, R. J. (2005). Making the most of what we have: A practical application of multidimensional item response theory in test scoring. Journal of Educational and Behavioral Statistics, 30(3), 295-311.
  • DeMars, C. E. (2002, April). Recovery of graded response and partial credit parameters in multilog and parscale. Annual meeting of American Educational Research Association, Chicago.
  • Feinberg, R. A., & Rubright, J. D. (2016). Conducting simulation studies in psychometrics. Educational Measurement: Issues and Practice, 35(2), 36-49.
  • Fu, J. (2019). Maximum marginal likelihood estimation with an expectation–maximization algorithm for multigroup/mixture multidimensional item response theory models (No. RR-19-35). ETS Research Report Series,
  • Gao, F., & Chen, L. (2005). Bayesian or non-Bayesian: A comparison study of item parameter estimation in the three-parameter logistic model. Applied Measurement in Education, 18(4), 351-380.
  • Glass, G. V., Peckham, P. D., & Sanders, J. R. (1972). Consequences of failure to meet assumptions underlying the fixed effects analyses of variance and covariance. Review of Educational Research, 42(3), 237-288.
  • Goldman, S. H., & Raju, N. S. (1986). Recovery of one-and two-parameter logistic item parameters: An empirical study. Educational and Psychological Measurement, 46(1), 11-21.
  • Hair, J. F., Black W. C., Babin, B. J., & Anderson, R. E. (2019). Multivariate data analysis. (8th edition). Annabel Ainscow.
  • Han, K. T. (2007). WinGen: windows software that generates irt parameters and item responses. Applied Psychological Measurement, 31(5), 457 459.
  • Hanson, B. A. (1998, October). IRT parameter estimation using the EM algorithm.
  • Harwell, M. (1997). Analyzing the results of monte carlo studies in item response theory. Educational and Psychological Measurement, 57(2), 266-279.
  • Harwell, M. R., & Baker, F. B. (1991). The use of prior distributions in marginalized Bayesian item parameter estimation: A didactic. Applied Psychological Measurement, 15(4), 375–389.
  • Harwell, M., Stone, C. A., Hsu, T. C., & Kirisci, L. (1996). Monte carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101 125.
  • Hulin, C. L., Lissak, R. I., & Drasgow, F. (1982). Recovery of two and three-parameter logistic item characteristic curves: A Monte Carlo study. Applied Psychological Measurement, 6(3), 249–260.
  • Jiang, S., Wang, C., & Weiss, D. J. (2016). Sample size requirements for estimation of item parameters in the multidimensional graded response model. Frontiers in psychology, 7, 109.
  • Kirisci, L., Hsu, T. C., & Yu, L. (2001). Robustness of item parameter estimation programs to assumptions of unidimensionality and normality. Applied Psychological Measurement, 25(2), 146-162.
  • Kleijnen, J. P. (1987). Statistical tools for simulation practitioners. Marcel Dekker.
  • Lee, S., Bulut, O., & Suh, Y. (2017). Multidimensional extension of multiple indicators multiple causes models to detect DIF. Educational and Psychological Measurement, 77(4), 545–569.
  • Lord, F. M. (1968). An analysis of the verbal scholastic aptitude test using Birnbaum's three-parameter logistic model. Educational and Psychological Measurement, 28(4), 989-1020.
  • Matlock, K. L., & Turner, R. (2016). Unidimensional IRT item parameter estimates across equivalent test forms with confounding specifications within dimensions. Educational and Psychological Measurement, 76(2), 258 279.
  • Matlock Cole, K., & Paek, I. (2017). PROC IRT: A SAS procedure for item response theory. Applied Psychological Measurement, 41(4), 311 320.
  • McDonald, R. P. (1982). Linear Versus Models in Item Response Theory. Applied Psychological Measurement, 6(4), 379 396.
  • Mislevy, R. J., & Stocking, M. L. (1989). A consumer's guide to LOGIST and BILOG. Applied Psychological Measurement, 13(1), 57 75.
  • Mooney, C. Z. (1997). Monte Carlo simulation. Thousand Oaks, CA: Sage.
  • Mundform, D. J., Schaffer, J., Kim, M. J., Shaw, D., Thongteeraparp, A., & Supawan, P. (2011). Number of replications required in Monte Carlo simulation studies: A synthesis of four studies. Journal of Modern Applied Statistical Methods, 10(1), 19-28.
  • Park, Y. S., Lee, Y. S., & Xing, K. (2016). Investigating the impact of item parameter drift for item response theory models with mixture distributions. Frontiers in Psychology, 7, 255.
  • Patsias, K., Sheng, Y., & Rahimi, S. (2009, September 24-26). A high performance Gibbs sampling algorithm for item response theory. 22nd International Conference on Parallel and Distributed Computing and Communication Systems, Kentucky, USA.
  • Patsula, L. N., & Gessaroli, M. E. A (1995, April). Comparison of item parameter estimates and iccs produced.
  • Preecha, C. (2004). Numbers of replications required in ANOVA simulation studies [Doctoral dissertation, University of Northern Colorado]. ProQuest Dissertations and Theses Global.
  • Reise, S. P., & Yu, J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27(2), 133 144.
  • Reise, S., Moore, T., & Maydeu-Olivares, A. (2011). Target rotations and assessing the impact of model violations on the parameters of unidimensional item response theory models. Educational and Psychological Measurement, 71(4), 684-711.
  • Roberts, J. S., Donoghue, J. R., & Laughlin, J. E. (2002). Characteristics of MML/EAP parameter estimates in the generalized graded unfolding model. Applied Psychological Measurement, 26(2), 192-207.
  • Rubinstein, R. Y. (1981). Simulation and the Monte Carlo method. John Wiley and Sons, New York.
  • Sahin, A., & Anil, D. (2017). The effects of test length and sample size on item parameters in item response theory. Educational Sciences: Theory & Practice, 17(1), 321-33.
  • Sarkar, D. (2008). Lattice: multivariate data visualization with R. Springer, New York.
  • Schumacker, R. E, Smith, R. M., & Bush, J. M. (1994, April). Examining replication effects in Rasch fit statistics. American Educational Research Association Annual Meeting, New Orleans.
  • Sen, S., Cohen, A. S., & Kim, S. H. (2016). The impact of non-normality on extraction of spurious latent classes in mixture IRT models. Applied Psychological Measurement, 40(2), 98-113.
  • Sheng, Y., & Wikle, C. K. (2007). Comparing multiunidimensional and unidimensional item response theory models. Educational and Psychological Measurement, 67(6), 899-919.
  • Tavares, H. R., Andrade, D. F. D., & Pereira, C. A. D. B. (2004). Detection of determinant genes and diagnostic via item response theory. Genetics and Molecular Biology, 27(4), 679-685.
  • Thissen, D., & Wainer, H. (1982). Some standard errors in item response theory. Psychometrika, 47(4), 397-412.
  • Thompson, B. (2004). Exploratory and confirmatory factor analysis. Amer Psychological Assn.
  • Thompson, B. (2006). Foundations of behavioral statistics: An insight-based approach. Guilford Press.
  • Thompson, N. A. (2009). Ability estimation with item response theory. Assessment Systems Corporation. _Ability_estimation_with_IRT.pdf
  • Şengül Avşar, A., & Tavşancıl, E. (2017). Examination of polytomous items’ psychometric properties according to nonparametric item response theory models in different test conditions. Educational Sciences: Theory & Practice, 17(2).
  • van der Linden, W. J. (Ed.). (2018). Handbook of item response theory, three volume set. CRC Press.
  • van Onna, M. J. H. (2004). Ordered latent class models in nonparametric item response theory. [Doctoral dissertation]. University of Groningen.
  • Weissman, A. (2013). Optimizing information using the EM algorithm in item response theory. Annals of Operations Research, 206(1), 627 646.
  • Yang, S. (2007). A comparison of unidimensional and multidimensional RASCH models using parameter estimates and fit indices when assumption of unidimensionality is violated [Doctoral dissertation, The Ohio State University]. ProQuest Dissertations and Theses Global.
  • Yen, W. M. (1987). A comparison of the efficiency and accuracy of BILOG and LOGIST. Psychometrika, 52(2), 275-291.
  • Zhang, B. (2008). Application of unidimensional item response models to tests with items sensitive to secondary dimensions. The Journal of Experimental Education, 77(2), 147-166.
Primary Language en
Subjects Education, Scientific Disciplines
Published Date June
Journal Section Articles

Orcid: 0000-0001-6989-512X
Country: Turkey

Orcid: 0000-0001-5522-2514
Author: Asiye ŞENGÜL AVŞAR (Primary Author)
Institution: Recep Tayyip Erdoğan Üniversitesi
Country: Turkey


Publication Date : June 10, 2021

APA Baris Pekmezci, F , Şengül Avşar, A . (2021). A Guide for More Accurate and Precise Estimations in Simulative Unidimensional IRT Models . International Journal of Assessment Tools in Education , 8 (2) , 423-453 . DOI: 10.21449/ijate.790289