A Short Note on Obtaining Item Parameter Estimates of IRT Models with Bayesian Estimation in Mplus

Sedat Şen; Allan Cohen; Seock-ho Kım

doi:10.21031/epod.693719

Research Article

Year 2020, Volume: 11 Issue: 3, 266 - 282, 27.09.2020

Sedat Şen , Allan Cohen , Seock-ho Kım

https://doi.org/10.21031/epod.693719

Cited By: 2

Abstract

References

Albert, J. H. (1992). Bayesian estimation of normal ogive item response curves using Gibbs sampling. Journal of Educational Statistics, 17(3), 251–269.
Albert, J. H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422), 669–679.
Asparouhov, T., & Muthén, B. (2010a). Bayesian analysis of latent variable models using Mplus. (Technical Report). Version 4. Retrieved from http://www.statmodel.com/download/BayesAdvantages18.pdf
Asparouhov, T., & Muthén, B. (2010b). Bayesian analysis using Mplus, version 4. Technical report. Los Angeles: Muthén and Muthén. www.statmodel.com.
Asparouhov, T., & Muthén, B. (2016). IRT in Mplus (Technical report). Los Angeles, CA: Muthén & Muthén.
Baker, F. B. (1998). An investigation of the item parameter recovery characteristics of a Gibbs sampling procedure. Applied Psychological Measurement, 22(2), 153–169.
Barton, M.A., & Lord, F. M. (1981). An upper asymptote for the three-parameter logistic item-response model (ETS RR- 81-20). Princeton, NJ: Educational Testing Service.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee's ability. In F.M. Lord and M.R. Novick, Statistical theories of mental test scores (pp. 395-479). Reading, Massachusetts: Addison-Wesley.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459.
Bock, R. D., & Lieberman, M. (1970). Fitting a response model for n dichotomously scored items. Psychometrika, 35(2), 179–197.
Bolt, D. M., Cohen, A. S., & Wollack, J. A. (2001). A mixture item response model for multiple-choice data. Journal of Educational and Behavioral Statistics, 26(4), 381–409.
Culpepper, S. A. (2016). Revisiting the 4-parameter item response model: Bayesian estimation and application. Psychometrika, 81(4), 1142–1163.
Curtis, S. M. (2010). BUGS code for item response theory. Journal of Statistical Software, 36(1), 1–34.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Hillsdale, NJ: Erlbaum.
Forero, C. G., & Maydeu-Olivares, A. (2009). Estimation of IRT graded response models: Limited versus full information methods. Psychological Methods, 14(3), 275–299.
Fox, J. P. (2007). Multi-level IRT modeling in practice with the Package mlirt. Journal of Statistical Software, 20(5), 1–16.
Fox, J. P. (2010). Bayesian item response modeling: Theory and applications. New York, NY: Springer.
Gao, F. & Chen, L. (2005). Bayesian or non-Bayesian: A comparison study of item parameter estimation in the three-parameter logistic model. Applied Measurement in Education, 18(4), 351–380.
Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457–472.
Ghosh, M., Ghosh, A., Chen, M. H., & Agresti, A. (2000). Non-informative priors for one-parameter item response models. Journal of Statistical Planning and Inference, 88(1), 99–115.
Haley, D. C. (1952). Estimation of the dosage mortality relationship when the dose is subject to error. (Tech. Rep. No.15, Office of Naval Research Contract No. 25140, NR-342–022). Stanford, CA: Stanford University, Applied Mathematics and Statistics Laboratory.
Jackman, S., Tahk, A., Zeileis, A., Maimone, C., Fearon, J., Meers, Z., ... & Imports, M. A. S. S. (2017). Package ‘pscl’. See http://github. com/atahk/pscl.
Jara, A., Hanson, T., Quintana, F. A., Mueller, P., & Rosner, G. (2012). DPpackage: Bayesian nonparametric modeling in R. R package version, 1-1.
Kieftenbeld, V., & Natesan, P. (2012). Recovery of graded response model parameters: A comparison of marginal maximum likelihood and Markov chain Monte Carlo estimation. Applied Psychological Measurement, 36(5), 399–419.
Kim, S. H. (2003). An investigation of Bayes estimation procedures for the two-parameter logistic model. In H. Yanai, A. Okada, K. Shigemasu, Y. Kano, & J. J. Meulman (Eds.), New developments in psychometrics (pp. 389-396). Tokyo: Springer.
Kuo, T. C., & Sheng, Y. (2015). Bayesian estimation of a multi-unidimensional graded response IRT model. Behaviormetrika, 42(2), 79–94.
Lee, S. Y., & Song, X. Y. (2004). Evaluation of the Bayesian and maximum likelihood approaches in analyzing structural equation models with small sample sizes. Multivariate Behavioral Research, 39(4), 653–686.
Luo, Y. (2018). A short note on estimating the testlet model with different estimators in Mplus. Educational and Psychological Measurement, 78(3), 517–529.
Luo, Y., & Dimitrov, D. M. (2019). A short note on obtaining point estimates of the IRT ability parameter with MCMC estimation in Mplus: How many plausible values are needed?. Educational and Psychological Measurement, 79(2), 272–287.
Martin, A. D., Quinn, K. M., & Park, J. H. (2011). Mcmcpack: Markov chain Monte Carlo in R. Journal of Statistical Software, 42(9), 1–21.
Masters, G. N., & Wright, B. D. (1997). The partial credit model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 101–121). New York, NY: Springer.
Meuleman, B., & Billiet, J. (2009). A Monte Carlo sample size study: How many countries are needed for accurate multi-level SEM?. Survey Research Methods, 3(1), 45–58.
Muraki, E. (1997). A generalized partial credit model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 153–164). New York, NY: Springer.
Muraki, E., & Bock, R. D. (1996). PARSCALE: IRT based test scoring and item analysis for graded open-ended exercises and performance tasks (Version 3) [Computer software]. Chicago, IL: Scientific Software.
Muthén, B., & Asparouhov, T. (2013). Item response modeling in Mplus: A multi-dimensional, multi-level, and multi-timepoint example. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of item response theory: Models, statistical tools, and applications (Vol 1, pp. 527-539). Boca Raton, FL: Chapman & Hall/CRC Press.
Muthén, L. K., & Muthén, B. O. (1998-2019). Mplus (version 8.3)[computer software]. Los Angeles, CA: Muthén & Muthén.
Natesan, P., Nandakumar, R., Minka, T., & Rubright, J. D. (2016). Bayesian prior choice in IRT estimation using MCMC and variational Bayes. Frontiers in Psychology, 7, 1422.
Paek, I., Cui, M., Öztürk Gübeş, N., & Yang, Y. (2018). Estimation of an IRT model by Mplus for dichotomously scored responses under different estimation methods. Educational and Psychological Measurement, 78(4), 569–588.
Patz, R. J., & Junker, B. W. (1999a). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24(2), 146–178.
Patz, R. J., & Junker, B. W. (1999b). Applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses. Journal of Educational and Behavioral Statistics, 24(4), 342–366.
Plummer, M. (2003). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In Proceedings of the 3rd international workshop on distributed statistical computing (Vol. 124, p. 125). Technische Universita¨t Wien, Vienna, Austria.
Rupp, A. A., Dey, D. K., & Zumbo, B. D. (2004). To Bayes or not to Bayes, from whether to when: Applications of Bayesian methodology to modeling. Structural Equation Modeling, 11(3), 424–451.
SAS Institute. (2017). Base SAS 9.4 procedures guide: Statistical procedures. SAS Institute.
Sheng, Y. (2010). A sensitivity analysis of Gibbs sampling for 3PNO IRT models: Effects of prior specifications on parameter estimates. Behaviormetrika, 37(2), 87–110.
Spiegelhalter, D., Thomas, A., Best, N., & Lunn, D. (2003). WinBUGS version 1.4 [Computer program]. Cambridge, UK: MRC Biostatistics Unit, Institute of Public Health.
Spiegelhalter, D. J., Thomas, A., Best, N. G., & Lunn, D. (2010). OpenBUGS Version 3.1.1 user manual. Retrieved from http://www.openbugs.net/Manuals/Manual.html
Stan Development Team. (2016). Stan modeling language users guide and reference manual (Version 2.12.0). Retrieved from http://mc-stan.org/documentation/ The MathWorks Inc. (2010). MATLAB. [Computer software]. Natick, MA: The MathWorks Inc.
Thissen, D. (1991). MULTILOG user’s guide: Multiple, categorical item analysis and test scoring using item response theory (Version 6.0) [Software manual]. Chicago, IL: Scientific Software.
van der Linden, W. J., & Hambleton, R. K. (Eds.). (2013). Handbook of modern item response theory. Springer Science & Business Media.
Wang, J., & Wang, X. (2020). Structural equation modeling: Applications using Mplus. Hoboken, NJ: John Wiley & Sons.
Wollack, J. A., Bolt, D. M., Cohen, A. S., & Lee, Y. S. (2002). Recovery of item parameters in the nominal response model: A comparison of marginal maximum likelihood estimation and Markov chain Monte Carlo estimation. Applied Psychological Measurement, 26(3), 339–352.
Zimowski, M. F., Muraki, E., Mislevy, R. J., & Bock, R. D. (2003). BILOG-MG [computer software]. Lincolnwood, IL: Scientific Software.

A Short Note on Obtaining Item Parameter Estimates of IRT Models with Bayesian Estimation in Mplus

Year 2020, Volume: 11 Issue: 3, 266 - 282, 27.09.2020

Sedat Şen , Allan Cohen , Seock-ho Kım

https://doi.org/10.21031/epod.693719

Cited By: 2

Abstract

Parameter estimation of Item Response Theory (IRT) models can be applied using both Bayesian and non-Bayesian methods. Although maximum likelihood estimation (MLE), a non-Bayesian method, has predominated since the 1970s, there is an increasing use of Bayesian methods, due to their capability for estimating complex models and for their implementation in commercially available software. In view of the recent increase in the popularity of these methods, a comparison between model parameter estimates from the two types of methods would be useful for practitioners. In this study, we compare MLE and Bayesian estimation, two popular methods for obtaining parameter estimates for dichotomous IRT models, using the MLE and Bayes estimator options as implemented in the Mplus software package. Results indicated Bayesian and MLE estimates differed only slightly, clearly demonstrating the consistency between estimates from the two methods. Further, Bayes estimator option in Mplus can be a viable and relatively easy to use tool for calibrations of IRT models.

Keywords

Item response theory, dichotomous models,, Bayesian estimation, Mplus

References

Albert, J. H. (1992). Bayesian estimation of normal ogive item response curves using Gibbs sampling. Journal of Educational Statistics, 17(3), 251–269.
Albert, J. H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422), 669–679.
Asparouhov, T., & Muthén, B. (2010a). Bayesian analysis of latent variable models using Mplus. (Technical Report). Version 4. Retrieved from http://www.statmodel.com/download/BayesAdvantages18.pdf
Asparouhov, T., & Muthén, B. (2010b). Bayesian analysis using Mplus, version 4. Technical report. Los Angeles: Muthén and Muthén. www.statmodel.com.
Asparouhov, T., & Muthén, B. (2016). IRT in Mplus (Technical report). Los Angeles, CA: Muthén & Muthén.
Baker, F. B. (1998). An investigation of the item parameter recovery characteristics of a Gibbs sampling procedure. Applied Psychological Measurement, 22(2), 153–169.
Barton, M.A., & Lord, F. M. (1981). An upper asymptote for the three-parameter logistic item-response model (ETS RR- 81-20). Princeton, NJ: Educational Testing Service.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee's ability. In F.M. Lord and M.R. Novick, Statistical theories of mental test scores (pp. 395-479). Reading, Massachusetts: Addison-Wesley.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459.
Bock, R. D., & Lieberman, M. (1970). Fitting a response model for n dichotomously scored items. Psychometrika, 35(2), 179–197.
Bolt, D. M., Cohen, A. S., & Wollack, J. A. (2001). A mixture item response model for multiple-choice data. Journal of Educational and Behavioral Statistics, 26(4), 381–409.
Culpepper, S. A. (2016). Revisiting the 4-parameter item response model: Bayesian estimation and application. Psychometrika, 81(4), 1142–1163.
Curtis, S. M. (2010). BUGS code for item response theory. Journal of Statistical Software, 36(1), 1–34.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Hillsdale, NJ: Erlbaum.
Forero, C. G., & Maydeu-Olivares, A. (2009). Estimation of IRT graded response models: Limited versus full information methods. Psychological Methods, 14(3), 275–299.
Fox, J. P. (2007). Multi-level IRT modeling in practice with the Package mlirt. Journal of Statistical Software, 20(5), 1–16.
Fox, J. P. (2010). Bayesian item response modeling: Theory and applications. New York, NY: Springer.
Gao, F. & Chen, L. (2005). Bayesian or non-Bayesian: A comparison study of item parameter estimation in the three-parameter logistic model. Applied Measurement in Education, 18(4), 351–380.
Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457–472.
Ghosh, M., Ghosh, A., Chen, M. H., & Agresti, A. (2000). Non-informative priors for one-parameter item response models. Journal of Statistical Planning and Inference, 88(1), 99–115.
Haley, D. C. (1952). Estimation of the dosage mortality relationship when the dose is subject to error. (Tech. Rep. No.15, Office of Naval Research Contract No. 25140, NR-342–022). Stanford, CA: Stanford University, Applied Mathematics and Statistics Laboratory.
Jackman, S., Tahk, A., Zeileis, A., Maimone, C., Fearon, J., Meers, Z., ... & Imports, M. A. S. S. (2017). Package ‘pscl’. See http://github. com/atahk/pscl.
Jara, A., Hanson, T., Quintana, F. A., Mueller, P., & Rosner, G. (2012). DPpackage: Bayesian nonparametric modeling in R. R package version, 1-1.
Kieftenbeld, V., & Natesan, P. (2012). Recovery of graded response model parameters: A comparison of marginal maximum likelihood and Markov chain Monte Carlo estimation. Applied Psychological Measurement, 36(5), 399–419.
Kim, S. H. (2003). An investigation of Bayes estimation procedures for the two-parameter logistic model. In H. Yanai, A. Okada, K. Shigemasu, Y. Kano, & J. J. Meulman (Eds.), New developments in psychometrics (pp. 389-396). Tokyo: Springer.
Kuo, T. C., & Sheng, Y. (2015). Bayesian estimation of a multi-unidimensional graded response IRT model. Behaviormetrika, 42(2), 79–94.
Lee, S. Y., & Song, X. Y. (2004). Evaluation of the Bayesian and maximum likelihood approaches in analyzing structural equation models with small sample sizes. Multivariate Behavioral Research, 39(4), 653–686.
Luo, Y. (2018). A short note on estimating the testlet model with different estimators in Mplus. Educational and Psychological Measurement, 78(3), 517–529.
Luo, Y., & Dimitrov, D. M. (2019). A short note on obtaining point estimates of the IRT ability parameter with MCMC estimation in Mplus: How many plausible values are needed?. Educational and Psychological Measurement, 79(2), 272–287.
Martin, A. D., Quinn, K. M., & Park, J. H. (2011). Mcmcpack: Markov chain Monte Carlo in R. Journal of Statistical Software, 42(9), 1–21.
Masters, G. N., & Wright, B. D. (1997). The partial credit model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 101–121). New York, NY: Springer.
Meuleman, B., & Billiet, J. (2009). A Monte Carlo sample size study: How many countries are needed for accurate multi-level SEM?. Survey Research Methods, 3(1), 45–58.
Muraki, E. (1997). A generalized partial credit model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 153–164). New York, NY: Springer.
Muraki, E., & Bock, R. D. (1996). PARSCALE: IRT based test scoring and item analysis for graded open-ended exercises and performance tasks (Version 3) [Computer software]. Chicago, IL: Scientific Software.
Muthén, B., & Asparouhov, T. (2013). Item response modeling in Mplus: A multi-dimensional, multi-level, and multi-timepoint example. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of item response theory: Models, statistical tools, and applications (Vol 1, pp. 527-539). Boca Raton, FL: Chapman & Hall/CRC Press.
Muthén, L. K., & Muthén, B. O. (1998-2019). Mplus (version 8.3)[computer software]. Los Angeles, CA: Muthén & Muthén.
Natesan, P., Nandakumar, R., Minka, T., & Rubright, J. D. (2016). Bayesian prior choice in IRT estimation using MCMC and variational Bayes. Frontiers in Psychology, 7, 1422.
Paek, I., Cui, M., Öztürk Gübeş, N., & Yang, Y. (2018). Estimation of an IRT model by Mplus for dichotomously scored responses under different estimation methods. Educational and Psychological Measurement, 78(4), 569–588.
Patz, R. J., & Junker, B. W. (1999a). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24(2), 146–178.
Patz, R. J., & Junker, B. W. (1999b). Applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses. Journal of Educational and Behavioral Statistics, 24(4), 342–366.
Plummer, M. (2003). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In Proceedings of the 3rd international workshop on distributed statistical computing (Vol. 124, p. 125). Technische Universita¨t Wien, Vienna, Austria.
Rupp, A. A., Dey, D. K., & Zumbo, B. D. (2004). To Bayes or not to Bayes, from whether to when: Applications of Bayesian methodology to modeling. Structural Equation Modeling, 11(3), 424–451.
SAS Institute. (2017). Base SAS 9.4 procedures guide: Statistical procedures. SAS Institute.
Sheng, Y. (2010). A sensitivity analysis of Gibbs sampling for 3PNO IRT models: Effects of prior specifications on parameter estimates. Behaviormetrika, 37(2), 87–110.
Spiegelhalter, D., Thomas, A., Best, N., & Lunn, D. (2003). WinBUGS version 1.4 [Computer program]. Cambridge, UK: MRC Biostatistics Unit, Institute of Public Health.
Spiegelhalter, D. J., Thomas, A., Best, N. G., & Lunn, D. (2010). OpenBUGS Version 3.1.1 user manual. Retrieved from http://www.openbugs.net/Manuals/Manual.html
Stan Development Team. (2016). Stan modeling language users guide and reference manual (Version 2.12.0). Retrieved from http://mc-stan.org/documentation/ The MathWorks Inc. (2010). MATLAB. [Computer software]. Natick, MA: The MathWorks Inc.
Thissen, D. (1991). MULTILOG user’s guide: Multiple, categorical item analysis and test scoring using item response theory (Version 6.0) [Software manual]. Chicago, IL: Scientific Software.
van der Linden, W. J., & Hambleton, R. K. (Eds.). (2013). Handbook of modern item response theory. Springer Science & Business Media.
Wang, J., & Wang, X. (2020). Structural equation modeling: Applications using Mplus. Hoboken, NJ: John Wiley & Sons.
Wollack, J. A., Bolt, D. M., Cohen, A. S., & Lee, Y. S. (2002). Recovery of item parameters in the nominal response model: A comparison of marginal maximum likelihood estimation and Markov chain Monte Carlo estimation. Applied Psychological Measurement, 26(3), 339–352.
Zimowski, M. F., Muraki, E., Mislevy, R. J., & Bock, R. D. (2003). BILOG-MG [computer software]. Lincolnwood, IL: Scientific Software.

There are 52 citations in total.

Details

Primary Language	English
Journal Section	Articles
Authors	Sedat Şen 0000-0001-6962-4960 Allan Cohen 0000-0002-8776-9378 Seock-ho Kım This is me 0000-0002-2353-7826
Publication Date	September 27, 2020
Acceptance Date	August 15, 2020
Published in Issue	Year 2020 Volume: 11 Issue: 3

Cite

APA	Şen, S., Cohen, A., & Kım, S.-h. (2020). A Short Note on Obtaining Item Parameter Estimates of IRT Models with Bayesian Estimation in Mplus. Journal of Measurement and Evaluation in Education and Psychology, 11(3), 266-282. https://doi.org/10.21031/epod.693719

Cited By

Comparing Differential Item Functioning Based On Multilevel Mixture Item Response Theory, Mixture Item Response Theory And Manifest Groups

Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi

https://doi.org/10.21031/epod.1457880

An Application of Multilevel Mixture Item Response Theory Model

Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi

Sedat ŞEN

https://doi.org/10.21031/epod.893149

Download Cover Image

Article Files

Full Text