An Evaluation of 4PL IRT and DINA Models for Estimating Pseudo-Guessing and Slipping Parameters

Ömür Kaya Kalkan; İsmail Çuhadar

doi:10.21031/epod.660273

Research Article

Year 2020, Volume: 11 Issue: 2, 131 - 146, 13.06.2020

Ömür Kaya Kalkan , İsmail Çuhadar

https://doi.org/10.21031/epod.660273

Abstract

Supporting Institution

PAMUKKALE ÜNİVERSİTESİ

Project Number

ADEP-2018KRM002-063

References

Barton, M. A., & Lord, F. M. (1981). An upper asymptote for the three-parameter logistic item-response model (Research Report 18-21). Princeton, NJ: Educational Testing Service. doi: 10.1002/j.2333-8504.1981.tb01255.x
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1-29. doi: 10.18637/jss.v048.i06
Chiu, C. Y. (2008). Cluster analysis for cognitive diagnosis: Theory and applications (Doctoral dissertation). Retrieved from https://www.ideals.illinois.edu/handle/2142/80055
Conway, J. M., & Huffcutt, A. I. (2003). A review and evaluation of exploratory factor analysis practices in organizational research. Organizational Research Methods, 6(2), 147-168. doi: 10.1177/1094428103251541
Culpepper, S. A. (2016). Revisiting the 4-parameter item response model: Bayesian estimation and application. Psychometrika, 81(4), 1142-1163. doi: 10.1007/s11336-015-9477-6
de Ayala, R. J. (2009). The theory and practice of item response theory. New York, NY: The Guilford Press.
DeCarlo, L. T. (2011). On the analysis of fraction subtraction data: The DINA model, classification, latent class sizes, and the Q-matrix. Applied Psychological Measurement, 35(1), 8-26. doi: 10.1177/0146621610377081
de la Torre, J. (2008). An empirically based method of Q-matrix validation for the DINA model: Development and applications. Journal of Educational Measurement, 45(4), 343-362. doi: 10.1111/j.1745-3984.2008.00069.x
de la Torre, J. (2009). DINA model and parameter estimation: A didactic. Journal of Educational And Behavioral Statistics, 34(1), 115-130. doi: 10.3102/1076998607309474
de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179-199. doi: 10.1007/s11336-011-9207-7
de la Torre, J., & Douglas, J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrica, 69(3), 333-353. doi: 10.1007/BF02295640
de la Torre, J., & Douglas, J. A. (2008). Model evaluation and multiple strategies in cognitive diagnosis: An analysis of fraction subtraction data. Psychometrika, 73(4), 595-624. doi: 10.1007/s11336-008-9063-2
de la Torre, J., Hong, Y., & Deng, W. (2010). Factors affecting the item parameter estimation and classification accuracy of the DINA model. Journal of Educational Measurement, 47(2), 227-249. doi: 10.1111/j.1745-3984.2010.00110.x
de la Torre, J., & Lee, Y. S. (2010). A note on the invariance of the dina model parameters. Journal of Educational Measurement, 47(1), 115-127. doi: 10.1111/j.1745-3984.2009.00102.x
de la Torre, J., & Lee, Y. S. (2013). Evaluating the Wald test for item‐level comparison of saturated and reduced models in cognitive diagnosis. Journal of Educational Measurement, 50(4), 355-373. doi: 10.1111/jedm.12022
DeMars, C. E. (2007). “Guessing” parameter estimates for multidimensional item response theory models. Educational and Psychological Measurement, 67(3), 433-446. doi: 10.1177/0013164406294778
Doornik, J. A. (2018). An object-oriented matrix programming language Ox (Version 8.0) [Computer software]. London: Timberlake Consultants Press.
Finch, H. (2010). Item parameter estimation for the MIRT model: Bias and precision of confirmatory factor analysis-based models. Applied Psychological Measurement, 34(1), 10-26. doi: 10.1177/0146621609336112
Finch, H., Habing, B. T., & Huynh, H. (2003, April). Comparison of NOHARM and conditional covariance methods of dimensionality assessment. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.
Haertel, E. H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26(4), 301-321. doi: 10.1111/j.1745-3984.1989.tb00336.x
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston, MA: Kluwer.
Henson, R., & Douglas, J. (2005). Test construction for cognitive diagnosis. Applied Psychological Measurement, 29(4), 262-277. doi: 10.1177/0146621604272623
Henson, R. K., & Roberts, J. K. (2006). Use of exploratory factor analysis in published research: Common errors and some comment on improved practice. Educational and Psychological Measurement, 66(3), 393-416. doi: 10.1177/0013164405282485
Hoijtink, H., & Molenaar, I. W. (1997). A multidimensional item response model: Constrained latent class analysis using the Gibbs sampler and posterior predictive checks. Psychometrika, 62(2), 171-189. doi: 10.1007/BF02295273
Huebner, A., & Wang, C. (2011). A note on comparing examinee classification methods for cognitive diagnosis models. Educational and Psychological Measurement, 71(2), 407-419. doi: 10.1177/0013164410388832
Hulin, C. L., Lissak, R. I., & Drasgow, F. (1982). Recovery of two- and three-parameter logistic item characteristic curves: A monte carlo study. Applied Psychological Measurement, 6(3), 249-260. doi: 10.1177/014662168200600301
Jackson, D. L., Gillaspy, J. A., & Purc-Stephenson, R. (2009). Reporting practices in confirmatory factor analysis: an overview and some recommendations. Psychological Methods, 14(1), 6-23. doi: 10.1037/a0014694
Junker, B. W. (2001). On the interplay between nonparametric and parametric IRT, with some thoughts about the future. In A. Boomsma, M. A. J. Van Duijn, &T. A. B. Snijders (Eds.), Essays on item response theory (pp. 274-276). New York, NY: Springer-Verlag.
Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25(3), 258-272. doi: 10.1177/01466210122032064
Liao, W. W., Ho, R. G., Yen, Y. C., & Cheng, H. C. (2012). The four-parameter logistic item response theory model as a robust method of estimating ability despite aberrant responses. Social Behavior and Personality: An International Journal, 40(10), 1679-1694. doi: 10.2224/sbp.2012.40.10.1679
Loken, E., & Rulison, K. L. (2010). Estimation of a four-parameter item response theory model. British Journal of Mathematical and Statistical Psychology, 63(3), 509-525. doi: 10.1348/000711009X474502
Lorenzo-Seva, U., & Ferrando, P. J. (2006). FACTOR: A computer program to fit the exploratory factor analysis model. Behavior Research Methods, Instruments, & Computers, 38(1), 88-91. doi: 10.3758/BF03192753
Lord, F. M. (2012). Applications of item response theory to practical testing problems. New Jersey, NJ: Lawrence Erlbaum Associates.
Ma, W., & de la Torre, J. (2020). GDINA: The generalized DINA model framework: R package (Version 2.7.9). Retrieved from https://CRAN.R-project.org/package=GDINA
Magis, D. (2013). A note on the item information function of the four-parameter logistic model. Applied Psychological Measurement, 37(4), 304-315. doi: 10.1177/0146621613475471
Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64, 187-212. doi: 10.1007/BF02294535
Meng, X., Xu, G., Zhang, J., & Tao, J. (2019). Marginalized maximum a posteriori estimation for the four-parameter logistic model under a mixture modelling framework. British Journal of Mathematical and Statistical Psychology, Advanced online publication. doi: 10.1111/bmsp.12185
Muthén, L. K., & Muthén, B. O. (1998-2017). Mplus user’s guide (8th ed.). Los Angeles, CA: Muthén & Muthén.
R Core Team. (2017). R: A language and environment for statistical computing [Computer Software]. Vienna, Austria: R Foundation for Statistical Computing.
Robitzsch, A., Kiefer, T., George, A. C., & Uenlue, A. (2019). Package ‘CDM’. Retrieved from https://cran.r-project.org/web/packages/CDM/CDM.pdf
Rowley, G. L., & Traub, R. E. (1977). Formula scoring, number-right scoring, and test-taking strategy. Journal of Educational Measurement, 14(1), 15-22. doi: 10.1111/j.1745-3984.1977.tb00024.x
Rulison, K. L., & Loken, E. (2009). I’ve fallen and i can’t get up: Can high-ability students recover from early mistakes in CAT? Applied Psychological Measurement, 33(2), 83-101. doi: 10.1177/0146621608324023
Svetina, D., Valdivia, A., Underhill, S., Dai, S., & Wang, X. (2017). Parameter recovery in multidimensional item response theory models under complexity and nonnormality. Applied Psychological Measurement, 41(7), 530-544. doi: 10.1177/0146621617707507
Tatsuoka, K. K. (1983). Rule space: An approach for dealing with misconceptions based on item response theory. Journal of educational measurement, 20(4), 345-354. doi: 10.1111/j.1745-3984.1983.tb00212.x
Vermunt, J. K., & Magidson, J. (2016). Upgrade manual for latent GOLD 5.1. Belmont, MA: Statistical Innovations Inc.
Waller, N. G., & Feuerstahler, L. (2017). Bayesian modal estimation of the four-parameter item response model in real, realistic, and idealized data sets. Multivariate behavioral research, 52(3), 350-370. doi: 10.1080/00273171.2017.1292893
Yakar, L. (2017). Bilişsel tanı ve çok boyutlu madde tepki kuramı modellerinin karşılıklı uyumlarının incelenmesi (Doctoral thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
Yen, Y. C., Ho, R. G., Laio, W. W., Chen, L. J., & Kuo, C. C. (2012). An empirical evaluation of the slip correction in the four parameter logistic models with computerized adaptive testing. Applied Psychological Measurement, 36(2), 75-87. doi: 10.1177/0146621611432862

An Evaluation of 4PL IRT and DINA Models for Estimating Pseudo-Guessing and Slipping Parameters

Year 2020, Volume: 11 Issue: 2, 131 - 146, 13.06.2020

Ömür Kaya Kalkan , İsmail Çuhadar

https://doi.org/10.21031/epod.660273

Abstract

In an achievement test, the examinees with the required knowledge and skill on a test item are expected to answer the item correctly while the examinees with a lack of necessary information on the item are expected to give an incorrect answer. However, an examinee can give a correct answer to the multiple-choice test items through guessing or sometimes give an incorrect response to an easy item due to anxiety or carelessness. Either case may cause a bias estimation of examinee abilities and item parameters. 4PL IRT model and the DINA model can be used to mitigate these negative impacts on the parameter estimations. The current simulation study aims to compare the estimated pseudo-guessing and slipping parameters from the 4PL IRT model and the DINA model under several study conditions. The DINA model was used to simulate the datasets in the study. The study results showed that the bias of the estimated slipping and guessing parameters from both 4PL IRT and DINA models were reasonably small in general although the estimated slipping and guessing parameters were more biased when datasets were analyzed through the 4PL IRT model rather than the DINA model (i.e., the average bias for both guessing and slipping parameters = .00 from DINA model, but .08 from 4PL IRT model). Accordingly, both 4PL IRT and DINA models can be considered for analyzing the datasets contaminated with guessing and slipping effects.

Keywords

4PL IRT model, DINA model, (pseudo) guessing effect, slipping effect, lower-upper asymptote parameter

Project Number

ADEP-2018KRM002-063

References

Barton, M. A., & Lord, F. M. (1981). An upper asymptote for the three-parameter logistic item-response model (Research Report 18-21). Princeton, NJ: Educational Testing Service. doi: 10.1002/j.2333-8504.1981.tb01255.x
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1-29. doi: 10.18637/jss.v048.i06
Chiu, C. Y. (2008). Cluster analysis for cognitive diagnosis: Theory and applications (Doctoral dissertation). Retrieved from https://www.ideals.illinois.edu/handle/2142/80055
Conway, J. M., & Huffcutt, A. I. (2003). A review and evaluation of exploratory factor analysis practices in organizational research. Organizational Research Methods, 6(2), 147-168. doi: 10.1177/1094428103251541
Culpepper, S. A. (2016). Revisiting the 4-parameter item response model: Bayesian estimation and application. Psychometrika, 81(4), 1142-1163. doi: 10.1007/s11336-015-9477-6
de Ayala, R. J. (2009). The theory and practice of item response theory. New York, NY: The Guilford Press.
DeCarlo, L. T. (2011). On the analysis of fraction subtraction data: The DINA model, classification, latent class sizes, and the Q-matrix. Applied Psychological Measurement, 35(1), 8-26. doi: 10.1177/0146621610377081
de la Torre, J. (2008). An empirically based method of Q-matrix validation for the DINA model: Development and applications. Journal of Educational Measurement, 45(4), 343-362. doi: 10.1111/j.1745-3984.2008.00069.x
de la Torre, J. (2009). DINA model and parameter estimation: A didactic. Journal of Educational And Behavioral Statistics, 34(1), 115-130. doi: 10.3102/1076998607309474
de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76(2), 179-199. doi: 10.1007/s11336-011-9207-7
de la Torre, J., & Douglas, J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrica, 69(3), 333-353. doi: 10.1007/BF02295640
de la Torre, J., & Douglas, J. A. (2008). Model evaluation and multiple strategies in cognitive diagnosis: An analysis of fraction subtraction data. Psychometrika, 73(4), 595-624. doi: 10.1007/s11336-008-9063-2
de la Torre, J., Hong, Y., & Deng, W. (2010). Factors affecting the item parameter estimation and classification accuracy of the DINA model. Journal of Educational Measurement, 47(2), 227-249. doi: 10.1111/j.1745-3984.2010.00110.x
de la Torre, J., & Lee, Y. S. (2010). A note on the invariance of the dina model parameters. Journal of Educational Measurement, 47(1), 115-127. doi: 10.1111/j.1745-3984.2009.00102.x
de la Torre, J., & Lee, Y. S. (2013). Evaluating the Wald test for item‐level comparison of saturated and reduced models in cognitive diagnosis. Journal of Educational Measurement, 50(4), 355-373. doi: 10.1111/jedm.12022
DeMars, C. E. (2007). “Guessing” parameter estimates for multidimensional item response theory models. Educational and Psychological Measurement, 67(3), 433-446. doi: 10.1177/0013164406294778
Doornik, J. A. (2018). An object-oriented matrix programming language Ox (Version 8.0) [Computer software]. London: Timberlake Consultants Press.
Finch, H. (2010). Item parameter estimation for the MIRT model: Bias and precision of confirmatory factor analysis-based models. Applied Psychological Measurement, 34(1), 10-26. doi: 10.1177/0146621609336112
Finch, H., Habing, B. T., & Huynh, H. (2003, April). Comparison of NOHARM and conditional covariance methods of dimensionality assessment. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.
Haertel, E. H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26(4), 301-321. doi: 10.1111/j.1745-3984.1989.tb00336.x
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston, MA: Kluwer.
Henson, R., & Douglas, J. (2005). Test construction for cognitive diagnosis. Applied Psychological Measurement, 29(4), 262-277. doi: 10.1177/0146621604272623
Henson, R. K., & Roberts, J. K. (2006). Use of exploratory factor analysis in published research: Common errors and some comment on improved practice. Educational and Psychological Measurement, 66(3), 393-416. doi: 10.1177/0013164405282485
Hoijtink, H., & Molenaar, I. W. (1997). A multidimensional item response model: Constrained latent class analysis using the Gibbs sampler and posterior predictive checks. Psychometrika, 62(2), 171-189. doi: 10.1007/BF02295273
Huebner, A., & Wang, C. (2011). A note on comparing examinee classification methods for cognitive diagnosis models. Educational and Psychological Measurement, 71(2), 407-419. doi: 10.1177/0013164410388832
Hulin, C. L., Lissak, R. I., & Drasgow, F. (1982). Recovery of two- and three-parameter logistic item characteristic curves: A monte carlo study. Applied Psychological Measurement, 6(3), 249-260. doi: 10.1177/014662168200600301
Jackson, D. L., Gillaspy, J. A., & Purc-Stephenson, R. (2009). Reporting practices in confirmatory factor analysis: an overview and some recommendations. Psychological Methods, 14(1), 6-23. doi: 10.1037/a0014694
Junker, B. W. (2001). On the interplay between nonparametric and parametric IRT, with some thoughts about the future. In A. Boomsma, M. A. J. Van Duijn, &T. A. B. Snijders (Eds.), Essays on item response theory (pp. 274-276). New York, NY: Springer-Verlag.
Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25(3), 258-272. doi: 10.1177/01466210122032064
Liao, W. W., Ho, R. G., Yen, Y. C., & Cheng, H. C. (2012). The four-parameter logistic item response theory model as a robust method of estimating ability despite aberrant responses. Social Behavior and Personality: An International Journal, 40(10), 1679-1694. doi: 10.2224/sbp.2012.40.10.1679
Loken, E., & Rulison, K. L. (2010). Estimation of a four-parameter item response theory model. British Journal of Mathematical and Statistical Psychology, 63(3), 509-525. doi: 10.1348/000711009X474502
Lorenzo-Seva, U., & Ferrando, P. J. (2006). FACTOR: A computer program to fit the exploratory factor analysis model. Behavior Research Methods, Instruments, & Computers, 38(1), 88-91. doi: 10.3758/BF03192753
Lord, F. M. (2012). Applications of item response theory to practical testing problems. New Jersey, NJ: Lawrence Erlbaum Associates.
Ma, W., & de la Torre, J. (2020). GDINA: The generalized DINA model framework: R package (Version 2.7.9). Retrieved from https://CRAN.R-project.org/package=GDINA
Magis, D. (2013). A note on the item information function of the four-parameter logistic model. Applied Psychological Measurement, 37(4), 304-315. doi: 10.1177/0146621613475471
Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64, 187-212. doi: 10.1007/BF02294535
Meng, X., Xu, G., Zhang, J., & Tao, J. (2019). Marginalized maximum a posteriori estimation for the four-parameter logistic model under a mixture modelling framework. British Journal of Mathematical and Statistical Psychology, Advanced online publication. doi: 10.1111/bmsp.12185
Muthén, L. K., & Muthén, B. O. (1998-2017). Mplus user’s guide (8th ed.). Los Angeles, CA: Muthén & Muthén.
R Core Team. (2017). R: A language and environment for statistical computing [Computer Software]. Vienna, Austria: R Foundation for Statistical Computing.
Robitzsch, A., Kiefer, T., George, A. C., & Uenlue, A. (2019). Package ‘CDM’. Retrieved from https://cran.r-project.org/web/packages/CDM/CDM.pdf
Rowley, G. L., & Traub, R. E. (1977). Formula scoring, number-right scoring, and test-taking strategy. Journal of Educational Measurement, 14(1), 15-22. doi: 10.1111/j.1745-3984.1977.tb00024.x
Rulison, K. L., & Loken, E. (2009). I’ve fallen and i can’t get up: Can high-ability students recover from early mistakes in CAT? Applied Psychological Measurement, 33(2), 83-101. doi: 10.1177/0146621608324023
Svetina, D., Valdivia, A., Underhill, S., Dai, S., & Wang, X. (2017). Parameter recovery in multidimensional item response theory models under complexity and nonnormality. Applied Psychological Measurement, 41(7), 530-544. doi: 10.1177/0146621617707507
Tatsuoka, K. K. (1983). Rule space: An approach for dealing with misconceptions based on item response theory. Journal of educational measurement, 20(4), 345-354. doi: 10.1111/j.1745-3984.1983.tb00212.x
Vermunt, J. K., & Magidson, J. (2016). Upgrade manual for latent GOLD 5.1. Belmont, MA: Statistical Innovations Inc.
Waller, N. G., & Feuerstahler, L. (2017). Bayesian modal estimation of the four-parameter item response model in real, realistic, and idealized data sets. Multivariate behavioral research, 52(3), 350-370. doi: 10.1080/00273171.2017.1292893
Yakar, L. (2017). Bilişsel tanı ve çok boyutlu madde tepki kuramı modellerinin karşılıklı uyumlarının incelenmesi (Doctoral thesis). Retrieved from https://tez.yok.gov.tr/UlusalTezMerkezi/
Yen, Y. C., Ho, R. G., Laio, W. W., Chen, L. J., & Kuo, C. C. (2012). An empirical evaluation of the slip correction in the four parameter logistic models with computerized adaptive testing. Applied Psychological Measurement, 36(2), 75-87. doi: 10.1177/0146621611432862

There are 48 citations in total.

Details

Primary Language	English
Journal Section	Articles
Authors	Ömür Kaya Kalkan 0000-0001-7088-4268 İsmail Çuhadar 0000-0002-5262-5892
Project Number	ADEP-2018KRM002-063
Publication Date	June 13, 2020
Acceptance Date	April 2, 2020
Published in Issue	Year 2020 Volume: 11 Issue: 2

Cite

APA	Kalkan, Ö. K., & Çuhadar, İ. (2020). An Evaluation of 4PL IRT and DINA Models for Estimating Pseudo-Guessing and Slipping Parameters. Journal of Measurement and Evaluation in Education and Psychology, 11(2), 131-146. https://doi.org/10.21031/epod.660273

Download Cover Image

Article Files

Full Text