TY  - JOUR
T1  - Explanatory Item Response Models for Polytomous Item Responses
TT  - Explanatory Item Response Models for Polytomous Item Responses
AU  - Bulut, Okan
AU  - Stanke, Luke
PY  - 2019
DA  - July
DO  - 10.21449/ijate.515085
JF  - International Journal of Assessment Tools in Education
JO  - Int. J. Assess. Tools Educ.
PB  - İzzet KARA
WT  - DergiPark
SN  - 2148-7456
SP  - 259
EP  - 278
VL  - 6
IS  - 2
LA  - en
AB  - Item response theory is a widely used framework for thedesign, scoring, and scaling of measurement instruments. Item response modelsare typically used for dichotomously scored questions that have only two scorepoints (e.g., multiple-choice items). However, given the increasing use ofinstruments that include questions with multiple response categories, such assurveys, questionnaires, and psychological scales, polytomous item responsemodels are becoming more utilized in education and psychology. This study aimsto demonstrate the application of explanatory item response models to polytomousitem responses in order to explain common variability in item clusters, persongroups, and interactions between item clusters and person groups. Explanatoryforms of several polytomous item response models – such as Partial Credit Modeland Rating Scale Model – are demonstrated and the estimation procedures ofthese models are explained. Findings of this study suggest that explanatoryitem response models can be more robust and parsimonious than traditional itemresponse models for polytomous data where items and persons share common characteristics.Explanatory polytomous item response models can provide more information aboutresponse patterns in item responses by estimating fewer item parameters.
KW  - Polytomous IRT
KW  - explanatory item response modeling
KW  - assessment
KW  - partial credit model
N2  - Item response theory is a widely used framework for the design, scoring, and scaling of measurement instruments. Item response models are typically used for dichotomously scored questions that have only two score points (e.g., multiple-choice items). However, given the increasing use of instruments that include questions with multiple response categories, such as surveys, questionnaires, and psychological scales, polytomous item response models are becoming more utilized in education and psychology. This study aims to demonstrate the application of explanatory item response models to polytomous item responses in order to explain common variability in item clusters, person groups, and interactions between item clusters and person groups. Explanatory forms of several polytomous item response models – such as Partial Credit Model and Rating Scale Model – are demonstrated and the estimation procedures of these models are explained. Findings of this study suggest that explanatory item response models can be more robust and parsimonious than traditional item response models for polytomous data where items and persons share common characteristics. Explanatory polytomous item response models can provide more information about response patterns in item responses by estimating fewer item parameters.
CR  - Albano, A. D. (2013). Multilevel modeling of item position effects. Journal of Educational Measurement, 50(4), 408–426. doi:10.1111/jedm.12026
CR  - Adams, R. J., Wu, M. L., &amp; Wilson, M. (2012). The Rasch rating model and the disordered threshold controversy. Educational and Psychological Measurement, 72(4), 547–573. doi: 10.1177/0013164411432166
CR  - American Educational Research Association, American Psychological Association, &amp; National Council on Measurement in Education, &amp; Joint Committee on Standards for Educational and Psychological Testing. (2014). Standards for educational and psychological testing. Washington, DC: AERA.
CR  - Andrich, D. (1978). Application of a psychometric rating model to ordered categories which are scored with successive integers. Applied Psychological Measurement, 2(4) 581–594. doi:10.1177/014662167800200413
CR  - Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723. doi:10.1109/TAC.1974.1100705
CR  - Bates, D., Maechler, M., Bokler, B., &amp; Walker, S. (2014). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. doi:10.18637/jss.v067.i01
CR  - Beretvas, S. N. (2008). Cross-classified random effects models. In A. A. O’Connell &amp; D. Betsy McCoach (Eds.), Multilevel modeling of educational data (pp. 161-197). Charlotte, SC: Information Age Publishing.
CR  - Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord &amp; M. R. Novick (Eds.), Statistical theories of mental test scores. Reading, MA: Addison–Wesley.
CR  - Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29–51. doi:10.1007/BF02291411
CR  - Bock, R. D., &amp; Aitkin, M. (1981) Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459. doi:10.1007/BF02293801
CR  - Bond, T., &amp; Fox, C. (2001). Applying the Rasch model: Fundamental measurement in the human sciences. Mahwah, NJ: Lawrence Erlbaum Associates.
CR  - Briggs, D. C. (2008). Using explanatory item response models to analyze group differences in science achievement. Applied Measurement in Education, 21(2), 89 - 118. http://dx.doi.org/10.1080/08957340801926086
CR  - Bulut, O. (2019). eirm: Explanatory item response modeling for dichotomous and polytomous item responses [Computer software]. Available from https://github.com/okanbulut/eirm.
CR  - Bulut, O., Palma, J., Rodriguez, M. C., &amp; Stanke, L. (2015). Evaluating measurement invariance in the measurement of developmental assets in Latino English language groups across developmental stages. Sage Open, 5(2), 1-18. doi:10.1177/2158244015586238
CR  - Cawthon, S., Kaye, A., Lockhart, L., &amp; Beretvas, S. N. (2012). Effects of linguistic complexity and accommodations on estimates of ability for students with learning disabilities. Journal of School Psychology, 50, 293–316. doi:10.1016/j.jsp.2012.01.002
CR  - Cohen, A. S., &amp; Bolt, D. M. (2005). A mixture model analysis of differential item functioning. Journal of Educational Measurement, 42(2), 133–148. doi:10.1111/j.1745-3984.2005.00007
CR  - De Ayala, R. J., Kim, S. H., Stapleton, L. M., &amp; Dayton, C. M. (2002). Differential item functioning: A mixture distribution conceptualization. International Journal of Testing, 2(3-4), 243–276. http://dx.doi.org/10.1080/15305058.2002.9669495
CR  - De Boeck, P. (2008). Random item IRT models. Psychometrika, 73, 533-559. doi:10.1007/s11336-008-9092-x
CR  - De Boeck, P., &amp; Partchev, I. (2012). IRTrees: Tree-based item response models of the GLMM family. Journal of Statistical Software, 48(1), 1–28.
CR  - De Boeck, P., &amp; Wilson, M. (2004). Explanatory item response models: a generalized linear and nonlinear approach. Statistics for Social Science and Public Policy. New York, NY. Springer.
CR  - Desjardins, C. D., &amp; Bulut, O. (2018). Handbook of educational measurement and psychometrics using R. Boca Raton, FL: CRC Press.
CR  - Embretson, S. E. (1983). Construct validity: Construct representation versus nomothetic span. Psychological Bulletin, 93(1). 179–197. http://dx.doi.org/10.1037/0033-2909.93.1.179
CR  - Embretson, S. E. (1994). Applications of cognitive design systems to test development. In C. R. Reynolds, Cognitive Assessment (pp. 107¬–135). Springer USA.
CR  - Embretson, S. E. (1998).  A cognitive design system approach to generating valid tests: Application to abstract reasoning. Psychological Methods, 3(3), 380–396. http://dx.doi.org/10.1037/1082-989X.3.3.380
CR  - Embretson, S. E. (2006). Cognitive models for the psychometric properties of GRE quantitative items. Final Report. Princeton, NJ: Educational Testing Service.
CR  - Embretson, S. E., &amp; Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum.
CR  - Embretson, S. E., &amp; Yang, X. (2007). Construct validity and cognitive diagnostic assessment. In J. P. Leighton &amp; M. J. Gierl (Eds.), Cognitive diagnostic assessment for education (pp. 119–145). New York, NY: Cambridge University Press.
CR  - Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37(6), 359–374.
CR  - French, B. F., &amp; Finch, W. H. (2010). Hierarchical logistic regression: Accounting for multilevel data in DIF detection. Journal of Educational Measurement, 47(3). 299–317. doi:10.1111/j.1745-3984.2010.00115.x
CR  - Ferster, A. E. (2013). An evaluation of item level cognitive supports via a random-effects extension of the linear logistic test model. Unpublished doctoral dissertation, University of Georgia.
CR  - Gelman, A., Carlin, J. B., Stern, H. S., &amp; Rubin, D. B. (2013). Bayesian data analysis. Boca Raton, FL: CRC Press.
CR  - Hartig, J., Frey, A., Nold, G., &amp; Klieme, E. (2012). An application of explanatory item response modeling for model-based proficiency scaling. Educational and Psychological Measurement, 72(4), 665–686. doi:10.1177/0013164411430707
CR  - Holling, H., Bertling, J. P., &amp; Zeuch, N. (2009). Automatic item generation of probability word problems. Studies in Educational Evaluation, 35, 71–76. doi:10.1016/j.stueduc.2009.10.004
CR  - Janssen, R. (2010). Modeling the effect of item designs within the Rasch model. In. S. E. Embretson (Ed.), Measuring psychological constructs: Advances in model-based approaches (pp. 227–245). Washington, DC, US: American Psychological Association.
CR  - Janssen, R., Schepers, J., &amp; Peres, D. (2004). Models with item and item group predictors. In P. De Boeck &amp; M. Wilson (Eds.), Explanatory item response models: A generalized linear and nonlinear approach (pp. 189–212). New York, NY: Springer-Verlag.
CR  - Jiao, H., &amp; Zhang, Y. (2014). Polytomous multilevel testlet models for testlet‐based assessments with complex sampling designs. British Journal of Mathematical and Statistical Psychology, 68(1), 65–83. doi:10.1111/bmsp.12035
CR  - Kan, A., &amp; Bulut, O. (2014). Examining the relationship between gender DIF and language complexity in mathematics assessments. International Journal of Testing, 14(3), 245–264. http://dx.doi.org/10.1080/15305058.2013.877911
CR  - Kuha, J. (2004). AIC and BIC: Comparisons of assumptions of performance. Sociological Methods and Research, 33, 188–229. doi:10.1177/0049124103262065
CR  - Kubinger, K. (2008). On the revival of the Rasch model-based LLTM: from constructing tests using item generating rules to measuring item administration effects. Psychological Science Quarterly, (3), 311–327.
CR  - Linacre, J. M. (2002). Optimizing rating scale category effectiveness. Journal of Applied Measurement, 5(1), 85–106.
CR  - Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
CR  - Lunn, D. J., Thomas, A., Best, N., &amp; Spiegelhalter, D. (2000). WinBUGS-a Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing, 10(4), 325–337. doi:10.1023/A:1008929526011
CR  - Luppescu, S. (2012, April). DIF detection in HLM item analysis. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA.
CR  - Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149–174. doi:10.1007/BF02296272
CR  - Natesan, P., Limbers, C., &amp; Varni, J. W. (2010). Bayesian estimation of graded response multilevel models using Gibbs sampling: formulation and illustration. Educational and Psychological Measurement, 70(3) 420–439. doi:10.1177/0013164409355696
CR  - Plieninger, H. &amp; Meiser, T. (2014). Validity of multi-process IRT models for separating content and response styles. Educational and Psychological Measurement, 74(5), 875–899. doi:10.1177/0013164413514998
CR  - Prowker, A., &amp; Camilli, G. (2007). Looking beyond the overall scores of NAEP assessments: Applications of generalized linear mixed modeling for exploring value‐added item difficulty effects. Journal of Educational Measurement, 44(1), 69–87. doi:10.1111/j.1745-3984.2007.00027.x
CR  - R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing: Vienna, Austria.
CR  - Rasch, G. (1960/1980). Probabilistic models for some intelligence and attainment tests (Copenhagen, Danish Institute for Educational Research), expanded edition (1980) with foreword and afterword by B. D. Wright. Chicago: The University of Chicago Press.
CR  - Reise, S. P., &amp; Yu, J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27(2), 133–144.
CR  - Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores (Psychometric Monograph No. 17). Richmond, VA: Psychometric Society. Retrieved from http://www.psychometrika.org/journal/online/MN17.pdf
CR  - Schwarz, G.E. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461–464. doi:10.1214/aos/1176344136
CR  - Scheiblechner, H. H. (2009). Rasch and pseudo-Rasch models: suitableness for practical test applications. Psychology Science Quarterly, 51, 181–194.
CR  - Thissen, D., Chen, W., &amp; Bock, D. (2003). MULTILOG 7 [Computer software]. Chicago, IL: Scientific Software International.
CR  - Tuerlinckx, F., &amp; Wang, W.-C. (2004). Models for polytomous data. In P. De Boeck &amp; M. Wilson (Eds.), Explanatory item response models: A generalized linear and nonlinear approach (pp. 75–109). New York: Springer-Verlag.
CR  - Tutz, G. (1990). Sequential item response models with an ordered response. British Journal of Mathematical and Statistical Psychology, 43(1), 39–55.
CR  - Tutz, G. (1991). Sequential models in categorical regression. Computational Statistics and Data Analysis, 11(3), 275–295. doi:10.1111/j.2044-8317.1990.tb00925.x
CR  - Vaughn, B. K. (2006). A hierarchical generalized linear model of random differential item functioning for polytomous items: A Bayesian multilevel approach. Electronic Theses, Treatises and Dissertations. Paper 4588.
CR  - Van den Noortgate, W., De Boeck, P., &amp; Meulders, M. (2003). Cross-classification multilevel logistic models in psychometrics. Journal of Educational and Behavioral Statistics, 28(4), 369–386. doi:10.3102/10769986028004369
CR  - Van den Noortgate, W., &amp; Paek, I. (2004). Person regression models. In P. De Boeck &amp; M. Wilson (Eds.), Explanatory item response models: A generalized linear and nonlinear approach (pp. 167–187). New York, NY: Springer-Verlag.
CR  - van der Linden, W. J. &amp; Hambleton, R. K. (1997). Item response theory: Brief history, common models, and extensions. In W. J. van der Linden &amp; R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 1–28). New York: Springer
CR  - Vansteelandt, K. (2000). Formal models for contextualized personality psychology. Unpublished doctoral dissertation, K.U. Leuven, Belgium.
CR  - Verhelst, N. D., &amp; Verstralen, H. H. F. M. (2008). Some considerations on the Partial Credit Model. Psicologica: International Journal of Methodology and Experimental Psychology, 29(2), 229–254.
CR  - Wang, W.-C., &amp; Liu, C.-Y. (2007). Formulation and application of the generalized multilevel facets model. Educational and Psychological Measurement, 67(4), 583 - 605. doi:10.1177/0013164406296974
CR  - Wang, W.-C., &amp; Wilson, M. (2005). Exploring local item dependence using a random-effects facet model. Applied Psychological Measurement, 29(4), 296 - 318. doi:10.1177/0146621605276281
CR  - Wang, W.-C., Wilson, M., &amp; Shih, C.-L. (2006). Modeling randomness in judging rating scales with a random-effects rating scale model. Journal of Educational Measurement, 43(4), 335–353. doi:10.1111/j.1745-3984.2006.00020.x
CR  - Wang, W.-C., &amp; Wu, S.-L. (2011). The random-effect generalized rating scale model. Journal of Educational Measurement, 48(4), 441-456. doi:10.1111/j.1745-3984.2011.00154.x
CR  - Williams, N. J., &amp; Beretvas, S. N. (2006). DIF identification using HGLM for polytomous items. Applied Psychological Measurement, 30, 22–42. doi:10.1177/0146621605279867
CR  - Wilson, M., De Boeck, P., &amp; Carstensen, C. H. (2008). Explanatory item response models: A brief introduction. In Hartig, J., Klieme, E., Leutner, D. (Eds.), Assessment of competencies in educational contexts: State of the art and future prospects (pp. 91-120). Göttingen, Germany: Hogrefe &amp; Huber.
CR  - Wilson, M., Zheng, X., &amp; McGuire, L. (2012). Formulating latent growth using an explanatory item response model approach. Journal of Applied Measurement, 13(1), 1–22.
CR  - Wright, B. D., &amp; Masters, G. N. (1982). Rating scale analysis. Chicago: Mesa Press.
CR  - Zwinderman, A. H. (1991). A generalized Rasch model for manifest predictors. Psychometrika, 56(4), 589–600.
UR  - https://doi.org/10.21449/ijate.515085
L1  - https://dergipark.org.tr/en/download/article-file/716984
ER  -