Research Article
BibTex RIS Cite

Explanatory Item Response Models for Polytomous Item Responses

Year 2019, , 259 - 278, 15.07.2019
https://doi.org/10.21449/ijate.515085

Abstract

Item response theory is a widely used framework for the design, scoring, and scaling of measurement instruments. Item response models are typically used for dichotomously scored questions that have only two score points (e.g., multiple-choice items). However, given the increasing use of instruments that include questions with multiple response categories, such as surveys, questionnaires, and psychological scales, polytomous item response models are becoming more utilized in education and psychology. This study aims to demonstrate the application of explanatory item response models to polytomous item responses in order to explain common variability in item clusters, person groups, and interactions between item clusters and person groups. Explanatory forms of several polytomous item response models – such as Partial Credit Model and Rating Scale Model – are demonstrated and the estimation procedures of these models are explained. Findings of this study suggest that explanatory item response models can be more robust and parsimonious than traditional item response models for polytomous data where items and persons share common characteristics. Explanatory polytomous item response models can provide more information about response patterns in item responses by estimating fewer item parameters.

References

  • Albano, A. D. (2013). Multilevel modeling of item position effects. Journal of Educational Measurement, 50(4), 408–426. doi:10.1111/jedm.12026
  • Adams, R. J., Wu, M. L., & Wilson, M. (2012). The Rasch rating model and the disordered threshold controversy. Educational and Psychological Measurement, 72(4), 547–573. doi: 10.1177/0013164411432166
  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, & Joint Committee on Standards for Educational and Psychological Testing. (2014). Standards for educational and psychological testing. Washington, DC: AERA.
  • Andrich, D. (1978). Application of a psychometric rating model to ordered categories which are scored with successive integers. Applied Psychological Measurement, 2(4) 581–594. doi:10.1177/014662167800200413
  • Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723. doi:10.1109/TAC.1974.1100705
  • Bates, D., Maechler, M., Bokler, B., & Walker, S. (2014). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. doi:10.18637/jss.v067.i01
  • Beretvas, S. N. (2008). Cross-classified random effects models. In A. A. O’Connell & D. Betsy McCoach (Eds.), Multilevel modeling of educational data (pp. 161-197). Charlotte, SC: Information Age Publishing.
  • Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores. Reading, MA: Addison–Wesley.
  • Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29–51. doi:10.1007/BF02291411
  • Bock, R. D., & Aitkin, M. (1981) Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459. doi:10.1007/BF02293801
  • Bond, T., & Fox, C. (2001). Applying the Rasch model: Fundamental measurement in the human sciences. Mahwah, NJ: Lawrence Erlbaum Associates.
  • Briggs, D. C. (2008). Using explanatory item response models to analyze group differences in science achievement. Applied Measurement in Education, 21(2), 89 - 118. http://dx.doi.org/10.1080/08957340801926086
  • Bulut, O. (2019). eirm: Explanatory item response modeling for dichotomous and polytomous item responses [Computer software]. Available from https://github.com/okanbulut/eirm.
  • Bulut, O., Palma, J., Rodriguez, M. C., & Stanke, L. (2015). Evaluating measurement invariance in the measurement of developmental assets in Latino English language groups across developmental stages. Sage Open, 5(2), 1-18. doi:10.1177/2158244015586238
  • Cawthon, S., Kaye, A., Lockhart, L., & Beretvas, S. N. (2012). Effects of linguistic complexity and accommodations on estimates of ability for students with learning disabilities. Journal of School Psychology, 50, 293–316. doi:10.1016/j.jsp.2012.01.002
  • Cohen, A. S., & Bolt, D. M. (2005). A mixture model analysis of differential item functioning. Journal of Educational Measurement, 42(2), 133–148. doi:10.1111/j.1745-3984.2005.00007
  • De Ayala, R. J., Kim, S. H., Stapleton, L. M., & Dayton, C. M. (2002). Differential item functioning: A mixture distribution conceptualization. International Journal of Testing, 2(3-4), 243–276. http://dx.doi.org/10.1080/15305058.2002.9669495
  • De Boeck, P. (2008). Random item IRT models. Psychometrika, 73, 533-559. doi:10.1007/s11336-008-9092-x
  • De Boeck, P., & Partchev, I. (2012). IRTrees: Tree-based item response models of the GLMM family. Journal of Statistical Software, 48(1), 1–28.
  • De Boeck, P., & Wilson, M. (2004). Explanatory item response models: a generalized linear and nonlinear approach. Statistics for Social Science and Public Policy. New York, NY. Springer.
  • Desjardins, C. D., & Bulut, O. (2018). Handbook of educational measurement and psychometrics using R. Boca Raton, FL: CRC Press.
  • Embretson, S. E. (1983). Construct validity: Construct representation versus nomothetic span. Psychological Bulletin, 93(1). 179–197. http://dx.doi.org/10.1037/0033-2909.93.1.179
  • Embretson, S. E. (1994). Applications of cognitive design systems to test development. In C. R. Reynolds, Cognitive Assessment (pp. 107¬–135). Springer USA.
  • Embretson, S. E. (1998). A cognitive design system approach to generating valid tests: Application to abstract reasoning. Psychological Methods, 3(3), 380–396. http://dx.doi.org/10.1037/1082-989X.3.3.380
  • Embretson, S. E. (2006). Cognitive models for the psychometric properties of GRE quantitative items. Final Report. Princeton, NJ: Educational Testing Service.
  • Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum.
  • Embretson, S. E., & Yang, X. (2007). Construct validity and cognitive diagnostic assessment. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education (pp. 119–145). New York, NY: Cambridge University Press.
  • Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37(6), 359–374.
  • French, B. F., & Finch, W. H. (2010). Hierarchical logistic regression: Accounting for multilevel data in DIF detection. Journal of Educational Measurement, 47(3). 299–317. doi:10.1111/j.1745-3984.2010.00115.x
  • Ferster, A. E. (2013). An evaluation of item level cognitive supports via a random-effects extension of the linear logistic test model. Unpublished doctoral dissertation, University of Georgia.
  • Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2013). Bayesian data analysis. Boca Raton, FL: CRC Press.
  • Hartig, J., Frey, A., Nold, G., & Klieme, E. (2012). An application of explanatory item response modeling for model-based proficiency scaling. Educational and Psychological Measurement, 72(4), 665–686. doi:10.1177/0013164411430707
  • Holling, H., Bertling, J. P., & Zeuch, N. (2009). Automatic item generation of probability word problems. Studies in Educational Evaluation, 35, 71–76. doi:10.1016/j.stueduc.2009.10.004
  • Janssen, R. (2010). Modeling the effect of item designs within the Rasch model. In. S. E. Embretson (Ed.), Measuring psychological constructs: Advances in model-based approaches (pp. 227–245). Washington, DC, US: American Psychological Association.
  • Janssen, R., Schepers, J., & Peres, D. (2004). Models with item and item group predictors. In P. De Boeck & M. Wilson (Eds.), Explanatory item response models: A generalized linear and nonlinear approach (pp. 189–212). New York, NY: Springer-Verlag.
  • Jiao, H., & Zhang, Y. (2014). Polytomous multilevel testlet models for testlet‐based assessments with complex sampling designs. British Journal of Mathematical and Statistical Psychology, 68(1), 65–83. doi:10.1111/bmsp.12035
  • Kan, A., & Bulut, O. (2014). Examining the relationship between gender DIF and language complexity in mathematics assessments. International Journal of Testing, 14(3), 245–264. http://dx.doi.org/10.1080/15305058.2013.877911
  • Kuha, J. (2004). AIC and BIC: Comparisons of assumptions of performance. Sociological Methods and Research, 33, 188–229. doi:10.1177/0049124103262065
  • Kubinger, K. (2008). On the revival of the Rasch model-based LLTM: from constructing tests using item generating rules to measuring item administration effects. Psychological Science Quarterly, (3), 311–327.
  • Linacre, J. M. (2002). Optimizing rating scale category effectiveness. Journal of Applied Measurement, 5(1), 85–106.
  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
  • Lunn, D. J., Thomas, A., Best, N., & Spiegelhalter, D. (2000). WinBUGS-a Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing, 10(4), 325–337. doi:10.1023/A:1008929526011
  • Luppescu, S. (2012, April). DIF detection in HLM item analysis. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA.
  • Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149–174. doi:10.1007/BF02296272
  • Natesan, P., Limbers, C., & Varni, J. W. (2010). Bayesian estimation of graded response multilevel models using Gibbs sampling: formulation and illustration. Educational and Psychological Measurement, 70(3) 420–439. doi:10.1177/0013164409355696
  • Plieninger, H. & Meiser, T. (2014). Validity of multi-process IRT models for separating content and response styles. Educational and Psychological Measurement, 74(5), 875–899. doi:10.1177/0013164413514998
  • Prowker, A., & Camilli, G. (2007). Looking beyond the overall scores of NAEP assessments: Applications of generalized linear mixed modeling for exploring value‐added item difficulty effects. Journal of Educational Measurement, 44(1), 69–87. doi:10.1111/j.1745-3984.2007.00027.x
  • R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing: Vienna, Austria.
  • Rasch, G. (1960/1980). Probabilistic models for some intelligence and attainment tests (Copenhagen, Danish Institute for Educational Research), expanded edition (1980) with foreword and afterword by B. D. Wright. Chicago: The University of Chicago Press.
  • Reise, S. P., & Yu, J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27(2), 133–144.
  • Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores (Psychometric Monograph No. 17). Richmond, VA: Psychometric Society. Retrieved from http://www.psychometrika.org/journal/online/MN17.pdf
  • Schwarz, G.E. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461–464. doi:10.1214/aos/1176344136
  • Scheiblechner, H. H. (2009). Rasch and pseudo-Rasch models: suitableness for practical test applications. Psychology Science Quarterly, 51, 181–194.
  • Thissen, D., Chen, W., & Bock, D. (2003). MULTILOG 7 [Computer software]. Chicago, IL: Scientific Software International.
  • Tuerlinckx, F., & Wang, W.-C. (2004). Models for polytomous data. In P. De Boeck & M. Wilson (Eds.), Explanatory item response models: A generalized linear and nonlinear approach (pp. 75–109). New York: Springer-Verlag.
  • Tutz, G. (1990). Sequential item response models with an ordered response. British Journal of Mathematical and Statistical Psychology, 43(1), 39–55.
  • Tutz, G. (1991). Sequential models in categorical regression. Computational Statistics and Data Analysis, 11(3), 275–295. doi:10.1111/j.2044-8317.1990.tb00925.x
  • Vaughn, B. K. (2006). A hierarchical generalized linear model of random differential item functioning for polytomous items: A Bayesian multilevel approach. Electronic Theses, Treatises and Dissertations. Paper 4588.
  • Van den Noortgate, W., De Boeck, P., & Meulders, M. (2003). Cross-classification multilevel logistic models in psychometrics. Journal of Educational and Behavioral Statistics, 28(4), 369–386. doi:10.3102/10769986028004369
  • Van den Noortgate, W., & Paek, I. (2004). Person regression models. In P. De Boeck & M. Wilson (Eds.), Explanatory item response models: A generalized linear and nonlinear approach (pp. 167–187). New York, NY: Springer-Verlag.
  • van der Linden, W. J. & Hambleton, R. K. (1997). Item response theory: Brief history, common models, and extensions. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 1–28). New York: Springer
  • Vansteelandt, K. (2000). Formal models for contextualized personality psychology. Unpublished doctoral dissertation, K.U. Leuven, Belgium.
  • Verhelst, N. D., & Verstralen, H. H. F. M. (2008). Some considerations on the Partial Credit Model. Psicologica: International Journal of Methodology and Experimental Psychology, 29(2), 229–254.
  • Wang, W.-C., & Liu, C.-Y. (2007). Formulation and application of the generalized multilevel facets model. Educational and Psychological Measurement, 67(4), 583 - 605. doi:10.1177/0013164406296974
  • Wang, W.-C., & Wilson, M. (2005). Exploring local item dependence using a random-effects facet model. Applied Psychological Measurement, 29(4), 296 - 318. doi:10.1177/0146621605276281
  • Wang, W.-C., Wilson, M., & Shih, C.-L. (2006). Modeling randomness in judging rating scales with a random-effects rating scale model. Journal of Educational Measurement, 43(4), 335–353. doi:10.1111/j.1745-3984.2006.00020.x
  • Wang, W.-C., & Wu, S.-L. (2011). The random-effect generalized rating scale model. Journal of Educational Measurement, 48(4), 441-456. doi:10.1111/j.1745-3984.2011.00154.x
  • Williams, N. J., & Beretvas, S. N. (2006). DIF identification using HGLM for polytomous items. Applied Psychological Measurement, 30, 22–42. doi:10.1177/0146621605279867
  • Wilson, M., De Boeck, P., & Carstensen, C. H. (2008). Explanatory item response models: A brief introduction. In Hartig, J., Klieme, E., Leutner, D. (Eds.), Assessment of competencies in educational contexts: State of the art and future prospects (pp. 91-120). Göttingen, Germany: Hogrefe & Huber.
  • Wilson, M., Zheng, X., & McGuire, L. (2012). Formulating latent growth using an explanatory item response model approach. Journal of Applied Measurement, 13(1), 1–22.
  • Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago: Mesa Press.
  • Zwinderman, A. H. (1991). A generalized Rasch model for manifest predictors. Psychometrika, 56(4), 589–600.

Explanatory Item Response Models for Polytomous Item Responses

Year 2019, , 259 - 278, 15.07.2019
https://doi.org/10.21449/ijate.515085

Abstract

Item response theory is a widely used framework for the
design, scoring, and scaling of measurement instruments. Item response models
are typically used for dichotomously scored questions that have only two score
points (e.g., multiple-choice items). However, given the increasing use of
instruments that include questions with multiple response categories, such as
surveys, questionnaires, and psychological scales, polytomous item response
models are becoming more utilized in education and psychology. This study aims
to demonstrate the application of explanatory item response models to polytomous
item responses in order to explain common variability in item clusters, person
groups, and interactions between item clusters and person groups. Explanatory
forms of several polytomous item response models – such as Partial Credit Model
and Rating Scale Model – are demonstrated and the estimation procedures of
these models are explained. Findings of this study suggest that explanatory
item response models can be more robust and parsimonious than traditional item
response models for polytomous data where items and persons share common characteristics.
Explanatory polytomous item response models can provide more information about
response patterns in item responses by estimating fewer item parameters.

References

  • Albano, A. D. (2013). Multilevel modeling of item position effects. Journal of Educational Measurement, 50(4), 408–426. doi:10.1111/jedm.12026
  • Adams, R. J., Wu, M. L., & Wilson, M. (2012). The Rasch rating model and the disordered threshold controversy. Educational and Psychological Measurement, 72(4), 547–573. doi: 10.1177/0013164411432166
  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, & Joint Committee on Standards for Educational and Psychological Testing. (2014). Standards for educational and psychological testing. Washington, DC: AERA.
  • Andrich, D. (1978). Application of a psychometric rating model to ordered categories which are scored with successive integers. Applied Psychological Measurement, 2(4) 581–594. doi:10.1177/014662167800200413
  • Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723. doi:10.1109/TAC.1974.1100705
  • Bates, D., Maechler, M., Bokler, B., & Walker, S. (2014). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. doi:10.18637/jss.v067.i01
  • Beretvas, S. N. (2008). Cross-classified random effects models. In A. A. O’Connell & D. Betsy McCoach (Eds.), Multilevel modeling of educational data (pp. 161-197). Charlotte, SC: Information Age Publishing.
  • Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores. Reading, MA: Addison–Wesley.
  • Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29–51. doi:10.1007/BF02291411
  • Bock, R. D., & Aitkin, M. (1981) Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459. doi:10.1007/BF02293801
  • Bond, T., & Fox, C. (2001). Applying the Rasch model: Fundamental measurement in the human sciences. Mahwah, NJ: Lawrence Erlbaum Associates.
  • Briggs, D. C. (2008). Using explanatory item response models to analyze group differences in science achievement. Applied Measurement in Education, 21(2), 89 - 118. http://dx.doi.org/10.1080/08957340801926086
  • Bulut, O. (2019). eirm: Explanatory item response modeling for dichotomous and polytomous item responses [Computer software]. Available from https://github.com/okanbulut/eirm.
  • Bulut, O., Palma, J., Rodriguez, M. C., & Stanke, L. (2015). Evaluating measurement invariance in the measurement of developmental assets in Latino English language groups across developmental stages. Sage Open, 5(2), 1-18. doi:10.1177/2158244015586238
  • Cawthon, S., Kaye, A., Lockhart, L., & Beretvas, S. N. (2012). Effects of linguistic complexity and accommodations on estimates of ability for students with learning disabilities. Journal of School Psychology, 50, 293–316. doi:10.1016/j.jsp.2012.01.002
  • Cohen, A. S., & Bolt, D. M. (2005). A mixture model analysis of differential item functioning. Journal of Educational Measurement, 42(2), 133–148. doi:10.1111/j.1745-3984.2005.00007
  • De Ayala, R. J., Kim, S. H., Stapleton, L. M., & Dayton, C. M. (2002). Differential item functioning: A mixture distribution conceptualization. International Journal of Testing, 2(3-4), 243–276. http://dx.doi.org/10.1080/15305058.2002.9669495
  • De Boeck, P. (2008). Random item IRT models. Psychometrika, 73, 533-559. doi:10.1007/s11336-008-9092-x
  • De Boeck, P., & Partchev, I. (2012). IRTrees: Tree-based item response models of the GLMM family. Journal of Statistical Software, 48(1), 1–28.
  • De Boeck, P., & Wilson, M. (2004). Explanatory item response models: a generalized linear and nonlinear approach. Statistics for Social Science and Public Policy. New York, NY. Springer.
  • Desjardins, C. D., & Bulut, O. (2018). Handbook of educational measurement and psychometrics using R. Boca Raton, FL: CRC Press.
  • Embretson, S. E. (1983). Construct validity: Construct representation versus nomothetic span. Psychological Bulletin, 93(1). 179–197. http://dx.doi.org/10.1037/0033-2909.93.1.179
  • Embretson, S. E. (1994). Applications of cognitive design systems to test development. In C. R. Reynolds, Cognitive Assessment (pp. 107¬–135). Springer USA.
  • Embretson, S. E. (1998). A cognitive design system approach to generating valid tests: Application to abstract reasoning. Psychological Methods, 3(3), 380–396. http://dx.doi.org/10.1037/1082-989X.3.3.380
  • Embretson, S. E. (2006). Cognitive models for the psychometric properties of GRE quantitative items. Final Report. Princeton, NJ: Educational Testing Service.
  • Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum.
  • Embretson, S. E., & Yang, X. (2007). Construct validity and cognitive diagnostic assessment. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education (pp. 119–145). New York, NY: Cambridge University Press.
  • Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37(6), 359–374.
  • French, B. F., & Finch, W. H. (2010). Hierarchical logistic regression: Accounting for multilevel data in DIF detection. Journal of Educational Measurement, 47(3). 299–317. doi:10.1111/j.1745-3984.2010.00115.x
  • Ferster, A. E. (2013). An evaluation of item level cognitive supports via a random-effects extension of the linear logistic test model. Unpublished doctoral dissertation, University of Georgia.
  • Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2013). Bayesian data analysis. Boca Raton, FL: CRC Press.
  • Hartig, J., Frey, A., Nold, G., & Klieme, E. (2012). An application of explanatory item response modeling for model-based proficiency scaling. Educational and Psychological Measurement, 72(4), 665–686. doi:10.1177/0013164411430707
  • Holling, H., Bertling, J. P., & Zeuch, N. (2009). Automatic item generation of probability word problems. Studies in Educational Evaluation, 35, 71–76. doi:10.1016/j.stueduc.2009.10.004
  • Janssen, R. (2010). Modeling the effect of item designs within the Rasch model. In. S. E. Embretson (Ed.), Measuring psychological constructs: Advances in model-based approaches (pp. 227–245). Washington, DC, US: American Psychological Association.
  • Janssen, R., Schepers, J., & Peres, D. (2004). Models with item and item group predictors. In P. De Boeck & M. Wilson (Eds.), Explanatory item response models: A generalized linear and nonlinear approach (pp. 189–212). New York, NY: Springer-Verlag.
  • Jiao, H., & Zhang, Y. (2014). Polytomous multilevel testlet models for testlet‐based assessments with complex sampling designs. British Journal of Mathematical and Statistical Psychology, 68(1), 65–83. doi:10.1111/bmsp.12035
  • Kan, A., & Bulut, O. (2014). Examining the relationship between gender DIF and language complexity in mathematics assessments. International Journal of Testing, 14(3), 245–264. http://dx.doi.org/10.1080/15305058.2013.877911
  • Kuha, J. (2004). AIC and BIC: Comparisons of assumptions of performance. Sociological Methods and Research, 33, 188–229. doi:10.1177/0049124103262065
  • Kubinger, K. (2008). On the revival of the Rasch model-based LLTM: from constructing tests using item generating rules to measuring item administration effects. Psychological Science Quarterly, (3), 311–327.
  • Linacre, J. M. (2002). Optimizing rating scale category effectiveness. Journal of Applied Measurement, 5(1), 85–106.
  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
  • Lunn, D. J., Thomas, A., Best, N., & Spiegelhalter, D. (2000). WinBUGS-a Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing, 10(4), 325–337. doi:10.1023/A:1008929526011
  • Luppescu, S. (2012, April). DIF detection in HLM item analysis. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA.
  • Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149–174. doi:10.1007/BF02296272
  • Natesan, P., Limbers, C., & Varni, J. W. (2010). Bayesian estimation of graded response multilevel models using Gibbs sampling: formulation and illustration. Educational and Psychological Measurement, 70(3) 420–439. doi:10.1177/0013164409355696
  • Plieninger, H. & Meiser, T. (2014). Validity of multi-process IRT models for separating content and response styles. Educational and Psychological Measurement, 74(5), 875–899. doi:10.1177/0013164413514998
  • Prowker, A., & Camilli, G. (2007). Looking beyond the overall scores of NAEP assessments: Applications of generalized linear mixed modeling for exploring value‐added item difficulty effects. Journal of Educational Measurement, 44(1), 69–87. doi:10.1111/j.1745-3984.2007.00027.x
  • R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing: Vienna, Austria.
  • Rasch, G. (1960/1980). Probabilistic models for some intelligence and attainment tests (Copenhagen, Danish Institute for Educational Research), expanded edition (1980) with foreword and afterword by B. D. Wright. Chicago: The University of Chicago Press.
  • Reise, S. P., & Yu, J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27(2), 133–144.
  • Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores (Psychometric Monograph No. 17). Richmond, VA: Psychometric Society. Retrieved from http://www.psychometrika.org/journal/online/MN17.pdf
  • Schwarz, G.E. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461–464. doi:10.1214/aos/1176344136
  • Scheiblechner, H. H. (2009). Rasch and pseudo-Rasch models: suitableness for practical test applications. Psychology Science Quarterly, 51, 181–194.
  • Thissen, D., Chen, W., & Bock, D. (2003). MULTILOG 7 [Computer software]. Chicago, IL: Scientific Software International.
  • Tuerlinckx, F., & Wang, W.-C. (2004). Models for polytomous data. In P. De Boeck & M. Wilson (Eds.), Explanatory item response models: A generalized linear and nonlinear approach (pp. 75–109). New York: Springer-Verlag.
  • Tutz, G. (1990). Sequential item response models with an ordered response. British Journal of Mathematical and Statistical Psychology, 43(1), 39–55.
  • Tutz, G. (1991). Sequential models in categorical regression. Computational Statistics and Data Analysis, 11(3), 275–295. doi:10.1111/j.2044-8317.1990.tb00925.x
  • Vaughn, B. K. (2006). A hierarchical generalized linear model of random differential item functioning for polytomous items: A Bayesian multilevel approach. Electronic Theses, Treatises and Dissertations. Paper 4588.
  • Van den Noortgate, W., De Boeck, P., & Meulders, M. (2003). Cross-classification multilevel logistic models in psychometrics. Journal of Educational and Behavioral Statistics, 28(4), 369–386. doi:10.3102/10769986028004369
  • Van den Noortgate, W., & Paek, I. (2004). Person regression models. In P. De Boeck & M. Wilson (Eds.), Explanatory item response models: A generalized linear and nonlinear approach (pp. 167–187). New York, NY: Springer-Verlag.
  • van der Linden, W. J. & Hambleton, R. K. (1997). Item response theory: Brief history, common models, and extensions. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 1–28). New York: Springer
  • Vansteelandt, K. (2000). Formal models for contextualized personality psychology. Unpublished doctoral dissertation, K.U. Leuven, Belgium.
  • Verhelst, N. D., & Verstralen, H. H. F. M. (2008). Some considerations on the Partial Credit Model. Psicologica: International Journal of Methodology and Experimental Psychology, 29(2), 229–254.
  • Wang, W.-C., & Liu, C.-Y. (2007). Formulation and application of the generalized multilevel facets model. Educational and Psychological Measurement, 67(4), 583 - 605. doi:10.1177/0013164406296974
  • Wang, W.-C., & Wilson, M. (2005). Exploring local item dependence using a random-effects facet model. Applied Psychological Measurement, 29(4), 296 - 318. doi:10.1177/0146621605276281
  • Wang, W.-C., Wilson, M., & Shih, C.-L. (2006). Modeling randomness in judging rating scales with a random-effects rating scale model. Journal of Educational Measurement, 43(4), 335–353. doi:10.1111/j.1745-3984.2006.00020.x
  • Wang, W.-C., & Wu, S.-L. (2011). The random-effect generalized rating scale model. Journal of Educational Measurement, 48(4), 441-456. doi:10.1111/j.1745-3984.2011.00154.x
  • Williams, N. J., & Beretvas, S. N. (2006). DIF identification using HGLM for polytomous items. Applied Psychological Measurement, 30, 22–42. doi:10.1177/0146621605279867
  • Wilson, M., De Boeck, P., & Carstensen, C. H. (2008). Explanatory item response models: A brief introduction. In Hartig, J., Klieme, E., Leutner, D. (Eds.), Assessment of competencies in educational contexts: State of the art and future prospects (pp. 91-120). Göttingen, Germany: Hogrefe & Huber.
  • Wilson, M., Zheng, X., & McGuire, L. (2012). Formulating latent growth using an explanatory item response model approach. Journal of Applied Measurement, 13(1), 1–22.
  • Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago: Mesa Press.
  • Zwinderman, A. H. (1991). A generalized Rasch model for manifest predictors. Psychometrika, 56(4), 589–600.
There are 72 citations in total.

Details

Primary Language English
Subjects Studies on Education
Journal Section Articles
Authors

Luke Stanke This is me 0000-0002-4340-6954

Okan Bulut 0000-0001-5853-1267

Publication Date July 15, 2019
Submission Date January 19, 2019
Published in Issue Year 2019

Cite

APA Stanke, L., & Bulut, O. (2019). Explanatory Item Response Models for Polytomous Item Responses. International Journal of Assessment Tools in Education, 6(2), 259-278. https://doi.org/10.21449/ijate.515085

Cited By












23823             23825             23824