Research Article
BibTex RIS Cite

Branching Out: How the IRTree Model can Root Out Disengagement in a Variety of Low-stakes Contexts

Year 2025, Volume: 16 Issue: 3, 179 - 202, 30.09.2025
https://doi.org/10.21031/epod.1662875

Abstract

Low-stakes assessments, commonly used in higher education and large-scale K-12 testing, face unique challenges in score interpretation due to disengagement during testing. The IRTree Model for Disengagement, which can be applied with various indicators of disengagement at the item level, addresses this issue by accounting for disengaged responses, thereby improving the accuracy of score interpretations. This study examines the effectiveness of this IRTree model by evaluating its accuracy in estimating parameters across different conditions, including variations of sample size, test length, prior distributions, and the correlation between latent traits. Using a 2x2x2x3 simulation design, we found that the IRTree model effectively recovers person and item parameters, with factors like test length and prior distributions moderately influencing accuracy. While larger sample sizes and matched priors enhance parameter recovery, non-matched priors also perform adequately. Given the model to estimates two latent traits—disengagement and the trait of interest—simultaneously, estimation can leverage their relationship, reducing parameter bias and increasing parameter coverage rates. This showcases that using this model leads to more valid interpretations of test scores, particularly in low-stakes contexts where disengagement might otherwise skew results. The results underscore the IRTree model's use in educational assessment, improving the validity of score interpretations in low-stakes testing environments, regardless of the disengagement indicator used.

References

  • Alahmadi, S., & DeMars, C. E. (2024). Comparing examinee-based and response-based motivation filtering methods in remote low-stakes testing. Applied Measurement in Education, 37(1), 43–56. https://doi.org/10.1080/08957347.2024.2311927
  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (Eds.). (2014). Standards for educational and psychological testing. American Educational Research Association.
  • Ames, A. J., & Leventhal, B. C. (2022). Modeling changes in response style with longitudinal IRTree models. Multivariate Behavioral Research, 57(5), 859–878. https://doi.org/10.1080/00273171.2021.1920361
  • Barry, C. L., Horst, S. J., Finney, S. J., Brown, A. R., & Kopp, J. P. (2010). Do examinees have similar test-taking effort? A high-stakes question for low-stakes testing. International Journal of Testing, 10(4), 342–363. https://doi.org/10.1080/15305058.2010.508569
  • Csányi, R., & Molnár, G. (2024). Item- and person-level factors in test-taking disengagement: Multilevel modelling in a low-stakes context. International Journal of Educational Research Open, 7, 100373. https://doi.org/10.1016/j.ijedro.2024.100373
  • De Boeck, P., & Partchev, I. (2012). IRTrees: Tree-based item response models of the GLMM family. Journal of Statistical Software, 48(Code Snippet 1). https://doi.org/10.18637/jss.v048.c01
  • Debeer, D., Janssen, R., & De Boeck, P. (2017). Modeling skipped and not-reached items using IRTrees. Journal of Educational Measurement, 54(3), 333–363. https://doi.org/10.1111/jedm.12147
  • DeMars, C. (2010). Item response theory. Oxford University Press.
  • Deribo, T., Kroehne, U., & Goldhammer, F. (2021). Model‐based treatment of rapid guessing. Journal of Educational Measurement, 58(2), 281–303. https://doi.org/10.1111/jedm.12290
  • Dibek, M. İ. (2020). Silent predictors of test disengagement in PIAAC 2012. Journal of Measurement and Evaluation in Education and Psychology, 11(4), Article 4. https://doi.org/10.21031/epod.796626
  • Feinberg, R. A., & Rubright, J. D. (2016). Conducting simulation studies in psychometrics. Educational Measurement: Issues and Practice, 35(2), 36–49. https://doi.org/10.1111/emip.12111
  • Finney, S. J., & McFadden, M. E. (2023). Examining the question-behavior effect in low-stakes testing contexts: A cheap strategy to increase examinee effort. Educational Assessment, 28(4), 211–228.
  • Flake, J. K., Barron, K. E., Hulleman, C., McCoach, B. D., & Welsh, M. E. (2015). Measuring cost: The forgotten component of expectancy-value theory. Contemporary Educational Psychology, 41, 232–244. https://doi.org/10.1016/j.cedpsych.2015.03.002
  • Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014). Bayesian data analysis (Third edition). CRC Press, Taylor and Francis Group.
  • Geweke, J. (1992). Evaluating the accuracy of sampling‐based approaches to the calculation of posterior moments. In Bayesian Statistics (pp. 169–193). Oxford University Press.
  • Goldhammer, F., Martens, T., & Lüdtke, O. (2017). Conditioning factors of test-taking engagement in PIAAC: An exploratory IRT modelling approach considering person and item characteristics. Large-Scale Assessments in Education, 5(1), 18. https://doi.org/10.1186/s40536-017-0051-9
  • Gorgun, G., & Bulut, O. (2021). A polytomous scoring approach to handle not-reached items in low-stakes assessments. Educational and Psychological Measurement, 81(5), 847–871. https://doi.org/10.1177/0013164421991211
  • Guo, H., Rios, J. A., Haberman, S., Liu, O. L., Wang, J., & Paek, I. (2016). A new procedure for detection of students’ rapid guessing responses using response Time. Applied Measurement in Education, 29(3), 173–183. https://doi.org/10.1080/08957347.2016.1171766
  • Hauser C., Kingsbury G. G. (2009, April 14–16). Individual score validity in a modest-stakes adaptive educational testing setting [Paper Presentation]. The Annual Meeting of the National Council on Measurement in Education, San Diego, CA, United States.
  • Hollander, J., & Huette, S. (2022). Extracting blinks from continuous eye-tracking data in a mind wandering paradigm. Consciousness and Cognition, 100, 103303. https://doi.org/10.1016/j.concog.2022.103303
  • Kara, B. E. (2025). Exploring test taking disengagement in the context of PISA 2022: Evidence from process data. International Electronic Journal of Elementary Education, 17(2), 305–314.
  • Kuang, H., & Sahin, F. (2023). Comparison of disengagement levels and the impact of disengagement on item parameters between PISA 2015 and PISA 2018 in the United States. Large-Scale Assessments in Education, 11(1), 4. https://doi.org/10.1186/s40536-023-00152-0
  • Kuhfeld, M., & Soland, J. (2020). Using assessment metadata to quantify the impact of test disengagement on estimates of educational effectiveness. Journal of Research on Educational Effectiveness, 13(1), 147–175. https://doi.org/10.1080/19345747.2019.1636437
  • Leventhal, B. C., Ames, A. J., & Thompson, K. N. (2023). Simulation studies for psychometrics. In International Encyclopedia of Education (Fourth Edition) (pp. 341–346). Elsevier. https://doi.org/10.1016/B978-0-12-818630-5.10043-0
  • Leventhal, B. C., Gregg, N., & Ames, A. J. (2022). Accounting for response styles: Leveraging the benefits of combining response process data collection and response process analysis methods. Measurement: Interdisciplinary Research and Perspectives, 20(3), 151–174. https://doi.org/10.1080/15366367.2021.1953315
  • Leventhal, B. C., & Pastor, D. (2024). An illustration of an IRTree model for disengagement. Educational and Psychological Measurement. https://doi.org/10.1177/00131644231185533
  • Liu, Y., Li, Z., Liu, H., & Luo, F. (2019). Modeling test-taking non-effort in MIRT models. Frontiers in Psychology, 10, 145. https://doi.org/10.3389/fpsyg.2019.00145
  • Maqsood, R., Ceravolo, P., & Ventura, S. (2019). Discovering students’ engagement behaviors in confidence-based assessment. 2019 IEEE Global Engineering Education Conference (EDUCON), 841–846. https://doi.org/10.1109/EDUCON.2019.8725161
  • Mehou, J. & Leventhal, B. C. (2025, April 24) An IRTree approach to investigating item-level predictors of disengagement on the PISA. Paper presented at the Annual Meeting of the National Council on Measurement in Education (NCME), Denver, CO.
  • Mehou, J.M., Leventhal, B.C., Hunsberger, J. & Bao, Y. (2024). An IRTree approach to investigating examinee level predictors of disengagement on the PISA. Paper presented a the annual meeting of the Northeastern Educational Research Association, Trumbull, CT.
  • Mislevy, R. J., & Wu, P. K. (1996). Missing responses and IRT ability estimation: Omits, choice, time limits, and adaptive testing. ETS Research Report Series, 1996(2), i-36. https://doi.org/10.1002/j.2333-8504.1996.tb01708.x
  • Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. ETS Research Report Series, 1992(1), i–30. https://doi.org/10.1002/j.2333-8504.1992.tb01436.x
  • Myers, A. J., Ames, A. J., Leventhal, B. C., & Holzman, M. A. (2020). Validating rubric scoring processes: An application of an item response tree model. Applied Measurement in Education, 33(4), 293–308. https://doi.org/10.1080/08957347.2020.1789143
  • Nagy, G., Ulitzsch, E., & Lindner, M. A. (2023). The role of rapid guessing and test-taking persistence in modelling test-taking engagement. Journal of Computer Assisted Learning, 39(3), 751–766. https://doi.org/10.1111/jcal.12719
  • Organization for Economic Co-operation and Development (OECD). (2023). PISA 2022 results (Volume I): The state of learning and equity in education. OECD. https://doi.org/10.1787/53f23881-en
  • Pastor, D. A., Foelber, K. J., Jacovidis, J. N., Fulcher, K. H., Sauder, D. C., & Love, P. D. (2019). University-wide Assessment Days: The James Madison University model. The Association for Institutional Research (AIR) Professional File, Article 144, 1-13.
  • Penk, C., & Schipolowski, S. (2015). Is it all about value? Bringing back the expectancy component to the assessment of test-taking motivation. Learning and Individual Differences, 42, 27–35. https://doi.org/10.1016/j.lindif.2015.08.002
  • Rios, J. (2021). Improving test-taking effort in low-stakes group-based educational testing: A meta-analysis of interventions. Applied Measurement in Education, 34(2), 85–106. https://doi.org/10.1080/08957347.2021.1890741
  • Rios, J. A. (2022). Assessing the accuracy of parameter estimates in the presence of rapid guessing misclassifications. Educational and Psychological Measurement, 82(1), 122–150. https://doi.org/10.1177/00131644211003640
  • Rios, J. A., & Deng, J. (2021). Does the choice of response time threshold procedure substantially affect inferences concerning the identification and exclusion of rapid guessing responses? A meta-analysis. Large-Scale Assessments in Education, 9(1), 18. https://doi.org/10.1186/s40536-021-00110-8
  • Rios, J. A., & Deng, J. (2023). A comparison of response time threshold scoring procedures in mitigating bias from rapid guessing behavior. Educational and Psychological Measurement, 1–34. https://doi.org/10.1177/00131644231168398
  • Rios, J. A., Deng, J., & Ihlenfeldt, S. D. (2022). To what degree does rapid guessing distort aggregated test scores? A meta-analytic investigation. Educational Assessment, 27(4), 356–373. https://doi.org/10.1080/10627197.2022.2110465
  • Rios, J. A., Guo, H., Mao, L., & Liu, O. L. (2017). Evaluating the impact of careless responding on aggregated-scores: To filter unmotivated examinees or not? International Journal of Testing, 17(1), 74–104. https://doi.org/10.1080/15305058.2016.1231193
  • Rios, J. A., & Soland, J. (2021). Parameter estimation accuracy of the effort-moderated item response theory model under multiple assumption violations. Educational and Psychological Measurement, 81(3), 569–594. https://doi.org/10.1177/0013164420949896
  • Rios, J. A., & Soland, J. (2022). An investigation of item, examinee, and country correlates of rapid guessing in PISA. International Journal of Testing, 22(2), 154–184. https://doi.org/10.1080/15305058.2022.2036161
  • SAS Institute Inc. (2023). SAS/STAT® 15.3 user’s guide. Cary, NC: SAS Institute Inc.
  • Schaefer, K. E., & Finney, S. J. (2025). The influence of student disengagement on a non-cognitive measure: Practical solutions for assessment practitioners. Research & Practice in Assessment, 20(1), 34-48.
  • Selçuk, E., & Demir, E. (2024). Comparison of item response theory ability and item parameters according to classical and Bayesian estimation methods. International Journal of Assessment Tools in Education, 11(2), 213–248. https://doi.org/10.21449/ijate.1290831
  • Spratto, E. M., Leventhal, B. C., & Bandalos, D. L. (2021). Seeing the forest and the trees: Comparison of two IRTree models to investigate the impact of full versus endpoint-only response option labeling. Educational and Psychological Measurement, 81(1), 39–60. https://doi.org/10.1177/0013164420918655
  • Stone, C. A., & Zhu, X. (2015). Bayesian analysis of item response theory models using SAS®. SAS.
  • Svetina Valdivia, D., Rutkowski, L., Rutkowski, D., Canbolat, Y., & Underhill, S. (2023). Test engagement and rapid guessing: Evidence from a large-scale state assessment. Frontiers in Education, 8. https://doi.org/10.3389/feduc.2023.1127644
  • Sweeney, S. M., Sinharay, S., Johnson, M. S., & Steinhauer, E. W. (2022). An investigation of the nature and consequence of the relationship between IRT difficulty and discrimination. Educational Measurement: Issues and Practice, 41(4), 50–67. https://doi.org/10.1111/emip.12522
  • Ulitzsch, E., von Davier, M., & Pohl, S. (2020). A hierarchical latent response model for inferences about examinee engagement in terms of guessing and item-level non-response. British Journal of Mathematical and Statistical Psychology, 73(S1), e12188. https://doi.org/10.1111/bmsp.12188
  • Wang, C. C., Hung, J. C., Chen, S.-N., & Chang, H. P. (2019). Tracking students’ visual attention on manga-based interactive e-book while reading: An eye-movement approach. Multimedia Tools and Applications, 78(4), 4813–4834. https://doi.org/10.1007/s11042-018-5754-6
  • Wigfield, A., & Eccles, J. S. (1992). The development of achievement task values: A theoretical analysis. Developmental Review, 12(3), 265–310.
  • Wigfield, A., & Eccles, J. S. (2000). Expectancy–value theory of achievement motivation. Contemporary Educational Psychology, 25(1), 68–81. https://doi.org/10.1006/ceps.1999.1015
  • Wise, S. L. (2017). Rapid-guessing behavior: Its identification, interpretation, and implications. Educational Measurement: Issues and Practice, 36(4), 52–61. https://doi.org/10.1111/emip.12165
  • Wise, S. L. (2020). Six insights regarding test-taking disengagement. Educational Research and Evaluation, 26(5–6), 328–338. https://doi.org/10.1080/13803611.2021.1963942
  • Wise, S. L., & DeMars, C. E. (2005). Low examinee effort in low-stakes assessment: Problems and potential solutions. Educational Assessment, 10(1), 1–17. https://doi.org/10.1207/s15326977ea1001_1
  • Wise, S. L., & DeMars, C. E. (2006). An application of item response time: The effort-moderated IRT model. Journal of Educational Measurement, 43(1), 19–38. https://doi.org/10.1111/j.1745-3984.2006.00002.x
  • Wise, S. L., & DeMars, C. E. (2010). Examinee noneffort and the validity of program assessment results. Educational Assessment, 15(1), 27–41. https://doi.org/10.1080/10627191003673216
  • Wise, S. L., & Kong, X. (2005). Response time effort: A new measure of examinee motivation in computer-based tests. Applied Measurement in Education, 18(2), 163–183. https://doi.org/10.1207/s15324818ame1802_2
  • Wise S. & Kuhfeld M. (2019). What happens when test takers disengage? Understanding and addressing rapid guessing. Research Brief. Retrieved from https://www.nwea.org/research/publication/what-happens-when-test-takers-disengage-understanding-and-addressing-rapid-guessing/
  • Wise, S. L., & Ma, L. (2012). Setting response time thresholds for a CAT item pool: The normative threshold method. Annual Meeting of the National Council on Measurement in Education, Vancouver, Canada.
  • Wise, S. L., Pastor, D. A., & Kong, X. J. (2009). Correlates of rapid-guessing behavior in low-stakes testing: Implications for test development and measurement practice. Applied Measurement in Education, 22(2), 185–205. https://doi.org/10.1080/08957340902754650
  • Wise, S. L., Soland, James, & Bo, Y. (2020). The (non)impact of differential test taker engagement on aggregated scores. International Journal of Testing, 20(1), 57–77. https://doi.org/10.1080/15305058.2019.1605999
  • Yamamoto, K. (1995). Estimating the effects of test length and test time on parameter estimation using the hybrid model. ETS Research Report Series, 1995(1), 1–39. https://doi.org/10.1002/j.2333-8504.1995.tb01637.x
There are 67 citations in total.

Details

Primary Language English
Subjects Item Response Theory, Modelling, Testing, Assessment and Psychometrics (Other)
Journal Section Articles
Authors

Brian Leventhal 0000-0002-6480-2016

Josiah Hunsberger 0000-0003-1680-2545

Publication Date September 30, 2025
Submission Date March 21, 2025
Acceptance Date September 30, 2025
Published in Issue Year 2025 Volume: 16 Issue: 3

Cite

APA Leventhal, B., & Hunsberger, J. (2025). Branching Out: How the IRTree Model can Root Out Disengagement in a Variety of Low-stakes Contexts. Journal of Measurement and Evaluation in Education and Psychology, 16(3), 179-202. https://doi.org/10.21031/epod.1662875