Branching Out: How the IRTree Model can Root Out Disengagement in a Variety of Low-stakes Contexts

Brian Leventhal; Josiah Hunsberger

doi:10.21031/epod.1662875

Research Article

Branching Out: How the IRTree Model can Root Out Disengagement in a Variety of Low-stakes Contexts

Year 2025, Volume: 16 Issue: 3, 179 - 202, 30.09.2025

Brian Leventhal , Josiah Hunsberger

https://doi.org/10.21031/epod.1662875

Abstract

Low-stakes assessments, commonly used in higher education and large-scale K-12 testing, face unique challenges in score interpretation due to disengagement during testing. The IRTree Model for Disengagement, which can be applied with various indicators of disengagement at the item level, addresses this issue by accounting for disengaged responses, thereby improving the accuracy of score interpretations. This study examines the effectiveness of this IRTree model by evaluating its accuracy in estimating parameters across different conditions, including variations of sample size, test length, prior distributions, and the correlation between latent traits. Using a 2x2x2x3 simulation design, we found that the IRTree model effectively recovers person and item parameters, with factors like test length and prior distributions moderately influencing accuracy. While larger sample sizes and matched priors enhance parameter recovery, non-matched priors also perform adequately. Given the model to estimates two latent traits—disengagement and the trait of interest—simultaneously, estimation can leverage their relationship, reducing parameter bias and increasing parameter coverage rates. This showcases that using this model leads to more valid interpretations of test scores, particularly in low-stakes contexts where disengagement might otherwise skew results. The results underscore the IRTree model's use in educational assessment, improving the validity of score interpretations in low-stakes testing environments, regardless of the disengagement indicator used.

Keywords

IRTree model , low-stakes assessments , disengagement , parameter recovery , Bayesian simulation

References

Alahmadi, S., & DeMars, C. E. (2024). Comparing examinee-based and response-based motivation filtering methods in remote low-stakes testing. Applied Measurement in Education, 37(1), 43–56. https://doi.org/10.1080/08957347.2024.2311927
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (Eds.). (2014). Standards for educational and psychological testing. American Educational Research Association.
Ames, A. J., & Leventhal, B. C. (2022). Modeling changes in response style with longitudinal IRTree models. Multivariate Behavioral Research, 57(5), 859–878. https://doi.org/10.1080/00273171.2021.1920361
Barry, C. L., Horst, S. J., Finney, S. J., Brown, A. R., & Kopp, J. P. (2010). Do examinees have similar test-taking effort? A high-stakes question for low-stakes testing. International Journal of Testing, 10(4), 342–363. https://doi.org/10.1080/15305058.2010.508569
Csányi, R., & Molnár, G. (2024). Item- and person-level factors in test-taking disengagement: Multilevel modelling in a low-stakes context. International Journal of Educational Research Open, 7, 100373. https://doi.org/10.1016/j.ijedro.2024.100373
De Boeck, P., & Partchev, I. (2012). IRTrees: Tree-based item response models of the GLMM family. Journal of Statistical Software, 48(Code Snippet 1). https://doi.org/10.18637/jss.v048.c01
Debeer, D., Janssen, R., & De Boeck, P. (2017). Modeling skipped and not-reached items using IRTrees. Journal of Educational Measurement, 54(3), 333–363. https://doi.org/10.1111/jedm.12147
DeMars, C. (2010). Item response theory. Oxford University Press.
Deribo, T., Kroehne, U., & Goldhammer, F. (2021). Model‐based treatment of rapid guessing. Journal of Educational Measurement, 58(2), 281–303. https://doi.org/10.1111/jedm.12290
Dibek, M. İ. (2020). Silent predictors of test disengagement in PIAAC 2012. Journal of Measurement and Evaluation in Education and Psychology, 11(4), Article 4. https://doi.org/10.21031/epod.796626
Feinberg, R. A., & Rubright, J. D. (2016). Conducting simulation studies in psychometrics. Educational Measurement: Issues and Practice, 35(2), 36–49. https://doi.org/10.1111/emip.12111
Finney, S. J., & McFadden, M. E. (2023). Examining the question-behavior effect in low-stakes testing contexts: A cheap strategy to increase examinee effort. Educational Assessment, 28(4), 211–228.
Flake, J. K., Barron, K. E., Hulleman, C., McCoach, B. D., & Welsh, M. E. (2015). Measuring cost: The forgotten component of expectancy-value theory. Contemporary Educational Psychology, 41, 232–244. https://doi.org/10.1016/j.cedpsych.2015.03.002
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014). Bayesian data analysis (Third edition). CRC Press, Taylor and Francis Group.
Geweke, J. (1992). Evaluating the accuracy of sampling‐based approaches to the calculation of posterior moments. In Bayesian Statistics (pp. 169–193). Oxford University Press.
Goldhammer, F., Martens, T., & Lüdtke, O. (2017). Conditioning factors of test-taking engagement in PIAAC: An exploratory IRT modelling approach considering person and item characteristics. Large-Scale Assessments in Education, 5(1), 18. https://doi.org/10.1186/s40536-017-0051-9
Gorgun, G., & Bulut, O. (2021). A polytomous scoring approach to handle not-reached items in low-stakes assessments. Educational and Psychological Measurement, 81(5), 847–871. https://doi.org/10.1177/0013164421991211
Guo, H., Rios, J. A., Haberman, S., Liu, O. L., Wang, J., & Paek, I. (2016). A new procedure for detection of students’ rapid guessing responses using response Time. Applied Measurement in Education, 29(3), 173–183. https://doi.org/10.1080/08957347.2016.1171766
Hauser C., Kingsbury G. G. (2009, April 14–16). Individual score validity in a modest-stakes adaptive educational testing setting [Paper Presentation]. The Annual Meeting of the National Council on Measurement in Education, San Diego, CA, United States.
Hollander, J., & Huette, S. (2022). Extracting blinks from continuous eye-tracking data in a mind wandering paradigm. Consciousness and Cognition, 100, 103303. https://doi.org/10.1016/j.concog.2022.103303
Kara, B. E. (2025). Exploring test taking disengagement in the context of PISA 2022: Evidence from process data. International Electronic Journal of Elementary Education, 17(2), 305–314.
Kuang, H., & Sahin, F. (2023). Comparison of disengagement levels and the impact of disengagement on item parameters between PISA 2015 and PISA 2018 in the United States. Large-Scale Assessments in Education, 11(1), 4. https://doi.org/10.1186/s40536-023-00152-0
Kuhfeld, M., & Soland, J. (2020). Using assessment metadata to quantify the impact of test disengagement on estimates of educational effectiveness. Journal of Research on Educational Effectiveness, 13(1), 147–175. https://doi.org/10.1080/19345747.2019.1636437
Leventhal, B. C., Ames, A. J., & Thompson, K. N. (2023). Simulation studies for psychometrics. In International Encyclopedia of Education (Fourth Edition) (pp. 341–346). Elsevier. https://doi.org/10.1016/B978-0-12-818630-5.10043-0
Leventhal, B. C., Gregg, N., & Ames, A. J. (2022). Accounting for response styles: Leveraging the benefits of combining response process data collection and response process analysis methods. Measurement: Interdisciplinary Research and Perspectives, 20(3), 151–174. https://doi.org/10.1080/15366367.2021.1953315
Leventhal, B. C., & Pastor, D. (2024). An illustration of an IRTree model for disengagement. Educational and Psychological Measurement. https://doi.org/10.1177/00131644231185533
Liu, Y., Li, Z., Liu, H., & Luo, F. (2019). Modeling test-taking non-effort in MIRT models. Frontiers in Psychology, 10, 145. https://doi.org/10.3389/fpsyg.2019.00145
Maqsood, R., Ceravolo, P., & Ventura, S. (2019). Discovering students’ engagement behaviors in confidence-based assessment. 2019 IEEE Global Engineering Education Conference (EDUCON), 841–846. https://doi.org/10.1109/EDUCON.2019.8725161
Mehou, J. & Leventhal, B. C. (2025, April 24) An IRTree approach to investigating item-level predictors of disengagement on the PISA. Paper presented at the Annual Meeting of the National Council on Measurement in Education (NCME), Denver, CO.
Mehou, J.M., Leventhal, B.C., Hunsberger, J. & Bao, Y. (2024). An IRTree approach to investigating examinee level predictors of disengagement on the PISA. Paper presented a the annual meeting of the Northeastern Educational Research Association, Trumbull, CT.
Mislevy, R. J., & Wu, P. K. (1996). Missing responses and IRT ability estimation: Omits, choice, time limits, and adaptive testing. ETS Research Report Series, 1996(2), i-36. https://doi.org/10.1002/j.2333-8504.1996.tb01708.x
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. ETS Research Report Series, 1992(1), i–30. https://doi.org/10.1002/j.2333-8504.1992.tb01436.x
Myers, A. J., Ames, A. J., Leventhal, B. C., & Holzman, M. A. (2020). Validating rubric scoring processes: An application of an item response tree model. Applied Measurement in Education, 33(4), 293–308. https://doi.org/10.1080/08957347.2020.1789143
Nagy, G., Ulitzsch, E., & Lindner, M. A. (2023). The role of rapid guessing and test-taking persistence in modelling test-taking engagement. Journal of Computer Assisted Learning, 39(3), 751–766. https://doi.org/10.1111/jcal.12719
Organization for Economic Co-operation and Development (OECD). (2023). PISA 2022 results (Volume I): The state of learning and equity in education. OECD. https://doi.org/10.1787/53f23881-en
Pastor, D. A., Foelber, K. J., Jacovidis, J. N., Fulcher, K. H., Sauder, D. C., & Love, P. D. (2019). University-wide Assessment Days: The James Madison University model. The Association for Institutional Research (AIR) Professional File, Article 144, 1-13.
Penk, C., & Schipolowski, S. (2015). Is it all about value? Bringing back the expectancy component to the assessment of test-taking motivation. Learning and Individual Differences, 42, 27–35. https://doi.org/10.1016/j.lindif.2015.08.002
Rios, J. (2021). Improving test-taking effort in low-stakes group-based educational testing: A meta-analysis of interventions. Applied Measurement in Education, 34(2), 85–106. https://doi.org/10.1080/08957347.2021.1890741
Rios, J. A. (2022). Assessing the accuracy of parameter estimates in the presence of rapid guessing misclassifications. Educational and Psychological Measurement, 82(1), 122–150. https://doi.org/10.1177/00131644211003640
Rios, J. A., & Deng, J. (2021). Does the choice of response time threshold procedure substantially affect inferences concerning the identification and exclusion of rapid guessing responses? A meta-analysis. Large-Scale Assessments in Education, 9(1), 18. https://doi.org/10.1186/s40536-021-00110-8
Rios, J. A., & Deng, J. (2023). A comparison of response time threshold scoring procedures in mitigating bias from rapid guessing behavior. Educational and Psychological Measurement, 1–34. https://doi.org/10.1177/00131644231168398
Rios, J. A., Deng, J., & Ihlenfeldt, S. D. (2022). To what degree does rapid guessing distort aggregated test scores? A meta-analytic investigation. Educational Assessment, 27(4), 356–373. https://doi.org/10.1080/10627197.2022.2110465
Rios, J. A., Guo, H., Mao, L., & Liu, O. L. (2017). Evaluating the impact of careless responding on aggregated-scores: To filter unmotivated examinees or not? International Journal of Testing, 17(1), 74–104. https://doi.org/10.1080/15305058.2016.1231193
Rios, J. A., & Soland, J. (2021). Parameter estimation accuracy of the effort-moderated item response theory model under multiple assumption violations. Educational and Psychological Measurement, 81(3), 569–594. https://doi.org/10.1177/0013164420949896
Rios, J. A., & Soland, J. (2022). An investigation of item, examinee, and country correlates of rapid guessing in PISA. International Journal of Testing, 22(2), 154–184. https://doi.org/10.1080/15305058.2022.2036161
SAS Institute Inc. (2023). SAS/STAT® 15.3 user’s guide. Cary, NC: SAS Institute Inc.
Schaefer, K. E., & Finney, S. J. (2025). The influence of student disengagement on a non-cognitive measure: Practical solutions for assessment practitioners. Research & Practice in Assessment, 20(1), 34-48.
Selçuk, E., & Demir, E. (2024). Comparison of item response theory ability and item parameters according to classical and Bayesian estimation methods. International Journal of Assessment Tools in Education, 11(2), 213–248. https://doi.org/10.21449/ijate.1290831
Spratto, E. M., Leventhal, B. C., & Bandalos, D. L. (2021). Seeing the forest and the trees: Comparison of two IRTree models to investigate the impact of full versus endpoint-only response option labeling. Educational and Psychological Measurement, 81(1), 39–60. https://doi.org/10.1177/0013164420918655
Stone, C. A., & Zhu, X. (2015). Bayesian analysis of item response theory models using SAS®. SAS.
Svetina Valdivia, D., Rutkowski, L., Rutkowski, D., Canbolat, Y., & Underhill, S. (2023). Test engagement and rapid guessing: Evidence from a large-scale state assessment. Frontiers in Education, 8. https://doi.org/10.3389/feduc.2023.1127644
Sweeney, S. M., Sinharay, S., Johnson, M. S., & Steinhauer, E. W. (2022). An investigation of the nature and consequence of the relationship between IRT difficulty and discrimination. Educational Measurement: Issues and Practice, 41(4), 50–67. https://doi.org/10.1111/emip.12522
Ulitzsch, E., von Davier, M., & Pohl, S. (2020). A hierarchical latent response model for inferences about examinee engagement in terms of guessing and item-level non-response. British Journal of Mathematical and Statistical Psychology, 73(S1), e12188. https://doi.org/10.1111/bmsp.12188
Wang, C. C., Hung, J. C., Chen, S.-N., & Chang, H. P. (2019). Tracking students’ visual attention on manga-based interactive e-book while reading: An eye-movement approach. Multimedia Tools and Applications, 78(4), 4813–4834. https://doi.org/10.1007/s11042-018-5754-6
Wigfield, A., & Eccles, J. S. (1992). The development of achievement task values: A theoretical analysis. Developmental Review, 12(3), 265–310.
Wigfield, A., & Eccles, J. S. (2000). Expectancy–value theory of achievement motivation. Contemporary Educational Psychology, 25(1), 68–81. https://doi.org/10.1006/ceps.1999.1015
Wise, S. L. (2017). Rapid-guessing behavior: Its identification, interpretation, and implications. Educational Measurement: Issues and Practice, 36(4), 52–61. https://doi.org/10.1111/emip.12165
Wise, S. L. (2020). Six insights regarding test-taking disengagement. Educational Research and Evaluation, 26(5–6), 328–338. https://doi.org/10.1080/13803611.2021.1963942
Wise, S. L., & DeMars, C. E. (2005). Low examinee effort in low-stakes assessment: Problems and potential solutions. Educational Assessment, 10(1), 1–17. https://doi.org/10.1207/s15326977ea1001_1
Wise, S. L., & DeMars, C. E. (2006). An application of item response time: The effort-moderated IRT model. Journal of Educational Measurement, 43(1), 19–38. https://doi.org/10.1111/j.1745-3984.2006.00002.x
Wise, S. L., & DeMars, C. E. (2010). Examinee noneffort and the validity of program assessment results. Educational Assessment, 15(1), 27–41. https://doi.org/10.1080/10627191003673216
Wise, S. L., & Kong, X. (2005). Response time effort: A new measure of examinee motivation in computer-based tests. Applied Measurement in Education, 18(2), 163–183. https://doi.org/10.1207/s15324818ame1802_2
Wise S. & Kuhfeld M. (2019). What happens when test takers disengage? Understanding and addressing rapid guessing. Research Brief. Retrieved from https://www.nwea.org/research/publication/what-happens-when-test-takers-disengage-understanding-and-addressing-rapid-guessing/
Wise, S. L., & Ma, L. (2012). Setting response time thresholds for a CAT item pool: The normative threshold method. Annual Meeting of the National Council on Measurement in Education, Vancouver, Canada.
Wise, S. L., Pastor, D. A., & Kong, X. J. (2009). Correlates of rapid-guessing behavior in low-stakes testing: Implications for test development and measurement practice. Applied Measurement in Education, 22(2), 185–205. https://doi.org/10.1080/08957340902754650
Wise, S. L., Soland, James, & Bo, Y. (2020). The (non)impact of differential test taker engagement on aggregated scores. International Journal of Testing, 20(1), 57–77. https://doi.org/10.1080/15305058.2019.1605999
Yamamoto, K. (1995). Estimating the effects of test length and test time on parameter estimation using the hybrid model. ETS Research Report Series, 1995(1), 1–39. https://doi.org/10.1002/j.2333-8504.1995.tb01637.x

There are 67 citations in total.

Details

Primary Language	English
Subjects	Item Response Theory, Modelling, Testing, Assessment and Psychometrics (Other)
Journal Section	Articles
Authors	Brian Leventhal 0000-0002-6480-2016 Josiah Hunsberger 0000-0003-1680-2545
Publication Date	September 30, 2025
Submission Date	March 21, 2025
Acceptance Date	September 30, 2025
Published in Issue	Year 2025 Volume: 16 Issue: 3

Cite

APA	Leventhal, B., & Hunsberger, J. (2025). Branching Out: How the IRTree Model can Root Out Disengagement in a Variety of Low-stakes Contexts. Journal of Measurement and Evaluation in Education and Psychology, 16(3), 179-202. https://doi.org/10.21031/epod.1662875

Download Cover Image

Article Files

Full Text