Research Article
BibTex RIS Cite

Disengagement matters: A response-time informed approach to scoring low-stakes assessments

Year 2026, Volume: 13 Issue: 1, 123 - 144, 02.01.2026
https://doi.org/10.21449/ijate.1721066

Abstract

Low-stakes assessments in K-12 education play a crucial role in monitoring student progress, yet their validity is often compromised by disengaged test-taking behaviors, such as rapid guessing and idling. This study introduces a novel application of the Nested Logit Model (NLM) to integrate test-taking engagement, as indicated by response times (RTs), with item responses to improve ability estimation. Using data from a low-stakes reading assessment in the United States (n = 27,556 students in grades 5 to 8), we compared six scoring approaches, including traditional dichotomous scoring, effort-moderated scoring, and nominal scoring that categorized responses based on RT-informed engagement. Our results demonstrated that nominal scoring approaches, particularly those distinguishing rapid guesses and idle responses, yielded superior model fit, increased measurement precision, and provided nuanced insights into examinee behaviors compared to dichotomous scoring methods. Latent class analysis further identified three distinct engagement profiles—effortful responders, rapid guessers, and idle responders—highlighting the need to address both rapid and idle behaviors in modeling. This study emphasizes the value of leveraging RT data to enhance the accuracy of low-stakes assessments while preserving response information. Findings also suggest the NLM framework as a practical and accessible tool for researchers and practitioners seeking to address disengaged behaviors and ensure the reliability of low-stakes assessments.

Ethical Statement

University of Alberta, Pro00154093.

References

  • Aboutaleb, Y.M., Ben-Akiva, M., & Jaillet, P. (2020). Learning structure in nested logit models. ArXiv. https://doi.org/10.48550/arXiv.2008.08048
  • Ben-Akiva, M.E. (1973). Structure of passenger travel demand models [Doctoral thesis, Massachusetts Institute of Technology]. DSpace@MIT. https://dspace.mit.edu/handle/1721.1/14790
  • Ben-Akiva, M.E., & Lerman, S.R. (1985). Discrete choice analysis: Theory and application to travel demand. Cambridge, MA: MIT Press.
  • Bock, R.D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29–51.
  • Bolt, D.M., Wollack, J.A., & Suh, Y. (2012). Application of a multidimensional nested logit model to multiple-choice test items. Psychometrika, 77(2), 339–357. https://doi.org/10.1007/S11336-012-9257-5
  • Bulut, O., Gorgun, G., Wongvorachan, T., & Tan, B. (2023). Rapid guessing in low-stakes assessments: Finding the optimal response time threshold with random search and genetic algorithm. Algorithms, 16(2). https://doi.org/10.3390/a16020089
  • Butler, J., & Adams, R.J. (2007). The impact of differential investment of student effort on the outcomes of international studies. Journal of Applied Measurement, 8(3), 279–304.
  • Chalmers, R.P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06
  • Csányi, R., & Molnár, G. (2024). Item- and person-level factors in test-taking disengagement: Multilevel modeling in a low-stakes context. International Journal of Educational Research Open, 7. https://doi.org/10.1016/j.ijedro.2024.100373
  • De Boeck, P., & Jeon, M. (2019). An overview of models for response times and processes in cognitive tests. Frontiers in Psychology, 10. https://doi.org/10.3389/fpsyg.2019.00102
  • De Boeck, P., & Partchev, I. (2012). IRTrees: Tree-based item response models of the GLMM family. Journal of Statistical Software, 48(1), 1–28. https://doi.org/10.18637/jss.v048.c01
  • DeMars, C.E. (2007). Changes in rapid-guessing behavior series of assessments. Educational Assessment, 12(1), 23–45. https://doi.org/10.1080/10627190709336946
  • Deribo, T., Kroehne, U., & Goldhammer, F. (2021). Model-based treatment of rapid guessing. Journal of Educational Measurement, 58(2), 281–303. https://doi.org/10.1111/jedm.12290
  • Deribo, T., Goldhammer, F., & Kroehne, U. (2023). Changes in the speed-ability relation through different treatments of rapid guessing. Educational and Psychological Measurement, 83(3), 473–494. https://doi.org/10.1177/00131644221109490
  • Eklöf, H. (2006). Development and validation of scores from an instrument measuring student test-taking motivation. Educational and Psychological Measurement, 66(4), 643 656. https://doi.org/10.1177/0013164405278574
  • Eklöf, H. (2010). Skill and will: Test-taking motivation and assessment quality. Assessment in Education: Principles, Policy & Practice, 17(4), 345 356. https://doi.org/10.1080/0969594X.2010.516569
  • Erdem Kara, B. (2025). Exploring Test-Taking Disengagement in the Context of PISA 2022: Evidence from Process Data. International Electronic Journal of Elementary Education, 17(2), 305-315. https://doi.org/10.26822/iejee.2025.380
  • Filiasov, S., & Sweetman, A. (2023). Low-stakes standardized tests in British Columbia, Canada: System accountability and/or individual feedback? Education Economics, 31(2), 145–165. https://doi.org/10.1080/09645292.2022.2091113
  • Finn, B. (2015). Measuring motivation in low‐stakes assessments. ETS Research Report Series, 2015(2), 1–17.
  • Gorgun, G., & Bulut, O. (2021). A polytomous scoring approach to handle not-reached items in low-stakes assessments. Educational and Psychological Measurement, 81(5), 847 871. https://doi.org/10.1177/0013164421991211
  • Gorgun, G., & Bulut, O. (2023). Incorporating test-taking engagement into the item selection algorithm in low-stakes computerized adaptive tests. Large-scale Assessments in Education, 11(27). https://doi.org/10.1186/s40536-023-00177-5
  • Guo, H., Rios, J.A., Haberman, S., Liu, O.L., Wang, J., & Paek, I. (2016). A new procedure for detection of students’ rapid guessing responses using response time. Applied Measurement in Education, 29(3), 173–183. https://doi.org/10.1080/08957347.2016.1171766
  • Hauenstein, C.E., Embretson, S.E., & Kim, E. (2024). Psychometric modeling to identify examinees’ strategy differences during testing. Journal of Intelligence, 12(4). https://doi.org/10.3390/jintelligence12040040
  • Hu, L.-T., & Bentler, P.M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1–55. https://doi.org/10.1080/10705519909540118
  • Koo, T.K., & Li, M.Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155 163. https://doi.org/10.1016/j.jcm.2016.02.012
  • Kuhfeld, M., & Soland, J. (2020). Using assessment metadata to quantify the impact of test disengagement on estimates of educational effectiveness. Journal of Research on Educational Effectiveness, 13(1), 147–175. https://doi.org/10.1080/19345747.2019.1636437
  • Lee, Y.-H., & Jia, Y. (2014). Using response time to investigate students’ test-taking behaviors in a NAEP computer based study. Large scale Assessments in Education, 2(8). https://doi.org/10.1186/s40536-014-0008-1
  • Leventhal, B.C., & Pastor, D. (2024). An illustration of an IRTree model for disengagement. Educational and Psychological Measurement, 84(4), 810 834. https://doi.org/10.1177/00131644231185533
  • Lindner, M.A., Lüdtke, O., & Nagy, G. (2019). The onset of rapid-guessing behavior over the course of testing time: A matter of motivation and cognitive resources. Frontiers in Psychology, 10. https://doi.org/10.3389/fpsyg.2019.01533
  • Linzer, D.A., & Lewis, J.B. (2011). poLCA: An R package for polytomous variable latent class analysis. Journal of Statistical Software, 42(10), 1–29. https://doi.org/10.18637/jss.v042.i10
  • Liu, J.X., Bulut, O., & Johnson, M.D. (2024). Examining position effects on students’ ability and test-taking speed in the TIMSS 2019 problem-solving and inquiry tasks: A structural equation modeling approach. Psychology International, 6(2), 492–508. https://doi.org/10.3390/psycholint6020030
  • Lundgren, E., & Eklöf, H. (2023). Questionnaire-taking motivation: Using response times to assess motivation to optimize on the PISA 2018 student questionnaire. International Journal of Testing, 23(4), 231–256. https://doi.org/10.1080/15305058.2023.2214647
  • McFadden, D. (1981). Econometric models of probabilistic choice. In C. F. Manski, & D. McFadden (Eds.), Structural analysis of discrete data with econometric application (pp. 198–272). Cambridge: MIT Press
  • Michaelides, M.P., & Ivanova, M. (2022). Response time as an indicator of test-taking effort in PISA: country and item-type differences. Psychological Test and Assessment Modeling, 64(3), 304–338.
  • Michaelides, M.P., Ivanova, M.G., & Avraam, D. (2024). The impact of filtering out rapid-guessing examinees on PISA 2015 country rankings. Psychological Test and Assessment Modeling, 66, 50–62. https://doi.org/10.2440/001-0012
  • Nagy, G., & Ulitzsch, E. (2022). A multilevel mixture IRT framework for modeling response times as predictors or indicators of response engagement in IRT models. Educational and Psychological Measurement, 82(5), 845–879. https://doi.org/10.1177/00131644211045351
  • Pastor, D.A., Ong, T.Q., & Strickman, S.N. (2019). Patterns of solution behavior across items in low-stakes assessments. Educational Assessment, 24(3), 189 212. https://doi.org/10.1080/10627197.2019.1615373
  • Penk, C., & Richter, D. (2017). Change in test-taking motivation and its relationship to test performance in low-stakes assessments. Educational Assessment, Evaluation and Accountability, 29(1), 55–79. https://doi.org/10.1007/s11092-016-9248-7
  • Pools, E., & Monseur, C. (2021). Student test-taking effort in low-stakes assessments: Evidence from the English version of the PISA 2015 science test. Large-scale Assessments in Education, 9. https://doi.org/10.1186/s40536-021-00104-6
  • R Core Team. (2023). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
  • Rios, J.A., & Deng, J. (2022). Quantifying the distorting effect of rapid guessing on estimates of coefficient alpha. Applied Psychological Measurement, 46(1), 40 52. https://doi.org/10.1177/01466216211051719
  • Rios, J.A., & Guo, H. (2020). Can culture be a salient predictor of test-taking engagement? An analysis of differential noneffortful responding on an international college-level assessment of critical thinking. Applied Measurement in Education, 33(4), 263 279. https://doi.org/10.1080/08957347.2020.1789141
  • Rios, J.A., Guo, H., Mao, L., & Liu, O.L. (2017). Evaluating the impact of careless responding on aggregated-scores: To filter unmotivated examinees or not? International Journal of Testing, 17, 74–104. https://doi.org/10.1080/15305058.2016.1231193
  • Rios, J.A., Liu, O.L., & Bridgeman, B. (2014). Identifying low-effort examinees on student learning outcomes assessment: A comparison of two approaches. New Directions for Institutional Research, 2014(161), 69–82. https://doi.org/10.1002/ir.20068
  • Samuels, S.J., & Flor, R.F. (1997). The importance of automaticity for developing expertise in reading. Reading & Writing Quarterly, 13(2), 107–121. https://doi.org/10.1080/1057356970130202
  • Schnipke, D.L. (1995, April). Assessing speededness in computer-based tests using item response times. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco, CA.
  • Shrout, P.E., & Fleiss, J.L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420–428. https://psycnet.apa.org/doi/10.1037/0033-2909.86.2.420
  • Sideridis, G., & Alahmadi, M.T.S. (2022). The role of response times on the measurement of mental ability. Frontiers in Psychology, 13. https://doi.org/10.3389/fpsyg.2022.892317
  • Soland, J., Kuhfeld, M., & Rios, J. (2021). Comparing different response time threshold setting methods to detect low effort on a large-scale assessment. Large-scale Assessments in Education, 9(8). https://doi.org/10.1186/s40536-021-00100-w
  • Stickney, E.M., Sharp, L.B., & Kenyon, A.S. (2012). Technology-enhanced assessment of math fact automaticity: Patterns of performance for low- and typically achieving students. Assessment for Effective Intervention, 37(2), 84–94. https://doi.org/10.1177/1534508411430321
  • Suh, Y., & Bolt, D.M. (2010). Nested logit models for multiple choice item response data. Psychometrika, 75(3), 454–473. https://doi.org/10.1007/s11336-010-9163-7
  • Swerdzewski, P.J., Harmes, J.C., & Finney, S.J. (2011). Two approaches for identifying low-motivated students in a low-stakes assessment context. Applied Measurement in Education, 24(2), 162–188. https://doi.org/10.1080/08957347.2011.555217
  • Tijmstra, J., & Bolsinova, M. (2018). On the importance of the speed-ability trade-off when dealing with not reached items. Frontiers in Psychology, 9. https://doi.org/10.3389/fpsyg.2018.00964
  • Ulitzsch, E., von Davier, M., & Pohl, S. (2020). A hierarchical latent response model for inferences about examinee engagement in terms of guessing and item-level non-response. British Journal of Mathematical and Statistical Psychology, 73(S1), 83–112. https://doi.org/10.1111/bmsp.12188
  • Wang, C., & Xu, G. (2015). A mixture hierarchical model for response times and response accuracy. British Journal of Mathematical and Statistical Psychology, 68(3), 456 477. https://doi.org/10.1111/bmsp.12054
  • Wang, B., Huggins-Manley, C., Kuang, H., & Xiong, J. (2024). Enhancing effort-moderated item response theory models by evaluating a two-step estimation method and multidimensional variations on the model. Educational and Psychological Measurement, 1 23. https://doi.org/10.1177/00131644241280727
  • Wise, S.L. (2006). An investigation of the differential effort received by items on a low-stakes computer-based test. Applied Measurement in Education, 19(2), 95 114. https://doi.org/10.1207/s15324818ame1902_2
  • Wise, S.L. (2015). Effort Analysis: Individual Score Validation of Achievement Test Data. Applied Measurement in Education, 28(3), 237–252. https://doi.org/10.1080/08957347.2015.1042155
  • Wise, S.L. (2017). Rapid-guessing behavior: Its identification, interpretation, and implications. Educational Measurement, 36(4). 52–61. https://doi.org/10.1111/emip.12165
  • Wise, S.L., & DeMars, C.E. (2005). Low examinee effort in low-stakes assessment: Problems and potential solutions. Educational Assessment, 10(1), 1 17. https://doi.org/10.1207/s15326977ea1001_1
  • Wise, S.L., & DeMars, C.E. (2006). An application of item response time: The effort‐moderated IRT model. Journal of Educational Measurement, 43(1), 19–38. https://doi.org/10.1111/j.1745-3984.2006.00002.x
  • Wise, S.L., & DeMars, C.E. (2009). A clarification of the effects of rapid guessing on coefficient α: A note on Attali’s “Reliability of speeded number-right multiple-choice tests.” Applied Psychological Measurement, 33(6), 488–490. https://doi.org/10.1177/0146621607304655
  • Wise, S.L., & Gao, L. (2017). A general approach to measuring test-taking effort on computer-based tests. Applied Measurement in Education, 30(4), 343 354. https://doi.org/10.1080/08957347.2017.1353992
  • Wise, S.L., Kingsbury, G.G., Thomason, J., & Kong, X. (2004, April). An investigation of motivation filtering in a statewide achievement testing program. Paper presented at the annual meeting of the National Council on Measurement in Education, Vancouver, Canada.
  • Wise, S.L., & Kong, X. (2005). Response time effort: A new measure of examinee motivation in computer based tests. Applied Measurement in Education, 18(2), 163 183. https://doi.org/10.1207/s15324818ame1802_2
  • Wise, S.L., & Kuhfeld, M.R. (2020a). A cessation of measurement: Identifying test taker disengagement using response time. In M.J. Margolis, & R.A. Feinberg (Eds.), Integrating Timing Considerations to Improve Testing Practices (pp. 150–164). Routledge.
  • Wise, S.L., & Kuhfeld, M.R. (2020b). Using retest data to evaluate and improve effort-moderated scoring. Journal of Educational Measurement, 58(1), 130–149. https://doi.org/10.1111/jedm.12275
  • Wise, S.L., & Ma, L. (2012, April). Setting response time thresholds for a CAT item pool: The normative threshold method. Paper presented at the annual meeting of the National Council on Measurement in Education, Vancouver, Canada.
  • Wise, S.L., Pastor, D.A., & Kong, X.J. (2009). Correlates of rapid-guessing behavior in low-stakes testing: Implications for test development and measurement practice. Applied Measurement in Education, 22(2), 185–205. https://doi.org/10.1080/08957340902754650
  • Wise, V.L., Wise, S.L., & Bhola, D.S. (2006). The generalizability of motivation filtering in improving test score validity. Educational Assessment, 11(1), 65–83.
  • Wright, D.B. (2016). Treating all rapid responses as errors (TARRE) improves estimates of ability (slightly). Psychological Test and Assessment Modeling, 58(1), 15–30.
  • Yildirim-Erbasli, S.N., & Bulut, O. (2020). The impact of students’ test-taking effort on growth estimates in low-stakes educational assessments. Educational Research and Evaluation, 26(7–8), 368–386. https://doi.org/10.1080/13803611.2021.1977152
  • Yildirim-Erbasli, S.N., & Gorgun, G. (2025). Disentangling the relationship between ability and test-taking effort: To what extent the ability levels can be predicted from response behavior? Technology, Knowledge, and Learning, 30(3), 1475–1497. https://doi.org/10.1007/s10758-024-09810-w
  • Yıldırım Hoş, H., & Uysal Saraç, M. (2025). How does incorporating the response times into mixture modelling influence the identification of latent classes for mathematics literacy framework in PISA 2022? Education and Science, 50(Supplement 1), 129–146. https://doi.org/10.15390/EB.2025.14125
  • Zheng, X., Sahin, F., Erberber, E., & Fonseca, F. (2023). Identification and cross-country comparison of students’ test-taking behaviors in selected eTIMSS 2019 countries. Large-scale Assessments in Education, 11. https://doi.org/10.1186/s40536-023-00179-3

Disengagement matters: A response-time informed approach to scoring low-stakes assessments

Year 2026, Volume: 13 Issue: 1, 123 - 144, 02.01.2026
https://doi.org/10.21449/ijate.1721066

Abstract

Low-stakes assessments in K-12 education play a crucial role in monitoring student progress, yet their validity is often compromised by disengaged test-taking behaviors, such as rapid guessing and idling. This study introduces a novel application of the Nested Logit Model (NLM) to integrate test-taking engagement, as indicated by response times (RTs), with item responses to improve ability estimation. Using data from a low-stakes reading assessment in the United States (n = 27,556 students in grades 5 to 8), we compared six scoring approaches, including traditional dichotomous scoring, effort-moderated scoring, and nominal scoring that categorized responses based on RT-informed engagement. Our results demonstrated that nominal scoring approaches, particularly those distinguishing rapid guesses and idle responses, yielded superior model fit, increased measurement precision, and provided nuanced insights into examinee behaviors compared to dichotomous scoring methods. Latent class analysis further identified three distinct engagement profiles—effortful responders, rapid guessers, and idle responders—highlighting the need to address both rapid and idle behaviors in modeling. This study emphasizes the value of leveraging RT data to enhance the accuracy of low-stakes assessments while preserving response information. Findings also suggest the NLM framework as a practical and accessible tool for researchers and practitioners seeking to address disengaged behaviors and ensure the reliability of low-stakes assessments.

Ethical Statement

University of Alberta, Pro00154093.

References

  • Aboutaleb, Y.M., Ben-Akiva, M., & Jaillet, P. (2020). Learning structure in nested logit models. ArXiv. https://doi.org/10.48550/arXiv.2008.08048
  • Ben-Akiva, M.E. (1973). Structure of passenger travel demand models [Doctoral thesis, Massachusetts Institute of Technology]. DSpace@MIT. https://dspace.mit.edu/handle/1721.1/14790
  • Ben-Akiva, M.E., & Lerman, S.R. (1985). Discrete choice analysis: Theory and application to travel demand. Cambridge, MA: MIT Press.
  • Bock, R.D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29–51.
  • Bolt, D.M., Wollack, J.A., & Suh, Y. (2012). Application of a multidimensional nested logit model to multiple-choice test items. Psychometrika, 77(2), 339–357. https://doi.org/10.1007/S11336-012-9257-5
  • Bulut, O., Gorgun, G., Wongvorachan, T., & Tan, B. (2023). Rapid guessing in low-stakes assessments: Finding the optimal response time threshold with random search and genetic algorithm. Algorithms, 16(2). https://doi.org/10.3390/a16020089
  • Butler, J., & Adams, R.J. (2007). The impact of differential investment of student effort on the outcomes of international studies. Journal of Applied Measurement, 8(3), 279–304.
  • Chalmers, R.P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06
  • Csányi, R., & Molnár, G. (2024). Item- and person-level factors in test-taking disengagement: Multilevel modeling in a low-stakes context. International Journal of Educational Research Open, 7. https://doi.org/10.1016/j.ijedro.2024.100373
  • De Boeck, P., & Jeon, M. (2019). An overview of models for response times and processes in cognitive tests. Frontiers in Psychology, 10. https://doi.org/10.3389/fpsyg.2019.00102
  • De Boeck, P., & Partchev, I. (2012). IRTrees: Tree-based item response models of the GLMM family. Journal of Statistical Software, 48(1), 1–28. https://doi.org/10.18637/jss.v048.c01
  • DeMars, C.E. (2007). Changes in rapid-guessing behavior series of assessments. Educational Assessment, 12(1), 23–45. https://doi.org/10.1080/10627190709336946
  • Deribo, T., Kroehne, U., & Goldhammer, F. (2021). Model-based treatment of rapid guessing. Journal of Educational Measurement, 58(2), 281–303. https://doi.org/10.1111/jedm.12290
  • Deribo, T., Goldhammer, F., & Kroehne, U. (2023). Changes in the speed-ability relation through different treatments of rapid guessing. Educational and Psychological Measurement, 83(3), 473–494. https://doi.org/10.1177/00131644221109490
  • Eklöf, H. (2006). Development and validation of scores from an instrument measuring student test-taking motivation. Educational and Psychological Measurement, 66(4), 643 656. https://doi.org/10.1177/0013164405278574
  • Eklöf, H. (2010). Skill and will: Test-taking motivation and assessment quality. Assessment in Education: Principles, Policy & Practice, 17(4), 345 356. https://doi.org/10.1080/0969594X.2010.516569
  • Erdem Kara, B. (2025). Exploring Test-Taking Disengagement in the Context of PISA 2022: Evidence from Process Data. International Electronic Journal of Elementary Education, 17(2), 305-315. https://doi.org/10.26822/iejee.2025.380
  • Filiasov, S., & Sweetman, A. (2023). Low-stakes standardized tests in British Columbia, Canada: System accountability and/or individual feedback? Education Economics, 31(2), 145–165. https://doi.org/10.1080/09645292.2022.2091113
  • Finn, B. (2015). Measuring motivation in low‐stakes assessments. ETS Research Report Series, 2015(2), 1–17.
  • Gorgun, G., & Bulut, O. (2021). A polytomous scoring approach to handle not-reached items in low-stakes assessments. Educational and Psychological Measurement, 81(5), 847 871. https://doi.org/10.1177/0013164421991211
  • Gorgun, G., & Bulut, O. (2023). Incorporating test-taking engagement into the item selection algorithm in low-stakes computerized adaptive tests. Large-scale Assessments in Education, 11(27). https://doi.org/10.1186/s40536-023-00177-5
  • Guo, H., Rios, J.A., Haberman, S., Liu, O.L., Wang, J., & Paek, I. (2016). A new procedure for detection of students’ rapid guessing responses using response time. Applied Measurement in Education, 29(3), 173–183. https://doi.org/10.1080/08957347.2016.1171766
  • Hauenstein, C.E., Embretson, S.E., & Kim, E. (2024). Psychometric modeling to identify examinees’ strategy differences during testing. Journal of Intelligence, 12(4). https://doi.org/10.3390/jintelligence12040040
  • Hu, L.-T., & Bentler, P.M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1–55. https://doi.org/10.1080/10705519909540118
  • Koo, T.K., & Li, M.Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155 163. https://doi.org/10.1016/j.jcm.2016.02.012
  • Kuhfeld, M., & Soland, J. (2020). Using assessment metadata to quantify the impact of test disengagement on estimates of educational effectiveness. Journal of Research on Educational Effectiveness, 13(1), 147–175. https://doi.org/10.1080/19345747.2019.1636437
  • Lee, Y.-H., & Jia, Y. (2014). Using response time to investigate students’ test-taking behaviors in a NAEP computer based study. Large scale Assessments in Education, 2(8). https://doi.org/10.1186/s40536-014-0008-1
  • Leventhal, B.C., & Pastor, D. (2024). An illustration of an IRTree model for disengagement. Educational and Psychological Measurement, 84(4), 810 834. https://doi.org/10.1177/00131644231185533
  • Lindner, M.A., Lüdtke, O., & Nagy, G. (2019). The onset of rapid-guessing behavior over the course of testing time: A matter of motivation and cognitive resources. Frontiers in Psychology, 10. https://doi.org/10.3389/fpsyg.2019.01533
  • Linzer, D.A., & Lewis, J.B. (2011). poLCA: An R package for polytomous variable latent class analysis. Journal of Statistical Software, 42(10), 1–29. https://doi.org/10.18637/jss.v042.i10
  • Liu, J.X., Bulut, O., & Johnson, M.D. (2024). Examining position effects on students’ ability and test-taking speed in the TIMSS 2019 problem-solving and inquiry tasks: A structural equation modeling approach. Psychology International, 6(2), 492–508. https://doi.org/10.3390/psycholint6020030
  • Lundgren, E., & Eklöf, H. (2023). Questionnaire-taking motivation: Using response times to assess motivation to optimize on the PISA 2018 student questionnaire. International Journal of Testing, 23(4), 231–256. https://doi.org/10.1080/15305058.2023.2214647
  • McFadden, D. (1981). Econometric models of probabilistic choice. In C. F. Manski, & D. McFadden (Eds.), Structural analysis of discrete data with econometric application (pp. 198–272). Cambridge: MIT Press
  • Michaelides, M.P., & Ivanova, M. (2022). Response time as an indicator of test-taking effort in PISA: country and item-type differences. Psychological Test and Assessment Modeling, 64(3), 304–338.
  • Michaelides, M.P., Ivanova, M.G., & Avraam, D. (2024). The impact of filtering out rapid-guessing examinees on PISA 2015 country rankings. Psychological Test and Assessment Modeling, 66, 50–62. https://doi.org/10.2440/001-0012
  • Nagy, G., & Ulitzsch, E. (2022). A multilevel mixture IRT framework for modeling response times as predictors or indicators of response engagement in IRT models. Educational and Psychological Measurement, 82(5), 845–879. https://doi.org/10.1177/00131644211045351
  • Pastor, D.A., Ong, T.Q., & Strickman, S.N. (2019). Patterns of solution behavior across items in low-stakes assessments. Educational Assessment, 24(3), 189 212. https://doi.org/10.1080/10627197.2019.1615373
  • Penk, C., & Richter, D. (2017). Change in test-taking motivation and its relationship to test performance in low-stakes assessments. Educational Assessment, Evaluation and Accountability, 29(1), 55–79. https://doi.org/10.1007/s11092-016-9248-7
  • Pools, E., & Monseur, C. (2021). Student test-taking effort in low-stakes assessments: Evidence from the English version of the PISA 2015 science test. Large-scale Assessments in Education, 9. https://doi.org/10.1186/s40536-021-00104-6
  • R Core Team. (2023). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
  • Rios, J.A., & Deng, J. (2022). Quantifying the distorting effect of rapid guessing on estimates of coefficient alpha. Applied Psychological Measurement, 46(1), 40 52. https://doi.org/10.1177/01466216211051719
  • Rios, J.A., & Guo, H. (2020). Can culture be a salient predictor of test-taking engagement? An analysis of differential noneffortful responding on an international college-level assessment of critical thinking. Applied Measurement in Education, 33(4), 263 279. https://doi.org/10.1080/08957347.2020.1789141
  • Rios, J.A., Guo, H., Mao, L., & Liu, O.L. (2017). Evaluating the impact of careless responding on aggregated-scores: To filter unmotivated examinees or not? International Journal of Testing, 17, 74–104. https://doi.org/10.1080/15305058.2016.1231193
  • Rios, J.A., Liu, O.L., & Bridgeman, B. (2014). Identifying low-effort examinees on student learning outcomes assessment: A comparison of two approaches. New Directions for Institutional Research, 2014(161), 69–82. https://doi.org/10.1002/ir.20068
  • Samuels, S.J., & Flor, R.F. (1997). The importance of automaticity for developing expertise in reading. Reading & Writing Quarterly, 13(2), 107–121. https://doi.org/10.1080/1057356970130202
  • Schnipke, D.L. (1995, April). Assessing speededness in computer-based tests using item response times. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco, CA.
  • Shrout, P.E., & Fleiss, J.L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420–428. https://psycnet.apa.org/doi/10.1037/0033-2909.86.2.420
  • Sideridis, G., & Alahmadi, M.T.S. (2022). The role of response times on the measurement of mental ability. Frontiers in Psychology, 13. https://doi.org/10.3389/fpsyg.2022.892317
  • Soland, J., Kuhfeld, M., & Rios, J. (2021). Comparing different response time threshold setting methods to detect low effort on a large-scale assessment. Large-scale Assessments in Education, 9(8). https://doi.org/10.1186/s40536-021-00100-w
  • Stickney, E.M., Sharp, L.B., & Kenyon, A.S. (2012). Technology-enhanced assessment of math fact automaticity: Patterns of performance for low- and typically achieving students. Assessment for Effective Intervention, 37(2), 84–94. https://doi.org/10.1177/1534508411430321
  • Suh, Y., & Bolt, D.M. (2010). Nested logit models for multiple choice item response data. Psychometrika, 75(3), 454–473. https://doi.org/10.1007/s11336-010-9163-7
  • Swerdzewski, P.J., Harmes, J.C., & Finney, S.J. (2011). Two approaches for identifying low-motivated students in a low-stakes assessment context. Applied Measurement in Education, 24(2), 162–188. https://doi.org/10.1080/08957347.2011.555217
  • Tijmstra, J., & Bolsinova, M. (2018). On the importance of the speed-ability trade-off when dealing with not reached items. Frontiers in Psychology, 9. https://doi.org/10.3389/fpsyg.2018.00964
  • Ulitzsch, E., von Davier, M., & Pohl, S. (2020). A hierarchical latent response model for inferences about examinee engagement in terms of guessing and item-level non-response. British Journal of Mathematical and Statistical Psychology, 73(S1), 83–112. https://doi.org/10.1111/bmsp.12188
  • Wang, C., & Xu, G. (2015). A mixture hierarchical model for response times and response accuracy. British Journal of Mathematical and Statistical Psychology, 68(3), 456 477. https://doi.org/10.1111/bmsp.12054
  • Wang, B., Huggins-Manley, C., Kuang, H., & Xiong, J. (2024). Enhancing effort-moderated item response theory models by evaluating a two-step estimation method and multidimensional variations on the model. Educational and Psychological Measurement, 1 23. https://doi.org/10.1177/00131644241280727
  • Wise, S.L. (2006). An investigation of the differential effort received by items on a low-stakes computer-based test. Applied Measurement in Education, 19(2), 95 114. https://doi.org/10.1207/s15324818ame1902_2
  • Wise, S.L. (2015). Effort Analysis: Individual Score Validation of Achievement Test Data. Applied Measurement in Education, 28(3), 237–252. https://doi.org/10.1080/08957347.2015.1042155
  • Wise, S.L. (2017). Rapid-guessing behavior: Its identification, interpretation, and implications. Educational Measurement, 36(4). 52–61. https://doi.org/10.1111/emip.12165
  • Wise, S.L., & DeMars, C.E. (2005). Low examinee effort in low-stakes assessment: Problems and potential solutions. Educational Assessment, 10(1), 1 17. https://doi.org/10.1207/s15326977ea1001_1
  • Wise, S.L., & DeMars, C.E. (2006). An application of item response time: The effort‐moderated IRT model. Journal of Educational Measurement, 43(1), 19–38. https://doi.org/10.1111/j.1745-3984.2006.00002.x
  • Wise, S.L., & DeMars, C.E. (2009). A clarification of the effects of rapid guessing on coefficient α: A note on Attali’s “Reliability of speeded number-right multiple-choice tests.” Applied Psychological Measurement, 33(6), 488–490. https://doi.org/10.1177/0146621607304655
  • Wise, S.L., & Gao, L. (2017). A general approach to measuring test-taking effort on computer-based tests. Applied Measurement in Education, 30(4), 343 354. https://doi.org/10.1080/08957347.2017.1353992
  • Wise, S.L., Kingsbury, G.G., Thomason, J., & Kong, X. (2004, April). An investigation of motivation filtering in a statewide achievement testing program. Paper presented at the annual meeting of the National Council on Measurement in Education, Vancouver, Canada.
  • Wise, S.L., & Kong, X. (2005). Response time effort: A new measure of examinee motivation in computer based tests. Applied Measurement in Education, 18(2), 163 183. https://doi.org/10.1207/s15324818ame1802_2
  • Wise, S.L., & Kuhfeld, M.R. (2020a). A cessation of measurement: Identifying test taker disengagement using response time. In M.J. Margolis, & R.A. Feinberg (Eds.), Integrating Timing Considerations to Improve Testing Practices (pp. 150–164). Routledge.
  • Wise, S.L., & Kuhfeld, M.R. (2020b). Using retest data to evaluate and improve effort-moderated scoring. Journal of Educational Measurement, 58(1), 130–149. https://doi.org/10.1111/jedm.12275
  • Wise, S.L., & Ma, L. (2012, April). Setting response time thresholds for a CAT item pool: The normative threshold method. Paper presented at the annual meeting of the National Council on Measurement in Education, Vancouver, Canada.
  • Wise, S.L., Pastor, D.A., & Kong, X.J. (2009). Correlates of rapid-guessing behavior in low-stakes testing: Implications for test development and measurement practice. Applied Measurement in Education, 22(2), 185–205. https://doi.org/10.1080/08957340902754650
  • Wise, V.L., Wise, S.L., & Bhola, D.S. (2006). The generalizability of motivation filtering in improving test score validity. Educational Assessment, 11(1), 65–83.
  • Wright, D.B. (2016). Treating all rapid responses as errors (TARRE) improves estimates of ability (slightly). Psychological Test and Assessment Modeling, 58(1), 15–30.
  • Yildirim-Erbasli, S.N., & Bulut, O. (2020). The impact of students’ test-taking effort on growth estimates in low-stakes educational assessments. Educational Research and Evaluation, 26(7–8), 368–386. https://doi.org/10.1080/13803611.2021.1977152
  • Yildirim-Erbasli, S.N., & Gorgun, G. (2025). Disentangling the relationship between ability and test-taking effort: To what extent the ability levels can be predicted from response behavior? Technology, Knowledge, and Learning, 30(3), 1475–1497. https://doi.org/10.1007/s10758-024-09810-w
  • Yıldırım Hoş, H., & Uysal Saraç, M. (2025). How does incorporating the response times into mixture modelling influence the identification of latent classes for mathematics literacy framework in PISA 2022? Education and Science, 50(Supplement 1), 129–146. https://doi.org/10.15390/EB.2025.14125
  • Zheng, X., Sahin, F., Erberber, E., & Fonseca, F. (2023). Identification and cross-country comparison of students’ test-taking behaviors in selected eTIMSS 2019 countries. Large-scale Assessments in Education, 11. https://doi.org/10.1186/s40536-023-00179-3
There are 75 citations in total.

Details

Primary Language English
Subjects Measurement Theories and Applications in Education and Psychology
Journal Section Research Article
Authors

Okan Bulut 0000-0001-5853-1267

Joyce Xinle Liu This is me 0009-0008-2278-8966

Hatice Cigdem Bulut 0000-0003-2585-3686

Submission Date June 16, 2025
Acceptance Date November 13, 2025
Publication Date January 2, 2026
Published in Issue Year 2026 Volume: 13 Issue: 1

Cite

APA Bulut, O., Liu, J. X., & Bulut, H. C. (2026). Disengagement matters: A response-time informed approach to scoring low-stakes assessments. International Journal of Assessment Tools in Education, 13(1), 123-144. https://doi.org/10.21449/ijate.1721066

23823             23825             23824