Year 2024,
Volume: 15 Issue: Special Issue, 282 - 301, 30.12.2024
Hongwen Guo
,
Matthew Johnson
,
Luis Saldivia
,
Michelle Worthington
,
Kadriye Ercikan
References
- Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., . . . Zheng, X. (2015). TensorFlow:
Large-scale machine learning on heterogeneous systems. Google. https://www.tensorflow.org/
- Baker, R. (2021). Artificial intelligence in education: Bringing it all together. In S. Vincent Lancrin
(Ed.), Pushing the frontiers with AI, blockchain, and robots (pp. 43–54). OECD.
- Ercikan, K., Guo, H., & He, Q. (2020). Use of response process data to inform group comparisons and
fairness research. Educational assessment, 25(3), 179–197. https://doi.org/10.1080/10627197.2020.1804353
- Ercikan, K., Guo, H., & Por, H.-H. (2023). Uses of process data in advancing the practice and science
of technology-rich assessments. Innovating Assessments to measure and support complex skills
(N. Foster & M. Piacentini, Eds.). OECD Publishing. Retrieved from https://www.oecdilibrary.
org/education/innovating-assessments-to-measure-and-support-complexskills_
7b3123f1-en
- Ercikan, K., & Pellegrino, J. (2017). Validation of score meaning in the next generation of assessments:
The use of response processes. Routledge.
- Geron, A. (2017). Hands-on machine learning with scikit-learn and TensorFlow: concepts, tools, and
techniques to build intelligent systems. Sebastopol, CA: O’Reilly Media.
- Gordon, E. (2020). Toward assessment in the service of learning. Educational Measurement: Issues and
Practice, 39(3), 72–78. Retrieved from https://doi.org/ 10.1111/emip.12370
- Greiff, S., Niepel, C., Scherer, R., & Martin, R. (2016). Understanding students’ performance in a
computer-based assessment of complex problem solving: An analysis of behavioral data from
computer-generated log files. Computers in Human Behavior, 61, 36-46.
- Guo, H., & Ercikan, K. (2021a). Differential rapid responding across language and cultural groups.
Educational Research and Evaluation, 26(5-6), 302-327. https://doi.org/10.1080/13803611.2021.1963941
- Guo, H., & Ercikan, K. (2021b). Using response-time data to compare the testing behaviors of English
language learners (ells) to other test-takers (non-ells) on a mathematics assessment. ETS Research
Report, 2021(1), 1-15. https://doi.org/10.1002/ets2.12340
- Guo, H., Johnson, M., Ercikan, K., Saldivia, L. & Worthington, M. (2024). Large-scale assessments for
learning: A huma-centered AI approach to contextualize test performance. Journal of Learning
Analytics, 11(2), 229-245. https://doi.org/10.18608/jla.2024.8007
- Guo, H., Rios, J., Haberman, S., Liu, O., Wang, J. & Paek, I. (2017). A new procedure for detection of
students’ rapid guessing responses using response time. Applied Measurement in Education.
29(3). 173 – 183. http://doi.org/10.1080/08957347.2016.1171766
- Johnson, M. S., & Liu, X. (2022). Psychometric considerations for the joint modeling of response and
process data [Paper presentation]. International Meeting of Psychometric Society, Bologna,
Italy.
- Levy, R. (2020). Implications of considering response process data for greater and lesser psychometrics.
Educational Assessment, 25(3), 218–235. https://doi.org/10.1080/10627197.2020.1804352
- Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning
Research, 9, 2579–2605.
- Miao, F., Holmes, W., Huang, R., & Zhang, H. (2021). AI and education: Guidance for policymakers.
UNESCO.
- National Assessment Governing Board. (NAGB, 2020). Response process data from the 2017 NAEP
grade 8 mathematics assessment. https://www.nationsreportcard.gov/process_data/
- National Assessment Governing Board (NAGB, 2024a). Mathematics assessment framework for the
2022 and 2024 National Assessment of Educational progress. Retrieved from https://www.nagb.gov/content/dam/nagb/en/documents/publications/frameworks/mathematics/2
022-24-nagb-math-framework-508.pdf
- National Assessment Governing Board (NAGB, 2024b). How states use and value the Nation’s Report
Card. Retrieved from https://www.nagb.gov/about-us/state-and-tuda-case-studies.html
- National Center for Education Statistics. (NCES, 2022). NAEP questions tool. Retrieved from
https://nces.ed.gov/NationsReportCard/nqt/
- National Research Council. (NRC, 2000). How people learn: Brain, mind, experience, and school:
Expanded edition. Washington, DC: The National Academies Press. Retrieved from
https://doi.org/10.17226/9853
- National Academies of Sciences, Engineering, and Medicine (NASEM, 2018). How people learn II:
Learners, contexts, and Cultures. Washington, DC: The National Academies Press.
- Office of Educational Technology. (2023). Artificial intelligence and the future of teaching and
learning: Insights and recommendations (Report). Washington, DC, 2023: U.S. Department of
Education.
- Pellegrino, J. W. (2020). Important considerations for assessment to function in the service of education.
Educational Measurement: Issues and Practice, 39(3), 81- 85. Retrieved from https://doi.org/10.1111/emip.12372
- Pohl, S., Ulitzsch, E., & von Davier, M. (2021). Reframing rankings in educational assessments.
Science, 372(6540), 338-340. Retrieved from https://doi.org/10.1126/science.abd3300
- Radwan, A. M. (2019). Human active learning. In S. M. Brito (Ed.), Active learning (chap. 2). Rijeka:
IntechOpen. Retrieved from https://doi.org/10.5772/ intechopen.81371
- Rios, J., & Guo, H. (2020). Can culture be a salient predictor of test-taking engagement? an analysis of
differential noneffortful responding on an international college-level assessment of critical
thinking ISLA. Applied Measurement in Education, 33(4), 263–279. https://doi.org/10.1080/08957347.2020.1789141
- Rios, J., Guo, H., Mao, L., & Liu, O. L. (2017). Evaluating the impact of careless responding on
aggregated-scores: To filter unmotivated examinees or not? International Journal of Testing,
17(1), 74–104. https://doi.org/10.1080/08957347.2020.1789141
- Rizve, M. N., Duarte, K., Rawat, Y. S., & Shah, M. (2021). In defense of pseudo-labeling: An
uncertainty-aware pseudo-label selection framework for semi-supervised learning. In
International conference on learning representations. Retrieved from https://iclr.cc/media/iclr-
2021/Slides/3255.pdf
- Wise, S. (2017). Rapid-guessing behavior: Its identification, interpretation, and implications.
Educational Measurement: Issues and Practice, 36(4), 52–61. https://doi.org/10.1111/emip.12165
- Wise, S. (2021). Six insights regarding test-taking disengagement. Educational Research and
Evaluation, 26(5-6), 328–338. https://doi.org/10.1080/13803611.2021.1963942
- Wise, S. & Kong, X. (2005). Response time effort: a new measure of examinee motivation in computerbased
tests. Applied Measurement in Education, 18 (2), 163 – 183. https://doi.org/10.1207/s15324818ame1802_2
- Xie, Q., Dai, Z., Hovy, E. H., Luong, M., & Le, Q. V. (2019). Unsupervised data augmentation for
consistency training. CoRR, abs/1904.12848. Retrieved from http://arxiv.org/abs/1904.12848
- Zhu, X., Lafferty, J., & Ghahramani, Z. (2003). Combining active learning and semisupervised learning
using gaussian fields and harmonic functions. In ICML 2003 workshop on the continuum from
labeled to unlabeled data in machine learning and data mining (pp. 58–65).
- Zoanetti, N., & Griffin, P. (2017). Log-file data as indicators for problem-solving processes. In B. Csapo
& J. Funke (Eds.), The nature of problem solving: Using research to inspire 21st century learning
(chap. 11). Paris: OECD Publishing. Retrieved from https://doi.org/10.1787/9789264273955-
en
Human-Centered AI for Discovering Student Engagement Profiles on Large-Scale Educational Assessments
Year 2024,
Volume: 15 Issue: Special Issue, 282 - 301, 30.12.2024
Hongwen Guo
,
Matthew Johnson
,
Luis Saldivia
,
Michelle Worthington
,
Kadriye Ercikan
Abstract
Large-scale assessments play a key role in education: they provide insights for educators and stakeholders about what students know and are able to do, which can inform educational policies and interventions. Besides overall performance scores and subscores, educators need to know how and why students performed at certain proficiency levels to improve learning. Process/log data contain nuanced information about how students engaged with and acted on tasks in an assessment, which hold promise of contextualizing a performance score. However, one isolated action event observed in process data may be open to multiple interpretations. To address this challenge, in the current study, we propose to integrate sequential process data with response data to create engagement profiles to better reflect students' test-taking processes. Most importantly, we propose to use AI algorithms to assist and amplify human expertise in the creation of students’ engagement profiles, so that the information extraction from the multi-source (performance and process) data can be scaled up to enhance the value of large-scale assessments in teaching and learning. We leveraged various machine learning techniques and developed a general framework of the human-centered AI approach to help human experts efficiently and effectively make sense of the multi-source data. Using a mathematics item block from the National Assessment of Educational Progress (NAEP) for illustrations, data from over 14,000 students resulted in ten preliminary profiles, more than half of which were associated with low performing students. These engagement profiles are expected to generate rich and meaningful feedback for educators and stakeholders.
References
- Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., . . . Zheng, X. (2015). TensorFlow:
Large-scale machine learning on heterogeneous systems. Google. https://www.tensorflow.org/
- Baker, R. (2021). Artificial intelligence in education: Bringing it all together. In S. Vincent Lancrin
(Ed.), Pushing the frontiers with AI, blockchain, and robots (pp. 43–54). OECD.
- Ercikan, K., Guo, H., & He, Q. (2020). Use of response process data to inform group comparisons and
fairness research. Educational assessment, 25(3), 179–197. https://doi.org/10.1080/10627197.2020.1804353
- Ercikan, K., Guo, H., & Por, H.-H. (2023). Uses of process data in advancing the practice and science
of technology-rich assessments. Innovating Assessments to measure and support complex skills
(N. Foster & M. Piacentini, Eds.). OECD Publishing. Retrieved from https://www.oecdilibrary.
org/education/innovating-assessments-to-measure-and-support-complexskills_
7b3123f1-en
- Ercikan, K., & Pellegrino, J. (2017). Validation of score meaning in the next generation of assessments:
The use of response processes. Routledge.
- Geron, A. (2017). Hands-on machine learning with scikit-learn and TensorFlow: concepts, tools, and
techniques to build intelligent systems. Sebastopol, CA: O’Reilly Media.
- Gordon, E. (2020). Toward assessment in the service of learning. Educational Measurement: Issues and
Practice, 39(3), 72–78. Retrieved from https://doi.org/ 10.1111/emip.12370
- Greiff, S., Niepel, C., Scherer, R., & Martin, R. (2016). Understanding students’ performance in a
computer-based assessment of complex problem solving: An analysis of behavioral data from
computer-generated log files. Computers in Human Behavior, 61, 36-46.
- Guo, H., & Ercikan, K. (2021a). Differential rapid responding across language and cultural groups.
Educational Research and Evaluation, 26(5-6), 302-327. https://doi.org/10.1080/13803611.2021.1963941
- Guo, H., & Ercikan, K. (2021b). Using response-time data to compare the testing behaviors of English
language learners (ells) to other test-takers (non-ells) on a mathematics assessment. ETS Research
Report, 2021(1), 1-15. https://doi.org/10.1002/ets2.12340
- Guo, H., Johnson, M., Ercikan, K., Saldivia, L. & Worthington, M. (2024). Large-scale assessments for
learning: A huma-centered AI approach to contextualize test performance. Journal of Learning
Analytics, 11(2), 229-245. https://doi.org/10.18608/jla.2024.8007
- Guo, H., Rios, J., Haberman, S., Liu, O., Wang, J. & Paek, I. (2017). A new procedure for detection of
students’ rapid guessing responses using response time. Applied Measurement in Education.
29(3). 173 – 183. http://doi.org/10.1080/08957347.2016.1171766
- Johnson, M. S., & Liu, X. (2022). Psychometric considerations for the joint modeling of response and
process data [Paper presentation]. International Meeting of Psychometric Society, Bologna,
Italy.
- Levy, R. (2020). Implications of considering response process data for greater and lesser psychometrics.
Educational Assessment, 25(3), 218–235. https://doi.org/10.1080/10627197.2020.1804352
- Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning
Research, 9, 2579–2605.
- Miao, F., Holmes, W., Huang, R., & Zhang, H. (2021). AI and education: Guidance for policymakers.
UNESCO.
- National Assessment Governing Board. (NAGB, 2020). Response process data from the 2017 NAEP
grade 8 mathematics assessment. https://www.nationsreportcard.gov/process_data/
- National Assessment Governing Board (NAGB, 2024a). Mathematics assessment framework for the
2022 and 2024 National Assessment of Educational progress. Retrieved from https://www.nagb.gov/content/dam/nagb/en/documents/publications/frameworks/mathematics/2
022-24-nagb-math-framework-508.pdf
- National Assessment Governing Board (NAGB, 2024b). How states use and value the Nation’s Report
Card. Retrieved from https://www.nagb.gov/about-us/state-and-tuda-case-studies.html
- National Center for Education Statistics. (NCES, 2022). NAEP questions tool. Retrieved from
https://nces.ed.gov/NationsReportCard/nqt/
- National Research Council. (NRC, 2000). How people learn: Brain, mind, experience, and school:
Expanded edition. Washington, DC: The National Academies Press. Retrieved from
https://doi.org/10.17226/9853
- National Academies of Sciences, Engineering, and Medicine (NASEM, 2018). How people learn II:
Learners, contexts, and Cultures. Washington, DC: The National Academies Press.
- Office of Educational Technology. (2023). Artificial intelligence and the future of teaching and
learning: Insights and recommendations (Report). Washington, DC, 2023: U.S. Department of
Education.
- Pellegrino, J. W. (2020). Important considerations for assessment to function in the service of education.
Educational Measurement: Issues and Practice, 39(3), 81- 85. Retrieved from https://doi.org/10.1111/emip.12372
- Pohl, S., Ulitzsch, E., & von Davier, M. (2021). Reframing rankings in educational assessments.
Science, 372(6540), 338-340. Retrieved from https://doi.org/10.1126/science.abd3300
- Radwan, A. M. (2019). Human active learning. In S. M. Brito (Ed.), Active learning (chap. 2). Rijeka:
IntechOpen. Retrieved from https://doi.org/10.5772/ intechopen.81371
- Rios, J., & Guo, H. (2020). Can culture be a salient predictor of test-taking engagement? an analysis of
differential noneffortful responding on an international college-level assessment of critical
thinking ISLA. Applied Measurement in Education, 33(4), 263–279. https://doi.org/10.1080/08957347.2020.1789141
- Rios, J., Guo, H., Mao, L., & Liu, O. L. (2017). Evaluating the impact of careless responding on
aggregated-scores: To filter unmotivated examinees or not? International Journal of Testing,
17(1), 74–104. https://doi.org/10.1080/08957347.2020.1789141
- Rizve, M. N., Duarte, K., Rawat, Y. S., & Shah, M. (2021). In defense of pseudo-labeling: An
uncertainty-aware pseudo-label selection framework for semi-supervised learning. In
International conference on learning representations. Retrieved from https://iclr.cc/media/iclr-
2021/Slides/3255.pdf
- Wise, S. (2017). Rapid-guessing behavior: Its identification, interpretation, and implications.
Educational Measurement: Issues and Practice, 36(4), 52–61. https://doi.org/10.1111/emip.12165
- Wise, S. (2021). Six insights regarding test-taking disengagement. Educational Research and
Evaluation, 26(5-6), 328–338. https://doi.org/10.1080/13803611.2021.1963942
- Wise, S. & Kong, X. (2005). Response time effort: a new measure of examinee motivation in computerbased
tests. Applied Measurement in Education, 18 (2), 163 – 183. https://doi.org/10.1207/s15324818ame1802_2
- Xie, Q., Dai, Z., Hovy, E. H., Luong, M., & Le, Q. V. (2019). Unsupervised data augmentation for
consistency training. CoRR, abs/1904.12848. Retrieved from http://arxiv.org/abs/1904.12848
- Zhu, X., Lafferty, J., & Ghahramani, Z. (2003). Combining active learning and semisupervised learning
using gaussian fields and harmonic functions. In ICML 2003 workshop on the continuum from
labeled to unlabeled data in machine learning and data mining (pp. 58–65).
- Zoanetti, N., & Griffin, P. (2017). Log-file data as indicators for problem-solving processes. In B. Csapo
& J. Funke (Eds.), The nature of problem solving: Using research to inspire 21st century learning
(chap. 11). Paris: OECD Publishing. Retrieved from https://doi.org/10.1787/9789264273955-
en