Automated Essay Scoring Effect on Test Equating Errors in Mixed-format Test
Abstract
Scoring constructed-response items can be highly difficult, time-consuming, and costly in practice. Improvements in computer technology have enabled automated scoring of constructed-response items. However, the application of automated scoring without an investigation of test equating can lead to serious problems. The goal of this study was to score the constructed-response items in mixed-format tests automatically with different test/training data rates and to investigate the indirect effect of these scores on test equating compared with human raters. Bidirectional long-short term memory (BLSTM) was selected as the automated scoring method for the best performance. During the test equating process, methods based on classical test theory and item response theory were utilized. In most of the equating methods, errors of the equating resulting from automated scoring were close to the errors occurring in equating processes conducted by human raters. It was concluded that automated scoring can be applied because it is convenient in terms of equating.
Keywords
References
- Adesiji, K. M., Agbonifo, O. C., Adesuyi, A. T., & Olabode, O. (2016). Development of an automated descriptive text-based scoring system. British Journal of Mathematics & Computer Science, 19(4), 1-14. https://doi.org/10.9734/BJMCS/2016/27558
- Albano, A. D. (2016). equate: An R package for observed-score linking and equating. Journal of Statistical Software, 74(8), 1-36. https://doi.org/10.18637/jss.v074.i08
- Almond, R. G. (2014). Using automated essay scores as an anchor when equating constructed-response writing tests. International Journal of Testing, 14(1), 73 91. https://doi.org/10.1080/15305058.2013.816309
- Angoff, W. H. (1984). Scales, norms and equivalent scores. Educational Testing Service.
- Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater® v.2. Journal of Technology, Learning, and Assessment, 4(3), 1-30. http://www.jtla.org.
- Barendse, M. T., Oort, F. J., & Timmerman, M. E. (2015). Using exploratory factor analysis to determine the dimensionality of discrete responses. Structural Equation Modeling: A Multidisciplinary Journal, 22(1), 87 101. https://doi.org/10.1080/10705511.2014.934850
- Bartlett, M. S. (1950). Tests of significance in factor analysis. British Journal of Psychology, 3(2), 77-85. https://doi.org/10.1111/j.2044-8317.1950.tb00285.x
- Chen, H., Xu, J., & He, B. (2014). Automated essay scoring by capturing relative writing quality. The Computer Journal, 57(9), 1318-1330. https://doi.org/10.1093/comjnl/bxt117
Details
Primary Language
English
Subjects
Studies on Education
Journal Section
Research Article
Publication Date
June 10, 2021
Submission Date
October 24, 2020
Acceptance Date
February 7, 2021
Published in Issue
Year 2021 Volume: 8 Number: 2
Cited By
A review of deep-neural automated essay scoring models
Behaviormetrika
https://doi.org/10.1007/s41237-021-00142-yAutomatic essay exam scoring system: a systematic literature review
Procedia Computer Science
https://doi.org/10.1016/j.procs.2022.12.166A Study of Scoring English Tests Using an Automatic Scoring Model Incorporating Semantics
Automatic Control and Computer Sciences
https://doi.org/10.3103/S0146411623050115