Investigating Sampling Impacts on an LLM-Based AI Scoring Approach: Prediction Accuracy and Fairness
Abstract
Keywords
References
- Ali, S., Abuhmed, T., El-Sappagh, S., et al. (2023). Explainable artificial intelligence (XAI): What we know and what is left to attain trustworthy artificial intelligence. Information Fusion, 99(C). Retrieved from https://doi.org/10.1016/j.inffus.2023.101805
- American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association.
- Ayoub, N. F., Balakrishnan, K., Ayoub, M. S., Barrett, T. F., David, A. P., & Gray, S. T. (2024). Inherent bias in large language models: A random sampling analysis. Mayo Clinic Proceedings: Digital Health, 2, 186–191. Retrieved from https://doi.org/10.1016/j.mcpdig.2024.03.003
- Bai, X., Wang, A., Sucholutsky, I., & Griffiths, T. L. (2024). Measuring implicit bias in explicitly unbiased large language models. arXiv. Retrieved from https://arxiv.org/pdf/2402.04105
- Bennett, R. E., & Zhang, M. (2016). Validity and automated scoring. In F. Drasgow (Ed.), Technology in testing: Measurement issues (pp. 142–173). Taylor & Francis.
- Caton, S., & Haas, C. (2024). Fairness in machine learning: A survey. ACM Computing Surveys, 56(7), Article 166. Retrieved from https://doi.org/10.1145/3616865
- Chamieh, I., Zesch, T., & Giebermann, K. (2024). LLMs in short answer scoring: Limitations and promise of zero-shot and few-shot approaches. In E. Kochmar et al. (Eds.), Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 309–315). Association for Computational Linguistics. Retrieved from https://aclanthology.org/2024.bea-1.25.pdf
- Chhabra, A., Singla, A., & Mohapatra, P. (2022). Fair clustering using antidote data. In J. Schrouff, A. Dieng, M. Rateike, K. Kwegyir-Aggrey, & G. Farnadi (Eds.), Proceedings of the algorithmic fairness through the lens of causality and robustness (Vol. 171, pp. 19–39). PMLR. Retrieved from https://proceedings.mlr.press/v171/chhabra22a.html
Details
Primary Language
English
Subjects
Modelling
Journal Section
Research Article
Authors
Mo Zhang
*
0000-0003-2689-2089
United States
Matthew Johnson
0000-0003-3157-4165
United States
Chunyi Ruan
This is me
0009-0009-3073-229X
United States
Publication Date
December 30, 2024
Submission Date
October 4, 2024
Acceptance Date
November 12, 2024
Published in Issue
Year 2024 Volume: 15 Number: Special Issue
Cited By
AI-feedback in education: user experience analysis
Педагогика и просвещение
https://doi.org/10.7256/2454-0676.2025.3.75129Answer-based and reference-based BERT models for automatic scoring of Turkish short answers: The decisive role of task complexity
International Journal of Assessment Tools in Education
https://doi.org/10.21449/ijate.1687429