Research Article

Artificial intelligence as an automated essay scoring tool: A focus on ChatGPT

Volume: 12 Number: 1 February 20, 2025
TR EN

Artificial intelligence as an automated essay scoring tool: A focus on ChatGPT

Abstract

This study explores the effectiveness of using ChatGPT, an Artificial Intelligence (AI) language model, as an Automated Essay Scoring (AES) tool for grading English as a Foreign Language (EFL) learners’ essays. The corpus consists of 50 essays representing various types including analysis, compare and contrast, descriptive, narrative, and opinion essays written by 10 EFL learners at the B2 level. Human raters and ChatGPT (4o mini version) scored the essays using the International English Language Testing System (IELTS) TASK 2 Writing band descriptors. Adopting a quantitative approach, the Wilcoxon signed-rank tests and Spearman correlation tests were employed to compare the scores generated, revealing a significant difference between the two methods of scoring, with human raters assigning higher scores than ChatGPT. Similarly, significant differences with varying degrees were also evident for each of the various types of essays, suggesting that the genre of the essays was not a parameter affecting the agreement between human raters and ChatGPT. After all, it was discussed that while ChatGPT shows promise as an AES tool, the observed disparities suggest that it has not reached sufficient proficiency for practical use. The study emphasizes the need for improvements in AI language models to meet the nuanced nature of essay evaluation in EFL contexts.

Keywords

Ethical Statement

Sivas Cumhuriyet University, Educational Sciences Ethics Committee, 24.05.2024-431192.

References

  1. Almusharraf, N., & Alotaibi, H. (2022). An error-analysis study from an EFL writing context: Human and automated essay scoring approaches. Technology, Knowledge and Learning, 28, 1015-1031. https://doi.org/10.1007/s10758-022-09592-z
  2. Attali, Y. (2013). Validity and reliability of automated essay scoring. In M.D. Shermis & J.C. Burstein (Eds.), Handbook of automated essay evaluation: Current applications and new directions (pp. 181-198). Routledge.
  3. Bui, N.M., & Barrot, J.S. (2024). ChatGPT as an automated essay scoring tool in the writing classrooms: how it compares with human scoring. Education and Information Technologies. https://doi.org/10.1007/s10639-024-12891-w
  4. Chen, H., & Pan, J. (2022). Computer or human: a comparative study of automated evaluation scoring and instructors’ feedback on Chinese college students’ English writing. Asian-Pacific Journal of Second and Foreign Language Education, 7(34), 1 20. https://doi.org/10.1186/s40862-022-00171-4
  5. Coghlan, D., & Brydon-Miller, M. (2014). The SAGE encyclopedia of action research. SAGE.
  6. Creswell, J.W. (2009). Research design: Qualitative, quantitative, and mixed methods approaches. SAGE Publications.
  7. Davies A. (2008). Assessing academic English language proficiency: 40+ years of U.K. language tests. In Fox J., Wesche M., Bayliss D., Cheng L., Turner C.E., Doe C. (Eds.), Language testing reconsidered (pp. 73–86). University of Ottawa Press.
  8. Guo, K., & Wang, D. (2024). To resist it or to embrace it? Examining ChatGPT’s potential to support teacher feedback in EFL writing. Education and Information Technologies, 29, 8435–8463. https://doi.org/10.1007/s10639-023-12146-0

Details

Primary Language

English

Subjects

Measurement and Evaluation in Education (Other)

Journal Section

Research Article

Early Pub Date

January 9, 2025

Publication Date

February 20, 2025

Submission Date

July 18, 2024

Acceptance Date

October 7, 2024

Published in Issue

Year 2025 Volume: 12 Number: 1

APA
Uyar, A. C., & Büyükahıska, D. (2025). Artificial intelligence as an automated essay scoring tool: A focus on ChatGPT. International Journal of Assessment Tools in Education, 12(1), 20-32. https://doi.org/10.21449/ijate.1517994

Cited By

23823             23825             23824