Research Article

Assessing second-language academic writing: AI vs. Human raters

Volume: 6 Number: 4 December 31, 2023
EN

Assessing second-language academic writing: AI vs. Human raters

Abstract

The quality of writing in a second language (L2) is one of the indicators of the level of proficiency for many college students to be eligible for departmental studies. Although certain software programs, such as Intelligent Essay Assessor or IntelliMetric, have been introduced to evaluate second-language writing quality, an overall assessment of writing proficiency is still largely achieved through trained human raters. The question that needs to be addressed today is whether generative artificial intelligence (AI) algorithms of large language models (LLMs) could facilitate and possibly replace human raters when it comes to the burdensome task of assessing student-written academic work. For this purpose, first-year college students (n=43) were given a paragraph writing task which was evaluated through the same writing criteria introduced to the generative pre-trained transformer, ChatGPT-3.5, and five human raters. The scores assigned by the five human raters revealed a statistically significant low to high positive correlation. A slight to fair but significant level of agreement was observed in the scores assigned by ChatGPT-3.5 and two of the human raters. The findings suggest that reliable results could be obtained when the scores of an application and multiple human raters are considered and that ChatGPT may potentially assist human raters in assessing L2 college writing.

Keywords

References

  1. Alikaniotis, D., Yannakoudakis, H., & Rei, M. (2016). Automatic text scoring using neural networks. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics: Volume 1 Long Papers (pp. 715-725). Stroudsburg: Association for Computational Linguistics.
  2. Amorim, E. & Veloso, A. (2017). A multi aspect analysis of automatic essay scoring for Brazilian Portuguese. In Proceedings of the 15th Conference of the European Chapter of the Association for
  3. Computational Linguistics (pp. 94-102). Student Research Workshop: Association for Computational Linguistics.
  4. Arslan Mancar, S., & Gulleroglu, H. D. (2022). Comparison of inter-rater reliability techniques in performance-based assessment. International Journal of Assessment Tools in Education, 9(2), 515-533.
  5. Attali, Y., Lewis, W., & Steier, M. (2013). Scoring with the computer: Alternative procedures for improving the reliability of holistic essay scoring. Language Testing, 30(1), 125-141.
  6. Azmi, A. M., Al-Jouie, M. F., & Hussain, M. (2019). AAEE–Automated evaluation of students’ essays in
  7. Arabic language. Information Processing & Management, 56(5), 1736-1752.
  8. Bai, J. Y-H., Zawacki-Richter, O., Bozkurt, A., Lee, K., Fanguy, M., Sari, B. C., & Marin, V. I. (2022). Automated essay scoring (AES) systems: Opportunities and challenges for open and distance education. In Proceedings of the Tenth Pan-Commonwealth Forum on Open Learning (PCF10) (pp. 1-10). Canada Minutes of Congress.

Details

Primary Language

English

Subjects

Instructional Technologies

Journal Section

Research Article

Publication Date

December 31, 2023

Submission Date

August 2, 2023

Acceptance Date

October 21, 2023

Published in Issue

Year 2023 Volume: 6 Number: 4

APA
Geckin, V., Kızıltaş, E., & Çınar, Ç. (2023). Assessing second-language academic writing: AI vs. Human raters. Journal of Educational Technology and Online Learning, 6(4), 1096-1108. https://doi.org/10.31681/jetol.1336599

Cited By