Research Article

Statistical and qualitative analysis of ChatGPT and human raters in preservice teachers’ writing assessment

Volume: 13 Number: 1 January 2, 2026
EN TR

Statistical and qualitative analysis of ChatGPT and human raters in preservice teachers’ writing assessment

Abstract

Teachers spend a significant amount of time providing feedback. This study compared expert and ChatGPT assessments and feedback on written texts to determine the suitability of AI for writing skill assessments that are time-consuming to assess and provide feedback. Three experts and ChatGPT graded 14 Turkish undergraduate students’ assignments using rubric that included content, language use, vocabulary, organization, and mechanics, and justified their decisions. The study involved document review and triangulation, a qualitative design. In addition, an intraclass correlation coefficient was used to assess the consistency of the ChatGPT and the experts’ scores. All feedback was qualitatively analyzed to identify the strengths and weaknesses of the experts and their similarities with ChatGPT. Experts and ChatGPT had moderate to weak consistency in the writing subscales, while good reliability was found in the total score. Experts excelled in ‘explanatory feedback’, ‘interpretation’ and ‘experience’, while ChatGPT excelled in ‘automation and continuity’ and ‘data processing capacity’. Experts’ weaknesses included ‘limited time and energy’ and ‘comparison bias’, while ChatGPT’s weaknesses were ‘ambiguous expressions’ and ‘repetition’. The study also found that experts and ChatGPT preferred to provide constructive and supportive feedback.

Keywords

Ethical Statement

Bayburt University, 4.11.2024-238376

References

  1. Akaya, A.O., & Kurtuluş, A. (2011). 6. sınıf matematik dersi öğretim programının uygulanabilirliğine ilişkin öğretmen görüşleri [Teachers’ opinions about the applicability of 6th grade mathematics curriculum]. Education Sciences, 6(3), 2229-2245.
  2. Altundal, B. (2024). A review of artificial intelligence and its use in education. Artificial Intelligence in Educational Research, 1(1), 28-38. https://doi.org/10.5281/zenodo.11241932
  3. Applebee, A.N., & Langer, J.A. (2011). EJ extra: A snapshot of writing instruction in middle schools and high schools. English Journal, 100(6), 14-27. https://doi.org/10.58680/ej201116413
  4. Baidoo-Anu, D., & Owusu Ansah, L. (2023). Education in the era of generative artificial intelligence (AI): Understanding the potential benefits of ChatGPT in promoting teaching and learning. Journal of AI, 7(1), 52-62. https://doi.org/10.61969/jai.1337500
  5. Bilge, H. (2024). Comparison of the texts selected from a Turkish textbook and the texts produced by artificial intelligence chatbots in terms of vocabulary. Artificial Intelligence in Educational Research, 1(1), 1-16. https://doi.org/10.5281/zenodo.11246999
  6. Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77-101. https://doi.org/10.1191/1478088706qp063oa
  7. Bryant, J., Heitz, C., Sanghvi, S., & Wagle, D. (2020). How artificial intelligence will impact K-12 teachers. McKinsey & Company.
  8. Bui, N.M., & Barrot, J.S. (2025). ChatGPT as an automated essay scoring tool in the writing classrooms: How it compares with human scoring. Education and Information Technologies, 30(2), 2041 2058. https://doi.org/10.1007/s10639-024-12891-w

Details

Primary Language

English

Subjects

Measurement and Evaluation in Education (Other)

Journal Section

Research Article

Publication Date

January 2, 2026

Submission Date

April 17, 2025

Acceptance Date

November 9, 2025

Published in Issue

Year 2026 Volume: 13 Number: 1

APA
Gülden, B., Bilge, H., & Kanık Uysal, P. (2026). Statistical and qualitative analysis of ChatGPT and human raters in preservice teachers’ writing assessment. International Journal of Assessment Tools in Education, 13(1), 248-269. https://doi.org/10.21449/ijate.1678002

Cited By

23823             23825             23824