Araştırma Makalesi

Can AI Assess Writing Skills Like a Human? A Reliability Analysis

Cilt: 18 Sayı: 4 28 Ekim 2025
PDF İndir
EN TR

Can AI Assess Writing Skills Like a Human? A Reliability Analysis

Abstract

This study investigates the reliability and consistency of a custom GPT-based scoring system in comparison to trained human raters, focusing on B1-level opinion paragraphs written by English preparatory students. Addressing the limited evidence on how AI scoring systems align with human evaluations in foreign language contexts, the study provides insights into both strengths and limitations of automated writing assessment. A total of 175 student writings were evaluated twice by human raters and twice by the AI system using analytic rubric. Findings indicate excellent agreement among human raters and high consistency across AI-generated scores, but only moderate alignment between human and AI evaluations, with the AI showing a tendency to assign higher scores and overlook off-topic content. These results suggest that while AI scoring systems offer efficiency and consistency, they still lack the interpretive depth of human judgment. The study highlights the potential of AI as a complementary tool in writing assessment, with practical implications for language testing policy and classroom pedagogy.

Keywords

Kaynakça

  1. Bachman, L. F., & Palmer, A. S. (2010). Language assessment in practice. Oxford University Press.
  2. Brown, H. D. (2018). Language assessment: Principles and classroom practices (3rd ed.). Pearson Education.
  3. Bui, N. M., & Barrot, J. S. (2024). ChatGPT as an automated essay scoring tool in the writing classrooms: how it compares with human scoring. Education and Information Technologies, 1-18. https://doi.org/10.1007/s10639-024-12891-w
  4. Creswell, J. W., & Creswell, J. D. (2018). Research design: Qualitative, quantitative, and mixed methods approaches. Sage.
  5. Dazzeo, R. (2024). AI-enhanced writing self-assessment: Empowering student revision with AI tools. Journal of Technology-Integrated Lessons and Teaching, 3(2), 80–85. https://doi.org/10.13001/jtilt.v3i2.9119
  6. DiSabito, D., Hansen, L., Mennella, T., & Rodriguez, J. (2025). Exploring the frontiers of generative AI in assessment: Is there potential for a human‐AI partnership?. New Directions for Teaching and Learning, 2025(182), 81-96. https://doi.org/10.1002%2Ftl.20630
  7. Geçkin, V., Kızıltaş, E., & Çınar, Ç. (2023). Assessing second-language academic writing: AI vs. Human raters. Journal of Educational Technology and Online Learning, 6(4), 1096-1108. https://doi.org/10.31681/jetol.1336599
  8. Hannah, L., Jang, E. E., Shah, M., & Gupta, V. (2023). Validity arguments for automated essay scoring of young students’ writing traits. Language Assessment Quarterly, 20(4–5), 399–420. https://doi.org/10.1080/15434303.2023.2288253

Ayrıntılar

Birincil Dil

İngilizce

Konular

Eğitim Teknolojisi ve Bilgi İşlem

Bölüm

Araştırma Makalesi

Yayımlanma Tarihi

28 Ekim 2025

Gönderilme Tarihi

12 Haziran 2025

Kabul Tarihi

22 Ekim 2025

Yayımlandığı Sayı

Yıl 2025 Cilt: 18 Sayı: 4

Kaynak Göster

APA
Ataseven, H., Çokluk Bökeoğlu, Ö., & Taşdemir, F. (2025). Can AI Assess Writing Skills Like a Human? A Reliability Analysis. Journal of Theoretical Educational Sciences, 18(4), 757-775. https://doi.org/10.30831/akukeg.1718511