Araştırma Makalesi

Reliability of Human Expert and AI Raters in Translation Assessment

Cilt: 22 Sayı: 6 17 Aralık 2025
PDF İndir
TR EN

Reliability of Human Expert and AI Raters in Translation Assessment

Öz

Although AI-based assessment systems offer new opportunities in education, their consistency with human judgment in measuring complex cognitive skills such as translation remains debatable. This study examines inter-rater reliability between a domain expert and AI raters (ChatGPT-5 and Gemini 1.5 Pro) in evaluating C2-level Turkish translations. Using a convergent mixed-methods design, translations from 14 students were scored with a 5-point analytic rubric. Krippendorff's alpha revealed low overall agreement (α = .392), particularly weak in "Semantic Accuracy" (α = .288). Qualitative analysis identified three key divergences: task fidelity, error severity perception, and criterion interpretation variability. Findings show AI models exhibit partial consistency in formal accuracy but systematically diverge from human experts in semantic nuance, style, and contextual appropriateness. The expert adopted a "task-oriented" approach, while AI models were more "form-focused" (Gemini) or "surface coherence-oriented" (ChatGPT). Although AI systems serve as useful auxiliary tools in translation assessment, they are not able to replace expert judgment

Anahtar Kelimeler

Artificial intelligence, inter-rater reliability, teaching Turkish as a foreign language, translation assessment

Destekleyen Kurum

Çanakkale Onsekiz Mart Üniversitesi

Proje Numarası

2025-YÖNP-2114

Etik Beyan

Çalışma, bir devlet üniversitesinin Sosyal ve Beşerî Bilimler Etik Kurulu’nun 24.10.2025 tarihli ve 21/114 sayılı onay kararıyla yürütülmüştür.

Kaynakça

  1. Bassnett, S. (2002). Translation studies. Routledge.
  2. Büyüköztürk, Ş., Çakmak, E. K., Akgün, Ö. E., Karadeniz, Ş., & Demirel, F. (2020). Bilimsel araştırma yöntemleri (27. bs.). Pegem Akademi.
  3. Doewes, A., & Pechenizkiy, M. (2021). On the limitations of human-computer agreement in automated essay scoring. Proceedings of the 2021 Educational Data Mining Conference.
  4. Fahmy, Y. (2024). Student perception on AI-driven assessment: Motivation, engagement and feedback capabilities [Yüksek lisans tezi, University of Twente]. University of Twente Student Theses. https://essay.utwente.nl/91297/
  5. Farrokhnia, M., Banihashem, S. K., Noroozi, O., & Wals, A. (2024). A SWOT analysis of ChatGPT: Implications for educational practice and research. Innovations in Education and Teaching International, 61(3), 460-474. https://doi.org/10.1080/14703297.2023.2195846
  6. İşcan, A. (2011). Türkçenin yabancı dil olarak önemi. International Journal of Eurasia Social Sciences, 2(4), 29-36. Kaleli, S., & Özdemir, A. (2025). Artificial intelligence and its role in teaching Turkish as a foreign language. Turkish Linguistics Journal.
  7. Kocmi, T., & Federmann, C. (2023). Large language models are state-of-the-art evaluators of translation quality. Proceedings of the 24th Annual Conference of the European Association for Machine Translation, 193-203. https://aclanthology.org/2023.eamt-1.19/
  8. Kotlyar, I., & Krasman, J. (2022). Virtual simulation: New method for assessing teamwork skills. International Journal of Selection and Assessment, 30(3), 344-360. https://doi.org/10.1111/ijsa.12368
  9. Kotlyar, I., & Krasman, J. (2025). Student reactions to AI versus human feedback in teamwork skills assessment. International Journal of Educational Technology in Higher Education, 22(1), 1-34. https://doi.org/10.1186/s41239-025-00555-9
  10. Krippendorff, K. (2004). Content analysis: An introduction to its methodology (2. bs.). Sage Publications.

Kaynak Göster

APA
Uzun, Y. (2025). Reliability of Human Expert and AI Raters in Translation Assessment. OPUS Journal of Society Research, 22(6), 1305-1317. https://doi.org/10.26466/opusjsr.1821518
AMA
1.Uzun Y. Reliability of Human Expert and AI Raters in Translation Assessment. OPUS TAD. 2025;22(6):1305-1317. doi:10.26466/opusjsr.1821518
Chicago
Uzun, Yasemin. 2025. “Reliability of Human Expert and AI Raters in Translation Assessment”. OPUS Journal of Society Research 22 (6): 1305-17. https://doi.org/10.26466/opusjsr.1821518.
EndNote
Uzun Y (01 Aralık 2025) Reliability of Human Expert and AI Raters in Translation Assessment. OPUS Journal of Society Research 22 6 1305–1317.
IEEE
[1]Y. Uzun, “Reliability of Human Expert and AI Raters in Translation Assessment”, OPUS TAD, c. 22, sy 6, ss. 1305–1317, Ara. 2025, doi: 10.26466/opusjsr.1821518.
ISNAD
Uzun, Yasemin. “Reliability of Human Expert and AI Raters in Translation Assessment”. OPUS Journal of Society Research 22/6 (01 Aralık 2025): 1305-1317. https://doi.org/10.26466/opusjsr.1821518.
JAMA
1.Uzun Y. Reliability of Human Expert and AI Raters in Translation Assessment. OPUS TAD. 2025;22:1305–1317.
MLA
Uzun, Yasemin. “Reliability of Human Expert and AI Raters in Translation Assessment”. OPUS Journal of Society Research, c. 22, sy 6, Aralık 2025, ss. 1305-17, doi:10.26466/opusjsr.1821518.
Vancouver
1.Yasemin Uzun. Reliability of Human Expert and AI Raters in Translation Assessment. OPUS TAD. 01 Aralık 2025;22(6):1305-17. doi:10.26466/opusjsr.1821518