Comparing Human and AI Judgments of Turkish EFL Learners’ Intelligibility

Canan Deveci

doi:10.19171/uefad.1728485

EN TR

Comparing Human and AI Judgments of Turkish EFL Learners’ Intelligibility

Abstract

This study investigates how the intelligibility of Turkish learners of English as a foreign language (EFL) is assessed by both human raters and artificial intelligence (AI), specifically ChatGPT-4, across two different speaking tasks: a controlled read-aloud passage and a spontaneous picture-description task. Drawing on intelligibility-focused pronunciation research, the study aims to explore how task type, rater type, and pronunciation features (segmental and suprasegmental) affect intelligibility ratings. 30 intermediate-level Turkish learners of English completed both tasks, and their recordings were evaluated by three native English-speaking human raters and an AI model. Quantitative results showed that spontaneous speech received higher intelligibility scores than the read-aloud task, despite including more segmental errors. Suprasegmental features such as rhythm, stress, and phrasing played a greater role in determining intelligibility across tasks. While AI ratings closely matched human judgments in most cases, discrepancies emerged, particularly in samples where prosodic nuances were critical. Qualitative analysis further revealed that both rater types frequently flagged vowel distortions, stress misplacement, and a lack of rhythmic cohesion as common intelligibility detractors. The findings underscore the importance of integrating suprasegmental instruction in EFL pronunciation pedagogy and highlight the potential role of AI tools in supporting intelligibility assessment. Nevertheless, when it comes to assessing natural, prosody-rich speech, human judgment is still crucial. By providing empirical evidence from a Turkish EFL environment and linking L2 pronunciation research with emerging technology, this study contributes to applied linguistics.

Keywords

Ethical Statement

Ethics committee approval was obtained for the research at the meeting of Ataturk University Social and Human Sciences Ethics Committee numbered E.88656144-000-2500199482.

İngilizceyi Yabancı Dil Olarak Öğrenen Türk Öğrencilerin Konuşma Anlaşılırlığının İnsan ve Yapay Zekâ Değerlendirmeleriyle Karşılaştırılması

Abstract

Bu çalışma, İngilizceyi yabancı dil olarak öğrenen Türk öğrencilerin İngilizce konuşma anlaşılırlığının insan değerlendiriciler ve yapay zeka (ChatGPT-4) tarafından biri kontrollü okuma ve diğeri serbest resim betimleme olmak üzere iki farklı konuşma görevi üzerinden nasıl değerlendirildiğini incelemektedir. Anlaşılırlık odaklı sesletim araştırmalarından hareketle, çalışmada görev türü, değerlendirici türü ve sesletim özelliklerinin (parçalı ve parçalarüstü ses birim) anlaşılırlık puanlarını nasıl etkilediği araştırılmıştır. Orta düzeyde İngilizce yeterliliğe sahip 30 Türk öğrenci her iki görevi tamamlamış ve ses kayıtları anadili İngilizce olan üç değerlendirici ve bir yapay zeka sürümü olan ChatGPT-4 tarafından değerlendirilmiştir. Nicel bulgular, serbest resim betimleme görevinin parçalı sesbirim hatalarını daha fazla barındırmasına rağmen okuma görevine kıyasla daha yüksek anlaşılırlık puanları aldığını göstermiştir. Özellikle ritim, vurgu ve söz gruplaması gibi parçalarüstü ses birim özelliklerin anlaşılırlığın belirlenmesinde daha etkili olduğu görülmüştür. Yapay zeka puanlamaları çoğu durumda insan değerlendirmeleriyle yakınlık göstermiş, ancak özellikle prosodi ayrıntıların kritik olduğu örneklerde farklılıklar ortaya çıkmıştır. Nitel analiz, her iki değerlendirici türünün de sıkça ünlü ses bozulmaları, yanlış vurgu ve ritmik bütünlük eksikliğini anlaşılırlığı düşüren unsurlar olarak işaretlediğini ortaya koymuştur. Bulgular, İngilizce öğretiminde parçalarüstü ses birim özelliklerin daha güçlü biçimde ele alınması gerektiğini vurgulamakta ve anlaşılırlık değerlendirmesinde yapay zeka araçlarının potansiyel rolüne dikkat çekmektedir. Bununla birlikte, özellikle prosodi açısından daha zengin olan serbest konuşma değerlendirilmesinde insan yargısı vazgeçilmezdir. Bu çalışma, ikinci dil sesletim araştırmalarını gelişen teknolojilerle buluşturarak uygulamalı dilbilime katkı sağlamakta ve yabancı dil olarak İngilizce öğretimi bağlamından deneysel kanıtlar sunmaktadır.

Keywords

Ethical Statement

Araştırma için etik kurul onayı, E.88656144-000-2500199482 sayılı Atatürk Üniversitesi Sosyal ve Beşeri Bilimler Etik Kurulu toplantısında alınmıştır.

References

Aoyama, K., & Guion, S. G. (2007). Prosody in second language acquisition: Acoustic analyses of duration and F0 range. In O.-S. Bohn & M. J. Munro (Eds.), Language experience in second-language speech learning (pp. 282–297). John Benjamins.
Babaeian, A. (2023). Pronunciation assessment: Traditional vs modern modes. Journal of Education For Sustainable Innovation, 1(1), 61-68. https://doi.org/10.56916/jesi.v1i1.530
Bayraktaroğlu, S. (2008). Orthographic interference and the teaching of British pronunciation to Turkish learners. Journal of Language and Linguistic Studies, 4(2), 1-36.
Celce Murcia, M., Brinton, D. M., Goodwin, J. M., & Griner, B. (2010). Teaching pronunciation: A course book and reference guide (2nd ed.). Cambridge University Press.
Chau, T., Huensch, A., Hoang, Y. K., & Chau, H. T. (2022). The effects of L2 pronunciation instruction on EFL learners’ intelligibility and fluency in spontaneous speech. TESL EJ, 25(4), n4.
Demirezen, M. (2005). Rehabilitating a fossilized pronunciation error: The/v/and/w/contrast by using the audio-articulation method in teacher training in Türkiye. Journal of Language and Linguistic Studies, 1(2), 183-192.
Derwing, T. M., & Munro, M. J. (1995). Foreign accent, comprehensibility, and intelligibility in L2 learner speech. Language Learning, 45(1), 73–97. https://doi.org/10.1111/j.1467-1770.1995.tb00963.x
Derwing, T. M., & Munro, M. J. (1997). Accent, intelligibility, and comprehensibility: Evidence from four L1s. Studies in second language acquisition, 19(1), 1-16. https://doi.org/10.1017/S0272263197001010

Derwing, T. M., & Munro, M. J. (2009). Putting accent in its place: Rethinking obstacles to communication. Language teaching, 42(4), 476-490. https://doi.org/10.1017/S026144480800551X
Derwing, T. M., & Munro, M. J. (2015). Pronunciation fundamentals: Evidence based perspectives for L2 teaching and research. John Benjamins.
Dörnyei, Z. (2007). Research methods in applied linguistics. Oxford university press.
Dikilitaş, K., & Geylanioğlu, S. (2019). Pronunciation Errors of Turkish Learners of English: Conceptualization Theory as a Teaching Method. Journal of Language Teaching and Learning, 2(2), 38-50. Retrieved from https://www.jltl.com.tr/index.php/jltl/article/view/101
Ercan, H. (2018). Pronunciation Problems of Turkish EFL Learners in Northern Cyprus. International Online Journal of Education and Teaching, 5(4), 877-893. International Online Journal of Education and Teaching, v5 n4 p877- 893 2018
Fairbanks, G. (1960). Voice and articulation drillbook (2nd ed.). Harper & Row.
Field, J. (2005). Intelligibility and the listener: The role of lexical stress. TESOL Quarterly, 39(3), 399–423. https://doi.org/10.2307/3588487
Foote, J. A., Holtby, A. K., & Derwing, T. M. (2011). Survey of the teaching of pronunciation in adult ESL programs in Canada, 2010. TESL Canada journal, 1-22. https://doi.org/10.18806/tesl.v29i1.1086
Geng, H., Saito, D., & Minematsu, N. (2025). A perception-based L2 speech intelligibility Indicator: Leveraging a rater’s shadowing and sequence-to-sequence voice conversion. arXiv preprint arXiv:2505.24304. https://doi.org/10.48550/arXiv.2505.24304
Hahn, L. (2004). Primary stress and intelligibility: Research to motivate the teaching of suprasegmentals. TESOL Quarterly, 38(2), 201–223. https://doi.org/10.2307/3588378
Hismanoglu, M. (2012). Teaching word stress to Turkish EFL learners through Internet-based video lessons. US–China Education Review A, 1(26), 26–40.
Jenkins, J. (2000). The phonology of English as an international language. Oxford University Press.
Kang, O., Rubin, D., & Pickering, L. (2010). Suprasegmental measures of accentedness and judgments of 1107 language learner proficiency in oral English. Modern Language Journal, 94(4), 554–566. https://doi.org/10.1111/j.1540-4781.2010.01091.x
Khalilzadeh, A. (2014). Phonetic and non-phonetic languages: A contrastive study of English and Turkish phonology focusing on the orthography-induced pronunciation problems of Turkish learners of English as a foreign language (Turkish EFL learners). International Journal of Languages’ Education and Teaching, 2(1), 1-16.
Kheir, Y. E., Ali, A., & Chowdhury, S. A. (2023). Automatic pronunciation assessment--a review. arXiv preprint arXiv:2310.13974. https://doi.org/10.48550/arXiv.2310.13974
Kormos, J., & Dénes, M. (2004). Exploring measures and perceptions of fluency in the speech of second language learners. System, 32(2), 145–164. https://doi.org/10.1016/j.system.2004.01.001
Lee, J., Jang, J., & Plonsky, L. (2015). The effectiveness of second language pronunciation instruction: A meta-analysis. Applied Linguistics, 36(3), 345–366. https://doi.org/10.1093/applin/amu040
Levis, J. M. (2005). Changing contexts and shifting paradigms in pronunciation teaching. TESOL Quarterly, 39(3), 369–377. https://doi.org/10.2307/3588485
Levis, J. M. (2018). Intelligibility, oral communication, and the teaching of pronunciation. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781108241564
Levis, J. (2020). Revisiting the intelligibility and nativeness principles. Journal of Second Language Pronunciation, 6(3), 310–328. https://doi.org/10.1075/jslp.20050.lev
Lochland, P. (2020). Intelligibility of L2 Speech in ELF. Australian Journal of Applied Linguistics, 3(3), 196-212.
Mompean, J. A. (2024). ChatGPT for L2 pronunciation teaching and learning. ELT Journal, 78(4), 423-434. https://doi.org/10.1093/elt/ccae050
Munro, M. J., & Derwing, T. M. (1995). Foreign accent, comprehensibility, and intelligibility in the speech of second language learners. Language Learning, 45(1), 73–97. https://doi.org/10.1111/j.1467-1770.1995.tb00963.x
Munro, M. J., & Derwing, T. M. (2020). Foreign accent, comprehensibility and intelligibility, redux. Journal of Second Language Pronunciation, 6(3), 283-309. https://doi.org/10.1075/jslp.20038.mun
Munro, M. J., Derwing, T. M., & Morton, S. L. (2006). The mutual intelligibility of L2 speech. Studies in second language acquisition, 28(1), 111-131. https://doi.org/10.1017/S0272263106060049
Pease, C. (2016). Accentedness, comprehensibility and intelligibility of L2 speech: A replication and extended study (Doctoral dissertation, University of York).
Saito, K. (2012). Effects of instruction on L2 pronunciation development: A synthesis of 15 quasi-experimental intervention studies. TESOL Quarterly, 46(4), 842–854. https://doi.org/10.1002/tesq.67
Saito, K. (2018). Advanced second language segmental and suprasegmental acquisition. The handbook of advanced proficiency in second language acquisition, 282-303. https://doi.org/10.1002/9781119261650.ch15
Seidlhofer, B. (2013). Understanding English as a lingua franca. Oxford University Press.
Somasundaran, S., Chen, L., Cheng, X., & Zechner, K. (2015). Exploring content and discourse features for automated speech scoring. Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, 12–21. https://doi.org/10.3115/v1/W15-0602
Uzun, T. (2022). The salient pronunciation errors and intelligibility of Turkish speakers in English. MEXTESOL Journal, 46(1), 1–15. https://doi.org/10.61871/mj.v46n1-9
Vančová, H. (2023). AI and AI-powered tools for pronunciation training. Journal of Language and Cultural Education, 11(3), 12-24.
Yavaş, M. (2011). Applied english phonology (2nd ed.). Wiley-Blackwell.
Yorkston, K. M., Beukelman, D. R., & Traynor, C. (1984). Assessment of intelligibility of dysarthric speech. Austin, TX: Pro-ed.
Zechner, K., Higgins, D., Xi, X., & Williamson, D. M. (2009). Automatic scoring of non-native spontaneous speech in tests of spoken English. Speech communication, 51(10), 883-895. https://doi.org/10.1016/j.specom.2009.04.009
Zou, B., Liviero, S., Ma, Q., Zhang, W., Du, Y., & Xing, P. (2024). Exploring EFL learners’ perceived promise and limitations of using an artificial intelligence speech evaluation system for speaking practice. System, 126, 103497. https://doi.org/10.1016/j.specom.2009.04.011

Details

Primary Language

English

Subjects

English As A Second Language, Applied Linguistics and Educational Linguistics

Journal Section

Research Article

Authors

Canan Deveci ^*
0000-0001-5894-3974
United States

Publication Date

April 30, 2026

Submission Date

June 30, 2025

Acceptance Date

September 26, 2025

Published in Issue

Year 2026 Volume: 39 Number: 1

DOI

https://doi.org/10.19171/uefad.1728485

IZ

https://izlik.org/JA25AY83JE

Cite

RIS / Bibtex

APA

Deveci, C. (2026). Comparing Human and AI Judgments of Turkish EFL Learners’ Intelligibility. Journal of Uludag University Faculty of Education, 39(1), 33-49. https://doi.org/10.19171/uefad.1728485

AMA

1.Deveci C. Comparing Human and AI Judgments of Turkish EFL Learners’ Intelligibility. Journal of Uludag University Faculty of Education. 2026;39(1):33-49. doi:10.19171/uefad.1728485

Chicago

Deveci, Canan. 2026. “Comparing Human and AI Judgments of Turkish EFL Learners’ Intelligibility”. Journal of Uludag University Faculty of Education 39 (1): 33-49. https://doi.org/10.19171/uefad.1728485.

EndNote

Deveci C (April 1, 2026) Comparing Human and AI Judgments of Turkish EFL Learners’ Intelligibility. Journal of Uludag University Faculty of Education 39 1 33–49.

IEEE

[1]C. Deveci, “Comparing Human and AI Judgments of Turkish EFL Learners’ Intelligibility”, Journal of Uludag University Faculty of Education, vol. 39, no. 1, pp. 33–49, Apr. 2026, doi: 10.19171/uefad.1728485.

ISNAD

Deveci, Canan. “Comparing Human and AI Judgments of Turkish EFL Learners’ Intelligibility”. Journal of Uludag University Faculty of Education 39/1 (April 1, 2026): 33-49. https://doi.org/10.19171/uefad.1728485.

JAMA

1.Deveci C. Comparing Human and AI Judgments of Turkish EFL Learners’ Intelligibility. Journal of Uludag University Faculty of Education. 2026;39:33–49.

MLA

Deveci, Canan. “Comparing Human and AI Judgments of Turkish EFL Learners’ Intelligibility”. Journal of Uludag University Faculty of Education, vol. 39, no. 1, Apr. 2026, pp. 33-49, doi:10.19171/uefad.1728485.

Vancouver

1.Canan Deveci. Comparing Human and AI Judgments of Turkish EFL Learners’ Intelligibility. Journal of Uludag University Faculty of Education. 2026 Apr. 1;39(1):33-49. doi:10.19171/uefad.1728485