Statistical and qualitative analysis of ChatGPT and human raters in preservice teachers’ writing assessment

Bahadır Gülden; Huzeyfe Bilge; Pınar Kanık Uysal

doi:10.21449/ijate.1678002

Research Article

BibTex

RIS

Cite

Statistical and qualitative analysis of ChatGPT and human raters in preservice teachers’ writing assessment

Year 2026, Volume: 13 Issue: 1, 248 - 269, 02.01.2026

Bahadır Gülden , Huzeyfe Bilge , Pınar Kanık Uysal

https://doi.org/10.21449/ijate.1678002

https://izlik.org/JA47HU42TP

Abstract

Teachers spend a significant amount of time providing feedback. This study compared expert and ChatGPT assessments and feedback on written texts to determine the suitability of AI for writing skill assessments that are time-consuming to assess and provide feedback. Three experts and ChatGPT graded 14 Turkish undergraduate students’ assignments using rubric that included content, language use, vocabulary, organization, and mechanics, and justified their decisions. The study involved document review and triangulation, a qualitative design. In addition, an intraclass correlation coefficient was used to assess the consistency of the ChatGPT and the experts’ scores. All feedback was qualitatively analyzed to identify the strengths and weaknesses of the experts and their similarities with ChatGPT. Experts and ChatGPT had moderate to weak consistency in the writing subscales, while good reliability was found in the total score. Experts excelled in ‘explanatory feedback’, ‘interpretation’ and ‘experience’, while ChatGPT excelled in ‘automation and continuity’ and ‘data processing capacity’. Experts’ weaknesses included ‘limited time and energy’ and ‘comparison bias’, while ChatGPT’s weaknesses were ‘ambiguous expressions’ and ‘repetition’. The study also found that experts and ChatGPT preferred to provide constructive and supportive feedback.

Keywords

Artificial Intelligence , ChatGPT , Writing feedback , Human-raters

Ethical Statement

Bayburt University, 4.11.2024-238376

References

Akaya, A.O., & Kurtuluş, A. (2011). 6. sınıf matematik dersi öğretim programının uygulanabilirliğine ilişkin öğretmen görüşleri [Teachers’ opinions about the applicability of 6th grade mathematics curriculum]. Education Sciences, 6(3), 2229-2245.
Altundal, B. (2024). A review of artificial intelligence and its use in education. Artificial Intelligence in Educational Research, 1(1), 28-38. https://doi.org/10.5281/zenodo.11241932
Applebee, A.N., & Langer, J.A. (2011). EJ extra: A snapshot of writing instruction in middle schools and high schools. English Journal, 100(6), 14-27. https://doi.org/10.58680/ej201116413
Baidoo-Anu, D., & Owusu Ansah, L. (2023). Education in the era of generative artificial intelligence (AI): Understanding the potential benefits of ChatGPT in promoting teaching and learning. Journal of AI, 7(1), 52-62. https://doi.org/10.61969/jai.1337500
Bilge, H. (2024). Comparison of the texts selected from a Turkish textbook and the texts produced by artificial intelligence chatbots in terms of vocabulary. Artificial Intelligence in Educational Research, 1(1), 1-16. https://doi.org/10.5281/zenodo.11246999
Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77-101. https://doi.org/10.1191/1478088706qp063oa
Bryant, J., Heitz, C., Sanghvi, S., & Wagle, D. (2020). How artificial intelligence will impact K-12 teachers. McKinsey & Company.
Bui, N.M., & Barrot, J.S. (2025). ChatGPT as an automated essay scoring tool in the writing classrooms: How it compares with human scoring. Education and Information Technologies, 30(2), 2041 2058. https://doi.org/10.1007/s10639-024-12891-w
Busch, P.A., & Hausvik, G.I. (2023). Too good to be true? An empirical study of ChatGPT capabilities for academic writing and implications for academic misconduct. In Proceedings of the 29th Annual Americas Conference on Information Systems (Paper 1829). Association for Information Systems. https://aisel.aisnet.org/amcis2023/sig_odis/sig_odis/21
Cai, Z., Duan, X., Haslett, D., Wang, S., & Pickering, M. (2024). Do large language models resemble humans in language use?. In Proceedings of the workshop on cognitive modeling and computational linguistics (pp. 37 56). Association for Computational Linguistics. https://aclanthology.org/2024.cmcl-1.4.pdf
Creswell, J.W. (2022). A concise introduction to mixed methods research. SAGE publications.
de Winter, J.C.F. (2023). Can ChatGPT pass high school exams on English language comprehension? International Journal of Artificial Intelligence in Education, 34, 915 930. https://doi.org/10.1007/s40593-023-00372-z
Dikli, S., & Bleyle, S. (2014). Automated essay scoring feedback for second language writers: How does it compare to instructor feedback? Assessing Writing, 22, 1 17. https://doi.org/10.1016/J.ASW.2014.03.006
Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608. https://doi.org/10.48550/arXiv.1702.08608
Flower, L., & Hayes, J.R. (1981). A cognitive process theory of writing. College Composition & Communication, 32(4), 365-387.
Fuchs, K. (2023). Exploring the opportunities and challenges of NLP models in higher education: Is Chat GPT a blessing or a curse? Frontiers in Education, 8, Article 1166682. https://doi.org/10.3389/feduc.2023.1166682
Graham, S., Hebert, M., & Harris, K.R. (2015). Formative assessment and writing. The Elementary School Journal, 115(4), 523-547. https://doi.org/10.1086/681947
Guo, K., & Wang, D. (2024). To resist it or to embrace it? Examining ChatGPT’s potential to support teacher feedback in EFL writing. Education and Information Technologies, 29(7), 8435 8463. https://doi.org/10.1007/s10639-023-12146-0
Guo, B., Zhang, X., Wang, Z., Jiang, M., Nie, J., Ding, Y., Yue, J., & Wu, Y. (2023). How close is ChatGPT to human experts? Comparison corpus, evaluation, and detection. arXiv preprint arXiv:2301.07597. https://doi.org/10.48550/arXiv.2301.07597
Han, Z., Battaglia, F., Udaiyar, A., Fooks, A., & Terlecky, S.R. (2024). An explorative assessment of ChatGPT as an aid in medical education: Use it with caution. Medical Teacher, 46(5), 657 664. https://doi.org/10.1080/0142159X.2023.2271159
Herbold, S., Hautli-Janisz, A., Heuer, U., Kikteva, Z., & Trautsch, A. (2023). AI, write an essay for me: A large-scale comparison of human-written versus ChatGPT-generated essays. arXiv preprint arXiv:2304.14276. https://doi.org/10.48550/arXiv.2304.14276
Huang, S.J. (2014). Automated versus human scoring: A case study in an EFL context. Electronic Journal of Foreign Language Teaching, 11, 149-164.
Jackaria, P.M., Hajan, B.H., & Mastul, A.R.H. (2024). A comparative analysis of the rating of college students’ essays by ChatGPT versus human raters. International Journal of Learning, Teaching and Educational Research, 23(2), 478-492. https://doi.org/10.26803/ijlter.23.2.23
Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., & Fung, P. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12), 1-38. https://doi.org/10.1145/3571730
Kadavath, S., Conerly, T., Askell, A., Henighan, T., Drain, D., Perez, E., Schiefer, N., Hatfield, D., DasSarma, N., Tran-Johnson, E., Johnston, S., El-Showk, S., Jones, A., Elhage, N., Hume, T., Chen, A., Bai, Y., Bowman, S., Fort, S. et al., (2022). Language models (mostly) know what they know. Cornell University. arXiv preprint arXiv:2207.05221. https://doi.org/10.48550/arXiv.2207.05221
Kanık Uysal, P., Akın Arıkan, Ç., Acar Erdol, T., Bayrak Özmutlu, E., & Akyol, H. (2022). Examining Turkish course exam questions in terms of originality, page layout, item type, item writing criteria and cognitive level. Education and Science, 47(210), 259 280. https://doi.org/10.15390/EB.2022.10896
Kiryakova, G., & Angelova, N. (2023). ChatGPT-A challenging tool for the university professors in their teaching practice. Education Sciences, 13(10), Article 1056. https://doi.org/10.3390/educsci13101056
Klyshbekova, M., & Abbott, P. (2024). ChatGPT and assessment in higher education: A magic wand or a disruptor? Electronic Journal of E-Learning, 22(2), 30-45. https://doi.org/10.34190/ejel.21.5.3114
Koo, T.K., & Li, M.Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155 163. https://doi.org/10.1016/j.jcm.2016.02.012
Kortemeyer, G. (2023). Could an artificial-intelligence agent pass an introductory physics course. Physical Review Physics Education Research, 19(1). https://doi.org/10.1103/PhysRevPhysEducRes.19.010132
Lo, C.K. (2023). What is the impact of ChatGPT on education? A rapid review of the literature. Education Sciences, 13, Article 410. https://doi.org/10.3390/educsci13040410
Mahowald, K. (2023). A discerning several thousand judgments: GPT-3 rates the article + adjective + numeral + noun construction. arXiv preprint arXiv:2301.12564. https://doi.org/10.48550/arXiv.2301.12564
McMillan, J.H. (2018). Classroom assessment principles and practice that enhance student learning and motivation improving results. Pearson.
Miles, M.B., & Huberman, A.M. (1994). Qualitative data analysis: An expanded sourcebook. Sage.
Mondal, H., & Mondal, S. (2023). ChatGPT in academic writing: Maximizing its benefits and minimizing the risks. Indian Journal of Ophthalmology, 71(12), 3600 3606. https://doi.org/10.4103/IJO.IJO_718_23
Oğuz, B. (2024). Readability levels of narrative texts produced by artificial intelligence (ChatGPT) for 6th grade students. Artificial Intelligence in Educational Research, 1(1), 17 27. https://doi.org/10.5281/zenodo.11247479
OpenAI. (2023). GPT 4 Technical Report. arXiv preprint arXiv:2303.08774. https://doi.org/10.48550/arXiv.2303.08774
Özen, O. (2020). Examining the open-ended question preparation skills of Turkish teachers [Unpublished master’s thesis] Atatürk University.
Parker, J.L., Becker, K., & Carroca, C. (2023). ChatGPT for automated writing evaluation in scholarly writing instruction. Journal of Nursing Education, 62(12), 721 727. https://doi.org/10.3928/01484834-20231006-02
Picken, J. D. (1988). A reassessment of error-count. JALT Journal, 10(1&2), 79-90.
Qadir, J. (2023). Engineering education in the era of ChatGPT: Promise and pitfalls of generative AI for education. In Proceedings of the IEEE Global Engineering Education Conference (pp. 1-9). Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/EDUCON54358.2023.10125121
Rahman, Md. M., & Watanobe, Y. (2023). ChatGPT for education and research: Opportunities, threats, and strategies. Applied Sciences, 13, Article 5783. https://doi.org/10.3390/app13095783
Safdar, M., Siddique, N., Gulzar, A., Yasin, H., & Khan, A. (2024). Does ChatGPT generate fake results? Challenges in retrieving content through ChatGPT. Digital Library Perspectives, 40(4), 668 680. https://doi.org/10.1108/DLP-01-2024-0006
Shi, H., Aryadoust, V. (2024). A systematic review of AI-based automated written feedback research. ReCALL, 36(2), 187-209. https://doi.org/10.1017/S0958344023000265
Sok, S., & Heng, K. (2023). Opportunities, challenges, and strategies for using ChatGPT in higher education: A literature review. Journal of Digital Educational Technology, 4(1), Article ep2401. https://doi.org/10.30935/jdet/14027
Steiss, J., Tate, T., Graham, S., Cruz, J., Hebert, M., Wang, J., Moon, Y., Tseng, W., Warschauer, M., & Olson, C.B. (2024). Comparing the quality of human and ChatGPT feedback of students’ writing. Learning and Instruction, 91, Article 101894. https://doi.org/10.1016/j.learninstruc.2024.101894
Steyvers, M., Smyth, P., & Griffiths, T.L. (2025). What large language models know and what people think they know. Nature Machine Intelligence, 7, 221-231. https://doi.org/10.1038/s42256-024-00976-7
Tarakçı, R., & Tarakçı, Ç. (2024). The comparison of artificial intelligence generated texts with textbook in teaching idioms and proverb. Artificial Intelligence in Educational Science, 1(1), 39 53. https://doi.org/10.5281/zenodo.11242790
Uyar, A.C., & Büyükahıska, D. (2025). Artificial intelligence as an automated essay scoring tool: A focus on ChatGPT. International Journal of Assessment Tools in Education, 12(1), 20 32. https://doi.org/10.21449/ijate.1517994
Ülper, H. (2019). Yazılı metinleri ölçme ve değerlendirme [Measuring and evaluating written texts]. In N. Bayat (Eds.), Yazma ve eğitimi [Writing and teaching writing] (pp. 159-174). Anı Publishing.
Vargas-Murillo, A.R., Pari-Bedoya, I.N.M. de la A., & Guevara-Soto, F. de J. (2023). Challenges and opportunities of AI-assisted learning: A systematic literature review on the impact of ChatGPT usage in higher education. International Journal of Learning, Teaching and Educational Research, 22(7), 122 135. https://doi.org/10.26803/ijlter.22.7.7
Waltzer, T., Cox, R.L., & Heyman, G.D. (2023). Testing the ability of teachers and students to differentiate between essays generated by ChatGPT and high school students. Human Behavior and Emerging Technologies, 2023, Article 1923981. https://doi.org/10.1155/2023/1923981
Wang, L., Chen, X., Wang, C., Xu, L., Shadiev, R., & Li, Y. (2024). ChatGPT’s capabilities in providing feedback on undergraduate students’ argumentation: A case study. Thinking Skills and Creativity, 51, Article 101440. https://doi.org/10.1016/j.tsc.2023.101440
Weigle, S.C. (2002). Assessing writing. Cambridge University Press.
Yan, D. (2023). Impact of ChatGPT on learners in a L2 writing practicum: An exploratory investigation. Education and Information Technologies, 28(11), 13943-13967. https://doi.org/10.1007/s10639-023-11742-4
Yavuz, İ., & Bilgeç, İ. (2016). Açık Uçlu Sorularla Yapılan Matematik Sınavlarının Ölçme ve Değerlendirilmesinin İncelenmesi [Examining the measurement and evaluation of mathematics exams with open-ended questions]. Journal of Research in Education and Teaching, 5(3), 183-193.
Zhao, H., Chen, H., Yang, F., Wang, Y., & Wu, X. (2024). Explainability for large language models: A survey. ACM Computing Surveys, 15(2), Article 20. https://doi.org/10.1145/3639372

Statistical and qualitative analysis of ChatGPT and human raters in preservice teachers’ writing assessment

Year 2026, Volume: 13 Issue: 1, 248 - 269, 02.01.2026

Bahadır Gülden , Huzeyfe Bilge , Pınar Kanık Uysal

https://doi.org/10.21449/ijate.1678002

https://izlik.org/JA47HU42TP

Abstract

Keywords

Artificial Intelligence , ChatGPT , Writing feedback , Human-raters

Ethical Statement

Bayburt University, 4.11.2024-238376

References

Akaya, A.O., & Kurtuluş, A. (2011). 6. sınıf matematik dersi öğretim programının uygulanabilirliğine ilişkin öğretmen görüşleri [Teachers’ opinions about the applicability of 6th grade mathematics curriculum]. Education Sciences, 6(3), 2229-2245.
Altundal, B. (2024). A review of artificial intelligence and its use in education. Artificial Intelligence in Educational Research, 1(1), 28-38. https://doi.org/10.5281/zenodo.11241932
Applebee, A.N., & Langer, J.A. (2011). EJ extra: A snapshot of writing instruction in middle schools and high schools. English Journal, 100(6), 14-27. https://doi.org/10.58680/ej201116413
Baidoo-Anu, D., & Owusu Ansah, L. (2023). Education in the era of generative artificial intelligence (AI): Understanding the potential benefits of ChatGPT in promoting teaching and learning. Journal of AI, 7(1), 52-62. https://doi.org/10.61969/jai.1337500
Bilge, H. (2024). Comparison of the texts selected from a Turkish textbook and the texts produced by artificial intelligence chatbots in terms of vocabulary. Artificial Intelligence in Educational Research, 1(1), 1-16. https://doi.org/10.5281/zenodo.11246999
Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77-101. https://doi.org/10.1191/1478088706qp063oa
Bryant, J., Heitz, C., Sanghvi, S., & Wagle, D. (2020). How artificial intelligence will impact K-12 teachers. McKinsey & Company.
Bui, N.M., & Barrot, J.S. (2025). ChatGPT as an automated essay scoring tool in the writing classrooms: How it compares with human scoring. Education and Information Technologies, 30(2), 2041 2058. https://doi.org/10.1007/s10639-024-12891-w
Busch, P.A., & Hausvik, G.I. (2023). Too good to be true? An empirical study of ChatGPT capabilities for academic writing and implications for academic misconduct. In Proceedings of the 29th Annual Americas Conference on Information Systems (Paper 1829). Association for Information Systems. https://aisel.aisnet.org/amcis2023/sig_odis/sig_odis/21
Cai, Z., Duan, X., Haslett, D., Wang, S., & Pickering, M. (2024). Do large language models resemble humans in language use?. In Proceedings of the workshop on cognitive modeling and computational linguistics (pp. 37 56). Association for Computational Linguistics. https://aclanthology.org/2024.cmcl-1.4.pdf
Creswell, J.W. (2022). A concise introduction to mixed methods research. SAGE publications.
de Winter, J.C.F. (2023). Can ChatGPT pass high school exams on English language comprehension? International Journal of Artificial Intelligence in Education, 34, 915 930. https://doi.org/10.1007/s40593-023-00372-z
Dikli, S., & Bleyle, S. (2014). Automated essay scoring feedback for second language writers: How does it compare to instructor feedback? Assessing Writing, 22, 1 17. https://doi.org/10.1016/J.ASW.2014.03.006
Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608. https://doi.org/10.48550/arXiv.1702.08608
Flower, L., & Hayes, J.R. (1981). A cognitive process theory of writing. College Composition & Communication, 32(4), 365-387.
Fuchs, K. (2023). Exploring the opportunities and challenges of NLP models in higher education: Is Chat GPT a blessing or a curse? Frontiers in Education, 8, Article 1166682. https://doi.org/10.3389/feduc.2023.1166682
Graham, S., Hebert, M., & Harris, K.R. (2015). Formative assessment and writing. The Elementary School Journal, 115(4), 523-547. https://doi.org/10.1086/681947
Guo, K., & Wang, D. (2024). To resist it or to embrace it? Examining ChatGPT’s potential to support teacher feedback in EFL writing. Education and Information Technologies, 29(7), 8435 8463. https://doi.org/10.1007/s10639-023-12146-0
Guo, B., Zhang, X., Wang, Z., Jiang, M., Nie, J., Ding, Y., Yue, J., & Wu, Y. (2023). How close is ChatGPT to human experts? Comparison corpus, evaluation, and detection. arXiv preprint arXiv:2301.07597. https://doi.org/10.48550/arXiv.2301.07597
Han, Z., Battaglia, F., Udaiyar, A., Fooks, A., & Terlecky, S.R. (2024). An explorative assessment of ChatGPT as an aid in medical education: Use it with caution. Medical Teacher, 46(5), 657 664. https://doi.org/10.1080/0142159X.2023.2271159
Herbold, S., Hautli-Janisz, A., Heuer, U., Kikteva, Z., & Trautsch, A. (2023). AI, write an essay for me: A large-scale comparison of human-written versus ChatGPT-generated essays. arXiv preprint arXiv:2304.14276. https://doi.org/10.48550/arXiv.2304.14276
Huang, S.J. (2014). Automated versus human scoring: A case study in an EFL context. Electronic Journal of Foreign Language Teaching, 11, 149-164.
Jackaria, P.M., Hajan, B.H., & Mastul, A.R.H. (2024). A comparative analysis of the rating of college students’ essays by ChatGPT versus human raters. International Journal of Learning, Teaching and Educational Research, 23(2), 478-492. https://doi.org/10.26803/ijlter.23.2.23
Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., & Fung, P. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12), 1-38. https://doi.org/10.1145/3571730
Kadavath, S., Conerly, T., Askell, A., Henighan, T., Drain, D., Perez, E., Schiefer, N., Hatfield, D., DasSarma, N., Tran-Johnson, E., Johnston, S., El-Showk, S., Jones, A., Elhage, N., Hume, T., Chen, A., Bai, Y., Bowman, S., Fort, S. et al., (2022). Language models (mostly) know what they know. Cornell University. arXiv preprint arXiv:2207.05221. https://doi.org/10.48550/arXiv.2207.05221
Kanık Uysal, P., Akın Arıkan, Ç., Acar Erdol, T., Bayrak Özmutlu, E., & Akyol, H. (2022). Examining Turkish course exam questions in terms of originality, page layout, item type, item writing criteria and cognitive level. Education and Science, 47(210), 259 280. https://doi.org/10.15390/EB.2022.10896
Kiryakova, G., & Angelova, N. (2023). ChatGPT-A challenging tool for the university professors in their teaching practice. Education Sciences, 13(10), Article 1056. https://doi.org/10.3390/educsci13101056
Klyshbekova, M., & Abbott, P. (2024). ChatGPT and assessment in higher education: A magic wand or a disruptor? Electronic Journal of E-Learning, 22(2), 30-45. https://doi.org/10.34190/ejel.21.5.3114
Koo, T.K., & Li, M.Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155 163. https://doi.org/10.1016/j.jcm.2016.02.012
Kortemeyer, G. (2023). Could an artificial-intelligence agent pass an introductory physics course. Physical Review Physics Education Research, 19(1). https://doi.org/10.1103/PhysRevPhysEducRes.19.010132
Lo, C.K. (2023). What is the impact of ChatGPT on education? A rapid review of the literature. Education Sciences, 13, Article 410. https://doi.org/10.3390/educsci13040410
Mahowald, K. (2023). A discerning several thousand judgments: GPT-3 rates the article + adjective + numeral + noun construction. arXiv preprint arXiv:2301.12564. https://doi.org/10.48550/arXiv.2301.12564
McMillan, J.H. (2018). Classroom assessment principles and practice that enhance student learning and motivation improving results. Pearson.
Miles, M.B., & Huberman, A.M. (1994). Qualitative data analysis: An expanded sourcebook. Sage.
Mondal, H., & Mondal, S. (2023). ChatGPT in academic writing: Maximizing its benefits and minimizing the risks. Indian Journal of Ophthalmology, 71(12), 3600 3606. https://doi.org/10.4103/IJO.IJO_718_23
Oğuz, B. (2024). Readability levels of narrative texts produced by artificial intelligence (ChatGPT) for 6th grade students. Artificial Intelligence in Educational Research, 1(1), 17 27. https://doi.org/10.5281/zenodo.11247479
OpenAI. (2023). GPT 4 Technical Report. arXiv preprint arXiv:2303.08774. https://doi.org/10.48550/arXiv.2303.08774
Özen, O. (2020). Examining the open-ended question preparation skills of Turkish teachers [Unpublished master’s thesis] Atatürk University.
Parker, J.L., Becker, K., & Carroca, C. (2023). ChatGPT for automated writing evaluation in scholarly writing instruction. Journal of Nursing Education, 62(12), 721 727. https://doi.org/10.3928/01484834-20231006-02
Picken, J. D. (1988). A reassessment of error-count. JALT Journal, 10(1&2), 79-90.
Qadir, J. (2023). Engineering education in the era of ChatGPT: Promise and pitfalls of generative AI for education. In Proceedings of the IEEE Global Engineering Education Conference (pp. 1-9). Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/EDUCON54358.2023.10125121
Rahman, Md. M., & Watanobe, Y. (2023). ChatGPT for education and research: Opportunities, threats, and strategies. Applied Sciences, 13, Article 5783. https://doi.org/10.3390/app13095783
Safdar, M., Siddique, N., Gulzar, A., Yasin, H., & Khan, A. (2024). Does ChatGPT generate fake results? Challenges in retrieving content through ChatGPT. Digital Library Perspectives, 40(4), 668 680. https://doi.org/10.1108/DLP-01-2024-0006
Shi, H., Aryadoust, V. (2024). A systematic review of AI-based automated written feedback research. ReCALL, 36(2), 187-209. https://doi.org/10.1017/S0958344023000265
Sok, S., & Heng, K. (2023). Opportunities, challenges, and strategies for using ChatGPT in higher education: A literature review. Journal of Digital Educational Technology, 4(1), Article ep2401. https://doi.org/10.30935/jdet/14027
Steiss, J., Tate, T., Graham, S., Cruz, J., Hebert, M., Wang, J., Moon, Y., Tseng, W., Warschauer, M., & Olson, C.B. (2024). Comparing the quality of human and ChatGPT feedback of students’ writing. Learning and Instruction, 91, Article 101894. https://doi.org/10.1016/j.learninstruc.2024.101894
Steyvers, M., Smyth, P., & Griffiths, T.L. (2025). What large language models know and what people think they know. Nature Machine Intelligence, 7, 221-231. https://doi.org/10.1038/s42256-024-00976-7
Tarakçı, R., & Tarakçı, Ç. (2024). The comparison of artificial intelligence generated texts with textbook in teaching idioms and proverb. Artificial Intelligence in Educational Science, 1(1), 39 53. https://doi.org/10.5281/zenodo.11242790
Uyar, A.C., & Büyükahıska, D. (2025). Artificial intelligence as an automated essay scoring tool: A focus on ChatGPT. International Journal of Assessment Tools in Education, 12(1), 20 32. https://doi.org/10.21449/ijate.1517994
Ülper, H. (2019). Yazılı metinleri ölçme ve değerlendirme [Measuring and evaluating written texts]. In N. Bayat (Eds.), Yazma ve eğitimi [Writing and teaching writing] (pp. 159-174). Anı Publishing.
Vargas-Murillo, A.R., Pari-Bedoya, I.N.M. de la A., & Guevara-Soto, F. de J. (2023). Challenges and opportunities of AI-assisted learning: A systematic literature review on the impact of ChatGPT usage in higher education. International Journal of Learning, Teaching and Educational Research, 22(7), 122 135. https://doi.org/10.26803/ijlter.22.7.7
Waltzer, T., Cox, R.L., & Heyman, G.D. (2023). Testing the ability of teachers and students to differentiate between essays generated by ChatGPT and high school students. Human Behavior and Emerging Technologies, 2023, Article 1923981. https://doi.org/10.1155/2023/1923981
Wang, L., Chen, X., Wang, C., Xu, L., Shadiev, R., & Li, Y. (2024). ChatGPT’s capabilities in providing feedback on undergraduate students’ argumentation: A case study. Thinking Skills and Creativity, 51, Article 101440. https://doi.org/10.1016/j.tsc.2023.101440
Weigle, S.C. (2002). Assessing writing. Cambridge University Press.
Yan, D. (2023). Impact of ChatGPT on learners in a L2 writing practicum: An exploratory investigation. Education and Information Technologies, 28(11), 13943-13967. https://doi.org/10.1007/s10639-023-11742-4
Yavuz, İ., & Bilgeç, İ. (2016). Açık Uçlu Sorularla Yapılan Matematik Sınavlarının Ölçme ve Değerlendirilmesinin İncelenmesi [Examining the measurement and evaluation of mathematics exams with open-ended questions]. Journal of Research in Education and Teaching, 5(3), 183-193.
Zhao, H., Chen, H., Yang, F., Wang, Y., & Wu, X. (2024). Explainability for large language models: A survey. ACM Computing Surveys, 15(2), Article 20. https://doi.org/10.1145/3639372

There are 57 citations in total.

Details

Primary Language	English
Subjects	Measurement and Evaluation in Education (Other)
Journal Section	Research Article
Authors	Bahadır Gülden 0000-0003-1917-8813 Huzeyfe Bilge 0000-0001-7664-488X Pınar Kanık Uysal 0000-0003-1208-9535
Submission Date	April 17, 2025
Acceptance Date	November 9, 2025
Publication Date	January 2, 2026
DOI	https://doi.org/10.21449/ijate.1678002
IZ	https://izlik.org/JA47HU42TP
Published in Issue	Year 2026 Volume: 13 Issue: 1

Cite

APA	Gülden, B., Bilge, H., & Kanık Uysal, P. (2026). Statistical and qualitative analysis of ChatGPT and human raters in preservice teachers’ writing assessment. International Journal of Assessment Tools in Education, 13(1), 248-269. https://doi.org/10.21449/ijate.1678002

Article Files

Full Text

23823 23825 23824