Statistical and qualitative analysis of ChatGPT and human raters in preservice teachers’ writing assessment
Year 2026,
Volume: 13 Issue: 1, 248 - 269, 02.01.2026
Bahadır Gülden
,
Huzeyfe Bilge
,
Pınar Kanık Uysal
Abstract
Teachers spend a significant amount of time providing feedback. This study compared expert and ChatGPT assessments and feedback on written texts to determine the suitability of AI for writing skill assessments that are time-consuming to assess and provide feedback. Three experts and ChatGPT graded 14 Turkish undergraduate students’ assignments using rubric that included content, language use, vocabulary, organization, and mechanics, and justified their decisions. The study involved document review and triangulation, a qualitative design. In addition, an intraclass correlation coefficient was used to assess the consistency of the ChatGPT and the experts’ scores. All feedback was qualitatively analyzed to identify the strengths and weaknesses of the experts and their similarities with ChatGPT. Experts and ChatGPT had moderate to weak consistency in the writing subscales, while good reliability was found in the total score. Experts excelled in ‘explanatory feedback’, ‘interpretation’ and ‘experience’, while ChatGPT excelled in ‘automation and continuity’ and ‘data processing capacity’. Experts’ weaknesses included ‘limited time and energy’ and ‘comparison bias’, while ChatGPT’s weaknesses were ‘ambiguous expressions’ and ‘repetition’. The study also found that experts and ChatGPT preferred to provide constructive and supportive feedback.
Ethical Statement
Bayburt University, 4.11.2024-238376
References
-
Akaya, A.O., & Kurtuluş, A. (2011). 6. sınıf matematik dersi öğretim programının uygulanabilirliğine ilişkin öğretmen görüşleri [Teachers’ opinions about the applicability of 6th grade mathematics curriculum]. Education Sciences, 6(3), 2229-2245.
-
Altundal, B. (2024). A review of artificial intelligence and its use in education. Artificial Intelligence in Educational Research, 1(1), 28-38. https://doi.org/10.5281/zenodo.11241932
-
Applebee, A.N., & Langer, J.A. (2011). EJ extra: A snapshot of writing instruction in middle schools and high schools. English Journal, 100(6), 14-27. https://doi.org/10.58680/ej201116413
-
Baidoo-Anu, D., & Owusu Ansah, L. (2023). Education in the era of generative artificial intelligence (AI): Understanding the potential benefits of ChatGPT in promoting teaching and learning. Journal of AI, 7(1), 52-62. https://doi.org/10.61969/jai.1337500
-
Bilge, H. (2024). Comparison of the texts selected from a Turkish textbook and the texts produced by artificial intelligence chatbots in terms of vocabulary. Artificial Intelligence in Educational Research, 1(1), 1-16. https://doi.org/10.5281/zenodo.11246999
-
Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77-101. https://doi.org/10.1191/1478088706qp063oa
-
Bryant, J., Heitz, C., Sanghvi, S., & Wagle, D. (2020). How artificial intelligence will impact K-12 teachers. McKinsey & Company.
-
Bui, N.M., & Barrot, J.S. (2025). ChatGPT as an automated essay scoring tool in the writing classrooms: How it compares with human scoring. Education and Information Technologies, 30(2), 2041 2058. https://doi.org/10.1007/s10639-024-12891-w
-
Busch, P.A., & Hausvik, G.I. (2023). Too good to be true? An empirical study of ChatGPT capabilities for academic writing and implications for academic misconduct. In Proceedings of the 29th Annual Americas Conference on Information Systems (Paper 1829). Association for Information Systems. https://aisel.aisnet.org/amcis2023/sig_odis/sig_odis/21
-
Cai, Z., Duan, X., Haslett, D., Wang, S., & Pickering, M. (2024). Do large language models resemble humans in language use?. In Proceedings of the workshop on cognitive modeling and computational linguistics (pp. 37 56). Association for Computational Linguistics. https://aclanthology.org/2024.cmcl-1.4.pdf
-
Creswell, J.W. (2022). A concise introduction to mixed methods research. SAGE publications.
-
de Winter, J.C.F. (2023). Can ChatGPT pass high school exams on English language comprehension? International Journal of Artificial Intelligence in Education, 34, 915 930. https://doi.org/10.1007/s40593-023-00372-z
-
Dikli, S., & Bleyle, S. (2014). Automated essay scoring feedback for second language writers: How does it compare to instructor feedback? Assessing Writing, 22, 1 17. https://doi.org/10.1016/J.ASW.2014.03.006
-
Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608. https://doi.org/10.48550/arXiv.1702.08608
-
Flower, L., & Hayes, J.R. (1981). A cognitive process theory of writing. College Composition & Communication, 32(4), 365-387.
-
Fuchs, K. (2023). Exploring the opportunities and challenges of NLP models in higher education: Is Chat GPT a blessing or a curse? Frontiers in Education, 8, Article 1166682. https://doi.org/10.3389/feduc.2023.1166682
-
Graham, S., Hebert, M., & Harris, K.R. (2015). Formative assessment and writing. The Elementary School Journal, 115(4), 523-547. https://doi.org/10.1086/681947
-
Guo, K., & Wang, D. (2024). To resist it or to embrace it? Examining ChatGPT’s potential to support teacher feedback in EFL writing. Education and Information Technologies, 29(7), 8435 8463. https://doi.org/10.1007/s10639-023-12146-0
-
Guo, B., Zhang, X., Wang, Z., Jiang, M., Nie, J., Ding, Y., Yue, J., & Wu, Y. (2023). How close is ChatGPT to human experts? Comparison corpus, evaluation, and detection. arXiv preprint arXiv:2301.07597. https://doi.org/10.48550/arXiv.2301.07597
-
Han, Z., Battaglia, F., Udaiyar, A., Fooks, A., & Terlecky, S.R. (2024). An explorative assessment of ChatGPT as an aid in medical education: Use it with caution. Medical Teacher, 46(5), 657 664. https://doi.org/10.1080/0142159X.2023.2271159
-
Herbold, S., Hautli-Janisz, A., Heuer, U., Kikteva, Z., & Trautsch, A. (2023). AI, write an essay for me: A large-scale comparison of human-written versus ChatGPT-generated essays. arXiv preprint arXiv:2304.14276. https://doi.org/10.48550/arXiv.2304.14276
-
Huang, S.J. (2014). Automated versus human scoring: A case study in an EFL context. Electronic Journal of Foreign Language Teaching, 11, 149-164.
-
Jackaria, P.M., Hajan, B.H., & Mastul, A.R.H. (2024). A comparative analysis of the rating of college students’ essays by ChatGPT versus human raters. International Journal of Learning, Teaching and Educational Research, 23(2), 478-492. https://doi.org/10.26803/ijlter.23.2.23
-
Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., & Fung, P. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12), 1-38. https://doi.org/10.1145/3571730
-
Kadavath, S., Conerly, T., Askell, A., Henighan, T., Drain, D., Perez, E., Schiefer, N., Hatfield, D., DasSarma, N., Tran-Johnson, E., Johnston, S., El-Showk, S., Jones, A., Elhage, N., Hume, T., Chen, A., Bai, Y., Bowman, S., Fort, S. et al., (2022). Language models (mostly) know what they know. Cornell University. arXiv preprint arXiv:2207.05221. https://doi.org/10.48550/arXiv.2207.05221
-
Kanık Uysal, P., Akın Arıkan, Ç., Acar Erdol, T., Bayrak Özmutlu, E., & Akyol, H. (2022). Examining Turkish course exam questions in terms of originality, page layout, item type, item writing criteria and cognitive level. Education and Science, 47(210), 259 280. https://doi.org/10.15390/EB.2022.10896
-
Kiryakova, G., & Angelova, N. (2023). ChatGPT-A challenging tool for the university professors in their teaching practice. Education Sciences, 13(10), Article 1056. https://doi.org/10.3390/educsci13101056
-
Klyshbekova, M., & Abbott, P. (2024). ChatGPT and assessment in higher education: A magic wand or a disruptor? Electronic Journal of E-Learning, 22(2), 30-45. https://doi.org/10.34190/ejel.21.5.3114
-
Koo, T.K., & Li, M.Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155 163. https://doi.org/10.1016/j.jcm.2016.02.012
-
Kortemeyer, G. (2023). Could an artificial-intelligence agent pass an introductory physics course. Physical Review Physics Education Research, 19(1). https://doi.org/10.1103/PhysRevPhysEducRes.19.010132
-
Lo, C.K. (2023). What is the impact of ChatGPT on education? A rapid review of the literature. Education Sciences, 13, Article 410. https://doi.org/10.3390/educsci13040410
-
Mahowald, K. (2023). A discerning several thousand judgments: GPT-3 rates the article + adjective + numeral + noun construction. arXiv preprint arXiv:2301.12564. https://doi.org/10.48550/arXiv.2301.12564
-
McMillan, J.H. (2018). Classroom assessment principles and practice that enhance student learning and motivation improving results. Pearson.
-
Miles, M.B., & Huberman, A.M. (1994). Qualitative data analysis: An expanded sourcebook. Sage.
-
Mondal, H., & Mondal, S. (2023). ChatGPT in academic writing: Maximizing its benefits and minimizing the risks. Indian Journal of Ophthalmology, 71(12), 3600 3606. https://doi.org/10.4103/IJO.IJO_718_23
-
Oğuz, B. (2024). Readability levels of narrative texts produced by artificial intelligence (ChatGPT) for 6th grade students. Artificial Intelligence in Educational Research, 1(1), 17 27. https://doi.org/10.5281/zenodo.11247479
-
OpenAI. (2023). GPT 4 Technical Report. arXiv preprint arXiv:2303.08774. https://doi.org/10.48550/arXiv.2303.08774
-
Özen, O. (2020). Examining the open-ended question preparation skills of Turkish teachers [Unpublished master’s thesis] Atatürk University.
-
Parker, J.L., Becker, K., & Carroca, C. (2023). ChatGPT for automated writing evaluation in scholarly writing instruction. Journal of Nursing Education, 62(12), 721 727. https://doi.org/10.3928/01484834-20231006-02
-
Picken, J. D. (1988). A reassessment of error-count. JALT Journal, 10(1&2), 79-90.
-
Qadir, J. (2023). Engineering education in the era of ChatGPT: Promise and pitfalls of generative AI for education. In Proceedings of the IEEE Global Engineering Education Conference (pp. 1-9). Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/EDUCON54358.2023.10125121
-
Rahman, Md. M., & Watanobe, Y. (2023). ChatGPT for education and research: Opportunities, threats, and strategies. Applied Sciences, 13, Article 5783. https://doi.org/10.3390/app13095783
-
Safdar, M., Siddique, N., Gulzar, A., Yasin, H., & Khan, A. (2024). Does ChatGPT generate fake results? Challenges in retrieving content through ChatGPT. Digital Library Perspectives, 40(4), 668 680. https://doi.org/10.1108/DLP-01-2024-0006
-
Shi, H., Aryadoust, V. (2024). A systematic review of AI-based automated written feedback research. ReCALL, 36(2), 187-209. https://doi.org/10.1017/S0958344023000265
-
Sok, S., & Heng, K. (2023). Opportunities, challenges, and strategies for using ChatGPT in higher education: A literature review. Journal of Digital Educational Technology, 4(1), Article ep2401. https://doi.org/10.30935/jdet/14027
-
Steiss, J., Tate, T., Graham, S., Cruz, J., Hebert, M., Wang, J., Moon, Y., Tseng, W., Warschauer, M., & Olson, C.B. (2024). Comparing the quality of human and ChatGPT feedback of students’ writing. Learning and Instruction, 91, Article 101894. https://doi.org/10.1016/j.learninstruc.2024.101894
-
Steyvers, M., Smyth, P., & Griffiths, T.L. (2025). What large language models know and what people think they know. Nature Machine Intelligence, 7, 221-231. https://doi.org/10.1038/s42256-024-00976-7
-
Tarakçı, R., & Tarakçı, Ç. (2024). The comparison of artificial intelligence generated texts with textbook in teaching idioms and proverb. Artificial Intelligence in Educational Science, 1(1), 39 53. https://doi.org/10.5281/zenodo.11242790
-
Uyar, A.C., & Büyükahıska, D. (2025). Artificial intelligence as an automated essay scoring tool: A focus on ChatGPT. International Journal of Assessment Tools in Education, 12(1), 20 32. https://doi.org/10.21449/ijate.1517994
-
Ülper, H. (2019). Yazılı metinleri ölçme ve değerlendirme [Measuring and evaluating written texts]. In N. Bayat (Eds.), Yazma ve eğitimi [Writing and teaching writing] (pp. 159-174). Anı Publishing.
-
Vargas-Murillo, A.R., Pari-Bedoya, I.N.M. de la A., & Guevara-Soto, F. de J. (2023). Challenges and opportunities of AI-assisted learning: A systematic literature review on the impact of ChatGPT usage in higher education. International Journal of Learning, Teaching and Educational Research, 22(7), 122 135. https://doi.org/10.26803/ijlter.22.7.7
-
Waltzer, T., Cox, R.L., & Heyman, G.D. (2023). Testing the ability of teachers and students to differentiate between essays generated by ChatGPT and high school students. Human Behavior and Emerging Technologies, 2023, Article 1923981. https://doi.org/10.1155/2023/1923981
-
Wang, L., Chen, X., Wang, C., Xu, L., Shadiev, R., & Li, Y. (2024). ChatGPT’s capabilities in providing feedback on undergraduate students’ argumentation: A case study. Thinking Skills and Creativity, 51, Article 101440. https://doi.org/10.1016/j.tsc.2023.101440
-
Weigle, S.C. (2002). Assessing writing. Cambridge University Press.
-
Yan, D. (2023). Impact of ChatGPT on learners in a L2 writing practicum: An exploratory investigation. Education and Information Technologies, 28(11), 13943-13967. https://doi.org/10.1007/s10639-023-11742-4
-
Yavuz, İ., & Bilgeç, İ. (2016). Açık Uçlu Sorularla Yapılan Matematik Sınavlarının Ölçme ve Değerlendirilmesinin İncelenmesi [Examining the measurement and evaluation of mathematics exams with open-ended questions]. Journal of Research in Education and Teaching, 5(3), 183-193.
-
Zhao, H., Chen, H., Yang, F., Wang, Y., & Wu, X. (2024). Explainability for large language models: A survey. ACM Computing Surveys, 15(2), Article 20. https://doi.org/10.1145/3639372
Statistical and qualitative analysis of ChatGPT and human raters in preservice teachers’ writing assessment
Year 2026,
Volume: 13 Issue: 1, 248 - 269, 02.01.2026
Bahadır Gülden
,
Huzeyfe Bilge
,
Pınar Kanık Uysal
Abstract
Teachers spend a significant amount of time providing feedback. This study compared expert and ChatGPT assessments and feedback on written texts to determine the suitability of AI for writing skill assessments that are time-consuming to assess and provide feedback. Three experts and ChatGPT graded 14 Turkish undergraduate students’ assignments using rubric that included content, language use, vocabulary, organization, and mechanics, and justified their decisions. The study involved document review and triangulation, a qualitative design. In addition, an intraclass correlation coefficient was used to assess the consistency of the ChatGPT and the experts’ scores. All feedback was qualitatively analyzed to identify the strengths and weaknesses of the experts and their similarities with ChatGPT. Experts and ChatGPT had moderate to weak consistency in the writing subscales, while good reliability was found in the total score. Experts excelled in ‘explanatory feedback’, ‘interpretation’ and ‘experience’, while ChatGPT excelled in ‘automation and continuity’ and ‘data processing capacity’. Experts’ weaknesses included ‘limited time and energy’ and ‘comparison bias’, while ChatGPT’s weaknesses were ‘ambiguous expressions’ and ‘repetition’. The study also found that experts and ChatGPT preferred to provide constructive and supportive feedback.
Ethical Statement
Bayburt University, 4.11.2024-238376
References
-
Akaya, A.O., & Kurtuluş, A. (2011). 6. sınıf matematik dersi öğretim programının uygulanabilirliğine ilişkin öğretmen görüşleri [Teachers’ opinions about the applicability of 6th grade mathematics curriculum]. Education Sciences, 6(3), 2229-2245.
-
Altundal, B. (2024). A review of artificial intelligence and its use in education. Artificial Intelligence in Educational Research, 1(1), 28-38. https://doi.org/10.5281/zenodo.11241932
-
Applebee, A.N., & Langer, J.A. (2011). EJ extra: A snapshot of writing instruction in middle schools and high schools. English Journal, 100(6), 14-27. https://doi.org/10.58680/ej201116413
-
Baidoo-Anu, D., & Owusu Ansah, L. (2023). Education in the era of generative artificial intelligence (AI): Understanding the potential benefits of ChatGPT in promoting teaching and learning. Journal of AI, 7(1), 52-62. https://doi.org/10.61969/jai.1337500
-
Bilge, H. (2024). Comparison of the texts selected from a Turkish textbook and the texts produced by artificial intelligence chatbots in terms of vocabulary. Artificial Intelligence in Educational Research, 1(1), 1-16. https://doi.org/10.5281/zenodo.11246999
-
Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77-101. https://doi.org/10.1191/1478088706qp063oa
-
Bryant, J., Heitz, C., Sanghvi, S., & Wagle, D. (2020). How artificial intelligence will impact K-12 teachers. McKinsey & Company.
-
Bui, N.M., & Barrot, J.S. (2025). ChatGPT as an automated essay scoring tool in the writing classrooms: How it compares with human scoring. Education and Information Technologies, 30(2), 2041 2058. https://doi.org/10.1007/s10639-024-12891-w
-
Busch, P.A., & Hausvik, G.I. (2023). Too good to be true? An empirical study of ChatGPT capabilities for academic writing and implications for academic misconduct. In Proceedings of the 29th Annual Americas Conference on Information Systems (Paper 1829). Association for Information Systems. https://aisel.aisnet.org/amcis2023/sig_odis/sig_odis/21
-
Cai, Z., Duan, X., Haslett, D., Wang, S., & Pickering, M. (2024). Do large language models resemble humans in language use?. In Proceedings of the workshop on cognitive modeling and computational linguistics (pp. 37 56). Association for Computational Linguistics. https://aclanthology.org/2024.cmcl-1.4.pdf
-
Creswell, J.W. (2022). A concise introduction to mixed methods research. SAGE publications.
-
de Winter, J.C.F. (2023). Can ChatGPT pass high school exams on English language comprehension? International Journal of Artificial Intelligence in Education, 34, 915 930. https://doi.org/10.1007/s40593-023-00372-z
-
Dikli, S., & Bleyle, S. (2014). Automated essay scoring feedback for second language writers: How does it compare to instructor feedback? Assessing Writing, 22, 1 17. https://doi.org/10.1016/J.ASW.2014.03.006
-
Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608. https://doi.org/10.48550/arXiv.1702.08608
-
Flower, L., & Hayes, J.R. (1981). A cognitive process theory of writing. College Composition & Communication, 32(4), 365-387.
-
Fuchs, K. (2023). Exploring the opportunities and challenges of NLP models in higher education: Is Chat GPT a blessing or a curse? Frontiers in Education, 8, Article 1166682. https://doi.org/10.3389/feduc.2023.1166682
-
Graham, S., Hebert, M., & Harris, K.R. (2015). Formative assessment and writing. The Elementary School Journal, 115(4), 523-547. https://doi.org/10.1086/681947
-
Guo, K., & Wang, D. (2024). To resist it or to embrace it? Examining ChatGPT’s potential to support teacher feedback in EFL writing. Education and Information Technologies, 29(7), 8435 8463. https://doi.org/10.1007/s10639-023-12146-0
-
Guo, B., Zhang, X., Wang, Z., Jiang, M., Nie, J., Ding, Y., Yue, J., & Wu, Y. (2023). How close is ChatGPT to human experts? Comparison corpus, evaluation, and detection. arXiv preprint arXiv:2301.07597. https://doi.org/10.48550/arXiv.2301.07597
-
Han, Z., Battaglia, F., Udaiyar, A., Fooks, A., & Terlecky, S.R. (2024). An explorative assessment of ChatGPT as an aid in medical education: Use it with caution. Medical Teacher, 46(5), 657 664. https://doi.org/10.1080/0142159X.2023.2271159
-
Herbold, S., Hautli-Janisz, A., Heuer, U., Kikteva, Z., & Trautsch, A. (2023). AI, write an essay for me: A large-scale comparison of human-written versus ChatGPT-generated essays. arXiv preprint arXiv:2304.14276. https://doi.org/10.48550/arXiv.2304.14276
-
Huang, S.J. (2014). Automated versus human scoring: A case study in an EFL context. Electronic Journal of Foreign Language Teaching, 11, 149-164.
-
Jackaria, P.M., Hajan, B.H., & Mastul, A.R.H. (2024). A comparative analysis of the rating of college students’ essays by ChatGPT versus human raters. International Journal of Learning, Teaching and Educational Research, 23(2), 478-492. https://doi.org/10.26803/ijlter.23.2.23
-
Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., & Fung, P. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12), 1-38. https://doi.org/10.1145/3571730
-
Kadavath, S., Conerly, T., Askell, A., Henighan, T., Drain, D., Perez, E., Schiefer, N., Hatfield, D., DasSarma, N., Tran-Johnson, E., Johnston, S., El-Showk, S., Jones, A., Elhage, N., Hume, T., Chen, A., Bai, Y., Bowman, S., Fort, S. et al., (2022). Language models (mostly) know what they know. Cornell University. arXiv preprint arXiv:2207.05221. https://doi.org/10.48550/arXiv.2207.05221
-
Kanık Uysal, P., Akın Arıkan, Ç., Acar Erdol, T., Bayrak Özmutlu, E., & Akyol, H. (2022). Examining Turkish course exam questions in terms of originality, page layout, item type, item writing criteria and cognitive level. Education and Science, 47(210), 259 280. https://doi.org/10.15390/EB.2022.10896
-
Kiryakova, G., & Angelova, N. (2023). ChatGPT-A challenging tool for the university professors in their teaching practice. Education Sciences, 13(10), Article 1056. https://doi.org/10.3390/educsci13101056
-
Klyshbekova, M., & Abbott, P. (2024). ChatGPT and assessment in higher education: A magic wand or a disruptor? Electronic Journal of E-Learning, 22(2), 30-45. https://doi.org/10.34190/ejel.21.5.3114
-
Koo, T.K., & Li, M.Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155 163. https://doi.org/10.1016/j.jcm.2016.02.012
-
Kortemeyer, G. (2023). Could an artificial-intelligence agent pass an introductory physics course. Physical Review Physics Education Research, 19(1). https://doi.org/10.1103/PhysRevPhysEducRes.19.010132
-
Lo, C.K. (2023). What is the impact of ChatGPT on education? A rapid review of the literature. Education Sciences, 13, Article 410. https://doi.org/10.3390/educsci13040410
-
Mahowald, K. (2023). A discerning several thousand judgments: GPT-3 rates the article + adjective + numeral + noun construction. arXiv preprint arXiv:2301.12564. https://doi.org/10.48550/arXiv.2301.12564
-
McMillan, J.H. (2018). Classroom assessment principles and practice that enhance student learning and motivation improving results. Pearson.
-
Miles, M.B., & Huberman, A.M. (1994). Qualitative data analysis: An expanded sourcebook. Sage.
-
Mondal, H., & Mondal, S. (2023). ChatGPT in academic writing: Maximizing its benefits and minimizing the risks. Indian Journal of Ophthalmology, 71(12), 3600 3606. https://doi.org/10.4103/IJO.IJO_718_23
-
Oğuz, B. (2024). Readability levels of narrative texts produced by artificial intelligence (ChatGPT) for 6th grade students. Artificial Intelligence in Educational Research, 1(1), 17 27. https://doi.org/10.5281/zenodo.11247479
-
OpenAI. (2023). GPT 4 Technical Report. arXiv preprint arXiv:2303.08774. https://doi.org/10.48550/arXiv.2303.08774
-
Özen, O. (2020). Examining the open-ended question preparation skills of Turkish teachers [Unpublished master’s thesis] Atatürk University.
-
Parker, J.L., Becker, K., & Carroca, C. (2023). ChatGPT for automated writing evaluation in scholarly writing instruction. Journal of Nursing Education, 62(12), 721 727. https://doi.org/10.3928/01484834-20231006-02
-
Picken, J. D. (1988). A reassessment of error-count. JALT Journal, 10(1&2), 79-90.
-
Qadir, J. (2023). Engineering education in the era of ChatGPT: Promise and pitfalls of generative AI for education. In Proceedings of the IEEE Global Engineering Education Conference (pp. 1-9). Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/EDUCON54358.2023.10125121
-
Rahman, Md. M., & Watanobe, Y. (2023). ChatGPT for education and research: Opportunities, threats, and strategies. Applied Sciences, 13, Article 5783. https://doi.org/10.3390/app13095783
-
Safdar, M., Siddique, N., Gulzar, A., Yasin, H., & Khan, A. (2024). Does ChatGPT generate fake results? Challenges in retrieving content through ChatGPT. Digital Library Perspectives, 40(4), 668 680. https://doi.org/10.1108/DLP-01-2024-0006
-
Shi, H., Aryadoust, V. (2024). A systematic review of AI-based automated written feedback research. ReCALL, 36(2), 187-209. https://doi.org/10.1017/S0958344023000265
-
Sok, S., & Heng, K. (2023). Opportunities, challenges, and strategies for using ChatGPT in higher education: A literature review. Journal of Digital Educational Technology, 4(1), Article ep2401. https://doi.org/10.30935/jdet/14027
-
Steiss, J., Tate, T., Graham, S., Cruz, J., Hebert, M., Wang, J., Moon, Y., Tseng, W., Warschauer, M., & Olson, C.B. (2024). Comparing the quality of human and ChatGPT feedback of students’ writing. Learning and Instruction, 91, Article 101894. https://doi.org/10.1016/j.learninstruc.2024.101894
-
Steyvers, M., Smyth, P., & Griffiths, T.L. (2025). What large language models know and what people think they know. Nature Machine Intelligence, 7, 221-231. https://doi.org/10.1038/s42256-024-00976-7
-
Tarakçı, R., & Tarakçı, Ç. (2024). The comparison of artificial intelligence generated texts with textbook in teaching idioms and proverb. Artificial Intelligence in Educational Science, 1(1), 39 53. https://doi.org/10.5281/zenodo.11242790
-
Uyar, A.C., & Büyükahıska, D. (2025). Artificial intelligence as an automated essay scoring tool: A focus on ChatGPT. International Journal of Assessment Tools in Education, 12(1), 20 32. https://doi.org/10.21449/ijate.1517994
-
Ülper, H. (2019). Yazılı metinleri ölçme ve değerlendirme [Measuring and evaluating written texts]. In N. Bayat (Eds.), Yazma ve eğitimi [Writing and teaching writing] (pp. 159-174). Anı Publishing.
-
Vargas-Murillo, A.R., Pari-Bedoya, I.N.M. de la A., & Guevara-Soto, F. de J. (2023). Challenges and opportunities of AI-assisted learning: A systematic literature review on the impact of ChatGPT usage in higher education. International Journal of Learning, Teaching and Educational Research, 22(7), 122 135. https://doi.org/10.26803/ijlter.22.7.7
-
Waltzer, T., Cox, R.L., & Heyman, G.D. (2023). Testing the ability of teachers and students to differentiate between essays generated by ChatGPT and high school students. Human Behavior and Emerging Technologies, 2023, Article 1923981. https://doi.org/10.1155/2023/1923981
-
Wang, L., Chen, X., Wang, C., Xu, L., Shadiev, R., & Li, Y. (2024). ChatGPT’s capabilities in providing feedback on undergraduate students’ argumentation: A case study. Thinking Skills and Creativity, 51, Article 101440. https://doi.org/10.1016/j.tsc.2023.101440
-
Weigle, S.C. (2002). Assessing writing. Cambridge University Press.
-
Yan, D. (2023). Impact of ChatGPT on learners in a L2 writing practicum: An exploratory investigation. Education and Information Technologies, 28(11), 13943-13967. https://doi.org/10.1007/s10639-023-11742-4
-
Yavuz, İ., & Bilgeç, İ. (2016). Açık Uçlu Sorularla Yapılan Matematik Sınavlarının Ölçme ve Değerlendirilmesinin İncelenmesi [Examining the measurement and evaluation of mathematics exams with open-ended questions]. Journal of Research in Education and Teaching, 5(3), 183-193.
-
Zhao, H., Chen, H., Yang, F., Wang, Y., & Wu, X. (2024). Explainability for large language models: A survey. ACM Computing Surveys, 15(2), Article 20. https://doi.org/10.1145/3639372