Automated Assessment of Students' Critical Writing Skills with ChatGPT

Serdar Tekin; Şeyhmus Aydoğdu

Research Article

Automated Assessment of Students' Critical Writing Skills with ChatGPT

Year 2025, Volume: 6 Issue: 2, 343 - 357

Abstract

Critical writing, a subskill of critical thinking, is a crucial skill for students to obtain in their education life. Since these skills require high-level cognitive skills such as analysis and evaluation, open-ended questions are used to evaluate students. Automated essay scoring (AES) tools can be used to overcome the difficulties in evaluating open-ended questions. This study aims to investigate the reliability of ChatGPT 3.5 as an AES tool for evaluating critical writing. It examines variations in average scores between a human rater and ChatGPT across diverse critical writing criteria, utilizing 59 essays from tertiary-level students majoring in teaching English as a foreign language. Reliability between raters was determined by intraclass correlation coefficients and the average score difference between raters was determined by Repeated Measures ANOVA. The findings indicate that ChatGPT, as an AES tool, demonstrates low reliability in assessing critical writing skills, suggesting its current role as a supplementary tool rather than a replacement for human raters. It was also found that ChatGPT tends to give higher scores than the human rater. The discussion aligns the results with existing literature, proposing future research avenues to leverage ChatGPT's potential as a supplementary tool for enhancing critical writing skills.

Keywords

Critical writing skills , Automated essay scoring (AES) , ChatGPT , Artificial intelligence

Ethical Statement

The research was carried out with the approval of Nevşehir Hacı Bektaş Veli University Ethics Commission dated 27.09.2023 and numbered 2023.11.269.

References

Almusharraf, N., & Alotaibi, H. (2023). An error-analysis study from an EFL writing context: Human and Automated Essay Scoring Approaches. Technology, Knowledge and Learning, 28(3), 1015-1031. https://doi.org/10.1007/s10758-022-09592-z
Attali, Y. (2013). Validity and reliability of automated essay scoring. In M. D. Shermis, & J. Burstein (Eds.), Handbook of automated essay evaluation: Current applications and new directions (pp. 181-198). Routledge.
Attali, Y., Lewis, W., & Steier, M. (2013). Scoring with the computer: Alternative procedures for improving the reliability of holistic essay scoring. Language Testing, 30(1), 125-141. https://doi.org/10.1177/0265532212452396
Barnet, S., Bedau, H., & O’Hara, J. (2017). Critical thinking, reading, and writing: A brief guide to argument. Macmillan.
Barrot, J. S. (2023). Trends in automated writing evaluation systems research for teaching, learning, and assessment: A bibliometric analysis. Education and Information Technologies. https://doi.org/10.1007/s10639-023-12083-y
Bean, J. C., & Melzer, D. (2021). Engaging ideas: The professor’s guide to integrating writing, critical thinking, and active learning in the classroom. Jossey-Bass.
British Educational Research Association (BERA) (2024). Ethical Guidelines for Educational Research. Available at: http://www.bera.ac.uk/publication/ethical-guidelines-for-educational-research-2024
Bui, N. M., & Barrot, J. S. (2024). ChatGPT as an automated essay scoring tool in the writing classrooms: how it compares with human scoring. Education and Information Technologies, 30(2), 2041-2058. https://doi.org/10.1007/s10639-024-12891-w
Canagarajah, A. S. (2012). Understanding critical writing. In Luria, H., Seymour, D. M., & Smoke, T. (Eds.) Language and Linguistics in context (pp. 307-314). Lawrence Erlbaum Associates.
Chen, X., Zhou, Z., & Prado, M. (2025). ChatGPT-3.5 as an automatic scoring system and feedback provider in IELTS exams. International Journal of Assessment Tools in Education, 12(1), 62-77. https://doi.org/10.21449/ijate.1496193
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Lawrence Erlbaum Associates.
Dikli, S., & Bleyle, S. (2014). Automated essay scoring feedback for second language writers: How does it compare to instructor feedback? Assessing Writing, 22, 1-17. https://doi.org/10.1016/j.asw.2014.03.006
Graff, G., & Birkenstein, C. (2018). They say/I say: The moves that matter in academic writing. W. W. Norton & Company.
Hoang, G. T. L., & Kunnan, A. J. (2016). Automated essay evaluation for English language learners: A case study of MY access. Language Assessment Quarterly, 13(4), 359-376. https://doi.org/10.1080/15434303.2016.1230121
Huang, S. Y. (2012). The integration of ‘critical’ and ‘literacy’ education in the EFL curriculum: Expanding the possibilities of critical writing practices. Language, Culture and Curriculum, 25(3), 283-298. https://doi.org/10.1080/07908318.2012.723715
Huawei, S., & Aryadoust, V. (2023). A systematic review of automated writing evaluation systems. Education and Information Technologies, 28(1), 771-795. https://doi.org/10.1007/s10639-022-11200-7
Hussein, M. A., Hassan, H., & Nassef, M. (2019). Automated language essay scoring systems: A literature review. PeerJ Computer Science, 5, 1-24. https://doi.org/10.7287/peerj.preprints.27715v1
Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155-163. https://doi.org/10.1016/j.jcm.2016.02.012
Latif, E., & Zhai, X. (2024). Fine-tuning ChatGPT for automatic scoring. Computers and Education: Artificial Intelligence, 6, 1-10. https://doi.org/10.1016/j.caeai.2024.100210
Lee, J. (2008). Is test-driven external accountability effective? Synthesizing the evidence from cross-state causal-comparative and correlational studies. Review of educational research, 78(3), 608-644. https://doi.org/10.3102/0034654308324427
Li, Z. (2021). Teachers in automated writing evaluation (AWE) system-supported ESL writing classes: Perception, implementation, and influence. System, 99, 1-14. https://doi.org/10.1016/j.system.2021.102505
Liu, O. L., Frankel, L., & Roohr, K. C. (2014). Assessing critical thinking in higher education: Current state and directions for next‐generation assessment. ETS Research Report Series, (1), 1-23. https://doi.org/10.1002/ets2.12009
Manning, J., Baldwin, J., & Powell, N. (2025). Human versus machine: The effectiveness of ChatGPT in automated essay scoring. Innovations in Education and Teaching International. 62(2), 1-14. https://doi.org/10.1080/14703297.2025.2469089
Mizumoto, A., & Eguchi, M. (2023). Exploring the potential of using an AI language model for automated essay scoring. Research Methods in Applied Linguistics, 2(2), 100050. https://doi.org/10.1016/j.rmal.2023.100050
Moore, K. A., Rutherford, C., & Crawford, K. A. (2016). Supporting postsecondary English language learners’ writing proficiency using technological tools. Journal of International Students, 6(4), 857-872.
Nosich, G. (2022). Critical writing: A guide to writing a paper using the concepts and processes of critical thinking: Rowman & Littlefield.
Pithers, R. T., & Soden, R. (2000). Critical thinking in education: A review. Educational research, 42(3), 237-249. https://doi.org/10.1080/001318800440579
Powers, D. E., Burstein, J. C., Chodorow, M. S., Fowles, M. E., & Kukich, K. (2002). Comparing the validity of automated and human scoring of essays. Journal of Educational Computing Research, 26(4), 407-425. https://doi.org/10.2190/CX92-7WKV-N7WC-JL0A
Ramesh, D., & Sanampudi, S. K. (2022). An automated essay scoring systems: A systematic literature review. Artificial Intelligence Review, 55(3), 2495-2527. https://doi.org/10.1007/s10462-021-10068-2
Ranalli, J. (2018). Automated written corrective feedback: How well can students make use of it? Computer Assisted Language Learning, 31(7), 653-674. https://doi.org/10.1080/09588221.2018.1428994
Shermis, M. D. (2014). State-of-the-art automated essay scoring: Competition, results, and future directions from a United States demonstration. Assessing Writing, 20, 53-76. https://doi.org/10.1016/j.asw.2013.04.001
Shermis, M. D., & Burstein, J. C. (2003). Automated essay scoring: A cross-disciplinary perspective. Routledge.
Shin, D., & Lee, J. H. (2024). Exploratory study on the potential of ChatGPT as a rater of second language writing. Education and Information Technologies, 29(18), 24735-24757. https://doi.org/10.1007/s10639-024-12817-6
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420-428. https://doi.org/10.1037/0033-2909.86.2.420
Spector, J. M., & Ma, S. (2019). Inquiry and critical thinking skills for the next generation: from artificial intelligence back to human intelligence. Smart Learning Environments, 6(8), 1-11. https://doi.org/10.1186/s40561-019-0088-z
Strobl, C., Ailhaud, E., Benetos, K., Devitt, A., Kruse, O., Proske, A., & Rapp, C. (2019). Digital support for academic writing: A review of technologies and pedagogies. Computers & Education, 131, 33-48. https://doi.org/10.1016/j.compedu.2018.12.005
Susanti, M. N. I., Ramadhan, A., & Warnars, H. L. H. S. (2023). Automatic essay exam scoring system: A systematic literature review. Procedia Computer Science, 216, 531-538. https://doi.org/10.1016/j.procs.2022.12.166
Tsai, C. Y., Lin, Y. T., & Brown, I. K. (2024). Impacts of ChatGPT-assisted writing for EFL English majors: Feasibility and challenges. Education and information technologies, 29(2), 1-19. https://doi.org/10.1007/s10639-024-12722-y
Uyar, A. C., & Büyükahıska, D. (2025). Artificial intelligence as an automated essay scoring tool: A focus on ChatGPT. International Journal of Assessment Tools in Education, 12(1), 20-32. https://doi.org/10.21449/ijate.1517994
Yamashita, T. (2024). An application of many-facet Rasch measurement to evaluate automated essay scoring: A case of ChatGPT-4.0. Research Methods in Applied Linguistics, 3(3), 1-14. https://doi.org/10.1016/j.rmal.2024.100133
Yavuz, F., Çelik, Ö., & Çelik, G. Y. (2025). Utilizing large language models for EFL essay grading: An examination of reliability and validity in rubric-based assessments. British Journal of Educational Technology, 56(1), 150-166. https://doi.org/10.1111/bjet.13494

There are 41 citations in total.

Details

Primary Language	English
Subjects	Educational Technology and Computing
Journal Section	Articles
Authors	Serdar Tekin 0000-0003-4625-4324 Şeyhmus Aydoğdu 0000-0002-9075-8055
Early Pub Date	September 3, 2025
Publication Date	October 1, 2025
Submission Date	July 6, 2025
Acceptance Date	August 31, 2025
Published in Issue	Year 2025 Volume: 6 Issue: 2

Cite

APA	Tekin, S., & Aydoğdu, Ş. (2025). Automated Assessment of Students’ Critical Writing Skills with ChatGPT. International Journal of Educational Studies and Policy, 6(2), 343-357.

Download Cover Image

Article Files

Full Text

Creative Commons License

All content published in the International Journal of Educational Studies and Policy (IJESP) is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).