Artificial intelligence as an automated essay scoring tool: A focus on ChatGPT

Ahmet Can Uyar; Dilek Büyükahıska

doi:10.21449/ijate.1517994

Research Article

Artificial intelligence as an automated essay scoring tool: A focus on ChatGPT

Year 2025, Volume: 12 Issue: 1, 20 - 32, 20.02.2025

Ahmet Can Uyar , Dilek Büyükahıska

https://doi.org/10.21449/ijate.1517994

Cited By: 1

Abstract

This study explores the effectiveness of using ChatGPT, an Artificial Intelligence (AI) language model, as an Automated Essay Scoring (AES) tool for grading English as a Foreign Language (EFL) learners’ essays. The corpus consists of 50 essays representing various types including analysis, compare and contrast, descriptive, narrative, and opinion essays written by 10 EFL learners at the B2 level. Human raters and ChatGPT (4o mini version) scored the essays using the International English Language Testing System (IELTS) TASK 2 Writing band descriptors. Adopting a quantitative approach, the Wilcoxon signed-rank tests and Spearman correlation tests were employed to compare the scores generated, revealing a significant difference between the two methods of scoring, with human raters assigning higher scores than ChatGPT. Similarly, significant differences with varying degrees were also evident for each of the various types of essays, suggesting that the genre of the essays was not a parameter affecting the agreement between human raters and ChatGPT. After all, it was discussed that while ChatGPT shows promise as an AES tool, the observed disparities suggest that it has not reached sufficient proficiency for practical use. The study emphasizes the need for improvements in AI language models to meet the nuanced nature of essay evaluation in EFL contexts.

Keywords

Automated essay scoring , Artificial intelligence , ChatGPT , Foreign language writing , Writing evaluation

Ethical Statement

Sivas Cumhuriyet University, Educational Sciences Ethics Committee, 24.05.2024-431192.

References

Almusharraf, N., & Alotaibi, H. (2022). An error-analysis study from an EFL writing context: Human and automated essay scoring approaches. Technology, Knowledge and Learning, 28, 1015-1031. https://doi.org/10.1007/s10758-022-09592-z
Attali, Y. (2013). Validity and reliability of automated essay scoring. In M.D. Shermis & J.C. Burstein (Eds.), Handbook of automated essay evaluation: Current applications and new directions (pp. 181-198). Routledge.
Bui, N.M., & Barrot, J.S. (2024). ChatGPT as an automated essay scoring tool in the writing classrooms: how it compares with human scoring. Education and Information Technologies. https://doi.org/10.1007/s10639-024-12891-w
Chen, H., & Pan, J. (2022). Computer or human: a comparative study of automated evaluation scoring and instructors’ feedback on Chinese college students’ English writing. Asian-Pacific Journal of Second and Foreign Language Education, 7(34), 1 20. https://doi.org/10.1186/s40862-022-00171-4
Coghlan, D., & Brydon-Miller, M. (2014). The SAGE encyclopedia of action research. SAGE.
Creswell, J.W. (2009). Research design: Qualitative, quantitative, and mixed methods approaches. SAGE Publications.
Davies A. (2008). Assessing academic English language proficiency: 40+ years of U.K. language tests. In Fox J., Wesche M., Bayliss D., Cheng L., Turner C.E., Doe C. (Eds.), Language testing reconsidered (pp. 73–86). University of Ottawa Press.
Guo, K., & Wang, D. (2024). To resist it or to embrace it? Examining ChatGPT’s potential to support teacher feedback in EFL writing. Education and Information Technologies, 29, 8435–8463. https://doi.org/10.1007/s10639-023-12146-0
Huang, S.J. (2014). Automated versus human scoring: A case study in an EFL context. Electronic Journal of Foreign Language Teaching, 11(1), 149-164.
IELTS. (2019). Guide for educational institutions, governments, professional bodies and commercial organisations. Cambridge Assessment English, The British Council, IDP Australia. https://www.ielts.org/-/media/publications/guide-for-institutions/ielts-guide-for-institutions-2015-uk.ashx
IELTS. (2023). IELTS Task 2 Writing band descriptors (Public version). https://takeielts.britishcouncil.org/sites/default/files/ielts_writing_band_descriptors.pdf
Larson-Hall, J. (2012). How to run statistical analyses. In A. Mackey & S. M. Gass (Eds.), Research methods in second language acquisition: A practical guide (pp. 245-274). Wiley-Blackwell.
Manap, M.R., Ramli, N.F., & Kassim, A.A.M. (2019). Web 2.0 automated essay scoring application and human ESL essay assessment: A comparison study. European Journal of English Language Teaching, 5(1), 146-162. https://doi.org/10.5281/zenodo.3461784
Mason, O., & Grove-Stephenson, I. (2002). Automated free text marking with paperless school. In M. Danson (Ed.), Proceedings of the Sixth International Computer Assisted Assessment Conference (pp. 216–222). Loughborough: Loughborough University.
Mizumoto, A., & Eguchi, M. (2023). Exploring the potential of using an AI language model for automated essay scoring. Research Methods in Applied Linguistics, 2, 1 13. https://doi.org/10.1016/j.rmal.2023.100050
Page, E. (1966). The imminence of ... grading essays by computer. Phi Delta Kappan, 47(5), 238–243.
Parker, J.L., Becker, K., & Carroca, C. (2023). ChatGPT for automated writing evaluation in scholarly writing instruction. Journal of Nursing Education, 62(12), 721 727. https://doi.org/10.3928/01484834-20231006-02
Pearson, W.S. (2022). Student Engagement with Teacher Written Feedback on Rehearsal Essays Undertaken in Preparation for IELTS. Sage Open, 12(1). https://doi.org/10.1177/21582440221079842
Wang, J., & Bai, L. (2021). Unveiling the scoring validity of two Chinese automated writing evaluation systems: A quantitative study. International Journal of English Linguistics, 11(2), 68-84. https://doi.org/10.5539/0jel.v11n2p68
Willard, C.A. (2020). Statistical methods: An introduction to basic statistical concepts and analysis. Routledge.
Yancey, K.P., Laflair, G., Verardi, A., & Burstein, J. (2023). Rating short L2 essays on the CEFR scale with GPT-4. In E. Kochmar, J. Burstein, A. Horbach, R. Laarmann-Quante, N. Madnani, A. Tack, V. Yaneva, Z. Yuan, & T. Zesch (Eds.), Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 576-584). Retrieved October 2, 2024, from https://aclanthology.org/2023.bea-1.49
Zribi, R., & Smaoui, C. (2021). Automated versus human essay scoring: A comparative study. International Journal of Information Technology and Language Studies, 5(1), 62-71.

Artificial intelligence as an automated essay scoring tool: A focus on ChatGPT

Year 2025, Volume: 12 Issue: 1, 20 - 32, 20.02.2025

Ahmet Can Uyar , Dilek Büyükahıska

https://doi.org/10.21449/ijate.1517994

Cited By: 1

Abstract

Keywords

Automated essay scoring , Artificial intelligence , ChatGPT , Foreign language writing , Writing evaluation

Ethical Statement

Sivas Cumhuriyet University, Educational Sciences Ethics Committee, 24.05.2024-431192.

References

Almusharraf, N., & Alotaibi, H. (2022). An error-analysis study from an EFL writing context: Human and automated essay scoring approaches. Technology, Knowledge and Learning, 28, 1015-1031. https://doi.org/10.1007/s10758-022-09592-z
Attali, Y. (2013). Validity and reliability of automated essay scoring. In M.D. Shermis & J.C. Burstein (Eds.), Handbook of automated essay evaluation: Current applications and new directions (pp. 181-198). Routledge.
Bui, N.M., & Barrot, J.S. (2024). ChatGPT as an automated essay scoring tool in the writing classrooms: how it compares with human scoring. Education and Information Technologies. https://doi.org/10.1007/s10639-024-12891-w
Chen, H., & Pan, J. (2022). Computer or human: a comparative study of automated evaluation scoring and instructors’ feedback on Chinese college students’ English writing. Asian-Pacific Journal of Second and Foreign Language Education, 7(34), 1 20. https://doi.org/10.1186/s40862-022-00171-4
Coghlan, D., & Brydon-Miller, M. (2014). The SAGE encyclopedia of action research. SAGE.
Creswell, J.W. (2009). Research design: Qualitative, quantitative, and mixed methods approaches. SAGE Publications.
Davies A. (2008). Assessing academic English language proficiency: 40+ years of U.K. language tests. In Fox J., Wesche M., Bayliss D., Cheng L., Turner C.E., Doe C. (Eds.), Language testing reconsidered (pp. 73–86). University of Ottawa Press.
Guo, K., & Wang, D. (2024). To resist it or to embrace it? Examining ChatGPT’s potential to support teacher feedback in EFL writing. Education and Information Technologies, 29, 8435–8463. https://doi.org/10.1007/s10639-023-12146-0
Huang, S.J. (2014). Automated versus human scoring: A case study in an EFL context. Electronic Journal of Foreign Language Teaching, 11(1), 149-164.
IELTS. (2019). Guide for educational institutions, governments, professional bodies and commercial organisations. Cambridge Assessment English, The British Council, IDP Australia. https://www.ielts.org/-/media/publications/guide-for-institutions/ielts-guide-for-institutions-2015-uk.ashx
IELTS. (2023). IELTS Task 2 Writing band descriptors (Public version). https://takeielts.britishcouncil.org/sites/default/files/ielts_writing_band_descriptors.pdf
Larson-Hall, J. (2012). How to run statistical analyses. In A. Mackey & S. M. Gass (Eds.), Research methods in second language acquisition: A practical guide (pp. 245-274). Wiley-Blackwell.
Manap, M.R., Ramli, N.F., & Kassim, A.A.M. (2019). Web 2.0 automated essay scoring application and human ESL essay assessment: A comparison study. European Journal of English Language Teaching, 5(1), 146-162. https://doi.org/10.5281/zenodo.3461784
Mason, O., & Grove-Stephenson, I. (2002). Automated free text marking with paperless school. In M. Danson (Ed.), Proceedings of the Sixth International Computer Assisted Assessment Conference (pp. 216–222). Loughborough: Loughborough University.
Mizumoto, A., & Eguchi, M. (2023). Exploring the potential of using an AI language model for automated essay scoring. Research Methods in Applied Linguistics, 2, 1 13. https://doi.org/10.1016/j.rmal.2023.100050
Page, E. (1966). The imminence of ... grading essays by computer. Phi Delta Kappan, 47(5), 238–243.
Parker, J.L., Becker, K., & Carroca, C. (2023). ChatGPT for automated writing evaluation in scholarly writing instruction. Journal of Nursing Education, 62(12), 721 727. https://doi.org/10.3928/01484834-20231006-02
Pearson, W.S. (2022). Student Engagement with Teacher Written Feedback on Rehearsal Essays Undertaken in Preparation for IELTS. Sage Open, 12(1). https://doi.org/10.1177/21582440221079842
Wang, J., & Bai, L. (2021). Unveiling the scoring validity of two Chinese automated writing evaluation systems: A quantitative study. International Journal of English Linguistics, 11(2), 68-84. https://doi.org/10.5539/0jel.v11n2p68
Willard, C.A. (2020). Statistical methods: An introduction to basic statistical concepts and analysis. Routledge.
Yancey, K.P., Laflair, G., Verardi, A., & Burstein, J. (2023). Rating short L2 essays on the CEFR scale with GPT-4. In E. Kochmar, J. Burstein, A. Horbach, R. Laarmann-Quante, N. Madnani, A. Tack, V. Yaneva, Z. Yuan, & T. Zesch (Eds.), Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 576-584). Retrieved October 2, 2024, from https://aclanthology.org/2023.bea-1.49
Zribi, R., & Smaoui, C. (2021). Automated versus human essay scoring: A comparative study. International Journal of Information Technology and Language Studies, 5(1), 62-71.

There are 22 citations in total.

Details

Primary Language	English
Subjects	Measurement and Evaluation in Education (Other)
Journal Section	Articles
Authors	Ahmet Can Uyar 0000-0003-2438-9877 Dilek Büyükahıska 0000-0001-5074-7805
Early Pub Date	January 9, 2025
Publication Date	February 20, 2025
Submission Date	July 18, 2024
Acceptance Date	October 7, 2024
Published in Issue	Year 2025 Volume: 12 Issue: 1

Cite

APA	Uyar, A. C., & Büyükahıska, D. (2025). Artificial intelligence as an automated essay scoring tool: A focus on ChatGPT. International Journal of Assessment Tools in Education, 12(1), 20-32. https://doi.org/10.21449/ijate.1517994

Cited By

ChatGPT as a Stable and Fair Tool for Automated Essay Scoring

Education Sciences

https://doi.org/10.3390/educsci15080946

Article Files

Full Text

23823 23825 23824