Araştırma Makalesi
BibTex RIS Kaynak Göster

Artificial intelligence as an automated essay scoring tool: A focus on ChatGPT

Yıl 2025, Cilt: 12 Sayı: 1, 20 - 32
https://doi.org/10.21449/ijate.1517994

Öz

This study explores the effectiveness of using ChatGPT, an Artificial Intelligence (AI) language model, as an Automated Essay Scoring (AES) tool for grading English as a Foreign Language (EFL) learners’ essays. The corpus consists of 50 essays representing various types including analysis, compare and contrast, descriptive, narrative, and opinion essays written by 10 EFL learners at the B2 level. Human raters and ChatGPT (4o mini version) scored the essays using the International English Language Testing System (IELTS) TASK 2 Writing band descriptors. Adopting a quantitative approach, the Wilcoxon signed-rank tests and Spearman correlation tests were employed to compare the scores generated, revealing a significant difference between the two methods of scoring, with human raters assigning higher scores than ChatGPT. Similarly, significant differences with varying degrees were also evident for each of the various types of essays, suggesting that the genre of the essays was not a parameter affecting the agreement between human raters and ChatGPT. After all, it was discussed that while ChatGPT shows promise as an AES tool, the observed disparities suggest that it has not reached sufficient proficiency for practical use. The study emphasizes the need for improvements in AI language models to meet the nuanced nature of essay evaluation in EFL contexts.

Etik Beyan

Sivas Cumhuriyet University, Educational Sciences Ethics Committee, 24.05.2024-431192.

Kaynakça

  • Almusharraf, N., & Alotaibi, H. (2022). An error-analysis study from an EFL writing context: Human and automated essay scoring approaches. Technology, Knowledge and Learning, 28, 1015-1031. https://doi.org/10.1007/s10758-022-09592-z
  • Attali, Y. (2013). Validity and reliability of automated essay scoring. In M.D. Shermis & J.C. Burstein (Eds.), Handbook of automated essay evaluation: Current applications and new directions (pp. 181-198). Routledge.
  • Bui, N.M., & Barrot, J.S. (2024). ChatGPT as an automated essay scoring tool in the writing classrooms: how it compares with human scoring. Education and Information Technologies. https://doi.org/10.1007/s10639-024-12891-w
  • Chen, H., & Pan, J. (2022). Computer or human: a comparative study of automated evaluation scoring and instructors’ feedback on Chinese college students’ English writing. Asian-Pacific Journal of Second and Foreign Language Education, 7(34), 1 20. https://doi.org/10.1186/s40862-022-00171-4
  • Coghlan, D., & Brydon-Miller, M. (2014). The SAGE encyclopedia of action research. SAGE.
  • Creswell, J.W. (2009). Research design: Qualitative, quantitative, and mixed methods approaches. SAGE Publications.
  • Davies A. (2008). Assessing academic English language proficiency: 40+ years of U.K. language tests. In Fox J., Wesche M., Bayliss D., Cheng L., Turner C.E., Doe C. (Eds.), Language testing reconsidered (pp. 73–86). University of Ottawa Press.
  • Guo, K., & Wang, D. (2024). To resist it or to embrace it? Examining ChatGPT’s potential to support teacher feedback in EFL writing. Education and Information Technologies, 29, 8435–8463. https://doi.org/10.1007/s10639-023-12146-0
  • Huang, S.J. (2014). Automated versus human scoring: A case study in an EFL context. Electronic Journal of Foreign Language Teaching, 11(1), 149-164.
  • IELTS. (2019). Guide for educational institutions, governments, professional bodies and commercial organisations. Cambridge Assessment English, The British Council, IDP Australia. https://www.ielts.org/-/media/publications/guide-for-institutions/ielts-guide-for-institutions-2015-uk.ashx
  • IELTS. (2023). IELTS Task 2 Writing band descriptors (Public version). https://takeielts.britishcouncil.org/sites/default/files/ielts_writing_band_descriptors.pdf
  • Larson-Hall, J. (2012). How to run statistical analyses. In A. Mackey & S. M. Gass (Eds.), Research methods in second language acquisition: A practical guide (pp. 245-274). Wiley-Blackwell.
  • Manap, M.R., Ramli, N.F., & Kassim, A.A.M. (2019). Web 2.0 automated essay scoring application and human ESL essay assessment: A comparison study. European Journal of English Language Teaching, 5(1), 146-162. https://doi.org/10.5281/zenodo.3461784
  • Mason, O., & Grove-Stephenson, I. (2002). Automated free text marking with paperless school. In M. Danson (Ed.), Proceedings of the Sixth International Computer Assisted Assessment Conference (pp. 216–222). Loughborough: Loughborough University.
  • Mizumoto, A., & Eguchi, M. (2023). Exploring the potential of using an AI language model for automated essay scoring. Research Methods in Applied Linguistics, 2, 1 13. https://doi.org/10.1016/j.rmal.2023.100050
  • Page, E. (1966). The imminence of ... grading essays by computer. Phi Delta Kappan, 47(5), 238–243.
  • Parker, J.L., Becker, K., & Carroca, C. (2023). ChatGPT for automated writing evaluation in scholarly writing instruction. Journal of Nursing Education, 62(12), 721 727. https://doi.org/10.3928/01484834-20231006-02
  • Pearson, W.S. (2022). Student Engagement with Teacher Written Feedback on Rehearsal Essays Undertaken in Preparation for IELTS. Sage Open, 12(1). https://doi.org/10.1177/21582440221079842
  • Wang, J., & Bai, L. (2021). Unveiling the scoring validity of two Chinese automated writing evaluation systems: A quantitative study. International Journal of English Linguistics, 11(2), 68-84. https://doi.org/10.5539/0jel.v11n2p68
  • Willard, C.A. (2020). Statistical methods: An introduction to basic statistical concepts and analysis. Routledge.
  • Yancey, K.P., Laflair, G., Verardi, A., & Burstein, J. (2023). Rating short L2 essays on the CEFR scale with GPT-4. In E. Kochmar, J. Burstein, A. Horbach, R. Laarmann-Quante, N. Madnani, A. Tack, V. Yaneva, Z. Yuan, & T. Zesch (Eds.), Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 576-584). Retrieved October 2, 2024, from https://aclanthology.org/2023.bea-1.49
  • Zribi, R., & Smaoui, C. (2021). Automated versus human essay scoring: A comparative study. International Journal of Information Technology and Language Studies, 5(1), 62-71.

Artificial intelligence as an automated essay scoring tool: A focus on ChatGPT

Yıl 2025, Cilt: 12 Sayı: 1, 20 - 32
https://doi.org/10.21449/ijate.1517994

Öz

This study explores the effectiveness of using ChatGPT, an Artificial Intelligence (AI) language model, as an Automated Essay Scoring (AES) tool for grading English as a Foreign Language (EFL) learners’ essays. The corpus consists of 50 essays representing various types including analysis, compare and contrast, descriptive, narrative, and opinion essays written by 10 EFL learners at the B2 level. Human raters and ChatGPT (4o mini version) scored the essays using the International English Language Testing System (IELTS) TASK 2 Writing band descriptors. Adopting a quantitative approach, the Wilcoxon signed-rank tests and Spearman correlation tests were employed to compare the scores generated, revealing a significant difference between the two methods of scoring, with human raters assigning higher scores than ChatGPT. Similarly, significant differences with varying degrees were also evident for each of the various types of essays, suggesting that the genre of the essays was not a parameter affecting the agreement between human raters and ChatGPT. After all, it was discussed that while ChatGPT shows promise as an AES tool, the observed disparities suggest that it has not reached sufficient proficiency for practical use. The study emphasizes the need for improvements in AI language models to meet the nuanced nature of essay evaluation in EFL contexts.

Etik Beyan

Sivas Cumhuriyet University, Educational Sciences Ethics Committee, 24.05.2024-431192.

Kaynakça

  • Almusharraf, N., & Alotaibi, H. (2022). An error-analysis study from an EFL writing context: Human and automated essay scoring approaches. Technology, Knowledge and Learning, 28, 1015-1031. https://doi.org/10.1007/s10758-022-09592-z
  • Attali, Y. (2013). Validity and reliability of automated essay scoring. In M.D. Shermis & J.C. Burstein (Eds.), Handbook of automated essay evaluation: Current applications and new directions (pp. 181-198). Routledge.
  • Bui, N.M., & Barrot, J.S. (2024). ChatGPT as an automated essay scoring tool in the writing classrooms: how it compares with human scoring. Education and Information Technologies. https://doi.org/10.1007/s10639-024-12891-w
  • Chen, H., & Pan, J. (2022). Computer or human: a comparative study of automated evaluation scoring and instructors’ feedback on Chinese college students’ English writing. Asian-Pacific Journal of Second and Foreign Language Education, 7(34), 1 20. https://doi.org/10.1186/s40862-022-00171-4
  • Coghlan, D., & Brydon-Miller, M. (2014). The SAGE encyclopedia of action research. SAGE.
  • Creswell, J.W. (2009). Research design: Qualitative, quantitative, and mixed methods approaches. SAGE Publications.
  • Davies A. (2008). Assessing academic English language proficiency: 40+ years of U.K. language tests. In Fox J., Wesche M., Bayliss D., Cheng L., Turner C.E., Doe C. (Eds.), Language testing reconsidered (pp. 73–86). University of Ottawa Press.
  • Guo, K., & Wang, D. (2024). To resist it or to embrace it? Examining ChatGPT’s potential to support teacher feedback in EFL writing. Education and Information Technologies, 29, 8435–8463. https://doi.org/10.1007/s10639-023-12146-0
  • Huang, S.J. (2014). Automated versus human scoring: A case study in an EFL context. Electronic Journal of Foreign Language Teaching, 11(1), 149-164.
  • IELTS. (2019). Guide for educational institutions, governments, professional bodies and commercial organisations. Cambridge Assessment English, The British Council, IDP Australia. https://www.ielts.org/-/media/publications/guide-for-institutions/ielts-guide-for-institutions-2015-uk.ashx
  • IELTS. (2023). IELTS Task 2 Writing band descriptors (Public version). https://takeielts.britishcouncil.org/sites/default/files/ielts_writing_band_descriptors.pdf
  • Larson-Hall, J. (2012). How to run statistical analyses. In A. Mackey & S. M. Gass (Eds.), Research methods in second language acquisition: A practical guide (pp. 245-274). Wiley-Blackwell.
  • Manap, M.R., Ramli, N.F., & Kassim, A.A.M. (2019). Web 2.0 automated essay scoring application and human ESL essay assessment: A comparison study. European Journal of English Language Teaching, 5(1), 146-162. https://doi.org/10.5281/zenodo.3461784
  • Mason, O., & Grove-Stephenson, I. (2002). Automated free text marking with paperless school. In M. Danson (Ed.), Proceedings of the Sixth International Computer Assisted Assessment Conference (pp. 216–222). Loughborough: Loughborough University.
  • Mizumoto, A., & Eguchi, M. (2023). Exploring the potential of using an AI language model for automated essay scoring. Research Methods in Applied Linguistics, 2, 1 13. https://doi.org/10.1016/j.rmal.2023.100050
  • Page, E. (1966). The imminence of ... grading essays by computer. Phi Delta Kappan, 47(5), 238–243.
  • Parker, J.L., Becker, K., & Carroca, C. (2023). ChatGPT for automated writing evaluation in scholarly writing instruction. Journal of Nursing Education, 62(12), 721 727. https://doi.org/10.3928/01484834-20231006-02
  • Pearson, W.S. (2022). Student Engagement with Teacher Written Feedback on Rehearsal Essays Undertaken in Preparation for IELTS. Sage Open, 12(1). https://doi.org/10.1177/21582440221079842
  • Wang, J., & Bai, L. (2021). Unveiling the scoring validity of two Chinese automated writing evaluation systems: A quantitative study. International Journal of English Linguistics, 11(2), 68-84. https://doi.org/10.5539/0jel.v11n2p68
  • Willard, C.A. (2020). Statistical methods: An introduction to basic statistical concepts and analysis. Routledge.
  • Yancey, K.P., Laflair, G., Verardi, A., & Burstein, J. (2023). Rating short L2 essays on the CEFR scale with GPT-4. In E. Kochmar, J. Burstein, A. Horbach, R. Laarmann-Quante, N. Madnani, A. Tack, V. Yaneva, Z. Yuan, & T. Zesch (Eds.), Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 576-584). Retrieved October 2, 2024, from https://aclanthology.org/2023.bea-1.49
  • Zribi, R., & Smaoui, C. (2021). Automated versus human essay scoring: A comparative study. International Journal of Information Technology and Language Studies, 5(1), 62-71.
Toplam 22 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Eğitimde Ölçme ve Değerlendirme (Diğer)
Bölüm Makaleler
Yazarlar

Ahmet Can Uyar 0000-0003-2438-9877

Dilek Büyükahıska 0000-0001-5074-7805

Erken Görünüm Tarihi 9 Ocak 2025
Yayımlanma Tarihi
Gönderilme Tarihi 18 Temmuz 2024
Kabul Tarihi 7 Ekim 2024
Yayımlandığı Sayı Yıl 2025 Cilt: 12 Sayı: 1

Kaynak Göster

APA Uyar, A. C., & Büyükahıska, D. (2025). Artificial intelligence as an automated essay scoring tool: A focus on ChatGPT. International Journal of Assessment Tools in Education, 12(1), 20-32. https://doi.org/10.21449/ijate.1517994

23823             23825             23824