ChatGPT-3.5 as an automatic scoring system and feedback provider in IELTS exams
Year 2025,
Volume: 12 Issue: 1, 62 - 77
Xinming Chen
,
Ziqian Zhou
Malila Prado
Abstract
This study explores the efficacy of ChatGPT-3.5, an AI chatbot, used as an Automatic Essay Scoring (AES) system and feedback provider for IELTS essay preparation. It investigates the alignment between scores given by ChatGPT-3.5 and those assigned by official IELTS examiners to establish its reliability as an AES. It also identifies the strategies employed by ChatGPT-3.5 in revising essays based on the four IELTS rubrics: task achievement, coherence and cohesion, lexical resources, and grammatical range and accuracy. Based on pre-rated essays from an official IELTS preparatory book as a control measure to ensure objectivity, the findings indicate a discrepancy, with ChatGPT-3.5 typically assigning lower scores compared to official raters. However, ChatGPT-3.5 shows a robust capability to revise essays across all four descriptors. In addition, the effectiveness of ChatGPT-3.5 as a feedback provider may be attributed to the essay type and its widely accepted rubrics. Our study contributes to the understanding of the application of AI tools in second language writing and suggests that future studies should focus on evaluating the capacity and effectiveness of such tools in pedagogical applications.
Supporting Institution
This research was partly funded by the BNU-HKBU United International College Startup Grant UICR0700033-22.
References
- Artiles Rodríguez, J., Guerra Santana, M., Aguiar Perera, V., & Rodríguez Pulido, J. (2021). Agente conversacional virtual: La inteligencia artificial para el aprendizaje autónomo. Pixel-Bit, Revista de Medios y Educación, 62, 107–144. https://doi.org/10.12795/pixelbit.86171
- Bai, J.Y.H., Zawacki-Richter, O., Bozkurt, A., Lee, K., Fanguy, M., Cefa Sari, B., & Marin, V.I. (2022, September). Automated Essay Scoring (AES) Systems: Opportunities and challenges for open and distance education. Tenth Pan-Commonwealth Forum on Open Learning. https://doi.org/10.56059/pcf10.8339
- Barrot, J.S. (2023). Using ChatGPT for second language writing: Pitfalls and potentials. Assessing Writing, 57, 100745. https://doi.org/10.1016/j.asw.2023.100745
- Bašić, Z., Banovac, A., Kruzic, I., & Jerkovic, I. (2023). Better by you, better than me: ChatGPT3 as writing assistance in student essays. https://doi.org/10.48550/ARXIV.2302.04536
- Baskara, R. (2023). Integrating ChatGPT into EFL writing instruction: Benefits and challenges. International Journal of Education and Learning, 5(1), 44 55. https://doi.org/10.31763/ijele.v5i1.858
- Boa Sorte, P., Farias, M.A. de F., Santos, A. E., Santos, J. do C.A., & Dias, J.S. dos S.R. (2021). Artificial intelligence in academic writing: What is the CPT-3 algorithm? Revista EntreLinguas, 7, e021035.
- Chiu, T.K.F., Xia, Q., Zhou, X., Chai, C.S., & Cheng, M. (2023). Systematic literature review of opportunities, challenges, and future research recommendations of artificial intelligence in education. Computers and Education: Artificial Intelligence, 4, 100118. https://doi.org/10.1016/j.caeai.2022.100118
- Cotos, E. (2023). Automated feedback on writing. In O. Kruse, C. Rapp, C.M. Anson, K. Benetos, E. Cotos, A. Devitt, & A. Shibani (Eds.), Digital writing technologies in higher education (pp. 347–364). Springer International.
- Crawley, M.J. (2012). The R book. John Wiley & Sons.
- Dergaa, I., Chamari, K., Zmijewski, P., & Ben Saad, H. (2023). From human writing to artificial intelligence generated text: Examining the prospects and potential threats of ChatGPT in academic writing. Biology of Sport, 40(2), 615 622. https://doi.org/10.5114/biolsport.2023.125623
- Fryer, L.K., Coniam, D., Carpenter, R., & Lăpușneanu, D. (2020). Bots for language learning now: Current and future directions. Language Learning & Technology, 24(2), 8–22. http://hdl.handle.net/10125/44719
- Fyfe, P. (2022). How to cheat on your final paper: Assigning AI for student writing. AI & Society, 38, 1395–1405. https://doi.org/10.1007/s00146-022-01397-z
- Han, J., Yoo, H., Kim, Y., Myung, J., Kim, M., Lim, H., Kim, J., Lee, T. Y., Hong, H., Ahn, S.-Y., & Oh, A. (2023). RECIPE: How to integrate ChatGPT into EFL writing education. Proceedings of the Tenth ACM Conference on Learning @ Scale, 416–420. https://doi.org/10.1145/3573051.3596200
- He, H. (2016). A survey of EFL college learners’ perceptions of an on-line writing program. International Journal of Emerging Technologies in Learning (Online), 11(4), 11-15. https://doi.org/10.3991/ijet.v11i04.5459
- Huang, H.-W., Li, Z., & Taylor, L. (2020). The effectiveness of using Grammarly to improve students’ writing skills. Proceedings of the 5th International Conference on Distance Education and Learning, 122–127. https://doi.org/10.1145/3402569.3402594
- Huang, W., Hew, K.F., & Fryer, L.K. (2022). Chatbots for language learning—Are they really useful? A systematic review of chatbot‐supported language learning. Journal of Computer Assisted Learning, 38(1), 237-257. https://doi.org/10.1111/jcal.12610
- Huynh-Cam, T.-T., Agrawal, S., Bui, T.-T., Nalluri, V., & Chen, L.-S. (2023). Enhancing the English writing skills of in-service students using Marking Mate automated feedback. Asia Pacific Education Review, 25(2), 459–474. https://doi.org/10.1007/s12564-023-09904-7
- Kohnke, L., Moorhouse, B.L., & Zou, D. (2023). ChatGPT for language teaching and learning. RELC Journal, 54(2), 537–550. https://doi.org/10.1177/00336882231162868
- Kuhail, M.A., Alturki, N., Alramlawi, S., & Alhejori, K. (2023). Interacting with educational chatbots: A systematic review. Education and Information Technologies, 28(1), 973–1018. https://doi.org/10.1007/s10639-022-11177-3
- Lee, S.M. (2023). The effectiveness of machine translation in foreign language education: A systematic review and meta-analysis. Computer Assisted Language Learning, 36(1-2), 103–125. https://doi.org/10.1080/09588221.2021.1901745
- Li, Z. (2021). Teachers in automated writing evaluation (AWE) system-supported ESL writing classes: Perception, implementation, and influence. System, 99, 102505. https://doi.org/10.1016/j.system.2021.102505
- Liang, W., Yuksekgonul, M., Mao, Y., Wu, E., & Zou, J. (2023). GPT detectors are biased against non native English writers. Patterns, 4(7), 100779. https://doi.org/10.1016/j.patter.2023.100779
- Miles, M.B., & Huberman, A.M. (1994). Qualitative data analysis: An expanded sourcebook. Sage.
- Mizumoto, A., & Eguchi, M. (2023). Exploring the potential of using an AI language model for automated essay scoring. Research Methods in Applied Linguistics, 2(2), 100050. https://doi.org/10.1016/j.rmal.2023.100050
- Open AI. (2022). Introducing ChatGPT. https://openai.com/blog/chatgpt
- Pavlik, J.V. (2023). Collaborating with ChatGPT: Considering the implications of generative artificial intelligence for journalism and media education. Journalism & Mass Communication Educator, 78(1), 84–93. https://doi.org/10.1177/10776958221149577
- Prado, M.C.A., & Huggins, T.J. (2023). Technological approaches to student participation while studying the history of psychology in an EMI institution. In J. Corbett, E.M.Y. Yan, J. Yeoh, & J. Lee (Eds.), Multilingual Education Yearbook 2023 (pp. 49–69). Springer International. https://doi.org/10.1007/978-3-031-32811-4_4
- Ranalli, J. (2018). Automated written corrective feedback: How well can students make use of it? Computer Assisted Language Learning, 31(7), 653 674. https://doi.org/10.1080/09588221.2018.1428994
- Shermis, M.D., & Burstein, J.C. (2003). Automated essay scoring: A cross-disciplinary perspective. Routledge.
- Stevenson, M., & Phakiti, A. (2014). The effects of computer-generated feedback on the quality of writing. Assessing Writing, 19, 51–65. https://doi.org/10.1016/j.asw.2013.11.007
- Test Statistics. (2022). IELTS. https://ielts.org/researchers/our research/test statistics#Demographic
- Uysal, H.H. (2010). A critical review of the IELTS writing test. ELT Journal, 64(3), 314–320. https://doi.org/10.1093/elt/ccp026
- Warschauer, M., Tseng, W., Yim, S., Webster, T., Jacob, S., Du, Q., & Tate, T. (2023). The affordances and contradictions of AI-generated text for writers of English as a second or foreign language. Journal of Second Language Writing, 62, 101071. https://doi.org/10.1016/j.jslw.2023.101071
- Yang, X., & Dai, Y. (2015). An empirical study of college English autonomous writing teaching mode based on www.pigai.org. Technology Enhanced Foreign Language Education, 162(02), 17-23. (Translated from Chinese) https://doi.org/10.3969/j.issn.1001-5795.2015.02.003
- Zhang, Z.V. (2020). Engaging with automated writing evaluation (AWE) feedback on L2 writing: Student perceptions and revisions. Assessing Writing, 43, 100439. https://doi.org/10.1016/j.asw.2019.100439
- Zhou, Z., & Prado, M. (2024). A corpus-based comparative study of readability of passages in compulsory Chinese English textbooks and exams for middle school students. Proceedings of the 13th Int. Conf. on Educational and Information Technology. pp. 279-83. http://doi.org/10.1109/ICEIT61397.2024.10540975
ChatGPT-3.5 as an automatic scoring system and feedback provider in IELTS exams
Year 2025,
Volume: 12 Issue: 1, 62 - 77
Xinming Chen
,
Ziqian Zhou
Malila Prado
Abstract
This study explores the efficacy of ChatGPT-3.5, an AI chatbot, used as an Automatic Essay Scoring (AES) system and feedback provider for IELTS essay preparation. It investigates the alignment between scores given by ChatGPT-3.5 and those assigned by official IELTS examiners to establish its reliability as an AES. It also identifies the strategies employed by ChatGPT-3.5 in revising essays based on the four IELTS rubrics: task achievement, coherence and cohesion, lexical resources, and grammatical range and accuracy. Based on pre-rated essays from an official IELTS preparatory book as a control measure to ensure objectivity, the findings indicate a discrepancy, with ChatGPT-3.5 typically assigning lower scores compared to official raters. However, ChatGPT-3.5 shows a robust capability to revise essays across all four descriptors. In addition, the effectiveness of ChatGPT-3.5 as a feedback provider may be attributed to the essay type and its widely accepted rubrics. Our study contributes to the understanding of the application of AI tools in second language writing and suggests that future studies should focus on evaluating the capacity and effectiveness of such tools in pedagogical applications.
Supporting Institution
This research was partly funded by the BNU-HKBU United International College Startup Grant UICR0700033-22.
References
- Artiles Rodríguez, J., Guerra Santana, M., Aguiar Perera, V., & Rodríguez Pulido, J. (2021). Agente conversacional virtual: La inteligencia artificial para el aprendizaje autónomo. Pixel-Bit, Revista de Medios y Educación, 62, 107–144. https://doi.org/10.12795/pixelbit.86171
- Bai, J.Y.H., Zawacki-Richter, O., Bozkurt, A., Lee, K., Fanguy, M., Cefa Sari, B., & Marin, V.I. (2022, September). Automated Essay Scoring (AES) Systems: Opportunities and challenges for open and distance education. Tenth Pan-Commonwealth Forum on Open Learning. https://doi.org/10.56059/pcf10.8339
- Barrot, J.S. (2023). Using ChatGPT for second language writing: Pitfalls and potentials. Assessing Writing, 57, 100745. https://doi.org/10.1016/j.asw.2023.100745
- Bašić, Z., Banovac, A., Kruzic, I., & Jerkovic, I. (2023). Better by you, better than me: ChatGPT3 as writing assistance in student essays. https://doi.org/10.48550/ARXIV.2302.04536
- Baskara, R. (2023). Integrating ChatGPT into EFL writing instruction: Benefits and challenges. International Journal of Education and Learning, 5(1), 44 55. https://doi.org/10.31763/ijele.v5i1.858
- Boa Sorte, P., Farias, M.A. de F., Santos, A. E., Santos, J. do C.A., & Dias, J.S. dos S.R. (2021). Artificial intelligence in academic writing: What is the CPT-3 algorithm? Revista EntreLinguas, 7, e021035.
- Chiu, T.K.F., Xia, Q., Zhou, X., Chai, C.S., & Cheng, M. (2023). Systematic literature review of opportunities, challenges, and future research recommendations of artificial intelligence in education. Computers and Education: Artificial Intelligence, 4, 100118. https://doi.org/10.1016/j.caeai.2022.100118
- Cotos, E. (2023). Automated feedback on writing. In O. Kruse, C. Rapp, C.M. Anson, K. Benetos, E. Cotos, A. Devitt, & A. Shibani (Eds.), Digital writing technologies in higher education (pp. 347–364). Springer International.
- Crawley, M.J. (2012). The R book. John Wiley & Sons.
- Dergaa, I., Chamari, K., Zmijewski, P., & Ben Saad, H. (2023). From human writing to artificial intelligence generated text: Examining the prospects and potential threats of ChatGPT in academic writing. Biology of Sport, 40(2), 615 622. https://doi.org/10.5114/biolsport.2023.125623
- Fryer, L.K., Coniam, D., Carpenter, R., & Lăpușneanu, D. (2020). Bots for language learning now: Current and future directions. Language Learning & Technology, 24(2), 8–22. http://hdl.handle.net/10125/44719
- Fyfe, P. (2022). How to cheat on your final paper: Assigning AI for student writing. AI & Society, 38, 1395–1405. https://doi.org/10.1007/s00146-022-01397-z
- Han, J., Yoo, H., Kim, Y., Myung, J., Kim, M., Lim, H., Kim, J., Lee, T. Y., Hong, H., Ahn, S.-Y., & Oh, A. (2023). RECIPE: How to integrate ChatGPT into EFL writing education. Proceedings of the Tenth ACM Conference on Learning @ Scale, 416–420. https://doi.org/10.1145/3573051.3596200
- He, H. (2016). A survey of EFL college learners’ perceptions of an on-line writing program. International Journal of Emerging Technologies in Learning (Online), 11(4), 11-15. https://doi.org/10.3991/ijet.v11i04.5459
- Huang, H.-W., Li, Z., & Taylor, L. (2020). The effectiveness of using Grammarly to improve students’ writing skills. Proceedings of the 5th International Conference on Distance Education and Learning, 122–127. https://doi.org/10.1145/3402569.3402594
- Huang, W., Hew, K.F., & Fryer, L.K. (2022). Chatbots for language learning—Are they really useful? A systematic review of chatbot‐supported language learning. Journal of Computer Assisted Learning, 38(1), 237-257. https://doi.org/10.1111/jcal.12610
- Huynh-Cam, T.-T., Agrawal, S., Bui, T.-T., Nalluri, V., & Chen, L.-S. (2023). Enhancing the English writing skills of in-service students using Marking Mate automated feedback. Asia Pacific Education Review, 25(2), 459–474. https://doi.org/10.1007/s12564-023-09904-7
- Kohnke, L., Moorhouse, B.L., & Zou, D. (2023). ChatGPT for language teaching and learning. RELC Journal, 54(2), 537–550. https://doi.org/10.1177/00336882231162868
- Kuhail, M.A., Alturki, N., Alramlawi, S., & Alhejori, K. (2023). Interacting with educational chatbots: A systematic review. Education and Information Technologies, 28(1), 973–1018. https://doi.org/10.1007/s10639-022-11177-3
- Lee, S.M. (2023). The effectiveness of machine translation in foreign language education: A systematic review and meta-analysis. Computer Assisted Language Learning, 36(1-2), 103–125. https://doi.org/10.1080/09588221.2021.1901745
- Li, Z. (2021). Teachers in automated writing evaluation (AWE) system-supported ESL writing classes: Perception, implementation, and influence. System, 99, 102505. https://doi.org/10.1016/j.system.2021.102505
- Liang, W., Yuksekgonul, M., Mao, Y., Wu, E., & Zou, J. (2023). GPT detectors are biased against non native English writers. Patterns, 4(7), 100779. https://doi.org/10.1016/j.patter.2023.100779
- Miles, M.B., & Huberman, A.M. (1994). Qualitative data analysis: An expanded sourcebook. Sage.
- Mizumoto, A., & Eguchi, M. (2023). Exploring the potential of using an AI language model for automated essay scoring. Research Methods in Applied Linguistics, 2(2), 100050. https://doi.org/10.1016/j.rmal.2023.100050
- Open AI. (2022). Introducing ChatGPT. https://openai.com/blog/chatgpt
- Pavlik, J.V. (2023). Collaborating with ChatGPT: Considering the implications of generative artificial intelligence for journalism and media education. Journalism & Mass Communication Educator, 78(1), 84–93. https://doi.org/10.1177/10776958221149577
- Prado, M.C.A., & Huggins, T.J. (2023). Technological approaches to student participation while studying the history of psychology in an EMI institution. In J. Corbett, E.M.Y. Yan, J. Yeoh, & J. Lee (Eds.), Multilingual Education Yearbook 2023 (pp. 49–69). Springer International. https://doi.org/10.1007/978-3-031-32811-4_4
- Ranalli, J. (2018). Automated written corrective feedback: How well can students make use of it? Computer Assisted Language Learning, 31(7), 653 674. https://doi.org/10.1080/09588221.2018.1428994
- Shermis, M.D., & Burstein, J.C. (2003). Automated essay scoring: A cross-disciplinary perspective. Routledge.
- Stevenson, M., & Phakiti, A. (2014). The effects of computer-generated feedback on the quality of writing. Assessing Writing, 19, 51–65. https://doi.org/10.1016/j.asw.2013.11.007
- Test Statistics. (2022). IELTS. https://ielts.org/researchers/our research/test statistics#Demographic
- Uysal, H.H. (2010). A critical review of the IELTS writing test. ELT Journal, 64(3), 314–320. https://doi.org/10.1093/elt/ccp026
- Warschauer, M., Tseng, W., Yim, S., Webster, T., Jacob, S., Du, Q., & Tate, T. (2023). The affordances and contradictions of AI-generated text for writers of English as a second or foreign language. Journal of Second Language Writing, 62, 101071. https://doi.org/10.1016/j.jslw.2023.101071
- Yang, X., & Dai, Y. (2015). An empirical study of college English autonomous writing teaching mode based on www.pigai.org. Technology Enhanced Foreign Language Education, 162(02), 17-23. (Translated from Chinese) https://doi.org/10.3969/j.issn.1001-5795.2015.02.003
- Zhang, Z.V. (2020). Engaging with automated writing evaluation (AWE) feedback on L2 writing: Student perceptions and revisions. Assessing Writing, 43, 100439. https://doi.org/10.1016/j.asw.2019.100439
- Zhou, Z., & Prado, M. (2024). A corpus-based comparative study of readability of passages in compulsory Chinese English textbooks and exams for middle school students. Proceedings of the 13th Int. Conf. on Educational and Information Technology. pp. 279-83. http://doi.org/10.1109/ICEIT61397.2024.10540975