Araştırma Makalesi
BibTex RIS Kaynak Göster

The use of ChatGPT in assessment

Yıl 2024, Cilt: 11 Sayı: 3, 608 - 621, 09.09.2024
https://doi.org/10.21449/ijate.1379647

Öz

ChatGPT has surged interest to cause people to look for its use in different tasks. However, before allowing it to replace humans, its capabilities should be investigated. As ChatGPT has potential for use in testing and assessment, this study aims to investigate the questions generated by ChatGPT by comparing them to those written by a course instructor. To investigate this issue, this study involved 36 junior students who took a practice test including 20 multiple-choice items generated by ChatGPT and 20 others by the course instructor, resulting in a 40-item test. Results indicate that there was an acceptable degree of consistency between the ChatGPT and the course instructor. Post-hoc analyses point to consistency between the instructor and the chatbot in item difficulty, yet the chatbot’s results were weaker in item discrimination power and distractor analysis. This indicates that ChatGPT can potentially generate multiple-choice exams similar to those of the course instructor.

Kaynakça

  • Al-Worafi, Y.M., Hermansyah, A., Goh, K.W., Ming, L.C. (2023). Artificial intelligence use in university: Should we ban ChatGPT? preprints.org, 2023020400. https://doi.org/10.20944/preprints202302.0400.v1
  • Anderson, L.W., Krathwohl, D.R., Airasian, P.W., Cruikshank, K.A., Mayer, R.E., Pintrich, P.R., Raths, J., & Wittrock, M.C. (2001). A taxonomy for teaching, learning, and assessment: A revision of Bloom's taxonomy of educational objectives. Longman.
  • Bloom, B.S., Engelhart, M.D., Furst, E.J., Hill, W.H., & Krathwohl, D.R. (1956). Taxonomy of educational objectives: The Classification of Educational Goals. David McKay.
  • Borji, A. (2023). A categorical archive of chatgpt failures. arXiv preprint arXiv:2302.03494.
  • Chen, G., Yang, J., Hauff, C., & Houben, G.J. (2018). LearningQ: A large-scale dataset for educational question generation. In Proceedings of the Twelfth International AAAI Conference on Web and Social Media (Vol. 12, No. 1). Association for the Advancement of Artificial Intelligence.
  • Crompton, H., & Burke, D. (2023). Artificial intelligence in higher education: the state of the field. International Journal of Educational Technology in Higher Education, 20(1), 1-22.
  • Dowling, M., & Lucey, B. (2023). ChatGPT for (finance) research: The Bananarama conjecture. Finance Research Letters, 53, 103662.
  • Ebel, R.L., & Frisbie, D.A. (1986). Essentials of educational measurement. Prentice-Hall.
  • Gardner, J., O'Leary, M., & Yuan, L. (2021). Artificial intelligence in educational assessment: “Breakthrough? Or buncombe and ballyhoo?”. Journal of Computer Assisted Learning, 37(5), 1207-1216.
  • González-Calatayud, V., Prendes-Espinosa, P., & Roig-Vila, R. (2021). Artificial intelligence for student assessment: A systematic review. Applied Sciences, 11(12), 5467.
  • Halaweh, M. (2023). ChatGPT in education: Strategies for responsible implementation. Contemporary Educational Technology, 15(2), ep421. https://doi.org/10.30935/cedtech/13036
  • Lo, C.K. (2023). What is the impact of ChatGPT on education? A rapid review of the literature. Education Sciences. 13(4), 410, 1-15. https://doi.org/10.3390/educsci13040410
  • Luckin, R., Cukurova, M., Kent, C., & du Boulay, B. (2022). Empowering educators to be AI-ready. Computers and Education: Artificial Intelligence, 3, 100076.
  • McCowan, R.J., & McCowan, S.C. (1999). Item analysis for criterion-referenced tests. Center for Development of Human Services. https://files.eric.ed.gov/fulltext/ED501716.pdf
  • Mhlanga, D. (2023). Open AI in education, the responsible and ethical use of ChatGPT towards lifelong learning. https://doi.org/10.2139/ssrn.4354422
  • Miller, D.M., Linn, R.L., & Gronlund, N.E. (2013). Measurement and assessment in teaching. Pearson.
  • Moore, S., Nguyen, H. A., Bier, N., Domadia, T., & Stamper, J. (2022). Assessing the quality of student-generated short answer questions using GPT-3. In I. Hilliger, P. J. Munoz-Merino, T. D. Laet, A. Ortega-Arranz & T. Farrell (Eds.), Educating for a new future: Making sense of technology-enhanced learning adoption (pp. 243-257). Springer.
  • Moore, S., Nguyen, H.A., Chen, T., & Stamper, J. (2023). Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based Methods. In O. Viberg, I. Jivet, P. K. Munoz-Merino, M. Perifanou & T. Papathoma (Eds.), Responsive and Sustainable Educational Features (pp. 229-245). Springer.
  • Nazaretsky, T., Ariely, M., Cukurova, M., & Alexandron, G. (2022). Teachers' trust in AI‐powered educational technology and a professional development program to improve it. British journal of educational technology, 53(4), 914-931.
  • Okonkwo, C.W., & Ade-Ibijola, A. (2021). Chatbots applications in education: A systematic review. Computers and Education: Artificial Intelligence, 2, 100033. https://doi.org/10.1016/j.caeai.2021.100033
  • Olney, A.M., Pavlik Jr, P.I., & Maass, J.K. (2017, June). Improving reading comprehension with automatically generated cloze item practice. In International Conference on Artificial Intelligence in Education (pp. 262-273). Cham: Springer International Publishing.
  • Reynolds, C.R., Livingston, R.B., & Willson, V. (2009). Measurement and assessment in education. Pearson.
  • Swiecki, Z., Khosravi, H., Chen, G., Martinez-Maldonado, R., Lodge, J.M., Milligan, S., ... & Gašević, D. (2022). Assessment in the age of artificial intelligence. Computers and Education: Artificial Intelligence, 3, 100075.
  • Thompson, B., & Vacha-Haase, T. (2018). Reliability. In C. Secolsky and D. B. Denison (Eds.), Handbook on measurement, assessment, and evaluation in higher education (pp. 231-251). Routledge.
  • Thorndike, R.M., & Thorndike-Christ, T. (2014). Measurement and evaluation in psychology and education. Pearson.
  • Van Dis, E.A., Bollen, J., Zuidema, W., van Rooij, R., & Bockting, C.L. (2023). ChatGPT: five priorities for research. Nature, 614(7947), 224-226. https://doi.org/10.1038/d41586-023-00288-7
  • Wang, Y., Liu, C., & Tu, Y.F. (2021). Factors affecting the adoption of AI-based applications in higher education. Educational Technology & Society, 24(3), 116-129.
  • Yang, A.C.M., Chen, I.Y.L., Flanagan, B., & Ogata, H. (2021). Automatic generation of cloze items for repeated testing to improve reading comprehension. Educational Technology & Society, 24(3), 147–158. https://www.jstor.org/stable/27032862

The use of ChatGPT in assessment

Yıl 2024, Cilt: 11 Sayı: 3, 608 - 621, 09.09.2024
https://doi.org/10.21449/ijate.1379647

Öz

ChatGPT has surged interest to cause people to look for its use in different tasks. However, before allowing it to replace humans, its capabilities should be investigated. As ChatGPT has potential for use in testing and assessment, this study aims to investigate the questions generated by ChatGPT by comparing them to those written by a course instructor. To investigate this issue, this study involved 36 junior students who took a practice test including 20 multiple-choice items generated by ChatGPT and 20 others by the course instructor, resulting in a 40-item test. Results indicate that there was an acceptable degree of consistency between the ChatGPT and the course instructor. Post-hoc analyses point to consistency between the instructor and the chatbot in item difficulty, yet the chatbot’s results were weaker in item discrimination power and distractor analysis. This indicates that ChatGPT can potentially generate multiple-choice exams similar to those of the course instructor.

Kaynakça

  • Al-Worafi, Y.M., Hermansyah, A., Goh, K.W., Ming, L.C. (2023). Artificial intelligence use in university: Should we ban ChatGPT? preprints.org, 2023020400. https://doi.org/10.20944/preprints202302.0400.v1
  • Anderson, L.W., Krathwohl, D.R., Airasian, P.W., Cruikshank, K.A., Mayer, R.E., Pintrich, P.R., Raths, J., & Wittrock, M.C. (2001). A taxonomy for teaching, learning, and assessment: A revision of Bloom's taxonomy of educational objectives. Longman.
  • Bloom, B.S., Engelhart, M.D., Furst, E.J., Hill, W.H., & Krathwohl, D.R. (1956). Taxonomy of educational objectives: The Classification of Educational Goals. David McKay.
  • Borji, A. (2023). A categorical archive of chatgpt failures. arXiv preprint arXiv:2302.03494.
  • Chen, G., Yang, J., Hauff, C., & Houben, G.J. (2018). LearningQ: A large-scale dataset for educational question generation. In Proceedings of the Twelfth International AAAI Conference on Web and Social Media (Vol. 12, No. 1). Association for the Advancement of Artificial Intelligence.
  • Crompton, H., & Burke, D. (2023). Artificial intelligence in higher education: the state of the field. International Journal of Educational Technology in Higher Education, 20(1), 1-22.
  • Dowling, M., & Lucey, B. (2023). ChatGPT for (finance) research: The Bananarama conjecture. Finance Research Letters, 53, 103662.
  • Ebel, R.L., & Frisbie, D.A. (1986). Essentials of educational measurement. Prentice-Hall.
  • Gardner, J., O'Leary, M., & Yuan, L. (2021). Artificial intelligence in educational assessment: “Breakthrough? Or buncombe and ballyhoo?”. Journal of Computer Assisted Learning, 37(5), 1207-1216.
  • González-Calatayud, V., Prendes-Espinosa, P., & Roig-Vila, R. (2021). Artificial intelligence for student assessment: A systematic review. Applied Sciences, 11(12), 5467.
  • Halaweh, M. (2023). ChatGPT in education: Strategies for responsible implementation. Contemporary Educational Technology, 15(2), ep421. https://doi.org/10.30935/cedtech/13036
  • Lo, C.K. (2023). What is the impact of ChatGPT on education? A rapid review of the literature. Education Sciences. 13(4), 410, 1-15. https://doi.org/10.3390/educsci13040410
  • Luckin, R., Cukurova, M., Kent, C., & du Boulay, B. (2022). Empowering educators to be AI-ready. Computers and Education: Artificial Intelligence, 3, 100076.
  • McCowan, R.J., & McCowan, S.C. (1999). Item analysis for criterion-referenced tests. Center for Development of Human Services. https://files.eric.ed.gov/fulltext/ED501716.pdf
  • Mhlanga, D. (2023). Open AI in education, the responsible and ethical use of ChatGPT towards lifelong learning. https://doi.org/10.2139/ssrn.4354422
  • Miller, D.M., Linn, R.L., & Gronlund, N.E. (2013). Measurement and assessment in teaching. Pearson.
  • Moore, S., Nguyen, H. A., Bier, N., Domadia, T., & Stamper, J. (2022). Assessing the quality of student-generated short answer questions using GPT-3. In I. Hilliger, P. J. Munoz-Merino, T. D. Laet, A. Ortega-Arranz & T. Farrell (Eds.), Educating for a new future: Making sense of technology-enhanced learning adoption (pp. 243-257). Springer.
  • Moore, S., Nguyen, H.A., Chen, T., & Stamper, J. (2023). Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based Methods. In O. Viberg, I. Jivet, P. K. Munoz-Merino, M. Perifanou & T. Papathoma (Eds.), Responsive and Sustainable Educational Features (pp. 229-245). Springer.
  • Nazaretsky, T., Ariely, M., Cukurova, M., & Alexandron, G. (2022). Teachers' trust in AI‐powered educational technology and a professional development program to improve it. British journal of educational technology, 53(4), 914-931.
  • Okonkwo, C.W., & Ade-Ibijola, A. (2021). Chatbots applications in education: A systematic review. Computers and Education: Artificial Intelligence, 2, 100033. https://doi.org/10.1016/j.caeai.2021.100033
  • Olney, A.M., Pavlik Jr, P.I., & Maass, J.K. (2017, June). Improving reading comprehension with automatically generated cloze item practice. In International Conference on Artificial Intelligence in Education (pp. 262-273). Cham: Springer International Publishing.
  • Reynolds, C.R., Livingston, R.B., & Willson, V. (2009). Measurement and assessment in education. Pearson.
  • Swiecki, Z., Khosravi, H., Chen, G., Martinez-Maldonado, R., Lodge, J.M., Milligan, S., ... & Gašević, D. (2022). Assessment in the age of artificial intelligence. Computers and Education: Artificial Intelligence, 3, 100075.
  • Thompson, B., & Vacha-Haase, T. (2018). Reliability. In C. Secolsky and D. B. Denison (Eds.), Handbook on measurement, assessment, and evaluation in higher education (pp. 231-251). Routledge.
  • Thorndike, R.M., & Thorndike-Christ, T. (2014). Measurement and evaluation in psychology and education. Pearson.
  • Van Dis, E.A., Bollen, J., Zuidema, W., van Rooij, R., & Bockting, C.L. (2023). ChatGPT: five priorities for research. Nature, 614(7947), 224-226. https://doi.org/10.1038/d41586-023-00288-7
  • Wang, Y., Liu, C., & Tu, Y.F. (2021). Factors affecting the adoption of AI-based applications in higher education. Educational Technology & Society, 24(3), 116-129.
  • Yang, A.C.M., Chen, I.Y.L., Flanagan, B., & Ogata, H. (2021). Automatic generation of cloze items for repeated testing to improve reading comprehension. Educational Technology & Society, 24(3), 147–158. https://www.jstor.org/stable/27032862
Toplam 28 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Sınıfiçi Ölçme Uygulamaları
Bölüm Makaleler
Yazarlar

Mehmet Kanık 0000-0002-1737-7678

Erken Görünüm Tarihi 27 Ağustos 2024
Yayımlanma Tarihi 9 Eylül 2024
Gönderilme Tarihi 22 Ekim 2023
Kabul Tarihi 12 Ağustos 2024
Yayımlandığı Sayı Yıl 2024 Cilt: 11 Sayı: 3

Kaynak Göster

APA Kanık, M. (2024). The use of ChatGPT in assessment. International Journal of Assessment Tools in Education, 11(3), 608-621. https://doi.org/10.21449/ijate.1379647

23823             23825