Research Article
BibTex RIS Cite

Madde Güçlüklerinin Tahmin Edilmesinde Uzman Görüşleri ve ChatGPT Performansının Karşılaştırılması

Year 2023, , 202 - 210, 30.08.2023
https://doi.org/10.57135/jier.1296255

Abstract

Bu çalışmada ChatGPT yapay zeka teknolojisinin eğitim alanında destekleyici unsur olarak kullanımına yönelik bir araştırma yürütülmüştür. ChatGPT’nin çoktan seçmeli test maddelerini yanıtlama ve bu maddelerin madde güçlük düzeylerini sınıflama performansı incelenmiştir. 20 maddeden oluşan beş seçenekli çoktan seçmeli test maddesine 4930 öğrencinin verdiği yanıtlara göre madde güçlük düzeyleri belirlenmiştir. Bu güçlük düzeyleri ile ChatGPT’nin ve uzmanların yaptığı sınıflandırmalar arasındaki ilişkiler incelenmiştir. Elde edilen bulgulara göre ChatGPT’nin çoktan seçmeli maddelere doğru yanıt verme performansının yüksek düzeyde olmadığı (%55) görülmüştür. Ancak madde güçlük düzeylerini sınıflandırma konusunda ChatGPT; gerçek madde güçlük düzeyleri ile 0.748, uzman görüşleri ile 0.870 korelasyon göstermiştir. Bu sonuçlara göre deneme uygulamasının yapılamadığı veya uzman görüşlerine başvurulamadığı durumlarda ChatGPT'den test geliştirme aşamalarında destek alınabileceği düşünülmektedir. Geniş ölçekli sınavlarda da uzman gözetiminde ChatGPT benzeri yapay zeka teknolojilerinden faydalanılabilir.

References

  • Anıl, D. (2002). Deneme uygulamasının yapılamadıgı durumlarda madde ve test parametrelerinin klasik ve örtük özellikler test teorilerine göre kestirilmesi. Yayımlanmamış doktora tezi, Hacettepe Üniversitesi Sosyal Bilimler Estitüsü, Ankara.
  • Baykul, Y., & Sezer, S. (1993). Deneme yapılamayan durumlarda madde güçlük ve ayırıcılık gücü indekslerinin ve bunlara bağlı test istatiklerinin kestirilmesi. Eğitim ve Bilim, 17(83)
  • Baykul, Y. (2015). Eğitimde ve psikolojide ölçme: Klasik test teorisi ve uygulaması. Ankara: Pegem Akademi.
  • Bozkurt, A., Xiao, J., Lambert, S., Crompton, H., Koseoglu, S., Farrow, R., Bond, M., Nerantzi, C., Honeychurch, S., Bali, M., Dron, J., Mir, K., Stewart, B., Costello, E., Mason, J., Stracke, C., Romero-Hall, E., Koutropoulos, A., . . . Jandrić, P. (2023). Speculative futures on ChatGPT and Generative Artificial Intelligence (AI): A collective reflection Pazurek, A., from the educational landscape. Asian Journal of Distance Education, 18(1), 53-130. https://www.asianjde.com/ojs/index.php/AsianJDE/article/view/709
  • Choi, J. H., Hickman, K. E., Monahan, A. B. & Schwarcz, D. (2023). ChatGPT Goes to Law School. Minnesota Legal Studies Research Paper No. 23-03.
  • CNN (2023). ChatGPT Passes Exams from Law and Business Schools. Available online: https://edition.cnn.com/2023/01/26/tech/chatgpt-passes-exams (accessed on 10 March 2023).
  • Crocker, L. & Algina, J. (1986). Introduction to Classical and Modern Test Theory. USA:Harcourt Brace Javanovich College Publishers.
  • Deng, J., & Lin, Y. (2022). The benefits and challenges of ChatGPT: An overview. Frontiers in Computing and Intelligent Systems, 2(2), 81-83. https://doi.org/10.54097/fcis.v2i2.4465
  • Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2012). How to design and evaluate research in education (Vol. 7, p. 429). New York: McGraw-hill.
  • Frieder, S., Pinchetti, L., Griffiths, R. R., Salvatori, T., Lukasiewicz, T., Petersen, P. C., ... & Berner, J. (2023). Mathematical capabilities of chatgpt. arXiv preprint arXiv:2301.13867.
  • Güler, N., İlhan, M., & Taşdelen-Teker, G. (2021). Çoktan seçmeli maddelerde uzmanlarca öngörülen ve ampirik olarak hesaplanan güçlük indekslerinin karşılaştırılması. Journal of Computer and Education Research, 9(18), 1022-1036. DOI: 10.18009/jcer.1000934
  • Impara, J. C., & Plake, B. S. (1998). Teachers' ability to estimate item difficulty: A test of the assumptions in the Angoff standard setting method. Journal of Educational Measurement, 35(1), 69-81.
  • Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., ... & Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274.
  • Khademi, A. (2023). Can ChatGPT and Bard Generate Aligned Assessment Items? A Reliability Analysis against Human Performance. arXiv preprint arXiv:2304.05372.
  • Kung, T. H., Cheatham, M., Medenilla, A., Sillos, C., De Leon, L., Elepaño, C., ... & Tseng, V. (2023). Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLoS digital health, 2(2), e0000198.
  • Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 159-174.
  • Lo, C. K. (2023). What Is the Impact of ChatGPT on Education? A Rapid Review of the Literature. Education Sciences, 13(4), 410.
  • Lorge, I., & Diamon, L. K. (1954). The value of information to good and poor judges of item difficulty. Educational and Psychological Measurement, 14(1), 29–33. https://doi.org/10.1177/001316445401400103
  • OpenAI (2023). Introducing OpenAI. Erişim tarihi:08.05.2023. Erişim adresi: https://openai.com/blog/introducing-openai
  • Quereshi, M. Y., & Fisher, T. L. (1977). Logical versus empirical estimates of item difficulty. Educational and Psychologıcal Measurement, 37(1), 91–100. https://doi.org/10.1177/001316447703700110
  • Ryznar, M. (2023). Exams in the Time of ChatGPT. Washington and Lee Law Review Online, 80(5), 305.
  • Sezer, S. (1992). Ön deneme yapılamayan durumlarda madde güçlük ve ayırıcılık gücü indekslerinin ve bunlara bağlı test istatistiklerinin kestirilmesi. Yayımlanmamış doktora tezi, Hacettepe Üniversitesi Sosyal Bilimler Estitüsü, Ankara.
  • Shakarian, P., Koyyalamudi, A., Ngu, N., & Mareedu, L. (2023). An Independent Evaluation of ChatGPT on Mathematical Word Problems (MWP). arXiv preprint arXiv:2302.13814.
  • Sok, S., & Heng, K. (2023). ChatGPT for education and research: A review of benefits and risks. Available at SSRN 4378735.
  • Tinkelman, S. (1947). Difficulty prediction of test items. Teachers College Contributions to Education, 941, 55.
  • Urbina, S. (2014). Essentials of psychological testing (2nd ed.). Hoboken, New Jersey: Wiley Walter, S. D., Eliasziw, M., & Donner, A. (1998). Sample size and optimal designs for reliability studies. Statistics in medicine, 17(1), 101-110.
  • Zhai, X. (2023). Chatgpt for next generation science learning. XRDS: Crossroads, The ACM Magazine for Students, 29(3), 42-46.
Year 2023, , 202 - 210, 30.08.2023
https://doi.org/10.57135/jier.1296255

Abstract

References

  • Anıl, D. (2002). Deneme uygulamasının yapılamadıgı durumlarda madde ve test parametrelerinin klasik ve örtük özellikler test teorilerine göre kestirilmesi. Yayımlanmamış doktora tezi, Hacettepe Üniversitesi Sosyal Bilimler Estitüsü, Ankara.
  • Baykul, Y., & Sezer, S. (1993). Deneme yapılamayan durumlarda madde güçlük ve ayırıcılık gücü indekslerinin ve bunlara bağlı test istatiklerinin kestirilmesi. Eğitim ve Bilim, 17(83)
  • Baykul, Y. (2015). Eğitimde ve psikolojide ölçme: Klasik test teorisi ve uygulaması. Ankara: Pegem Akademi.
  • Bozkurt, A., Xiao, J., Lambert, S., Crompton, H., Koseoglu, S., Farrow, R., Bond, M., Nerantzi, C., Honeychurch, S., Bali, M., Dron, J., Mir, K., Stewart, B., Costello, E., Mason, J., Stracke, C., Romero-Hall, E., Koutropoulos, A., . . . Jandrić, P. (2023). Speculative futures on ChatGPT and Generative Artificial Intelligence (AI): A collective reflection Pazurek, A., from the educational landscape. Asian Journal of Distance Education, 18(1), 53-130. https://www.asianjde.com/ojs/index.php/AsianJDE/article/view/709
  • Choi, J. H., Hickman, K. E., Monahan, A. B. & Schwarcz, D. (2023). ChatGPT Goes to Law School. Minnesota Legal Studies Research Paper No. 23-03.
  • CNN (2023). ChatGPT Passes Exams from Law and Business Schools. Available online: https://edition.cnn.com/2023/01/26/tech/chatgpt-passes-exams (accessed on 10 March 2023).
  • Crocker, L. & Algina, J. (1986). Introduction to Classical and Modern Test Theory. USA:Harcourt Brace Javanovich College Publishers.
  • Deng, J., & Lin, Y. (2022). The benefits and challenges of ChatGPT: An overview. Frontiers in Computing and Intelligent Systems, 2(2), 81-83. https://doi.org/10.54097/fcis.v2i2.4465
  • Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2012). How to design and evaluate research in education (Vol. 7, p. 429). New York: McGraw-hill.
  • Frieder, S., Pinchetti, L., Griffiths, R. R., Salvatori, T., Lukasiewicz, T., Petersen, P. C., ... & Berner, J. (2023). Mathematical capabilities of chatgpt. arXiv preprint arXiv:2301.13867.
  • Güler, N., İlhan, M., & Taşdelen-Teker, G. (2021). Çoktan seçmeli maddelerde uzmanlarca öngörülen ve ampirik olarak hesaplanan güçlük indekslerinin karşılaştırılması. Journal of Computer and Education Research, 9(18), 1022-1036. DOI: 10.18009/jcer.1000934
  • Impara, J. C., & Plake, B. S. (1998). Teachers' ability to estimate item difficulty: A test of the assumptions in the Angoff standard setting method. Journal of Educational Measurement, 35(1), 69-81.
  • Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., ... & Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274.
  • Khademi, A. (2023). Can ChatGPT and Bard Generate Aligned Assessment Items? A Reliability Analysis against Human Performance. arXiv preprint arXiv:2304.05372.
  • Kung, T. H., Cheatham, M., Medenilla, A., Sillos, C., De Leon, L., Elepaño, C., ... & Tseng, V. (2023). Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLoS digital health, 2(2), e0000198.
  • Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 159-174.
  • Lo, C. K. (2023). What Is the Impact of ChatGPT on Education? A Rapid Review of the Literature. Education Sciences, 13(4), 410.
  • Lorge, I., & Diamon, L. K. (1954). The value of information to good and poor judges of item difficulty. Educational and Psychological Measurement, 14(1), 29–33. https://doi.org/10.1177/001316445401400103
  • OpenAI (2023). Introducing OpenAI. Erişim tarihi:08.05.2023. Erişim adresi: https://openai.com/blog/introducing-openai
  • Quereshi, M. Y., & Fisher, T. L. (1977). Logical versus empirical estimates of item difficulty. Educational and Psychologıcal Measurement, 37(1), 91–100. https://doi.org/10.1177/001316447703700110
  • Ryznar, M. (2023). Exams in the Time of ChatGPT. Washington and Lee Law Review Online, 80(5), 305.
  • Sezer, S. (1992). Ön deneme yapılamayan durumlarda madde güçlük ve ayırıcılık gücü indekslerinin ve bunlara bağlı test istatistiklerinin kestirilmesi. Yayımlanmamış doktora tezi, Hacettepe Üniversitesi Sosyal Bilimler Estitüsü, Ankara.
  • Shakarian, P., Koyyalamudi, A., Ngu, N., & Mareedu, L. (2023). An Independent Evaluation of ChatGPT on Mathematical Word Problems (MWP). arXiv preprint arXiv:2302.13814.
  • Sok, S., & Heng, K. (2023). ChatGPT for education and research: A review of benefits and risks. Available at SSRN 4378735.
  • Tinkelman, S. (1947). Difficulty prediction of test items. Teachers College Contributions to Education, 941, 55.
  • Urbina, S. (2014). Essentials of psychological testing (2nd ed.). Hoboken, New Jersey: Wiley Walter, S. D., Eliasziw, M., & Donner, A. (1998). Sample size and optimal designs for reliability studies. Statistics in medicine, 17(1), 101-110.
  • Zhai, X. (2023). Chatgpt for next generation science learning. XRDS: Crossroads, The ACM Magazine for Students, 29(3), 42-46.
There are 27 citations in total.

Details

Primary Language Turkish
Subjects Other Fields of Education
Journal Section Eğitim Bilimleri
Authors

Erdem Boduroğlu 0000-0001-8318-4914

Oğuz Koç 0000-0002-8656-6069

Mahmut Sami Yiğiter 0000-0002-2896-0201

Publication Date August 30, 2023
Submission Date May 12, 2023
Acceptance Date July 25, 2023
Published in Issue Year 2023

Cite

APA Boduroğlu, E., Koç, O., & Yiğiter, M. S. (2023). Madde Güçlüklerinin Tahmin Edilmesinde Uzman Görüşleri ve ChatGPT Performansının Karşılaştırılması. Disiplinlerarası Eğitim Araştırmaları Dergisi, 7(15), 202-210. https://doi.org/10.57135/jier.1296255

The Aim of The Journal

The Journal of Interdisciplinary Educational Researches (JIER) published by the Interdisciplinary Educational and Research Association (JIER)A) is an internationally eminent journal.

JIER, a nonprofit, nonprofit NGO, is concerned with improving the education system within the context of its corporate objectives and social responsibility policies. JIER, has the potential to solve educational problems and has a strong gratification for the contributions of qualified scientific researchers.

JIER has the purpose of serving the construction of an education system that can win the knowledge and skills that each individual should have firstly in our country and then in the world. In addition, JIER serves to disseminate the academic work that contributes to the professional development of teachers and academicians, offering concrete solutions to the problems of all levels of education, from preschool education to higher education.

JIER has the priority to contribute to more qualified school practices. Creating and managing content within this context will help to advance towards the goal of being a "focus magazine" and "magazine school", and will also form the basis for a holistic view of educational issues. It also acts as an intermediary in the production of common mind for sustainable development and education