Performance of Chat Generative Pretrained Transformer and Bard on the Questions Asked in the Dental Specialty Entrance Examination in Turkey Regarding Bloom’s Revised Taxonomy

Rana Turunç Oğuzman; Zeliha Zuhal Yurdabakan

doi:10.5152/CRDS.2024.23261

Research Article

Year 2024, Volume: 34 Issue: 1, 25 - 34, 18.01.2024

Rana Turunç Oğuzman Zeliha Zuhal Yurdabakan

https://doi.org/10.5152/CRDS.2024.23261

Cited By: 3

Abstract

References

1. Noda R, Izaki Y, Kitano F, Komatsu J, Ichikawa D, Shigabaki Y. Perfor- mance of ChatGPT and Bard in Self-Assessment Questions for Nephrology Board Renewal. medRxiv [preprint]. 2023. [CrossRef]
2. Eggmann F, Weiger R, Zitzmann NU, Blatz MB. Implications of large language models such as ChatGPT for dental medicine. J Esthet Restor Dent. 2023;35(7):1098-1102. [CrossRef]
3. Suchman K, Garg S, Trindade AJ. Chat Generative Pretrained Trans- former fails the multiple-choice American College of Gastroenterol- ogyself-assessmenttest. AmJGastroenterol. 2023;118(12):2280-2282. [CrossRef]
4. Fatani B. ChatGPT for future medical and dental research. Cureus. 2023;15(4):e37285. [CrossRef] 5. Ali R, Tang OY, Connolly ID, et al. Performance of ChatGPT and GPT-4 on neurosurgery written board examinations. Neurosurgery. 2023;93(6):1353-1365. [CrossRef]
6. Lum ZC. Can artificial intelligence pass the American Board of Ortho- paedic Surgery examination? Orthopaedic residents versus Chat- GPT. Clin Orthop Relat Res. 2023;481(8):1623-1630. [CrossRef]
7. Giannos P, Delardas O. Performance of ChatGPT on UK standardized admission tests: insights from the BMAT, TMUA, LNAT, and TSA examinations. JMIR Med Educ. 2023;9(9):e47737. [CrossRef]
8. Deebel NA, Terlecki R. ChatGPT performance on the American Uro- logical Association self-assessment study program and the poten- tial influence of artificial intelligence in urologic training. Urology. 2023;177:29-33. [CrossRef]
9. Ali R, Tang OY, Connolly ID, et al. Performance of ChatGPT, GPT-4, and Google Bard on a neurosurgery oral boards preparation question bank. Neurosurgery. 2023. [CrossRef]
10. Skalidis I, Cagnina A, Luangphiphat W, et al. ChatGPT takes on the European Exam in core cardiology: an artificial intelligence success story? Eur Heart J Digit Health. 2023;4(3):279-281. [CrossRef]
11. Huh S. Are ChatGPT’s knowledge and interpretation ability compa- rable to those of medical students in Korea for taking a parasitology examination? a descriptive study. J Educ Eval Health Prof. 2023;20:1. [CrossRef]
12. Das D, Kumar N, Longjam LA, et al. Assessing the capability of Chat- GPT in answering first- and second-order knowledge questions on microbiology as per competency-based medical education curricu- lum. Cureus. 2023;15(3):e36034. [CrossRef]
13. Gilson A, Safranek CW, Huang T, et al. How does ChatGPT perform on the United States medical licensing examination? The implica- tions of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9:e45312. [CrossRef]
14. Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLoS Digit Health. 2023;2(2):e0000198. [CrossRef]
15. Gonsalves C. On ChatGPT: what promise remains for multiple choice assessment? JLDHE. [CrossRef]
16. Zaidi NLB, Grob KL, Monrad SM, et al. Pushing critical thinking skills with multiple-choice questions: does Bloom’s taxonomy work? Acad Med. 2018;93(6):856-859. [CrossRef]
17. Grainger R, Dai W, Osborne E, Kenwright D. Medical students create multiple-choice questions for learning in pathology education: A pilot study. BMC Med Educ. 2018;18(1):201. [CrossRef]
18. Elsayed S. Towards Mitigating ChatGPT’s Negative Impact on Educa- tion: Optimizing Question Design through Bloom’s Taxonomy. arXiv [Preprint]. 2023. [CrossRef]
19. Monrad SU, Bibler Zaidi NL, Grob KL, et al. What faculty write versus what students see? Perspectives on multiple-choice questions using Bloom’s taxonomy. Med Teach. 2021;43(5):575-582. [CrossRef]
20. Stringer JK, Santen SA, Lee E, et al. Examining Bloom’s taxonomy in multiple choice questions: students’ approach to questions. Med Sci Educ. 2021;31(4):1311-1317. [CrossRef]
21. Çulhaoğlu AK, Kiliçarslan MA, Deniz KZ. Diş hekimliğinde uzmanlik sinavinin farkli eğitim seviyelerdeki algı ve tercih durumlarinin değerlendirilmesi. Atatürk Üniversitesi Diş Hekimliği Fakültesi Der- gisi. 2021;31(3):1-1. [CrossRef]
22. Ertan AA, DUS Protetik Diş Tedavisi. 4th ed. Ankara: TUSEM Tıbbi Yayıncılık; 2017.
23. Ongole R, Praveen BN, eds. Textbook of Oral Medicine, Oral Diagno- sis, and Oral Radiology. 2nd ed. New Delhi: Elsevier Health Sciences; 2013.
24. Hoch CC, Wollenberg B, Lüers JC, et al. ChatGPT’s quiz skills in dif- ferent otolaryngology subspecialties: an analysis of 2576 single- choice and multiple-choice board certification preparation questions. Eur Arch Otorhinolaryngol. 2023;280(9):4271-4278. [CrossRef]
25. Huynh LM, Bonebrake BT, Schultis K, Quach A, Deibert CM. New arti- ficial intelligence ChatGPT performs poorly on the 2022 self-assess- ment study program for urology. Urol Pract. 2023;10(4):409-415. [CrossRef]
26. Ali MJ. ChatGPT and lacrimal drainage disorders: performance and scope of improvement. Ophthal Plast Reconstr Surg. 2023;39(3):221- 225. [CrossRef]
27. Cascella M, Montomoli J, Bellini V, Bignami E. Evaluating the feasibil- ity of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios. J Med Syst. 2023;47(1):33. [CrossRef] 28. Liévin V, Hother CE, Winther O. Can Large Language Models Reason about Medical Questions? arXiv [preprint]. 2023. [CrossRef]
29. Wang YM, Shen HW, Chen TJ. Performance of ChatGPT on the phar- macist licensing examination in Taiwan. J Chin Med Assoc. 2023;86(7):653-658. [CrossRef]

Performance of Chat Generative Pretrained Transformer and Bard on the Questions Asked in the Dental Specialty Entrance Examination in Turkey Regarding Bloom’s Revised Taxonomy

Year 2024, Volume: 34 Issue: 1, 25 - 34, 18.01.2024

Rana Turunç Oğuzman Zeliha Zuhal Yurdabakan

https://doi.org/10.5152/CRDS.2024.23261

Cited By: 3

Abstract

Objective: This study aimed to compare the performance of chat generative pretrained trans- former (ChatGPT) (GPT-3.5) and Bard, 2 large language models (LLMs), through multiple-choice dental specialty entrance examination (DUS) questions.
Methods: Dental specialty entrance examination questions related to prosthodontics and oral and dentomaxillofacial radiology up to 2021, excluding visually integrated questions, were prompted into LLMs. Then the LLMs were asked to choose the correct response and specify Bloom’s taxonomy level. After data collection, the LLMs’ ability to recognize Bloom’s taxonomy levels and the correct response rate in different subheadings, the agreement between LLMs on correct and incorrect answers, and the effect of Bloom’s taxonomy level on correct response rates were evaluated. Data were analyzed using McNemar, Chi-square, and Fisher–Freeman–Halton
exact tests, and Yate’s continuity correction and Kappa agreement level were calculated (P < .05).
Results: Notably, the only significant difference was observed between ChatGPT’s correct answer rates for oral and dentomaxillofacial radiology subheadings (P = .042; P < .05). For total prosth- odontic questions, ChatGPT and Bard achieved correct answer rates of 35.7% and 38.9%, respec- tively, while both LLMs achieved a 52.8% correct answer rate for oral and dentomaxillofacial
radiology. Moreover, there was a statistically significant agreement between ChatGPT and Bard on correct and incorrect answers. Bloom’s taxonomy level did not affect the correct response rates significantly.
Conclusion: The performance of ChatGPT and Bard did not demonstrate a reliable result on DUS questions, but considering rapid advancements in these LLMs, this performance gap will probably be closed soon, and these LLMs can be integrated into dental education as an interactive tool.
Keywords: ChatGPT, Bard, artificial intelligence, large language models, dental education, mul- tiple choice questioning

ÖZ
Amaç: Bu çalışmanın amacı, iki büyük dil modeli (LLM) olan ChatGPT (GPT-3,5) ve Bard’ın Diş Hekimliğinde Uzmanlık Eğitimi Giriş Sınavındaki (DUS) çoktan seçmeli sorular üzerindeki perfor- mansını karşılaştırmaktır.
Yöntemler: Görsel içerikli sorular hariç olmak üzere, 2021 yılına kadar olan protetik diş tedavisi ve ağız, diş ve çene radyolojisi ile ilgili DUS soruları LLM’lere sorulmuştur. Daha sonra LLM’lerden doğru yanıtı seçmeleri ve Bloom’un taksonomi düzeyini belirtmeleri istenmiştir. Veriler toplandık- tan sonra, LLM’lerin Bloom taksonomi düzeylerini belirleyebilme becerileri ve farklı alt başlıklardaki doğru yanıt oranları, LLM’ler arasında doğru ve yanlış yanıtlara ilişkin uyumu ve Bloom taksonomi düzeyinin doğru yanıt oranları üzerindeki etkisi değerlendirilmiştir. Veriler Mc Nemar, Ki-kare ve Fisher Freeman Halton Exact testleri kullanılarak analiz edilmiştir, Yate’s Continuity Düzeltmesi ve Kappa uyum düzeyi hesaplanmıştır (P < .05).
Bulgular: ChatGPT’nin doğru cevap oranları arasında tek anlamlı fark ağız, diş ve çene radyolojisi alt başlıkları arasında gözlenmiştir (P: .042; P < .05). Toplam protez soruları için ChatGPT ve Bard sırasıyla %35,7 ve %38,9 oranında doğru cevap verirken, her iki LLM de ağız, diş ve çene radyolojisi için %52,8 oranında doğru cevap vermiştir. Ayrıca, ChatGPT ve Bard arasında doğru ve yanlış cevaplar konusunda istatistiksel olarak anlamlı bir uyum saptanmıştır. Bloom’un taksonomi düzeyi doğru yanıt oranlarını anlamlı derecede etkilememiştir.
Sonuç: ChatGPT ve Bard, DUS soruları üzerinde güvenilir bir performans göstermemiştir, ancak LLM’lerdeki hızlı gelişmeler göz önünde bulundurulduğunda, performans açıkları muhtemelen yakında kapanacak ve bu LLM’ler interaktif bir araç olarak diş hekim- liği eğitimine entegre edilebilecektir.
Anahtar Kelimeler: ChatGPT, Bard, yapay zeka, büyük dil modelleri, diş hekimliği eğitimi, çoktan seçmeli soru

Keywords

ChatGPT, Bard, artificial intelligence, large language models, dental education

References

1. Noda R, Izaki Y, Kitano F, Komatsu J, Ichikawa D, Shigabaki Y. Perfor- mance of ChatGPT and Bard in Self-Assessment Questions for Nephrology Board Renewal. medRxiv [preprint]. 2023. [CrossRef]
2. Eggmann F, Weiger R, Zitzmann NU, Blatz MB. Implications of large language models such as ChatGPT for dental medicine. J Esthet Restor Dent. 2023;35(7):1098-1102. [CrossRef]
3. Suchman K, Garg S, Trindade AJ. Chat Generative Pretrained Trans- former fails the multiple-choice American College of Gastroenterol- ogyself-assessmenttest. AmJGastroenterol. 2023;118(12):2280-2282. [CrossRef]
4. Fatani B. ChatGPT for future medical and dental research. Cureus. 2023;15(4):e37285. [CrossRef] 5. Ali R, Tang OY, Connolly ID, et al. Performance of ChatGPT and GPT-4 on neurosurgery written board examinations. Neurosurgery. 2023;93(6):1353-1365. [CrossRef]
6. Lum ZC. Can artificial intelligence pass the American Board of Ortho- paedic Surgery examination? Orthopaedic residents versus Chat- GPT. Clin Orthop Relat Res. 2023;481(8):1623-1630. [CrossRef]
7. Giannos P, Delardas O. Performance of ChatGPT on UK standardized admission tests: insights from the BMAT, TMUA, LNAT, and TSA examinations. JMIR Med Educ. 2023;9(9):e47737. [CrossRef]
8. Deebel NA, Terlecki R. ChatGPT performance on the American Uro- logical Association self-assessment study program and the poten- tial influence of artificial intelligence in urologic training. Urology. 2023;177:29-33. [CrossRef]
9. Ali R, Tang OY, Connolly ID, et al. Performance of ChatGPT, GPT-4, and Google Bard on a neurosurgery oral boards preparation question bank. Neurosurgery. 2023. [CrossRef]
10. Skalidis I, Cagnina A, Luangphiphat W, et al. ChatGPT takes on the European Exam in core cardiology: an artificial intelligence success story? Eur Heart J Digit Health. 2023;4(3):279-281. [CrossRef]
11. Huh S. Are ChatGPT’s knowledge and interpretation ability compa- rable to those of medical students in Korea for taking a parasitology examination? a descriptive study. J Educ Eval Health Prof. 2023;20:1. [CrossRef]
12. Das D, Kumar N, Longjam LA, et al. Assessing the capability of Chat- GPT in answering first- and second-order knowledge questions on microbiology as per competency-based medical education curricu- lum. Cureus. 2023;15(3):e36034. [CrossRef]
13. Gilson A, Safranek CW, Huang T, et al. How does ChatGPT perform on the United States medical licensing examination? The implica- tions of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9:e45312. [CrossRef]
14. Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLoS Digit Health. 2023;2(2):e0000198. [CrossRef]
15. Gonsalves C. On ChatGPT: what promise remains for multiple choice assessment? JLDHE. [CrossRef]
16. Zaidi NLB, Grob KL, Monrad SM, et al. Pushing critical thinking skills with multiple-choice questions: does Bloom’s taxonomy work? Acad Med. 2018;93(6):856-859. [CrossRef]
17. Grainger R, Dai W, Osborne E, Kenwright D. Medical students create multiple-choice questions for learning in pathology education: A pilot study. BMC Med Educ. 2018;18(1):201. [CrossRef]
18. Elsayed S. Towards Mitigating ChatGPT’s Negative Impact on Educa- tion: Optimizing Question Design through Bloom’s Taxonomy. arXiv [Preprint]. 2023. [CrossRef]
19. Monrad SU, Bibler Zaidi NL, Grob KL, et al. What faculty write versus what students see? Perspectives on multiple-choice questions using Bloom’s taxonomy. Med Teach. 2021;43(5):575-582. [CrossRef]
20. Stringer JK, Santen SA, Lee E, et al. Examining Bloom’s taxonomy in multiple choice questions: students’ approach to questions. Med Sci Educ. 2021;31(4):1311-1317. [CrossRef]
21. Çulhaoğlu AK, Kiliçarslan MA, Deniz KZ. Diş hekimliğinde uzmanlik sinavinin farkli eğitim seviyelerdeki algı ve tercih durumlarinin değerlendirilmesi. Atatürk Üniversitesi Diş Hekimliği Fakültesi Der- gisi. 2021;31(3):1-1. [CrossRef]
22. Ertan AA, DUS Protetik Diş Tedavisi. 4th ed. Ankara: TUSEM Tıbbi Yayıncılık; 2017.
23. Ongole R, Praveen BN, eds. Textbook of Oral Medicine, Oral Diagno- sis, and Oral Radiology. 2nd ed. New Delhi: Elsevier Health Sciences; 2013.
24. Hoch CC, Wollenberg B, Lüers JC, et al. ChatGPT’s quiz skills in dif- ferent otolaryngology subspecialties: an analysis of 2576 single- choice and multiple-choice board certification preparation questions. Eur Arch Otorhinolaryngol. 2023;280(9):4271-4278. [CrossRef]
25. Huynh LM, Bonebrake BT, Schultis K, Quach A, Deibert CM. New arti- ficial intelligence ChatGPT performs poorly on the 2022 self-assess- ment study program for urology. Urol Pract. 2023;10(4):409-415. [CrossRef]
26. Ali MJ. ChatGPT and lacrimal drainage disorders: performance and scope of improvement. Ophthal Plast Reconstr Surg. 2023;39(3):221- 225. [CrossRef]
27. Cascella M, Montomoli J, Bellini V, Bignami E. Evaluating the feasibil- ity of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios. J Med Syst. 2023;47(1):33. [CrossRef] 28. Liévin V, Hother CE, Winther O. Can Large Language Models Reason about Medical Questions? arXiv [preprint]. 2023. [CrossRef]
29. Wang YM, Shen HW, Chen TJ. Performance of ChatGPT on the phar- macist licensing examination in Taiwan. J Chin Med Assoc. 2023;86(7):653-658. [CrossRef]

There are 27 citations in total.

Details

Primary Language	English
Subjects	Prosthodontics
Journal Section	Research Articles
Authors	Rana Turunç Oğuzman This is me Zeliha Zuhal Yurdabakan This is me
Publication Date	January 18, 2024
Submission Date	August 24, 2023
Published in Issue	Year 2024 Volume: 34 Issue: 1

Cite

AMA	Turunç Oğuzman R, Yurdabakan ZZ. Performance of Chat Generative Pretrained Transformer and Bard on the Questions Asked in the Dental Specialty Entrance Examination in Turkey Regarding Bloom’s Revised Taxonomy. Curr Res Dent Sci. January 2024;34(1):25-34. doi:10.5152/CRDS.2024.23261

Cited By

Evaluation of the performance of ChatGPT‐4 and ChatGPT‐4o as a learning tool in endodontics

International Endodontic Journal

https://doi.org/10.1111/iej.14217

Performance of artificial intelligence on Turkish dental specialization exam: can ChatGPT-4.0 and gemini advanced achieve comparable results to humans?

BMC Medical Education

https://doi.org/10.1186/s12909-024-06389-9

Large Language Models in Dental Licensing Examinations: Systematic Review and Meta-Analysis

International Dental Journal

https://doi.org/10.1016/j.identj.2024.10.014

Download Cover Image

Article Files

Full Text

Current Research in Dental Sciences is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

29936