Research Article
BibTex RIS Cite

Comparative Evaluation of ChatGPT-5 and Gemini 2.5 Pro in Answering Oral and Maxillofacial Surgery Questions from Dentistry Specialization Exams: A Cross-Sectional Study

Year 2025, Volume: 4 Issue: 3, 59 - 65, 23.09.2025

Abstract

Objective: Large language models (LLMs), such as ChatGPT-5 (OpenAI) and Gemini 2.5 Pro (Google DeepMind), are increasingly being applied in medicine and dentistry. However, their reliability in high-stakes specialty examinations remains unclear. This study compared the performance of ChatGPT-5 and Gemini 2.5 Pro in answering oral and maxillofacial surgery (OMFS) questions from the Dentistry Specialization Exam (DSE) in Türkiye.
Methods: A total of 128 OMFS questions from 13 DSEs (2012–2021) were presented to both models in Turkish under identical conditions. Responses were compared with official answer keys. Correct and incorrect answers were tabulated, and statistical analyses were conducted using Fisher’s Exact Test with p<0.05 considered significant.
Results: ChatGPT-5 achieved 119 correct (93.0%) and 9 incorrect answers, while Gemini 2.5 Pro achieved 124 correct (96.9%) and 4 incorrect. Although Gemini showed slightly higher accuracy, the difference was not statistically significant (p>0.05). Both models achieved 100% accuracy within several years; however, performance variability was observed, particularly in 2018 and 2019. Both models incorrectly answered four questions, while Gemini correctly answered five items that ChatGPT-5 missed.
Conclusion: Both ChatGPT-5 and Gemini 2.5 Pro demonstrated high accuracy in OMFS questions from DSE, marking a substantial improvement compared to earlier LLMs. While Gemini showed slightly better performance, differences were not significant. These findings suggest that current LLMs may serve as supplementary tools for postgraduate exam preparation in dentistry, though limitations in nuanced clinical reasoning and exam-specific logic persist.

References

  • 1. Feng S, Shen Y. ChatGPT and the future of Medical Education. Acad Med. 2023;98(8):867–8.
  • 2. Dashti M, Ghasemi S, Ghadimi N, Hefzi D, Karimian A, Zare N, et al. Performance of ChatGPT 3.5 and 4 on US dental examinations: the INBDE, ADAT, and DAT. Imaging Science in Dentistry. 2024;54(3):271.
  • 3. Extance A. ChatGPT has entered the classroom: how LLMs could transform education. Nature. 2023;623(7987):474–7.
  • 4. Alhaidry HM, Fatani B, Alrayes JO, Almana AM, Alfhaed NK. ChatGPT in dentistry: a comprehensive review. Cureus. 2023;15:38317.
  • 5. Chakravorty S, Aulakh BK, Shil M, Nepale M, Puthenkandathil R, Syed W. Role of Artificial Intelligence (AI) in dentistry: a literature review. J Pharm Bioallied Sci. 2024; 16(1):14-6.
  • 6. Dashti M, Londono J, Ghasemi S, Khurshid Z, Khosraviani F, Moghaddasi N, et al. Attitudes, knowledge, and perceptions of dentists and dental students toward artificial intelligence: a systematic review. J Taibah Univ Med Sci. 2024;19:327-37.
  • 7. Taşsöker M, Çelik M. Diş hekimliği öğ rencilerinde mezuniyet sonrası kariyer ve uzmanlık motivasyonu. Selcuk Dent J. 2019;6(4):108-111. Turkish
  • 8. Şahin EG. Diş hekimliğinde uzmanlık sınavında sorulmuş endodonti sorularının retrospektif analizi. Turkiye Klinikleri J Dental Sci. 2024;30(1):107-14. Turkish.
  • 9. Ekici Ö, Çalışkan İ. Retrospective Analysis of Oral and Maxillofacial Surgery Questions Asked in the Dentistry Specialization Training Entrance Exam. Turkiye Klinikleri J Dental Sci. 2024;30(4):564-71.
  • 10. Aşık A, Kuru E. Diş Hekimliğinde Uzmanlık Eğitim Giriş Sınavında Sorulan Çocuk Diş Hekimliği Sorularına ChatGPT’nin Verdiği Cevapların Analizi: Kesitsel Araştırma. Turkiye Klinikleri J Dental Sci. 2025;31(3):401-6. Turkish.
  • 11. Bilgin Avşar D, Ertan AA. Diş Hekimliğinde Uzmanlık Sınavında Sorulan Protetik Diş Tedavisi Sorularının ChatGPT-3.5 ve Gemini Tarafından Cevaplanma Performanslarının Karşılaştırmalı Olarak İncelenmesi: Kesitsel Araştırma. Turkiye Klinikleri J Dental Sci. 2024;30(4):668-73. Turkish.
  • 12. Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: potential for AIassisted medical education using large language models. PLOS Digit Health. 2023;2:0000198.
  • 13. Ohta K, Ohta S. The performance of GPT-3.5, GPT-4, and Bard on the Japanese national dentist examination: a comparison study. Cureus. 2023;15(12):50369.
  • 14. Torres-Zegarra BC, Rios-Garcia W, Ñaña-Cordova AM, Arteaga- Cisneros KF, Chalco XCB, Ordoñez MAB, et al. Performance of ChatGPT, Bard, Claude, and Bing on the Peruvian National Licensing Medical examination: a cross-sectional study. J Educ Eval Health Prof. 2023;20:30.
  • 15. Thibaut G, Dabbagh A, Liverneaux P. Does Google’s Bard Chatbot perform better than ChatGPT on the European hand surgery exam? Int Orthop. 2024;48(1):151–8.
  • 16. Eggmann F, Weiger R, Zitzmann NU, Blatz MB. Implications of large language models such as ChatGPT for dental medicine. J Esthet Restor Dent. 2023;35(7):1098–102.
  • 17. Zheng S, Huang J, Chang KC-C. Why Does ChatGPT Fall Short in Providing Truthful Answers? ICBINB Workshop at NeurIPS 2023 [Internet]. Available from: http://arxiv.org/abs/2304.10513.
  • 18. Hopkins AM, Logan JM, Kichenadasse G, Sorich MJ. Artificial intelligence chatbots will revolutionize how cancer patients access information: ChatGPT represents a paradigm-shift. JNCI Cancer Spectr. 2023;7(2):10.
  • 19. Revilla-León M, Barmak BA, Sailer I, Kois JC, Att W. Performance of an Artificial Intelligence-Based Chatbot (ChatGPT) answering the european certification in implant dentistry exam. Int J Prosthodont. 2024;37(2):221-4
  • 20. Chen Y, Huang X, Yang F, Lin H, Lin H, Zheng Z, et al. Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study. BMC Medical Education. 2024;24(1):1372.
  • 21. Liu M, Okuhara T, Chang X, Shirabe R, Nishiie Y, Okada H, et al. Performance of ChatGPT across different versions in medical licensing examinations worldwide: systematic review and metaanalysis. Journal of medical Internet research. 2024;26:60807

ChatGPT-5 ve Gemini 2.5 Pro’nun Diş Hekimliğinde Uzmanlık Sınavı’nda Yer Alan Ağız, Diş ve Çene Cerrahisi Sorularındaki Performanslarının Karşılaştırılması: Kesitsel Çalışma

Year 2025, Volume: 4 Issue: 3, 59 - 65, 23.09.2025

Abstract

Amaç: Yapay zeka tabanlı büyük dil modelleri, özellikle ChatGPT-5 (OpenAI) ve Gemini 2.5 Pro (Google DeepMind), tıp ve diş hekimliğinde giderek daha fazla kullanılmaktadır. Ancak kritik öneme sahip uzmanlık sınavlarındaki güvenilirlikleri belirsizdir. Bu çalışmada, her iki modelin Türkiye’de uygulanan Diş Hekimliğinde Uzmanlık Sınavı’nda (DUS) yer alan ağız, diş ve çene cerrahisi sorularındaki performansı karşılaştırılmıştır.
Yöntem: 2012–2021 yılları arasında yapılmış 13 sınavdan elde edilen toplam 128 ağız, diş ve çene cerrahisi sorusu, modellerin her birine orijinal Türkçe haliyle aynı koşullarda yöneltilmiştir. Yanıtlar resmi cevap anahtarları ile karşılaştırılmış, doğru ve yanlış yanıtlar kaydedilmiştir. İstatistiksel analizlerde Fisher’s Exact Test kullanılmış ve anlamlılık düzeyi p<0,05 olarak belirlenmiştir.
Bulgular: ChatGPT-5 toplamda 119 (%93,0) doğru, 9 (%7,0) yanlış; Gemini 2.5 Pro ise 124 (%96,9) doğru, 4 (%3,1) yanlış yanıt vermiştir. Gemini 2.5 Pro’nun doğruluk oranı daha yüksek olmasına rağmen istatistiksel olarak anlamlı farklılık bulunmamıştır (p>0,05). Her iki model bazı yıllarda %100 doğruluk göstermiş, ancak 2018 ve 2019 yıllarında performans düşüklüğü gözlenmiştir. Dört soru her iki model tarafından yanlış yanıtlanmış, Gemini 2.5 Pro ChatGPT-5’in yanlış cevap verdiği beş soruyu doğru cevaplamıştır.
Sonuç: ChatGPT-5 ve Gemini 2.5 Pro, DUS çene cerrahisi sorularında yüksek doğruluk oranına ulaşmış ve önceki nesil modellere göre önemli bir gelişme göstermiştir. Bulgular bu modellerin diş hekimliği uzmanlık sınav hazırlığında tamamlayıcı araç olarak kullanılabileceğini, ancak klinik muhakeme ve sınava özgü mantıksal ayrıntılarda sınırlılıklarının devam ettiğini göstermektedir.

References

  • 1. Feng S, Shen Y. ChatGPT and the future of Medical Education. Acad Med. 2023;98(8):867–8.
  • 2. Dashti M, Ghasemi S, Ghadimi N, Hefzi D, Karimian A, Zare N, et al. Performance of ChatGPT 3.5 and 4 on US dental examinations: the INBDE, ADAT, and DAT. Imaging Science in Dentistry. 2024;54(3):271.
  • 3. Extance A. ChatGPT has entered the classroom: how LLMs could transform education. Nature. 2023;623(7987):474–7.
  • 4. Alhaidry HM, Fatani B, Alrayes JO, Almana AM, Alfhaed NK. ChatGPT in dentistry: a comprehensive review. Cureus. 2023;15:38317.
  • 5. Chakravorty S, Aulakh BK, Shil M, Nepale M, Puthenkandathil R, Syed W. Role of Artificial Intelligence (AI) in dentistry: a literature review. J Pharm Bioallied Sci. 2024; 16(1):14-6.
  • 6. Dashti M, Londono J, Ghasemi S, Khurshid Z, Khosraviani F, Moghaddasi N, et al. Attitudes, knowledge, and perceptions of dentists and dental students toward artificial intelligence: a systematic review. J Taibah Univ Med Sci. 2024;19:327-37.
  • 7. Taşsöker M, Çelik M. Diş hekimliği öğ rencilerinde mezuniyet sonrası kariyer ve uzmanlık motivasyonu. Selcuk Dent J. 2019;6(4):108-111. Turkish
  • 8. Şahin EG. Diş hekimliğinde uzmanlık sınavında sorulmuş endodonti sorularının retrospektif analizi. Turkiye Klinikleri J Dental Sci. 2024;30(1):107-14. Turkish.
  • 9. Ekici Ö, Çalışkan İ. Retrospective Analysis of Oral and Maxillofacial Surgery Questions Asked in the Dentistry Specialization Training Entrance Exam. Turkiye Klinikleri J Dental Sci. 2024;30(4):564-71.
  • 10. Aşık A, Kuru E. Diş Hekimliğinde Uzmanlık Eğitim Giriş Sınavında Sorulan Çocuk Diş Hekimliği Sorularına ChatGPT’nin Verdiği Cevapların Analizi: Kesitsel Araştırma. Turkiye Klinikleri J Dental Sci. 2025;31(3):401-6. Turkish.
  • 11. Bilgin Avşar D, Ertan AA. Diş Hekimliğinde Uzmanlık Sınavında Sorulan Protetik Diş Tedavisi Sorularının ChatGPT-3.5 ve Gemini Tarafından Cevaplanma Performanslarının Karşılaştırmalı Olarak İncelenmesi: Kesitsel Araştırma. Turkiye Klinikleri J Dental Sci. 2024;30(4):668-73. Turkish.
  • 12. Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: potential for AIassisted medical education using large language models. PLOS Digit Health. 2023;2:0000198.
  • 13. Ohta K, Ohta S. The performance of GPT-3.5, GPT-4, and Bard on the Japanese national dentist examination: a comparison study. Cureus. 2023;15(12):50369.
  • 14. Torres-Zegarra BC, Rios-Garcia W, Ñaña-Cordova AM, Arteaga- Cisneros KF, Chalco XCB, Ordoñez MAB, et al. Performance of ChatGPT, Bard, Claude, and Bing on the Peruvian National Licensing Medical examination: a cross-sectional study. J Educ Eval Health Prof. 2023;20:30.
  • 15. Thibaut G, Dabbagh A, Liverneaux P. Does Google’s Bard Chatbot perform better than ChatGPT on the European hand surgery exam? Int Orthop. 2024;48(1):151–8.
  • 16. Eggmann F, Weiger R, Zitzmann NU, Blatz MB. Implications of large language models such as ChatGPT for dental medicine. J Esthet Restor Dent. 2023;35(7):1098–102.
  • 17. Zheng S, Huang J, Chang KC-C. Why Does ChatGPT Fall Short in Providing Truthful Answers? ICBINB Workshop at NeurIPS 2023 [Internet]. Available from: http://arxiv.org/abs/2304.10513.
  • 18. Hopkins AM, Logan JM, Kichenadasse G, Sorich MJ. Artificial intelligence chatbots will revolutionize how cancer patients access information: ChatGPT represents a paradigm-shift. JNCI Cancer Spectr. 2023;7(2):10.
  • 19. Revilla-León M, Barmak BA, Sailer I, Kois JC, Att W. Performance of an Artificial Intelligence-Based Chatbot (ChatGPT) answering the european certification in implant dentistry exam. Int J Prosthodont. 2024;37(2):221-4
  • 20. Chen Y, Huang X, Yang F, Lin H, Lin H, Zheng Z, et al. Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study. BMC Medical Education. 2024;24(1):1372.
  • 21. Liu M, Okuhara T, Chang X, Shirabe R, Nishiie Y, Okada H, et al. Performance of ChatGPT across different versions in medical licensing examinations worldwide: systematic review and metaanalysis. Journal of medical Internet research. 2024;26:60807
There are 21 citations in total.

Details

Primary Language English
Subjects Surgery (Other)
Journal Section Research Articles
Authors

Ezgi Yüceer Çetiner 0000-0003-4393-9440

Early Pub Date September 23, 2025
Publication Date September 23, 2025
Submission Date September 1, 2025
Acceptance Date September 10, 2025
Published in Issue Year 2025 Volume: 4 Issue: 3

Cite

Vancouver Yüceer Çetiner E. Comparative Evaluation of ChatGPT-5 and Gemini 2.5 Pro in Answering Oral and Maxillofacial Surgery Questions from Dentistry Specialization Exams: A Cross-Sectional Study. EJOMS. 2025;4(3):59-65.

Creative Common Attribution Licence, EJOMS Licence © 2024 by Association of Oral and Maxillofacial Surgery Society is licensed under

Attribution-NonCommercial-NoDerivatives 4.0 International