Comparative Evaluation of ChatGPT-5 and Gemini 2.5 Pro in Answering Oral and Maxillofacial Surgery Questions from Dentistry Specialization Exams: A Cross-Sectional Study

Ezgi Yüceer Çetiner

Research Article

BibTex

RIS

Cite

Comparative Evaluation of ChatGPT-5 and Gemini 2.5 Pro in Answering Oral and Maxillofacial Surgery Questions from Dentistry Specialization Exams: A Cross-Sectional Study

Year 2025, Volume: 4 Issue: 3, 59 - 65, 23.09.2025

Ezgi Yüceer Çetiner

Abstract

Objective: Large language models (LLMs), such as ChatGPT-5 (OpenAI) and Gemini 2.5 Pro (Google DeepMind), are increasingly being applied in medicine and dentistry. However, their reliability in high-stakes specialty examinations remains unclear. This study compared the performance of ChatGPT-5 and Gemini 2.5 Pro in answering oral and maxillofacial surgery (OMFS) questions from the Dentistry Specialization Exam (DSE) in Türkiye.
Methods: A total of 128 OMFS questions from 13 DSEs (2012–2021) were presented to both models in Turkish under identical conditions. Responses were compared with official answer keys. Correct and incorrect answers were tabulated, and statistical analyses were conducted using Fisher’s Exact Test with p<0.05 considered significant.
Results: ChatGPT-5 achieved 119 correct (93.0%) and 9 incorrect answers, while Gemini 2.5 Pro achieved 124 correct (96.9%) and 4 incorrect. Although Gemini showed slightly higher accuracy, the difference was not statistically significant (p>0.05). Both models achieved 100% accuracy within several years; however, performance variability was observed, particularly in 2018 and 2019. Both models incorrectly answered four questions, while Gemini correctly answered five items that ChatGPT-5 missed.
Conclusion: Both ChatGPT-5 and Gemini 2.5 Pro demonstrated high accuracy in OMFS questions from DSE, marking a substantial improvement compared to earlier LLMs. While Gemini showed slightly better performance, differences were not significant. These findings suggest that current LLMs may serve as supplementary tools for postgraduate exam preparation in dentistry, though limitations in nuanced clinical reasoning and exam-specific logic persist.

Keywords

artificial intelligence , dental education , dentistry specialization exam , oral and maxillofacial surgery

References

1. Feng S, Shen Y. ChatGPT and the future of Medical Education. Acad Med. 2023;98(8):867–8.
2. Dashti M, Ghasemi S, Ghadimi N, Hefzi D, Karimian A, Zare N, et al. Performance of ChatGPT 3.5 and 4 on US dental examinations: the INBDE, ADAT, and DAT. Imaging Science in Dentistry. 2024;54(3):271.
3. Extance A. ChatGPT has entered the classroom: how LLMs could transform education. Nature. 2023;623(7987):474–7.
4. Alhaidry HM, Fatani B, Alrayes JO, Almana AM, Alfhaed NK. ChatGPT in dentistry: a comprehensive review. Cureus. 2023;15:38317.
5. Chakravorty S, Aulakh BK, Shil M, Nepale M, Puthenkandathil R, Syed W. Role of Artificial Intelligence (AI) in dentistry: a literature review. J Pharm Bioallied Sci. 2024; 16(1):14-6.
6. Dashti M, Londono J, Ghasemi S, Khurshid Z, Khosraviani F, Moghaddasi N, et al. Attitudes, knowledge, and perceptions of dentists and dental students toward artificial intelligence: a systematic review. J Taibah Univ Med Sci. 2024;19:327-37.
7. Taşsöker M, Çelik M. Diş hekimliği öğ rencilerinde mezuniyet sonrası kariyer ve uzmanlık motivasyonu. Selcuk Dent J. 2019;6(4):108-111. Turkish
8. Şahin EG. Diş hekimliğinde uzmanlık sınavında sorulmuş endodonti sorularının retrospektif analizi. Turkiye Klinikleri J Dental Sci. 2024;30(1):107-14. Turkish.
9. Ekici Ö, Çalışkan İ. Retrospective Analysis of Oral and Maxillofacial Surgery Questions Asked in the Dentistry Specialization Training Entrance Exam. Turkiye Klinikleri J Dental Sci. 2024;30(4):564-71.
10. Aşık A, Kuru E. Diş Hekimliğinde Uzmanlık Eğitim Giriş Sınavında Sorulan Çocuk Diş Hekimliği Sorularına ChatGPT’nin Verdiği Cevapların Analizi: Kesitsel Araştırma. Turkiye Klinikleri J Dental Sci. 2025;31(3):401-6. Turkish.
11. Bilgin Avşar D, Ertan AA. Diş Hekimliğinde Uzmanlık Sınavında Sorulan Protetik Diş Tedavisi Sorularının ChatGPT-3.5 ve Gemini Tarafından Cevaplanma Performanslarının Karşılaştırmalı Olarak İncelenmesi: Kesitsel Araştırma. Turkiye Klinikleri J Dental Sci. 2024;30(4):668-73. Turkish.
12. Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: potential for AIassisted medical education using large language models. PLOS Digit Health. 2023;2:0000198.
13. Ohta K, Ohta S. The performance of GPT-3.5, GPT-4, and Bard on the Japanese national dentist examination: a comparison study. Cureus. 2023;15(12):50369.
14. Torres-Zegarra BC, Rios-Garcia W, Ñaña-Cordova AM, Arteaga- Cisneros KF, Chalco XCB, Ordoñez MAB, et al. Performance of ChatGPT, Bard, Claude, and Bing on the Peruvian National Licensing Medical examination: a cross-sectional study. J Educ Eval Health Prof. 2023;20:30.
15. Thibaut G, Dabbagh A, Liverneaux P. Does Google’s Bard Chatbot perform better than ChatGPT on the European hand surgery exam? Int Orthop. 2024;48(1):151–8.
16. Eggmann F, Weiger R, Zitzmann NU, Blatz MB. Implications of large language models such as ChatGPT for dental medicine. J Esthet Restor Dent. 2023;35(7):1098–102.
17. Zheng S, Huang J, Chang KC-C. Why Does ChatGPT Fall Short in Providing Truthful Answers? ICBINB Workshop at NeurIPS 2023 [Internet]. Available from: http://arxiv.org/abs/2304.10513.
18. Hopkins AM, Logan JM, Kichenadasse G, Sorich MJ. Artificial intelligence chatbots will revolutionize how cancer patients access information: ChatGPT represents a paradigm-shift. JNCI Cancer Spectr. 2023;7(2):10.
19. Revilla-León M, Barmak BA, Sailer I, Kois JC, Att W. Performance of an Artificial Intelligence-Based Chatbot (ChatGPT) answering the european certification in implant dentistry exam. Int J Prosthodont. 2024;37(2):221-4
20. Chen Y, Huang X, Yang F, Lin H, Lin H, Zheng Z, et al. Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study. BMC Medical Education. 2024;24(1):1372.
21. Liu M, Okuhara T, Chang X, Shirabe R, Nishiie Y, Okada H, et al. Performance of ChatGPT across different versions in medical licensing examinations worldwide: systematic review and metaanalysis. Journal of medical Internet research. 2024;26:60807

ChatGPT-5 ve Gemini 2.5 Pro’nun Diş Hekimliğinde Uzmanlık Sınavı’nda Yer Alan Ağız, Diş ve Çene Cerrahisi Sorularındaki Performanslarının Karşılaştırılması: Kesitsel Çalışma

Year 2025, Volume: 4 Issue: 3, 59 - 65, 23.09.2025

Ezgi Yüceer Çetiner

Abstract

Amaç: Yapay zeka tabanlı büyük dil modelleri, özellikle ChatGPT-5 (OpenAI) ve Gemini 2.5 Pro (Google DeepMind), tıp ve diş hekimliğinde giderek daha fazla kullanılmaktadır. Ancak kritik öneme sahip uzmanlık sınavlarındaki güvenilirlikleri belirsizdir. Bu çalışmada, her iki modelin Türkiye’de uygulanan Diş Hekimliğinde Uzmanlık Sınavı’nda (DUS) yer alan ağız, diş ve çene cerrahisi sorularındaki performansı karşılaştırılmıştır.
Yöntem: 2012–2021 yılları arasında yapılmış 13 sınavdan elde edilen toplam 128 ağız, diş ve çene cerrahisi sorusu, modellerin her birine orijinal Türkçe haliyle aynı koşullarda yöneltilmiştir. Yanıtlar resmi cevap anahtarları ile karşılaştırılmış, doğru ve yanlış yanıtlar kaydedilmiştir. İstatistiksel analizlerde Fisher’s Exact Test kullanılmış ve anlamlılık düzeyi p<0,05 olarak belirlenmiştir.
Bulgular: ChatGPT-5 toplamda 119 (%93,0) doğru, 9 (%7,0) yanlış; Gemini 2.5 Pro ise 124 (%96,9) doğru, 4 (%3,1) yanlış yanıt vermiştir. Gemini 2.5 Pro’nun doğruluk oranı daha yüksek olmasına rağmen istatistiksel olarak anlamlı farklılık bulunmamıştır (p>0,05). Her iki model bazı yıllarda %100 doğruluk göstermiş, ancak 2018 ve 2019 yıllarında performans düşüklüğü gözlenmiştir. Dört soru her iki model tarafından yanlış yanıtlanmış, Gemini 2.5 Pro ChatGPT-5’in yanlış cevap verdiği beş soruyu doğru cevaplamıştır.
Sonuç: ChatGPT-5 ve Gemini 2.5 Pro, DUS çene cerrahisi sorularında yüksek doğruluk oranına ulaşmış ve önceki nesil modellere göre önemli bir gelişme göstermiştir. Bulgular bu modellerin diş hekimliği uzmanlık sınav hazırlığında tamamlayıcı araç olarak kullanılabileceğini, ancak klinik muhakeme ve sınava özgü mantıksal ayrıntılarda sınırlılıklarının devam ettiğini göstermektedir.

Keywords

ağız diş ve çene cerrahisi , diş hekimliği eğitimi , diş hekimliğinde uzmanlık sınavı , yapay zeka

References

1. Feng S, Shen Y. ChatGPT and the future of Medical Education. Acad Med. 2023;98(8):867–8.
2. Dashti M, Ghasemi S, Ghadimi N, Hefzi D, Karimian A, Zare N, et al. Performance of ChatGPT 3.5 and 4 on US dental examinations: the INBDE, ADAT, and DAT. Imaging Science in Dentistry. 2024;54(3):271.
3. Extance A. ChatGPT has entered the classroom: how LLMs could transform education. Nature. 2023;623(7987):474–7.
4. Alhaidry HM, Fatani B, Alrayes JO, Almana AM, Alfhaed NK. ChatGPT in dentistry: a comprehensive review. Cureus. 2023;15:38317.
5. Chakravorty S, Aulakh BK, Shil M, Nepale M, Puthenkandathil R, Syed W. Role of Artificial Intelligence (AI) in dentistry: a literature review. J Pharm Bioallied Sci. 2024; 16(1):14-6.
6. Dashti M, Londono J, Ghasemi S, Khurshid Z, Khosraviani F, Moghaddasi N, et al. Attitudes, knowledge, and perceptions of dentists and dental students toward artificial intelligence: a systematic review. J Taibah Univ Med Sci. 2024;19:327-37.
7. Taşsöker M, Çelik M. Diş hekimliği öğ rencilerinde mezuniyet sonrası kariyer ve uzmanlık motivasyonu. Selcuk Dent J. 2019;6(4):108-111. Turkish
8. Şahin EG. Diş hekimliğinde uzmanlık sınavında sorulmuş endodonti sorularının retrospektif analizi. Turkiye Klinikleri J Dental Sci. 2024;30(1):107-14. Turkish.
9. Ekici Ö, Çalışkan İ. Retrospective Analysis of Oral and Maxillofacial Surgery Questions Asked in the Dentistry Specialization Training Entrance Exam. Turkiye Klinikleri J Dental Sci. 2024;30(4):564-71.
10. Aşık A, Kuru E. Diş Hekimliğinde Uzmanlık Eğitim Giriş Sınavında Sorulan Çocuk Diş Hekimliği Sorularına ChatGPT’nin Verdiği Cevapların Analizi: Kesitsel Araştırma. Turkiye Klinikleri J Dental Sci. 2025;31(3):401-6. Turkish.
11. Bilgin Avşar D, Ertan AA. Diş Hekimliğinde Uzmanlık Sınavında Sorulan Protetik Diş Tedavisi Sorularının ChatGPT-3.5 ve Gemini Tarafından Cevaplanma Performanslarının Karşılaştırmalı Olarak İncelenmesi: Kesitsel Araştırma. Turkiye Klinikleri J Dental Sci. 2024;30(4):668-73. Turkish.
12. Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: potential for AIassisted medical education using large language models. PLOS Digit Health. 2023;2:0000198.
13. Ohta K, Ohta S. The performance of GPT-3.5, GPT-4, and Bard on the Japanese national dentist examination: a comparison study. Cureus. 2023;15(12):50369.
14. Torres-Zegarra BC, Rios-Garcia W, Ñaña-Cordova AM, Arteaga- Cisneros KF, Chalco XCB, Ordoñez MAB, et al. Performance of ChatGPT, Bard, Claude, and Bing on the Peruvian National Licensing Medical examination: a cross-sectional study. J Educ Eval Health Prof. 2023;20:30.
15. Thibaut G, Dabbagh A, Liverneaux P. Does Google’s Bard Chatbot perform better than ChatGPT on the European hand surgery exam? Int Orthop. 2024;48(1):151–8.
16. Eggmann F, Weiger R, Zitzmann NU, Blatz MB. Implications of large language models such as ChatGPT for dental medicine. J Esthet Restor Dent. 2023;35(7):1098–102.
17. Zheng S, Huang J, Chang KC-C. Why Does ChatGPT Fall Short in Providing Truthful Answers? ICBINB Workshop at NeurIPS 2023 [Internet]. Available from: http://arxiv.org/abs/2304.10513.
18. Hopkins AM, Logan JM, Kichenadasse G, Sorich MJ. Artificial intelligence chatbots will revolutionize how cancer patients access information: ChatGPT represents a paradigm-shift. JNCI Cancer Spectr. 2023;7(2):10.
19. Revilla-León M, Barmak BA, Sailer I, Kois JC, Att W. Performance of an Artificial Intelligence-Based Chatbot (ChatGPT) answering the european certification in implant dentistry exam. Int J Prosthodont. 2024;37(2):221-4
20. Chen Y, Huang X, Yang F, Lin H, Lin H, Zheng Z, et al. Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study. BMC Medical Education. 2024;24(1):1372.
21. Liu M, Okuhara T, Chang X, Shirabe R, Nishiie Y, Okada H, et al. Performance of ChatGPT across different versions in medical licensing examinations worldwide: systematic review and metaanalysis. Journal of medical Internet research. 2024;26:60807

There are 21 citations in total.

Details

Primary Language	English
Subjects	Surgery (Other)
Journal Section	Research Articles
Authors	Ezgi Yüceer Çetiner 0000-0003-4393-9440
Early Pub Date	September 23, 2025
Publication Date	September 23, 2025
Submission Date	September 1, 2025
Acceptance Date	September 10, 2025
Published in Issue	Year 2025 Volume: 4 Issue: 3

Cite

Vancouver	Yüceer Çetiner E. Comparative Evaluation of ChatGPT-5 and Gemini 2.5 Pro in Answering Oral and Maxillofacial Surgery Questions from Dentistry Specialization Exams: A Cross-Sectional Study. EJOMS. 2025;4(3):59-65.

Article Files

Full Text

Attribution-NonCommercial-NoDerivatives 4.0 International