Performance of large language models on prosthodontics questions of the dentistry specialization examination: a comparative analysis (2014–2024)

Hayriye Yasemin Yay Kuşçu; Zuhal Görüş

doi:10.38053/acmj.1789931

EN TR

Performance of large language models on prosthodontics questions of the dentistry specialization examination: a comparative analysis (2014–2024)

Abstract

Aims: This study aimed to comparatively evaluate the performance of five contemporary large language models (LLMs) on prosthodontics questions of the dentistry specialization examination (DUS) between 2014 and 2024. Methods: A total of 167 prosthodontics questions from the DUS were analyzed. The questions were administered to five different LLMs: ChatGPT-5 (OpenAI Inc., USA), Claude 4 (Anthropic, USA), Gemini 1.5 Pro (Google LLC, USA), DeepSeek-V2 (DeepSeek AI, China), and Perplexity Pro (Perplexity AI, USA). The models’ responses were compared with the official answer keys provided by the Student Selection and Placement Center (OSYM), coded as correct or incorrect, and accuracy percentages were calculated. Statistical analyses included the Friedman test, correlation analysis, and frequency distributions. Subsection analyses were also performed to evaluate model performance across different content areas. Results: DeepSeek-V2 achieved the highest overall accuracy rate (70.06%). Perplexity Pro (53.89%) and Gemini 1.5 Pro (51.50%) demonstrated moderate performance, ChatGPT-5 (49.10%) performed close to human levels, while Claude 4 had the lowest accuracy (32.34%). Subsection analyses revealed high accuracy in standardized knowledge areas such as implantology and temporomandibular joint (TMJ) disorders (66.7-100%), whereas notable decreases were observed in occlusion and morphology questions (9.1-53.9%). Correlation analyses indicated significant relationships between certain models. Conclusion: The findings demonstrate heterogeneous performance of LLMs on DUS prosthodontics questions. While these models may serve as supplementary tools for exam preparation and dental education, their variable accuracy and potential for generating misinformation suggest they should not be used independently. Under expert supervision, LLMs may enhance dental education.

Keywords

Büyük dil modellerinin diş hekimliği uzmanlık sınavı protetik diş tedavisi sorularındaki performansı: 2014–2024 karşılaştırmalı analizi

Abstract

Amaç: Bu çalışmanın amacı, 2014–2024 yılları arasında yapılan Diş Hekimliği Uzmanlık Sınavı (DUS) protez sorularında beş güncel büyük dil modelinin (LLM) performansını karşılaştırmalı olarak değerlendirmektir. Yöntem: Toplam 167 protez sorusu analiz edilmiştir. Sorular beş farklı LLM’e yöneltilmiştir: ChatGPT-5 (OpenAI Inc., ABD), Claude 4 (Anthropic, ABD), Gemini 1.5 Pro (Google LLC, ABD), DeepSeek-V2 (DeepSeek AI, Çin) ve Perplexity Pro (Perplexity AI, ABD). Modellerin yanıtları Öğrenci Seçme ve Yerleştirme Merkezi (ÖSYM) tarafından sağlanan resmi cevap anahtarlarıyla karşılaştırılmış, doğru/yanlış olarak kodlanmış ve doğruluk yüzdeleri hesaplanmıştır. İstatistiksel analizlerde Friedman testi, korelasyon analizi ve frekans dağılımları kullanılmıştır. Ayrıca, farklı içerik alanlarında model performansını değerlendirmek için alt bölüm analizleri yapılmıştır. Bulgular: DeepSeek-V2 en yüksek genel doğruluk oranını (%70,06) elde etmiştir. Perplexity Pro (%53,89) ve Gemini 1.5 Pro (%51,50) orta düzey performans göstermiş, ChatGPT-5 (%49,10) insan düzeyine yakın sonuç vermiş, Claude 4 ise en düşük doğruluk oranına (%32,34) ulaşmıştır. Alt bölüm analizlerinde implantoloji ve temporomandibular eklem (TME) bozuklukları gibi standart bilgi alanlarında yüksek doğruluk (%66,7–100) elde edilirken, oklüzyon ve morfoloji sorularında belirgin düşüşler (%9,1–53,9) gözlenmiştir. Korelasyon analizleri, bazı modeller arasında anlamlı ilişkiler olduğunu ortaya koymuştur. Sonuç: Bulgular, DUS protez sorularında büyük dil modellerinin heterojen performans sergilediğini göstermektedir. Bu modeller sınav hazırlığı ve diş hekimliği eğitiminde yardımcı araçlar olarak kullanılabilse de, değişken doğruluk oranları ve yanlış bilgi üretme potansiyelleri nedeniyle tek başına kullanılmaları uygun değildir. Uzman denetimi altında kullanıldığında, LLM’ler diş hekimliği eğitimine katkı sağlayabilir.

Keywords

References

Aura-Tormos JI, Llacer-Martinez M, Torres-Osca I. Educational applications of ChatGPT in university-based dental education. A systematic review. Eur J Dent Educ. 2025. doi:10.1111/eje.70011
Huang Y, Gomaa A, Semrau S, et al. Benchmarking ChatGPT-4 on a radiation oncology in-training exam and Red Journal Gray Zone cases: potentials and challenges for ai-assisted medical education and decision making in radiation oncology. Front Oncol. 2023;13:1265024. doi:10. 3389/fonc.2023.1265024
Ali K, Barhom N, Marino FT, Duggal M. The thrills and chills of ChatGPT: implications for assessments in undergraduate dental education. 2023. doi:10.20944/preprints202302.0513.v1
Uribe SE, Maldupa I, Kavadella A, et al. Artificial intelligence chatbots and large language models in dental education: worldwide survey of educators. Eur J Dent Educ. 2024;28(4):865-876. doi:10.1111/eje.13009
Sallam M, Al-Salahat K. Below average ChatGPT performance in medical microbiology exam compared to university students. Front Educ. 2023;8:1333415. doi:10.3389/feduc.2023.1333415
Chau RCW, Thu KM, Yu OY, et al. Evaluation of Chatbot responses to text-based multiple-choice questions in prosthodontic and restorative dentistry. Dent J. 2025;13(7):279. doi:10.3390/dj13070279
Özyemişci N, Bal BT, Güngör MB, Öztürk EK, Canvar A, Nemli SK. Evaluation of information provided by artificial intelligence chatbots on extraoral maxillofacial prostheses. J Prosthet Dent. 2025;25:3913. doi:10. 1016/j.prosdent.2025.08.028
Sismanoglu S, Capan BS. Performance of artificial intelligence on Turkish dental specialization exam: can ChatGPT-4.0 and gemini advanced achieve comparable results to humans? BMC Med Educ. 2025; 25(1):214. doi:10.1186/s12909-024-06389-9

Details

Primary Language

English

Subjects

Information Systems (Other), Prosthodontics

Journal Section

Research Article

Authors

Hayriye Yasemin Yay Kuşçu ^*
0000-0002-0805-1510
Türkiye

Zuhal Görüş
0000-0003-1114-3333
Türkiye

Publication Date

October 26, 2025

Submission Date

September 23, 2025

Acceptance Date

October 15, 2025

Published in Issue

Year 2025 Volume: 7 Number: 6

DOI

https://doi.org/10.38053/acmj.1789931

IZ

https://izlik.org/JA49ES68WK

Cite

RIS / Bibtex

APA

Yay Kuşçu, H. Y., & Görüş, Z. (2025). Performance of large language models on prosthodontics questions of the dentistry specialization examination: a comparative analysis (2014–2024). Anatolian Current Medical Journal, 7(6), 893-899. https://doi.org/10.38053/acmj.1789931

AMA

1.Yay Kuşçu HY, Görüş Z. Performance of large language models on prosthodontics questions of the dentistry specialization examination: a comparative analysis (2014–2024). Anatolian Curr Med J / ACMJ / acmj. 2025;7(6):893-899. doi:10.38053/acmj.1789931

Chicago

Yay Kuşçu, Hayriye Yasemin, and Zuhal Görüş. 2025. “Performance of Large Language Models on Prosthodontics Questions of the Dentistry Specialization Examination: A Comparative Analysis (2014–2024)”. Anatolian Current Medical Journal 7 (6): 893-99. https://doi.org/10.38053/acmj.1789931.

EndNote

Yay Kuşçu HY, Görüş Z (October 1, 2025) Performance of large language models on prosthodontics questions of the dentistry specialization examination: a comparative analysis (2014–2024). Anatolian Current Medical Journal 7 6 893–899.

IEEE

[1]H. Y. Yay Kuşçu and Z. Görüş, “Performance of large language models on prosthodontics questions of the dentistry specialization examination: a comparative analysis (2014–2024)”, Anatolian Curr Med J / ACMJ / acmj, vol. 7, no. 6, pp. 893–899, Oct. 2025, doi: 10.38053/acmj.1789931.

ISNAD

Yay Kuşçu, Hayriye Yasemin - Görüş, Zuhal. “Performance of Large Language Models on Prosthodontics Questions of the Dentistry Specialization Examination: A Comparative Analysis (2014–2024)”. Anatolian Current Medical Journal 7/6 (October 1, 2025): 893-899. https://doi.org/10.38053/acmj.1789931.

JAMA

1.Yay Kuşçu HY, Görüş Z. Performance of large language models on prosthodontics questions of the dentistry specialization examination: a comparative analysis (2014–2024). Anatolian Curr Med J / ACMJ / acmj. 2025;7:893–899.

MLA

Yay Kuşçu, Hayriye Yasemin, and Zuhal Görüş. “Performance of Large Language Models on Prosthodontics Questions of the Dentistry Specialization Examination: A Comparative Analysis (2014–2024)”. Anatolian Current Medical Journal, vol. 7, no. 6, Oct. 2025, pp. 893-9, doi:10.38053/acmj.1789931.

Vancouver

1.Hayriye Yasemin Yay Kuşçu, Zuhal Görüş. Performance of large language models on prosthodontics questions of the dentistry specialization examination: a comparative analysis (2014–2024). Anatolian Curr Med J / ACMJ / acmj. 2025 Oct. 1;7(6):893-9. doi:10.38053/acmj.1789931

Cited By

A comparative analysis of the performance of large Language models in the dentistry specialty examination

Scientific Reports

https://doi.org/10.1038/s41598-026-37800-8

İntraoral Maksillofasiyal Protez Sorularında Yapay Zeka Tabanlı Sohbet Robotlarının Doğruluk ve Tutarlılığının Değerlendirilmesi

Akdeniz Diş Hekimliği Dergisi

https://doi.org/10.62268/add.1835285