Research Article
BibTex RIS Cite

Performance of large language models on prosthodontics questions of the dentistry specialization examination: a comparative analysis (2014–2024)

Year 2025, Volume: 7 Issue: 6, 893 - 899, 26.10.2025
https://doi.org/10.38053/acmj.1789931

Abstract

Aims: This study aimed to comparatively evaluate the performance of five contemporary large language models (LLMs) on prosthodontics questions of the dentistry specialization examination (DUS) between 2014 and 2024.
Methods: A total of 167 prosthodontics questions from the DUS were analyzed. The questions were administered to five different LLMs: ChatGPT-5 (OpenAI Inc., USA), Claude 4 (Anthropic, USA), Gemini 1.5 Pro (Google LLC, USA), DeepSeek-V2 (DeepSeek AI, China), and Perplexity Pro (Perplexity AI, USA). The models’ responses were compared with the official answer keys provided by the Student Selection and Placement Center (OSYM), coded as correct or incorrect, and accuracy percentages were calculated. Statistical analyses included the Friedman test, correlation analysis, and frequency distributions. Subsection analyses were also performed to evaluate model performance across different content areas.
Results: DeepSeek-V2 achieved the highest overall accuracy rate (70.06%). Perplexity Pro (53.89%) and Gemini 1.5 Pro (51.50%) demonstrated moderate performance, ChatGPT-5 (49.10%) performed close to human levels, while Claude 4 had the lowest accuracy (32.34%). Subsection analyses revealed high accuracy in standardized knowledge areas such as implantology and temporomandibular joint (TMJ) disorders (66.7-100%), whereas notable decreases were observed in occlusion and morphology questions (9.1-53.9%). Correlation analyses indicated significant relationships between certain models.
Conclusion: The findings demonstrate heterogeneous performance of LLMs on DUS prosthodontics questions. While these models may serve as supplementary tools for exam preparation and dental education, their variable accuracy and potential for generating misinformation suggest they should not be used independently. Under expert supervision, LLMs may enhance dental education.

References

  • Aura-Tormos JI, Llacer-Martinez M, Torres-Osca I. Educational applications of <scp>ChatGPT</scp> in university-based dental education. A systematic review. Eur J Dent Educ. 2025. doi:10.1111/eje.70011
  • Huang Y, Gomaa A, Semrau S, et al. Benchmarking ChatGPT-4 on a radiation oncology in-training exam and Red Journal Gray Zone cases: potentials and challenges for ai-assisted medical education and decision making in radiation oncology. Front Oncol. 2023;13:1265024. doi:10. 3389/fonc.2023.1265024
  • Ali K, Barhom N, Marino FT, Duggal M. The thrills and chills of ChatGPT: implications for assessments in undergraduate dental education. 2023. doi:10.20944/preprints202302.0513.v1
  • Uribe SE, Maldupa I, Kavadella A, et al. Artificial intelligence chatbots and large language models in dental education: worldwide survey of educators. Eur J Dent Educ. 2024;28(4):865-876. doi:10.1111/eje.13009
  • Sallam M, Al-Salahat K. Below average ChatGPT performance in medical microbiology exam compared to university students. Front Educ. 2023;8:1333415. doi:10.3389/feduc.2023.1333415
  • Chau RCW, Thu KM, Yu OY, et al. Evaluation of Chatbot responses to text-based multiple-choice questions in prosthodontic and restorative dentistry. Dent J. 2025;13(7):279. doi:10.3390/dj13070279
  • Özyemişci N, Bal BT, Güngör MB, Öztürk EK, Canvar A, Nemli SK. Evaluation of information provided by artificial intelligence chatbots on extraoral maxillofacial prostheses. J Prosthet Dent. 2025;25:3913. doi:10. 1016/j.prosdent.2025.08.028
  • Sismanoglu S, Capan BS. Performance of artificial intelligence on Turkish dental specialization exam: can ChatGPT-4.0 and gemini advanced achieve comparable results to humans? BMC Med Educ. 2025; 25(1):214. doi:10.1186/s12909-024-06389-9
  • Aşık A, Kuru E. Analysis of ChatGPT’s answers to pedodontics questions asked in the dentistry specialization training entrance exam: cross-sectional study. Turkiye Klin J Dent Sci. 2025;31(3):401-406. doi:10. 5336/dentalsci.2024-107488
  • Bilgin Avşar D, Ertan AA. A Comparative study of ChatGPT-3.5 and Gemini’s performance of answering the prosthetic dentistry questions in dentistry specialty exam: cross-sectional study. Turkiye Klin J Dent Sci. 2024;30(4):668-673. doi:10.5336/dentalsci.2024-104610
  • Fujimoto M, Kuroda H, Katayama T, et al. Evaluating large language models in dental anesthesiology: a comparative analysis of ChatGPT-4, Claude 3 Opus, and Gemini 1.0 on the Japanese Dental Society of Anesthesiology Board Certification Exam. Cureus. 2024;16(9):e70302. doi:10.7759/cureus.70302
  • Kavadella A, Dias da Silva MA, Kaklamanos EG, Stamatopoulos V, Giannakopoulos K. Evaluation of ChatGPT’s real-life implementation in undergraduate dental education: mixed methods study. JMIR Med Educ. 2024;10:e51344. doi:10.2196/51344
  • Topdağı B, Kavaz T. Assessment of information quality in contemporary artificial intelligence systems for digital smile design: a comparative analysis. J Prosthet Dent. 2025;134(4):1279.e1-1279.e8. doi:10.1016/j.prosdent.2025.06.030

Büyük dil modellerinin diş hekimliği uzmanlık sınavı protetik diş tedavisi sorularındaki performansı: 2014–2024 karşılaştırmalı analizi

Year 2025, Volume: 7 Issue: 6, 893 - 899, 26.10.2025
https://doi.org/10.38053/acmj.1789931

Abstract

Amaç: Bu çalışmanın amacı, 2014–2024 yılları arasında yapılan Diş Hekimliği Uzmanlık Sınavı (DUS) protez sorularında beş güncel büyük dil modelinin (LLM) performansını karşılaştırmalı olarak değerlendirmektir.
Yöntem: Toplam 167 protez sorusu analiz edilmiştir. Sorular beş farklı LLM’e yöneltilmiştir: ChatGPT-5 (OpenAI Inc., ABD), Claude 4 (Anthropic, ABD), Gemini 1.5 Pro (Google LLC, ABD), DeepSeek-V2 (DeepSeek AI, Çin) ve Perplexity Pro (Perplexity AI, ABD). Modellerin yanıtları Öğrenci Seçme ve Yerleştirme Merkezi (ÖSYM) tarafından sağlanan resmi cevap anahtarlarıyla karşılaştırılmış, doğru/yanlış olarak kodlanmış ve doğruluk yüzdeleri hesaplanmıştır. İstatistiksel analizlerde Friedman testi, korelasyon analizi ve frekans dağılımları kullanılmıştır. Ayrıca, farklı içerik alanlarında model performansını değerlendirmek için alt bölüm analizleri yapılmıştır.
Bulgular: DeepSeek-V2 en yüksek genel doğruluk oranını (%70,06) elde etmiştir. Perplexity Pro (%53,89) ve Gemini 1.5 Pro (%51,50) orta düzey performans göstermiş, ChatGPT-5 (%49,10) insan düzeyine yakın sonuç vermiş, Claude 4 ise en düşük doğruluk oranına (%32,34) ulaşmıştır. Alt bölüm analizlerinde implantoloji ve temporomandibular eklem (TME) bozuklukları gibi standart bilgi alanlarında yüksek doğruluk (%66,7–100) elde edilirken, oklüzyon ve morfoloji sorularında belirgin düşüşler (%9,1–53,9) gözlenmiştir. Korelasyon analizleri, bazı modeller arasında anlamlı ilişkiler olduğunu ortaya koymuştur.
Sonuç: Bulgular, DUS protez sorularında büyük dil modellerinin heterojen performans sergilediğini göstermektedir. Bu modeller sınav hazırlığı ve diş hekimliği eğitiminde yardımcı araçlar olarak kullanılabilse de, değişken doğruluk oranları ve yanlış bilgi üretme potansiyelleri nedeniyle tek başına kullanılmaları uygun değildir. Uzman denetimi altında kullanıldığında, LLM’ler diş hekimliği eğitimine katkı sağlayabilir.

References

  • Aura-Tormos JI, Llacer-Martinez M, Torres-Osca I. Educational applications of <scp>ChatGPT</scp> in university-based dental education. A systematic review. Eur J Dent Educ. 2025. doi:10.1111/eje.70011
  • Huang Y, Gomaa A, Semrau S, et al. Benchmarking ChatGPT-4 on a radiation oncology in-training exam and Red Journal Gray Zone cases: potentials and challenges for ai-assisted medical education and decision making in radiation oncology. Front Oncol. 2023;13:1265024. doi:10. 3389/fonc.2023.1265024
  • Ali K, Barhom N, Marino FT, Duggal M. The thrills and chills of ChatGPT: implications for assessments in undergraduate dental education. 2023. doi:10.20944/preprints202302.0513.v1
  • Uribe SE, Maldupa I, Kavadella A, et al. Artificial intelligence chatbots and large language models in dental education: worldwide survey of educators. Eur J Dent Educ. 2024;28(4):865-876. doi:10.1111/eje.13009
  • Sallam M, Al-Salahat K. Below average ChatGPT performance in medical microbiology exam compared to university students. Front Educ. 2023;8:1333415. doi:10.3389/feduc.2023.1333415
  • Chau RCW, Thu KM, Yu OY, et al. Evaluation of Chatbot responses to text-based multiple-choice questions in prosthodontic and restorative dentistry. Dent J. 2025;13(7):279. doi:10.3390/dj13070279
  • Özyemişci N, Bal BT, Güngör MB, Öztürk EK, Canvar A, Nemli SK. Evaluation of information provided by artificial intelligence chatbots on extraoral maxillofacial prostheses. J Prosthet Dent. 2025;25:3913. doi:10. 1016/j.prosdent.2025.08.028
  • Sismanoglu S, Capan BS. Performance of artificial intelligence on Turkish dental specialization exam: can ChatGPT-4.0 and gemini advanced achieve comparable results to humans? BMC Med Educ. 2025; 25(1):214. doi:10.1186/s12909-024-06389-9
  • Aşık A, Kuru E. Analysis of ChatGPT’s answers to pedodontics questions asked in the dentistry specialization training entrance exam: cross-sectional study. Turkiye Klin J Dent Sci. 2025;31(3):401-406. doi:10. 5336/dentalsci.2024-107488
  • Bilgin Avşar D, Ertan AA. A Comparative study of ChatGPT-3.5 and Gemini’s performance of answering the prosthetic dentistry questions in dentistry specialty exam: cross-sectional study. Turkiye Klin J Dent Sci. 2024;30(4):668-673. doi:10.5336/dentalsci.2024-104610
  • Fujimoto M, Kuroda H, Katayama T, et al. Evaluating large language models in dental anesthesiology: a comparative analysis of ChatGPT-4, Claude 3 Opus, and Gemini 1.0 on the Japanese Dental Society of Anesthesiology Board Certification Exam. Cureus. 2024;16(9):e70302. doi:10.7759/cureus.70302
  • Kavadella A, Dias da Silva MA, Kaklamanos EG, Stamatopoulos V, Giannakopoulos K. Evaluation of ChatGPT’s real-life implementation in undergraduate dental education: mixed methods study. JMIR Med Educ. 2024;10:e51344. doi:10.2196/51344
  • Topdağı B, Kavaz T. Assessment of information quality in contemporary artificial intelligence systems for digital smile design: a comparative analysis. J Prosthet Dent. 2025;134(4):1279.e1-1279.e8. doi:10.1016/j.prosdent.2025.06.030
There are 13 citations in total.

Details

Primary Language English
Subjects Information Systems (Other), Prosthodontics
Journal Section Research Article
Authors

Hayriye Yasemin Yay Kuşçu 0000-0002-0805-1510

Zuhal Görüş 0000-0003-1114-3333

Submission Date September 23, 2025
Acceptance Date October 15, 2025
Publication Date October 26, 2025
Published in Issue Year 2025 Volume: 7 Issue: 6

Cite

AMA Yay Kuşçu HY, Görüş Z. Performance of large language models on prosthodontics questions of the dentistry specialization examination: a comparative analysis (2014–2024). Anatolian Curr Med J / ACMJ / acmj. October 2025;7(6):893-899. doi:10.38053/acmj.1789931

TR DİZİN ULAKBİM and International Indexes (1b)

Interuniversity Board (UAK) Equivalency:  Article published in Ulakbim TR Index journal [10 POINTS], and Article published in other (excuding 1a, b, c) international indexed journal (1d) [5 POINTS]

Note: Our journal is not WOS indexed and therefore is not classified as Q.

You can download Council of Higher Education (CoHG) [Yüksek Öğretim Kurumu (YÖK)] Criteria) decisions about predatory/questionable journals and the author's clarification text and journal charge policy from your browser. https://dergipark.org.tr/tr/journal/3449/file/4924/show

Journal Indexes and Platforms: 

TR Dizin ULAKBİM, Google Scholar, Crossref, Worldcat (OCLC), DRJI, EuroPub, OpenAIRE, Turkiye Citation Index, Turk Medline, ROAD, ICI World of Journal's, Index Copernicus, ASOS Index, General Impact Factor, Scilit.


The indexes of the journal's are;

18596


asos-index.png

f9ab67f.png

WorldCat_Logo_H_Color.png

      logo-large-explore.png

images?q=tbn:ANd9GcQgDnBwx0yUPRKuetgIurtELxYERFv20CPAUcPe4jYrrJiwXzac8rGXlzd57gl8iikb1Tk&usqp=CAU

index_copernicus.jpg


84039476_619085835534619_7808805634291269632_n.jpg





The platforms of the journal's are;

COPE.jpg

images?q=tbn:ANd9GcTbq2FM8NTdXECzlOUCeKQ1dvrISFL-LhxhC7zy1ZQeJk-GGKSx2XkWQvrsHxcfhtfHWxM&usqp=CAUicmje_1_orig.png


ncbi.png

ORCID_logo.pngimages?q=tbn:ANd9GcQlwX77nfpy3Bu9mpMBZa0miWT2sRt2zjAPJKg2V69ODTrjZM1nT1BbhWzTVPsTNKJMZzQ&usqp=CAU


images?q=tbn:ANd9GcTaWSousoprPWGwE-qxwxGH2y0ByZ_zdLMN-Oq93MsZpBVFOTfxi9uXV7tdr39qvyE-U0I&usqp=CAU






The
 
indexes/platforms of the journal are;

TR Dizin Ulakbim, Crossref (DOI), Google Scholar, EuroPub, Directory of Research Journal İndexing (DRJI), Worldcat (OCLC), OpenAIRE, ASOS Index, ROAD, Turkiye Citation Index, ICI World of Journal's, Index Copernicus, Turk Medline, General Impact Factor, Scilit 


Journal articles are evaluated as "Double-Blind Peer Review"

All articles published in this journal are licensed under a Creative Commons Attribution 4.0 International License (CC BY NC ND)