Evaluation of the performance of artificial intelligence platforms in answering and generating new questions in prosthetic dentistry specialization

Aliye İpek Kuşçu; Gökalp Çınarer; Süha Kuşçu

doi:10.38053/acmj.1848512

Evaluation of the performance of artificial intelligence platforms in answering and generating new questions in prosthetic dentistry specialization

Abstract

Aims: This study aimed to evaluate large language models (LLMs) not only in answering Dentistry Specialty Examination (DUS) questions but also in generating new DUS-format questions, with expert validation of educational and clinical quality. Methods: A total of 130 official DUS questions published between 2012 and 2021 were used to assess answering performance of four LLMs (ChatGPT, Gemini, DeepSeek, and Grok). Additionally, each model generated 20 new multiple-choice questions (n=80), which were independently evaluated by expert prosthodontists for content accuracy, clinical relevance, discriminative capacity, and conformity with DUS standards. Expert-approved questions were subsequently re-answered by all models to enable cross-model performance analysis. Model performances were compared using descriptive statistics, one-sample proportion tests against chance level (p₀=0.20), and inter-model comparisons using Cochran’s Q and McNemar tests. Results: ChatGPT achieved the highest overall accuracy on historical DUS questions (81.3%), followed by Gemini and DeepSeek (72.8% and 70.3%) and Grok (68.8%). In expert-validated AI-generated questions, overall accuracy rates ranged between 71.3% and 78.8% across models, with no statistically significant inter-model difference (Q=3.82, p=0.28). All models performed significantly above chance level (p<0.001). Importantly, question-generation quality and answering performance were not consistently aligned across models. Conclusion: Although LLMs demonstrate statistically significant performance in DUS-style questions, both answering accuracy and educational validity of AI-generated questions require expert supervision. LLMs should be considered supportive tools rather than autonomous agents in high-stakes dental education and assessment contexts.

Keywords

Supporting Institution

There is no Supporting Institution

Project Number

There is no project number

Ethical Statement

There is no Ethical Statement

Thanks

There is no Thanks

References

Aggarwal A, Tam CC, Wu D, Li X, Qiao S. Artificial intelligence-based chatbots for promoting health behavioral changes: systematic review. J Med Internet Res. 2023;25:e40789. doi:10.2196/40789
Fatani B. ChatGPT for future medical and dental research. Cureus. 2023;15(4):e37285. doi:10.7759/cureus.37285
Alhaidry HM, Fatani B, Alrayes JO, Almana AM, Alfhaed NK. ChatGPT in dentistry: a comprehensive review. Cureus. 2023;15(4):e38317. doi:10. 7759/cureus.38317
Ding H, Wu J, Zhao W, Matinlinna JP, Burrow MF, Tsoi JKH. Artificial intelligence in dentistry-a review. Front Dent Med. 2023;4:1085251. doi: 10.3389/fdmed.2023.1085251
Aura-Tormos JI, Llacer-Martinez M, Torres-Osca I. Educational applications of ChatGPT in university-based dental education. A systematic review. Eur J Dent Educ. 2025. doi:10.1111/eje.70011
Baluch W. ChatGPT in a controlled exam: exam-based evidence on student-AI collaboration, teacher-support assessment tools, and emerging cognitive profiles. SSRN Electron J. 2025. doi:10.2139/ssrn.5934534
Krumsvik RJ. Artificial intelligence in nurse education: a new sparring partner? GPT-4 capabilities in formative and summative assessment in the National Examination in Anatomy, Physiology, and Biochemistry. Nord J Digit Lit. 2024(3):172-186. doi:10.18261/njdl.19.3.5
Gan W, Ouyang J, Li H, et al. Integrating ChatGPT in orthopedic education for medical undergraduates: randomized controlled trial. J Med Internet Res. 2024;26:e57037. doi:10.2196/57037

Is EE, Menekseoglu AK. Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o. Clin Rheumatol. 2024;43(11):3507-3513. doi:10.1007/s10067-024-07154-5
Freire Y, Santamaría Laorden A, Orejas Pérez J, Gómez Sánchez M, Díaz-Flores García V, Suárez A. ChatGPT performance in prosthodontics: assessment of accuracy and repeatability in answer generation. J Prosthet Dent. 2024;131(4):659.e1-659.e6. doi:10.1016/j.prosdent.2024.01.018
Revilla-León M, Gómez-Polo M, Vyas S, et al. Artificial intelligence models for tooth-supported fixed and removable prosthodontics: a systematic review. J Prosthet Dent. 2023;129(2):276-292. doi:10.1016/j.prosdent.2021.06.001
Avşar DB, Ertan AA. Diş hekimliğinde uzmanlık sınavında sorulan protetik diş tedavisi sorularının ChatGPT-3.5 ve Gemini tarafından cevaplanma performanslarının karşılaştırmalı olarak incelenmesi: kesitsel araştırma. Turkiye Klinikleri Dishek Bilim Derg. 2024;30(4):668-673. doi:10. 5336/dentalsci.2024-104610
Aşık A, Kuru E. Diş Hekimliğinde uzmanlık eğitim giriş sınavında sorulan çocuk diş hekimliği sorularına ChatGPT'nin verdiği cevapların analizi: kesitsel araştırma. Turkiye Klinikleri J Dental Sci. 2025;31(3):401-406. doi: 10.5336/dentalsci.2024-107488
Danesh A, Pazouki H, Danesh K, Danesh F, Danesh A. The performance of artificial intelligence language models in board-style dental knowledge assessment: a preliminary study on ChatGPT. J Am Dent Assoc. 2023;154(11):970-974. doi:10.1016/j.adaj.2023.07.016
Gilson A, Safranek CW, Huang T, et al. How does ChatGPT perform on the United States Medical Licensing Examination (USMLE)? the implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9:e45312. doi:10.2196/45312
Yildiz Durak H, Onan A. A systematic review of AI-based feedback in educational settings. J Comput Soc Sci. 2025;8(4):96. doi:10.1007/s42001-025-00428-1
Gbr RA, Munassar NMA. AI in e-learning: exploring challenges and limitations: a literature review. J Sci Technol. 2025;30(10). doi:10.20428/jst.v30i10.3160
Gökçearslan S, Tosun C, Erdemir ZG. Benefits, challenges, and methods of artificial intelligence (AI) chatbots in education: a systematic literature review. Int J Technol Educ. 2024;7(1):19-39. doi:10.46328/ijte.600
Zhou J, Zhang J, Wan R, et al. Integrating AI into clinical education: evaluating general practice trainees' proficiency in distinguishing AI-generated hallucinations and impacting factors. BMC Med Educ. 2025; 25(1):406. doi:10.1186/s12909-025-06916-2
Wu Z, Gan W, Xue Z, Ni Z, Zheng X, Zhang Y. Performance of ChatGPT on nursing licensure examinations in the United States and China: cross-sectional study. JMIR Med Educ. 2024;10:e52746. doi:10.2196/52746
Huang RS, Lu KJQ, Meaney C, Kemppainen J, Punnett A, Leung FH. Assessment of resident and AI chatbot performance on the University of Toronto family medicine residency progress test: comparative study. JMIR Med Educ. 2023;9:e50514. doi:10.2196/50514
Yüzbaşıoğlu E. Attitudes and perceptions of dental students towards artificial intelligence. J Dent Educ. 2021;85(1):60-68. doi:10.1002/jdd.12385
Suárez A, Díaz-Flores García V, Algar J, Gómez Sánchez M, Llorente de Pedro M, Freire Y. Unveiling the ChatGPT phenomenon: evaluating the consistency and accuracy of endodontic question answers. Int Endod J. 2024;57(1):108-113. doi:10.1111/iej.13985
Revilla-León M, Barmak BA, Sailer I, Kois JC, Att W. Performance of an artificial intelligence-based chatbot (ChatGPT) answering the European certification in implant dentistry. Int J Prosthodont. 2024;37(2):221-224. doi:10.11607/ijp.8852
Zhai X, Chu X, Chai CS, et al. A review of artificial intelligence (AI) in education from 2010 to 2020. Complexity. 2021;(1):8812542. doi:10. 1155/2021/8812542
Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2):e0000198. doi:10.1371/journal.pdig.0000198
Bhayana R, Krishna S, Bleakney RR. Performance of ChatGPT on a radiology board-style examination: insights into current strengths and limitations. Radiology. 2023;307(5):e230582. doi:10.1148/radiol.230582
Thirunavukarasu AJ, Hassan R, Mahmood S, et al. Trialling a large language model (ChatGPT) in general practice with the Applied Knowledge Test: observational study demonstrating opportunities and limitations in primary care. JMIR Med Educ. 2023;9:e46599. doi:10. 2196/46599
Feigerlova E, Hani H, Hothersall-Davies E. A systematic review of the impact of artificial intelligence on educational outcomes in health professions education. BMC Med Educ. 2025;25(1):129. doi:10.1186/s12909-025-06719-5
Shishehgar S, Murray-Parahi P, Alsharaydeh E, Mills S, Liu X. Artificial intelligence in health education and practice: a systematic review of health students' and academics' knowledge, perceptions and experiences. Int Nurs Rev. 2025;72(2):e70045. doi:10.1111/inr.70045
Derakhshanian S, Wood L, Arruzza E. Perceptions and attitudes of health science students relating to artificial intelligence (AI): a scoping review. Health Sci Rep. 2024;7(8):e2289. doi:10.1002/hsr2.2289
Büttner M, Leser U, Schneider L, Schwendicke F. Natural language processing: chances and challenges in dentistry. J Dent. 2024;141:104796. doi:10.1016/j.jdent.2023.104796
Singhal K, Azizi S, Tu T, et al. Large language models encode clinical knowledge. Nature. 2023;620(7972):172-180. doi:10.1038/s41586-023-06291-2
Nikolovski V, Trajanov D, Chorbev I. Advancing AI in higher education: a comparative study of large language model-based agents for exam question generation, improvement, and evaluation. Algorithms. 2025; 18(3):144. doi:10.3390/a18030144
Geaney A, O'Reilly P, Maxwell P, James JA, McArt D, Salto-Tellez M. Translation of tissue-based artificial intelligence into clinical practice: from discovery to adoption. Oncogene. 2023;42(48):3545-3555. doi:10. 1038/s41388-023-02857-6
Zheng S, Huang J, Chang KC-C. Why does ChatGPT fall short in providing truthful answers? arXiv preprint arXiv:230410513. 2023. doi: 10.48550/arXiv.2304.10513
Fischer L, Ehrlinger L, Geist V, et al. AI system engineering-key challenges and lessons learned. Mach Learn Knowl Extr. 2020;3(1):56-83. doi:10.3390/make3010004
Hopkins AM, Logan JM, Kichenadasse G, Sorich MJ. Artificial intelligence chatbots will revolutionize how cancer patients access information: ChatGPT represents a paradigm-shift. JNCI Cancer Spectr. 2023;7(2):pkad010. doi:10.1093/jncics/pkad010
Burk-Rafel J, Santen SA, Purkiss J. Study behaviors and USMLE Step 1 performance: implications of a student self-directed parallel curriculum. Acad Med. 2017;92(11S):S67-S74. doi:10.1097/ACM.0000000000001916
Taira K, Itaya T, Hanada A. Performance of the large language model ChatGPT on the National Nurse Examinations in Japan: evaluation study. JMIR Nurs. 2023;6:e47305. doi:10.2196/47305
Abou-Hanna JJ, Owens ST, Kinnucan JA, Mian SI, Kolars JC. Resuscitating the Socratic method: student and faculty perspectives on posing probing questions during clinical teaching. Acad Med. 2021; 96(1):113-117. doi:10.1097/ACM.0000000000003580

Details

Primary Language

English

Subjects

Prosthodontics

Journal Section

Research Article

Authors

Aliye İpek Kuşçu ^*
0000-0002-6696-1256
Türkiye

Gökalp Çınarer
0000-0003-0818-6746
Türkiye

Süha Kuşçu
0000-0002-0805-5888
Türkiye

Publication Date

May 22, 2026

Submission Date

December 24, 2025

Acceptance Date

March 23, 2026

Published in Issue

Year 2026 Volume: 8 Number: 3

DOI

https://doi.org/10.38053/acmj.1848512

IZ

https://izlik.org/JA36ZU52UK

Cite

RIS / Bibtex

APA

Kuşçu, A. İ., Çınarer, G., & Kuşçu, S. (2026). Evaluation of the performance of artificial intelligence platforms in answering and generating new questions in prosthetic dentistry specialization. Anatolian Current Medical Journal, 8(3), 409-416. https://doi.org/10.38053/acmj.1848512

AMA

1.Kuşçu Aİ, Çınarer G, Kuşçu S. Evaluation of the performance of artificial intelligence platforms in answering and generating new questions in prosthetic dentistry specialization. Anatolian Curr Med J / ACMJ / acmj. 2026;8(3):409-416. doi:10.38053/acmj.1848512

Chicago

Kuşçu, Aliye İpek, Gökalp Çınarer, and Süha Kuşçu. 2026. “Evaluation of the Performance of Artificial Intelligence Platforms in Answering and Generating New Questions in Prosthetic Dentistry Specialization”. Anatolian Current Medical Journal 8 (3): 409-16. https://doi.org/10.38053/acmj.1848512.

EndNote

Kuşçu Aİ, Çınarer G, Kuşçu S (May 1, 2026) Evaluation of the performance of artificial intelligence platforms in answering and generating new questions in prosthetic dentistry specialization. Anatolian Current Medical Journal 8 3 409–416.

IEEE

[1]A. İ. Kuşçu, G. Çınarer, and S. Kuşçu, “Evaluation of the performance of artificial intelligence platforms in answering and generating new questions in prosthetic dentistry specialization”, Anatolian Curr Med J / ACMJ / acmj, vol. 8, no. 3, pp. 409–416, May 2026, doi: 10.38053/acmj.1848512.

ISNAD

Kuşçu, Aliye İpek - Çınarer, Gökalp - Kuşçu, Süha. “Evaluation of the Performance of Artificial Intelligence Platforms in Answering and Generating New Questions in Prosthetic Dentistry Specialization”. Anatolian Current Medical Journal 8/3 (May 1, 2026): 409-416. https://doi.org/10.38053/acmj.1848512.

JAMA

1.Kuşçu Aİ, Çınarer G, Kuşçu S. Evaluation of the performance of artificial intelligence platforms in answering and generating new questions in prosthetic dentistry specialization. Anatolian Curr Med J / ACMJ / acmj. 2026;8:409–416.

MLA

Kuşçu, Aliye İpek, et al. “Evaluation of the Performance of Artificial Intelligence Platforms in Answering and Generating New Questions in Prosthetic Dentistry Specialization”. Anatolian Current Medical Journal, vol. 8, no. 3, May 2026, pp. 409-16, doi:10.38053/acmj.1848512.

Vancouver

1.Aliye İpek Kuşçu, Gökalp Çınarer, Süha Kuşçu. Evaluation of the performance of artificial intelligence platforms in answering and generating new questions in prosthetic dentistry specialization. Anatolian Curr Med J / ACMJ / acmj. 2026 May 1;8(3):409-16. doi:10.38053/acmj.1848512