Research Article

Evaluation of accuracy, clinical reliability and readability of LLM-based chatbot responses in prosthetic dentistry FAQs

Volume: 8 Number: 3 May 22, 2026

Evaluation of accuracy, clinical reliability and readability of LLM-based chatbot responses in prosthetic dentistry FAQs

Abstract

Aims: Evidence comparing multiple contemporary large language model (LLM)-based chatbots in prosthetic dentistry using multidimensional outcome measures remains limited. This study comparatively evaluated the responses generated by ChatGPT, Gemini, Copilot and DeepSeek to frequently asked questions (FAQs) related to prosthetic dentistry in terms of accuracy, clinical reliability and readability. Methods: Thirty-nine FAQs obtained from publicly available patient education resources were equally distributed across fixed, removable and implant-supported prosthesis categories (n=13 each). Questions were submitted in Turkish on the same day under standardized conditions to ChatGPT, Gemini, Copilot and DeepSeek Chatbots, all of which were accessed through their publicly available web interfaces. Responses generated in Turkish were independently scored by three prosthodontists using five-point Likert scales to assess accuracy and clinical reliability. Readability was assessed using the Ateşman and Bezirci-Yılmaz formulas. Inter-rater agreement was analyzed using the intraclass correlation coefficient (ICC). Repeated-measures comparisons were performed using the Friedman test, followed by Bonferroni-adjusted pairwise Wilcoxon signed-rank tests. Effect sizes were reported using Kendall’s W. Results: Inter-rater agreement was high for accuracy (ICC=0.86) and clinical reliability (ICC=0.83). Significant inter-system differences were observed in accuracy, clinical reliability, and readability outcomes (all p<0.001; Kendall’s W=0.31-0.46). ChatGPT demonstrated the highest accuracy and most favorable readability values, whereas Gemini showed the highest clinical reliability scores. Copilot and DeepSeek generally exhibited lower performances. Implant-related questions yielded significantly lower accuracy and reliability scores than fixed and removable prosthesis questions (p<0.05). Conclusion: LLM-based chatbots demonstrated heterogeneous performance in answering questions related to prosthetic dentistry. Although some systems may assist preliminary patient education, meaningful differences in clinical reliability and readability indicate that chatbot outputs should be interpreted cautiously and reviewed by dental professionals, particularly for implant-related topics.

Keywords

Supporting Institution

The authors received no financial support for the conduct or publication of this research.

Ethical Statement

This article does not require ethics committee approval as it does not involve human or animal studies

References

  1. Molena KF, Macedo AP, Ijaz A, et al. Assessing the accuracy, completeness, and reliability of artificial intelligence-generated responses in dentistry: a pilot study evaluating the ChatGPT model. Cureus. 2024;16(7):e65658. doi:10.7759/cureus.65658
  2. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 28 2015;521(7553): 436-444. doi:10.1038/nature14539
  3. Cook MJ, Yao L, Wang X. Facilitating accurate health provider directories using natural language processing. BMC Med Inform Decis Mak. 2019; 19(Suppl 3):80. doi:10.1186/s12911-019-0788-x
  4. Iannantuono GM, Bracken-Clarke D, Floudas CS, Roselli M, Gulley JL, Karzai F. Applications of large language models in cancer care: current evidence and future perspectives. Front Oncol. 2023;13:1268915. doi:10. 3389/fonc.2023.1268915
  5. Zhang P, Kamel Boulos MN. Generative AI in medicine and healthcare: promises, opportunities and challenges. Future Internet. 2023;15(9):286. doi:10.3390/fi15090286
  6. Sallam M. Reply to Moreno et al. Comment on "Sallam, M. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healthcare. 2023; 11(22):2955. doi:10.3390/healthcare11222955
  7. Deiana G, Dettori M, Arghittu A, Azara A, Gabutti G, Castiglia P. Artificial intelligence and public health: evaluating ChatGPT responses to vaccination myths and misconceptions. Vaccines. 2023;11(7):1217. doi:10.3390/vaccines11071217
  8. Alhur A. Redefining healthcare with artificial intelligence, this study examines the contributions of ChatGPT, Gemini, and Copilot. Cureus. 2024;16(4):e57795. doi:10.7759/cureus.57795

Details

Primary Language

English

Subjects

Dental Public Health

Journal Section

Research Article

Publication Date

May 22, 2026

Submission Date

March 9, 2026

Acceptance Date

May 2, 2026

Published in Issue

Year 2026 Volume: 8 Number: 3

APA
Tartuk, B. K., & Altıntaş, E. (2026). Evaluation of accuracy, clinical reliability and readability of LLM-based chatbot responses in prosthetic dentistry FAQs. Anatolian Current Medical Journal, 8(3), 513-523. https://izlik.org/JA49FE48DR
AMA
1.Tartuk BK, Altıntaş E. Evaluation of accuracy, clinical reliability and readability of LLM-based chatbot responses in prosthetic dentistry FAQs. Anatolian Curr Med J / ACMJ / acmj. 2026;8(3):513-523. https://izlik.org/JA49FE48DR
Chicago
Tartuk, Bülent Kadir, and Eyyüp Altıntaş. 2026. “Evaluation of Accuracy, Clinical Reliability and Readability of LLM-Based Chatbot Responses in Prosthetic Dentistry FAQs”. Anatolian Current Medical Journal 8 (3): 513-23. https://izlik.org/JA49FE48DR.
EndNote
Tartuk BK, Altıntaş E (May 1, 2026) Evaluation of accuracy, clinical reliability and readability of LLM-based chatbot responses in prosthetic dentistry FAQs. Anatolian Current Medical Journal 8 3 513–523.
IEEE
[1]B. K. Tartuk and E. Altıntaş, “Evaluation of accuracy, clinical reliability and readability of LLM-based chatbot responses in prosthetic dentistry FAQs”, Anatolian Curr Med J / ACMJ / acmj, vol. 8, no. 3, pp. 513–523, May 2026, [Online]. Available: https://izlik.org/JA49FE48DR
ISNAD
Tartuk, Bülent Kadir - Altıntaş, Eyyüp. “Evaluation of Accuracy, Clinical Reliability and Readability of LLM-Based Chatbot Responses in Prosthetic Dentistry FAQs”. Anatolian Current Medical Journal 8/3 (May 1, 2026): 513-523. https://izlik.org/JA49FE48DR.
JAMA
1.Tartuk BK, Altıntaş E. Evaluation of accuracy, clinical reliability and readability of LLM-based chatbot responses in prosthetic dentistry FAQs. Anatolian Curr Med J / ACMJ / acmj. 2026;8:513–523.
MLA
Tartuk, Bülent Kadir, and Eyyüp Altıntaş. “Evaluation of Accuracy, Clinical Reliability and Readability of LLM-Based Chatbot Responses in Prosthetic Dentistry FAQs”. Anatolian Current Medical Journal, vol. 8, no. 3, May 2026, pp. 513-2, https://izlik.org/JA49FE48DR.
Vancouver
1.Bülent Kadir Tartuk, Eyyüp Altıntaş. Evaluation of accuracy, clinical reliability and readability of LLM-based chatbot responses in prosthetic dentistry FAQs. Anatolian Curr Med J / ACMJ / acmj [Internet]. 2026 May 1;8(3):513-2. Available from: https://izlik.org/JA49FE48DR

 

TR DİZİN ULAKBİM and International Indexes (1b)
 

Interuniversity Board (UAK) Equivalency:  Article published in Ulakbim TR Index journal [10 POINTS], and Article published in other (excuding 1a, b, c) international indexed journal (1d) [5 POINTS]

Note: Our journal is not WOS indexed and therefore is not classified as Q.

You can download Council of Higher Education (CoHG) [Yüksek Öğretim Kurumu (YÖK)] Criteria) decisions about predatory/questionable journals and the author's clarification text and journal charge policy from your browser. https://dergipark.org.tr/tr/journal/3449/file/4924/show

 

Journal Indexes and Platforms: 

TR Dizin ULAKBİM, Google Scholar, Crossref, Worldcat (OCLC), DRJI, EuroPub, OpenAIRE, Turkiye Citation Index, Turk Medline, ROAD, ICI World of Journal's, Index Copernicus, ASOS Index, General Impact Factor, Scilit.


 

The indexes of the journal's are;


 

download?token=eyJhdXRoX3JvbGVzIjpbXSwiZW5kcG9pbnQiOiJqb3VybmFsIiwib3JpZ2luYWxuYW1lIjoiVHJfSW5kZXhfbG9nby5wbmciLCJwYXRoIjoiMDFiOS82MmZhLzA3MzMvNjlkZjNlNTdhMmI4ZjkuODYxMzMxMjQucG5nIiwiZXhwIjoxNzc2MjQxNzY3LCJub25jZSI6ImQyMTQ4MjdiNTg1ZjVmMGQwYzAzZTMxNzMwM2QwMThmIn0.RmnGvwR536HdIoKpGO-ApytZ5aRPRT_BFXE2EpGSIqc

asos-index.png
 
f9ab67f.png
 
WorldCat_Logo_H_Color.png
 

 

18596download?token=eyJhdXRoX3JvbGVzIjpbXSwiZW5kcG9pbnQiOiJqb3VybmFsIiwib3JpZ2luYWxuYW1lIjoiT3BlbkFpcmUuanBnIiwicGF0aCI6IjUyMWYvZjljYy8wMDk3LzY5ZGYzZDNiYmVkZGU0LjQzNDM2OTU3LmpwZyIsImV4cCI6MTc3NjI0MTQ4NCwibm9uY2UiOiIwYjgxZDE2NzRiNzhjMWQyOGVmMDM1OTA1MzI5NjdjZiJ9.xeFppR1ubA4i-dHG-u07ht9bQNogFheXQjLyEaP9GgAimages?q=tbn:ANd9GcQgDnBwx0yUPRKuetgIurtELxYERFv20CPAUcPe4jYrrJiwXzac8rGXlzd57gl8iikb1Tk&usqp=CAU

 

84039476_619085835534619_7808805634291269632_n.jpg

 

 

 

The platforms of the journal's are;
 

COPE.jpg
 
images?q=tbn:ANd9GcTbq2FM8NTdXECzlOUCeKQ1dvrISFL-LhxhC7zy1ZQeJk-GGKSx2XkWQvrsHxcfhtfHWxM&usqp=CAUicmje_1_orig.png
 
 
ncbi.png
 
ORCID_logo.pngimages?q=tbn:ANd9GcQlwX77nfpy3Bu9mpMBZa0miWT2sRt2zjAPJKg2V69ODTrjZM1nT1BbhWzTVPsTNKJMZzQ&usqp=CAU
 

 

images?q=tbn:ANd9GcTaWSousoprPWGwE-qxwxGH2y0ByZ_zdLMN-Oq93MsZpBVFOTfxi9uXV7tdr39qvyE-U0I&usqp=CAU
 


 


 

 


 


The indexes/platforms of the journal are;
 

TR Dizin Ulakbim, Crossref (DOI), Google Scholar, EuroPub, Directory of Research Journal İndexing (DRJI), Worldcat (OCLC), OpenAIRE, ASOS Index, ROAD, Turkiye Citation Index, ICI World of Journal's, Index Copernicus, Turk Medline, General Impact Factor, Scilit 
 


Journal articles are evaluated as "Double-Blind Peer Review"

 

All articles published in this journal are licensed under a Creative Commons Attribution 4.0 International License (CC BY NC ND)