Accuracy and Same-Day Response Consistency of AI Chatbots on Complete Denture Multiple-Choice Questions

Erhan Dilber; Umut Baran Sönmez; Ali Vasfi Ağlarcı; Kübra Yıldız Domaniç

doi:10.62243/edr.1910763

Accuracy and Same-Day Response Consistency of AI Chatbots on Complete Denture Multiple-Choice Questions

Abstract

Aim To compare the accuracy and same-day response consistency of four free-tier AI chatbots on single-best-answer complete denture MCQs. Material and method Twenty-five English MCQs (A–E, single correct option) on complete denture prosthodontics were used. Four chatbots were tested via official web interfaces under default free-tier settings: ChatGPT, Claude, Gemini, and Grok. Each question was asked in a new chat to ensure zero prior context, using a standardized instruction requiring the output of only one option letter (“Answer: ”). Results Fleiss’ kappa showed significant within-day agreement for all chatbots (p<0.001), indicating non-random temporal consistency (ChatGPT κ=0.625; Claude κ=0.785; Gemini κ=0.813; Grok κ=0.693). Overall correct answer rates were 71% for Gemini, 67% for ChatGPT, 63% for Claude, and 53% for Grok. Correct response rates did not differ across morning/noon/evening for ChatGPT (p=0.607), Claude (p=0.779), or Grok (p=0.846), whereas Gemini showed a significant time-of-day effect (p=0.039), with higher evening accuracy (80%) than morning (68%) and noon (64%). Conclusion All four free-tier chatbots demonstrated significant same-day response consistency, with Gemini and Claude showing the highest agreement. Accuracy was numerically highest for Gemini and lowest for Grok, although between-tool differences were not statistically significant within the same time windows. These findings suggest that both accuracy and temporal stability should be considered when using free-tier chatbots for complete denture MCQ-based learning.

Keywords

References

Hanci, V., Ergun, B., Gul, S., Uzun, O., Erdemir, I., & Hanci, F. B. (2024). Assessment of readability, reliability, and quality of ChatGPT, Bard, Gemini, Copilot, and Perplexity responses on palliative care. Medicine (Baltimore), 103(33), e39305.
Revilla-Leon, M., Barmak, B. A., Sailer, I., Kois, J. C., & Att, W. (2024). Performance of an artificial intelligence-based chatbot (ChatGPT) answering the European certification in implant dentistry exam. The International Journal of Prosthodontics, 37(2), 221–224.
Schwendicke, F., Samek, W., & Krois, J. (2020). Artificial intelligence in dentistry: Chances and challenges. Journal of Dental Research, 99(7), 769–774.
Eraslan, R., Ayata, M., Yagci, F., & Albayrak, H. (2025). Exploring the potential of artificial intelligence chatbots in prosthodontics education. BMC Medical Education, 25(1), 321.
Freire, Y., Santamaria Laorden, A., Orejas Perez, J., Gomez Sanchez, M., Diaz-Flores Garcia, V., & Suarez, A. (2024). ChatGPT performance in prosthodontics: Assessment of accuracy and repeatability in answer generation. The Journal of Prosthetic Dentistry, 131(4), 659.e1–659.e6.
Suarez, A., Jimenez, J., Llorente de Pedro, M., Andreu-Vazquez, C., Diaz-Flores Garcia, V., Gomez Sanchez, M., et al. (2024). Beyond the scalpel: Assessing ChatGPT’s potential as an auxiliary intelligent virtual assistant in oral surgery. Computational and Structural Biotechnology Journal, 24, 46–52.
Ayers, J. W., Poliak, A., Dredze, M., Leas, E. C., Zhu, Z., Kelley, J. B., et al. (2023). Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Internal Medicine, 183(6), 589–596.
Safi, Z., Abd-Alrazaq, A., Khalifa, M., & Househ, M. (2020). Technical aspects of developing chatbots for medical applications: Scoping review. Journal of Medical Internet Research, 22(12), e19127.

Suarez, A., Diaz-Flores Garcia, V., Algar, J., Gomez Sanchez, M., Llorente de Pedro, M., & Freire, Y. (2024). Unveiling the ChatGPT phenomenon: Evaluating the consistency and accuracy of endodontic question answers. International Endodontic Journal, 57(1), 108–113.
Danesh, A., Pazouki, H., Danesh, K., Danesh, F., & Danesh, A. (2023). The performance of artificial intelligence language models in board-style dental knowledge assessment: A preliminary study on ChatGPT. Journal of the American Dental Association, 154(11), 970–974.
Gheisarifar, M., Shembesh, M., Koseoglu, M., Fang, Q., Afshari, F. S., Yuan, J. C., et al. (2025). Evaluating the validity and consistency of artificial intelligence chatbots in responding to patients' frequently asked questions in prosthodontics. The Journal of Prosthetic Dentistry, 134(1), 199–206.
Warwas, F. B., & Heim, N. (2025). Performance of GPT-4 in oral and maxillofacial surgery board exams: Challenges in specialized questions. Oral and Maxillofacial Surgery, 29(1), 113.
Ittarat, M., Cheungpasitporn, W., & Chansangpetch, S. (2023). Personalized care in eye health: Exploring opportunities, challenges, and the road ahead for chatbots. Journal of Personalized Medicine, 13(12).
Tosun, B., & Yilmaz, Z. S. (2025). Comparison of artificial intelligence systems in answering prosthodontics questions from the dental specialty exam in Turkey. Journal of Dental Sciences, 20(3), 1454–1459.
Salem, M., Karasan, D., Revilla-Leon, M., Barmak, A. B., & Sailer, I. (2025). Performance of artificial intelligence-based chatbots (ChatGPT-3.5 and ChatGPT-4.0) answering the International Team of Implantology exam questions. Journal of Esthetic and Restorative Dentistry, 37(11), 2412–2416.
Rokhshad, R., Zhang, P., Mohammad-Rahimi, H., Pitchika, V., Entezari, N., & Schwendicke, F. (2024). Accuracy and consistency of chatbots versus clinicians for answering pediatric dentistry questions: A pilot study. Journal of Dentistry, 144, 104938.
Shirani, M. (2025). Comparing the performance of ChatGPT-4o, DeepSeek R1, and Gemini 2 Pro in answering fixed prosthodontics questions over time. The Journal of Prosthetic Dentistry. Advance online publication.
Çalıkkocaoğlu, S. (2013). Prosthetic treatment of edentulous patients: Conventional complete dentures (6th ed.). İstanbul, Türkiye: Quintessence Yayıncılık.
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.
Ayata, M., & Albayrak, H. (2025). İntraoral maksillofasiyal protez sorularında yapay zeka tabanlı sohbet robotlarının doğruluk ve tutarlılığının değerlendirilmesi. Akdeniz Diş Hekimliği Dergisi, 4(3), 204–211.
Yılmaz, D., & Çolpak, E. D. (2025). ChatGPT vs. Google Gemini: Assessment of performance regarding the accuracy and repeatability of responses to questions in implant-supported prostheses. European Annals of Dental Sciences, 52(2), 71–78.
Taymour, N., Fouda, S. M., Abdelrahaman, H. H., & Hassan, M. G. (2025). Performance of ChatGPT-3.5, ChatGPT-4, and Google Gemini large language models in responding to dental implantology inquiries. The Journal of Prosthetic Dentistry, 134(6), 2427–2434.
Esmailpour, H., Rasaie, V., Babaee Hemmati, Y., & Falahchai, M. (2025). Performance of artificial intelligence chatbots in responding to frequently asked questions of patients regarding dental prostheses. BMC Oral Health, 25(1), 574.
Francis, L., S, D. V., Viswambharan, P., & Nair, V. V. (2025). Comparative evaluation of accuracy, completeness, and readability of common patient queries related to prosthodontic treatment by two artificial intelligence models (ChatGPT-4o and Gemini). Cureus, 17(12), e98458.
Revilla-Leon, M., Gomez-Polo, M., Vyas, S., Barmak, A. B., Gallucci, G. O., Att, W., et al. (2023). Artificial intelligence models for tooth-supported fixed and removable prosthodontics: A systematic review. The Journal of Prosthetic Dentistry, 129(2), 276–292.
Sadowsky, S. J. (2025). Can ChatGPT be trusted as a resource for a scholarly article on treatment planning implant-supported prostheses? The Journal of Prosthetic Dentistry, 134(2), 438–443.

Details

Primary Language

English

Subjects

Prosthodontics

Journal Section

Research Article

Authors

Erhan Dilber
0009-0003-2209-1070
Türkiye

Umut Baran Sönmez
0009-0000-3689-9802
Türkiye

Ali Vasfi Ağlarcı
0000-0002-9010-4537
Türkiye

Kübra Yıldız Domaniç ^*
0000-0002-8271-8870
Türkiye

Publication Date

April 30, 2026

Submission Date

March 16, 2026

Acceptance Date

March 20, 2026

Published in Issue

Year 2026 Volume: 4 Number: 1

DOI

https://doi.org/10.62243/edr.1910763

IZ

https://izlik.org/JA55FZ39FL

Cite

RIS / Bibtex

APA

Dilber, E., Sönmez, U. B., Ağlarcı, A. V., & Yıldız Domaniç, K. (2026). Accuracy and Same-Day Response Consistency of AI Chatbots on Complete Denture Multiple-Choice Questions. Eurasian Dental Research, 4(1), 7-12. https://doi.org/10.62243/edr.1910763

AMA

1.Dilber E, Sönmez UB, Ağlarcı AV, Yıldız Domaniç K. Accuracy and Same-Day Response Consistency of AI Chatbots on Complete Denture Multiple-Choice Questions. EDR. 2026;4(1):7-12. doi:10.62243/edr.1910763

Chicago

Dilber, Erhan, Umut Baran Sönmez, Ali Vasfi Ağlarcı, and Kübra Yıldız Domaniç. 2026. “Accuracy and Same-Day Response Consistency of AI Chatbots on Complete Denture Multiple-Choice Questions”. Eurasian Dental Research 4 (1): 7-12. https://doi.org/10.62243/edr.1910763.

EndNote

Dilber E, Sönmez UB, Ağlarcı AV, Yıldız Domaniç K (April 1, 2026) Accuracy and Same-Day Response Consistency of AI Chatbots on Complete Denture Multiple-Choice Questions. Eurasian Dental Research 4 1 7–12.

IEEE

[1]E. Dilber, U. B. Sönmez, A. V. Ağlarcı, and K. Yıldız Domaniç, “Accuracy and Same-Day Response Consistency of AI Chatbots on Complete Denture Multiple-Choice Questions”, EDR, vol. 4, no. 1, pp. 7–12, Apr. 2026, doi: 10.62243/edr.1910763.

ISNAD

Dilber, Erhan - Sönmez, Umut Baran - Ağlarcı, Ali Vasfi - Yıldız Domaniç, Kübra. “Accuracy and Same-Day Response Consistency of AI Chatbots on Complete Denture Multiple-Choice Questions”. Eurasian Dental Research 4/1 (April 1, 2026): 7-12. https://doi.org/10.62243/edr.1910763.

JAMA

1.Dilber E, Sönmez UB, Ağlarcı AV, Yıldız Domaniç K. Accuracy and Same-Day Response Consistency of AI Chatbots on Complete Denture Multiple-Choice Questions. EDR. 2026;4:7–12.

MLA

Dilber, Erhan, et al. “Accuracy and Same-Day Response Consistency of AI Chatbots on Complete Denture Multiple-Choice Questions”. Eurasian Dental Research, vol. 4, no. 1, Apr. 2026, pp. 7-12, doi:10.62243/edr.1910763.

Vancouver

1.Erhan Dilber, Umut Baran Sönmez, Ali Vasfi Ağlarcı, Kübra Yıldız Domaniç. Accuracy and Same-Day Response Consistency of AI Chatbots on Complete Denture Multiple-Choice Questions. EDR. 2026 Apr. 1;4(1):7-12. doi:10.62243/edr.1910763