Research Article

Accuracy and Same-Day Response Consistency of AI Chatbots on Complete Denture Multiple-Choice Questions

Volume: 4 Number: 1 April 30, 2026

Accuracy and Same-Day Response Consistency of AI Chatbots on Complete Denture Multiple-Choice Questions

Abstract

Aim To compare the accuracy and same-day response consistency of four free-tier AI chatbots on single-best-answer complete denture MCQs. Material and method Twenty-five English MCQs (A–E, single correct option) on complete denture prosthodontics were used. Four chatbots were tested via official web interfaces under default free-tier settings: ChatGPT, Claude, Gemini, and Grok. Each question was asked in a new chat to ensure zero prior context, using a standardized instruction requiring the output of only one option letter (“Answer: ”). Results Fleiss’ kappa showed significant within-day agreement for all chatbots (p<0.001), indicating non-random temporal consistency (ChatGPT κ=0.625; Claude κ=0.785; Gemini κ=0.813; Grok κ=0.693). Overall correct answer rates were 71% for Gemini, 67% for ChatGPT, 63% for Claude, and 53% for Grok. Correct response rates did not differ across morning/noon/evening for ChatGPT (p=0.607), Claude (p=0.779), or Grok (p=0.846), whereas Gemini showed a significant time-of-day effect (p=0.039), with higher evening accuracy (80%) than morning (68%) and noon (64%). Conclusion All four free-tier chatbots demonstrated significant same-day response consistency, with Gemini and Claude showing the highest agreement. Accuracy was numerically highest for Gemini and lowest for Grok, although between-tool differences were not statistically significant within the same time windows. These findings suggest that both accuracy and temporal stability should be considered when using free-tier chatbots for complete denture MCQ-based learning.

Keywords

References

  1. Hanci, V., Ergun, B., Gul, S., Uzun, O., Erdemir, I., & Hanci, F. B. (2024). Assessment of readability, reliability, and quality of ChatGPT, Bard, Gemini, Copilot, and Perplexity responses on palliative care. Medicine (Baltimore), 103(33), e39305.
  2. Revilla-Leon, M., Barmak, B. A., Sailer, I., Kois, J. C., & Att, W. (2024). Performance of an artificial intelligence-based chatbot (ChatGPT) answering the European certification in implant dentistry exam. The International Journal of Prosthodontics, 37(2), 221–224.
  3. Schwendicke, F., Samek, W., & Krois, J. (2020). Artificial intelligence in dentistry: Chances and challenges. Journal of Dental Research, 99(7), 769–774.
  4. Eraslan, R., Ayata, M., Yagci, F., & Albayrak, H. (2025). Exploring the potential of artificial intelligence chatbots in prosthodontics education. BMC Medical Education, 25(1), 321.
  5. Freire, Y., Santamaria Laorden, A., Orejas Perez, J., Gomez Sanchez, M., Diaz-Flores Garcia, V., & Suarez, A. (2024). ChatGPT performance in prosthodontics: Assessment of accuracy and repeatability in answer generation. The Journal of Prosthetic Dentistry, 131(4), 659.e1–659.e6.
  6. Suarez, A., Jimenez, J., Llorente de Pedro, M., Andreu-Vazquez, C., Diaz-Flores Garcia, V., Gomez Sanchez, M., et al. (2024). Beyond the scalpel: Assessing ChatGPT’s potential as an auxiliary intelligent virtual assistant in oral surgery. Computational and Structural Biotechnology Journal, 24, 46–52.
  7. Ayers, J. W., Poliak, A., Dredze, M., Leas, E. C., Zhu, Z., Kelley, J. B., et al. (2023). Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Internal Medicine, 183(6), 589–596.
  8. Safi, Z., Abd-Alrazaq, A., Khalifa, M., & Househ, M. (2020). Technical aspects of developing chatbots for medical applications: Scoping review. Journal of Medical Internet Research, 22(12), e19127.

Details

Primary Language

English

Subjects

Prosthodontics

Journal Section

Research Article

Publication Date

April 30, 2026

Submission Date

March 16, 2026

Acceptance Date

March 20, 2026

Published in Issue

Year 2026 Volume: 4 Number: 1

APA
Dilber, E., Sönmez, U. B., Ağlarcı, A. V., & Yıldız Domaniç, K. (2026). Accuracy and Same-Day Response Consistency of AI Chatbots on Complete Denture Multiple-Choice Questions. Eurasian Dental Research, 4(1), 7-12. https://doi.org/10.62243/edr.1910763
AMA
1.Dilber E, Sönmez UB, Ağlarcı AV, Yıldız Domaniç K. Accuracy and Same-Day Response Consistency of AI Chatbots on Complete Denture Multiple-Choice Questions. EDR. 2026;4(1):7-12. doi:10.62243/edr.1910763
Chicago
Dilber, Erhan, Umut Baran Sönmez, Ali Vasfi Ağlarcı, and Kübra Yıldız Domaniç. 2026. “Accuracy and Same-Day Response Consistency of AI Chatbots on Complete Denture Multiple-Choice Questions”. Eurasian Dental Research 4 (1): 7-12. https://doi.org/10.62243/edr.1910763.
EndNote
Dilber E, Sönmez UB, Ağlarcı AV, Yıldız Domaniç K (April 1, 2026) Accuracy and Same-Day Response Consistency of AI Chatbots on Complete Denture Multiple-Choice Questions. Eurasian Dental Research 4 1 7–12.
IEEE
[1]E. Dilber, U. B. Sönmez, A. V. Ağlarcı, and K. Yıldız Domaniç, “Accuracy and Same-Day Response Consistency of AI Chatbots on Complete Denture Multiple-Choice Questions”, EDR, vol. 4, no. 1, pp. 7–12, Apr. 2026, doi: 10.62243/edr.1910763.
ISNAD
Dilber, Erhan - Sönmez, Umut Baran - Ağlarcı, Ali Vasfi - Yıldız Domaniç, Kübra. “Accuracy and Same-Day Response Consistency of AI Chatbots on Complete Denture Multiple-Choice Questions”. Eurasian Dental Research 4/1 (April 1, 2026): 7-12. https://doi.org/10.62243/edr.1910763.
JAMA
1.Dilber E, Sönmez UB, Ağlarcı AV, Yıldız Domaniç K. Accuracy and Same-Day Response Consistency of AI Chatbots on Complete Denture Multiple-Choice Questions. EDR. 2026;4:7–12.
MLA
Dilber, Erhan, et al. “Accuracy and Same-Day Response Consistency of AI Chatbots on Complete Denture Multiple-Choice Questions”. Eurasian Dental Research, vol. 4, no. 1, Apr. 2026, pp. 7-12, doi:10.62243/edr.1910763.
Vancouver
1.Erhan Dilber, Umut Baran Sönmez, Ali Vasfi Ağlarcı, Kübra Yıldız Domaniç. Accuracy and Same-Day Response Consistency of AI Chatbots on Complete Denture Multiple-Choice Questions. EDR. 2026 Apr. 1;4(1):7-12. doi:10.62243/edr.1910763