Research Article
BibTex RIS Cite

Chatgpt Vs. Google Gemini: Assessment of Performance Regarding the Accuracy and Repeatability of Responses to Questions in Implant-Supported Prostheses

Year 2025, Volume: 52 Issue: 2, 71 - 78, 31.08.2025
https://doi.org/10.52037/eads.2025.0011

Abstract

Purpose: This study aimed to determine the accuracy and repeatability of the responses of different large language models to questions regarding implant-supported prostheses and assess the impact of pre-prompt utilization and the time of day.
Materials & Methods: A total of 12 open-ended questions related to implant-supported prostheses were generated and the content validity of the questions was verified by a specialist. Following that, questions were posed to 2 different LLMs: ChatGPT-4.0 and Google Gemini (morning, afternoon, evening; with and without pre-prompt). The responses were evaluated by two expert prosthodontists with a holistic rubric; the concordance between the graders' responses and repeated responses by C and G software programs was calculated with the Brennan and Prediger coefficient, Cohen kappa coefficient, Fleiss kappa, and Krippendorff alpha coefficients. Kruskal-Wallis, Mann-Whitney U, independent t-test, and ANOVA analyses were used to compare the responses obtained in the implementations.
Results: The results showed that the accuracy of ChatGPT and Google Gemini was 34.7% and 17.4%, respectively. The implementation of pre-prompt significantly increased accuracy in Gemini (p = 0.026). No significant difference was found according to the time of day (morning, afternoon, evening) or inter-week implementations. In addition, inter-rater reliability and repeatability showed high levels of consistency.
Conclusion: The use of pre-prompt positively affected accuracy and repeatability in both ChatGPT and Google Gemini. However, LLMs can still produce hallucinations. Therefore, LLMs may assist clinicians but they should be aware of these limitations.
Keywords: Chatbot, ChatGPT, Prostheses and Implant.

Ethical Statement

None

Supporting Institution

None

Thanks

None

References

  • Eggmann F, Blatz MB. ChatGPT: Chances and Challenges for Dentistry. Compend Contin Educ Dent. 2023;44(4):220–224.
  • Omiye JA, Gui H, Rezaei SJ, Zou J, Daneshjou R. Large Language Models in Medicine: The Potentials and Pitfalls: A Narrative Review. Ann Intern Med. 2024;177(2):210–220. doi:10.7326/m23-2772.
  • Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, et al. Large language models encode clinical knowledge. Nature. 2023;620(7972):172–180. doi:10.1038/s41586-023-06291-2.
  • Wei Q, Yao Z, Cui Y, Wei B, Jin Z, Xu X. Evaluation of ChatGPT-generated medical responses: A systematic review and meta-analysis. J Biomed Inform. 2024;151:104620. doi:10.1016/j.jbi.2024.104620.
  • Khan B, Fatima H, Qureshi A, Kumar S, Hanan A, Hussain J, et al. Drawbacks of Artificial Intelligence and Their Potential Solutions in the Healthcare Sector. Biomed Mater Devices. 2023:1–8. doi:10.1007/s44174-023-00063-2.
  • Chatzopoulos GS, Koidou VP, Tsalikis L, Kaklamanos EG. Large language models in periodontology: Assessing their performance in clinically relevant questions. J Prosthet Dent. 2024. doi:10.1016/j.prosdent.2024.10.020.
  • Schwendicke F, Samek W, Krois J. Artificial Intelligence in Dentistry: Chances and Challenges. J Dent Res. 2020;99(7):769–774. doi:10.1177/0022034520915714.
  • Gheisarifar M, Shembesh M, Koseoglu M, Fang Q, Afshari FS, Yuan JC, et al. Evaluating the validity and consistency of artificial intelligence chatbots in responding to patients’ frequently asked questions in prosthodontics. J Prosthet Dent. 2025;134(1):199–206. doi:10.1016/j.prosdent.2025.03.009.
  • Sadowsky SJ. Can ChatGPT be trusted as a resource for a scholarly article on treatment planning implant-supported prostheses? J Prosthet Dent. 2025. doi:10.1016/j.prosdent.2025.03.025.
  • Singi SR, Sathe S, Reche AR, Sibal A, Mantri N. Extended Arm of Precision in Prosthodontics: Artificial Intelligence. Cureus. 2022;14(11):e30962. doi:10.7759/cureus.30962.
  • Revilla-León M, Gómez-Polo M, Vyas S, Barmak AB, Gallucci GO, Att W, et al. Artificial intelligence models for tooth-supported fixed and removable prosthodontics: A systematic review. J Prosthet Dent. 2023;129(2):276–292. doi:10.1016/j.prosdent.2021.06.001.
  • Freire Y, Santamaría Laorden A, Orejas Pérez J, Gómez Sánchez M, Díaz-Flores García V, Suárez A. ChatGPT performance in prosthodontics: Assessment of accuracy and repeatability in answer generation. J Prosthet Dent. 2024;131(4):659.e1–659.e6. doi:10.1016/j.prosdent.2024.01.018.
  • Alkaissi H, McFarlane SI. Artificial Hallucinations in ChatGPT: Implications in Scientific Writing. Cureus. 2023;15(2):e35179. doi:10.7759/cureus.35179.
  • Eggmann F, Weiger R, Zitzmann NU, Blatz MB. Implications of large language models such as ChatGPT for dental medicine. J Esthet Restor Dent. 2023;35(7):1098–1102. doi:10.1111/jerd.13046.
  • Stroop A, Stroop T, Zawy Alsofy S, Wegner M, Nakamura M, Stroop R. Assessing GPT-4’s accuracy in answering clinical pharmacological questions on pain therapy. Br J Clin Pharmacol. 2025. doi:10.1002/bcp.70036.
  • Hosseini M, Gao CA, Liebovitz DM, Carvalho AM, Ahmad FS, Luo Y, et al. An exploratory survey about using ChatGPT in education, healthcare, and research. PLoS One. 2023;18(10):e0292216. doi:10.1371/journal.pone.0292216.
  • Sallam M. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare (Basel). 2023;11(6). doi:10.3390/healthcare11060887.
  • Ali R, Tang OY, Connolly ID, Fridley JS, Shin JH, Zadnik Sullivan PL, et al. Performance of ChatGPT, GPT-4, and Google Bard on a Neurosurgery Oral Boards Preparation Question Bank. Neurosurgery. 2023;93(5):1090–1098. doi:10.1227/neu.0000000000002551.
  • Taymour N, Fouda SM, Abdelrahaman HH, Hassan MG. Performance of the ChatGPT-3.5, ChatGPT-4, and Google Gemini large language models in responding to dental implantology inquiries. J Prosthet Dent. 2025. doi:10.1016/j.prosdent.2024.12.016.
  • Tokgöz Kaplan T, Cankar M. Evidence-Based Potential of Generative Artificial Intelligence Large Language Models on Dental Avulsion: ChatGPT Versus Gemini. Dent Traumatol. 2025;41(2):178–186. doi:10.1111/edt.12999.
  • Barrington NM, Gupta N, Musmar B, Doyle D, Panico N, Godbole N, et al. A Bibliometric Analysis of the Rise of ChatGPT in Medical Research. Med Sci (Basel). 2023;11(3). doi:10.3390/medsci11030061.
  • Google. Google Gemini: Next-generation Model [Web Page]; 2024. Available from: https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/ .
  • Sanderson K. GPT-4 is here: what scientists think. Nature. 2023;615(7954):773. doi:10.1038/d41586-023-00816-5.
  • Dentistry BSfR. Crowns, Fixed Bridges and Dental Implants: Guidelines. United Kingdom: British Society for Restorative Dentistry; 2013.
  • Koçak D. Investigation of Rater Tendencies and Reliability in Different Assessment Methods with Many Facet Rasch Model. International Electronic Journal of Elementary Education. 2020;12:349–358. doi:10.26822/iejee.2020459464.
  • Suárez A, Díaz-Flores García V, Algar J, Gómez Sánchez M, Llorente de Pedro M, Freire Y. Unveiling the ChatGPT phenomenon: Evaluating the consistency and accuracy of endodontic question answers. Int Endod J. 2024;57(1):108–113. doi:10.1111/iej.13985.
  • Suárez A, Jiménez J, Llorente de Pedro M, Andreu-Vázquez C, Díaz-Flores García V, Gómez Sánchez M, et al. Beyond the Scalpel: Assessing ChatGPT’s potential as an auxiliary intelligent virtual assistant in oral surgery. Comput Struct Biotechnol J. 2024;24:46–52. doi:10.1016/j.csbj.2023.11.058.
  • Gwet KL. Handbook of Inter-Rater Reliability, 4th Edition: The Definitive Guide to Measuring The Extent of Agreement Among Raters. Advanced Analytics, LLC; 2014.
  • Rewthamrongsris P, Burapacheep J, Trachoo V, Porntaveetus T. Accuracy of Large Language Models for Infective Endocarditis Prophylaxis in Dental Procedures. Int Dent J. 2025;75(1):206–212. doi:10.1016/j.identj.2024.09.033.
  • Makrygiannakis MA, Giannakopoulos K, Kaklamanos EG. Evidence-based potential of generative artificial intelligence large language models in orthodontics: a comparative study of ChatGPT, Google Bard, and Microsoft Bing. Eur J Orthod. 2024. doi:10.1093/ejo/cjae017.
  • Ozdemir ZM, Yapici E. Evaluating the Accuracy, Reliability, Consistency, and Readability of Different Large Language Models in Restorative Dentistry. J Esthet Restor Dent. 2025;37(7):1740–1752. doi:10.1111/jerd.13447.
  • Rokhshad R, Zhang P, Mohammad-Rahimi H, Pitchika V, Entezari N, Schwendicke F. Accuracy and consistency of chatbots versus clinicians for answering pediatric dentistry questions: A pilot study. J Dent. 2024;144:104938. doi:10.1016/j.jdent.2024.104938.
  • Mago J, Sharma M. The Potential Usefulness of ChatGPT in Oral and Maxillofacial Radiology. Cureus. 2023;15(7):e42133. doi:10.7759/cureus.42133.
There are 33 citations in total.

Details

Primary Language English
Subjects Prosthodontics
Journal Section Original Research Articles
Authors

Deniz Yılmaz 0000-0003-4570-9067

Emine Dilara Çolpak 0000-0002-5334-2421

Early Pub Date August 30, 2025
Publication Date August 31, 2025
Submission Date April 18, 2025
Acceptance Date June 17, 2025
Published in Issue Year 2025 Volume: 52 Issue: 2

Cite

Vancouver Yılmaz D, Çolpak ED. Chatgpt Vs. Google Gemini: Assessment of Performance Regarding the Accuracy and Repeatability of Responses to Questions in Implant-Supported Prostheses. EADS. 2025;52(2):71-8.