Research Article

AI-Assisted Knowledge Assessment: Comparison of ChatGPT and Gemini on Undescended Testicle in Children

Volume: 5 Number: 3 September 23, 2025
EN TR

AI-Assisted Knowledge Assessment: Comparison of ChatGPT and Gemini on Undescended Testicle in Children

Abstract

Aim: This study aimed to evaluate the accuracy and completeness of ChatGPT-4 and Google Gemini in answering questions about undescended testis, as these AI tools can sometimes provide seemingly accurate but incorrect information, raising caution in medical applications. Methods: Researchers created 20 identical questions independently and submitted them to both ChatGPT-4 and Google Gemini.A pediatrician and a pediatric surgeon evaluated the responses for accuracy, using the Johnson et al. scale (accuracy rated from 1 to 6 and completeness from 1 to 3).Responses that lacked content received a score of 0. Statistical analyses were performed using R Software (version 4.3.1) to assess differences in accuracy and consistency between the tools. Results: Both chatbots answered all questions, with ChatGPT achieving a median accuracy score of 5.5 and a mean score of 5.35, while Google Gemini had a median score of 6 and a mean of 5.5. Completeness was similar, with ChatGPT scoring a median of 3 and Google Gemini showing comparable performance. Conclusion: ChatGPT and Google Gemini showed comparable accuracy and completeness; however, inconsistencies between accuracy and completeness suggest these AI tools require refinement.Regular updates are essential to improve the reliability of AI-generated medical information on UDT and ensure up-to-date, accurate responses.

Keywords

References

  1. 1. Patil NS, Huang RS, van der Pol CB, Larocque N. Comparative Performance of ChatGPT and Bard in a Text-Based Radiology Knowledge Assessment. Can Assoc Radiol J. 2024;75(2):344-50.
  2. 2. Haid B, Rein P, Oswald J. Undescended testes: Diagnostic Algorithm and Treatment. Eur Urol Focus. 2017;3(2-3):155-7.
  3. 3. Bradshaw CJ, Corbet-Burcher G, Hitchcock R. Age at orchidopexy in the UK: has new evidence changed practice? J Pediatr Urol. 2014;10(4):758-62.
  4. 4. Kolon TF, Herndon CD, Baker LA, Baskin LS, Baxter CG, Cheng EY, et al. Evaluation and treatment of cryptorchidism: AUA guideline. J Urol. 2014;192(2):337-45.
  5. 5. Promm M, Dittrich A, Brandstetter S, Fill-Malfertheiner S, Melter M, Seelbach-Göbel B, et al. Evaluation of Undescended Testes in Newborns: It Is Really Simple, Just Not Easy. Urol Int. 2021;105(11-12):1034-8.
  6. 6. Holland AJ, Nassar N, Schneuer FJ. Undescended testes: an update. Curr Opin Pediatr. 2016;28(3):388-94.
  7. 7. Batra NV, DeMarco RT, Bayne CE. A narrative review of the history and evidence-base for the timing of orchidopexy for cryptorchidism. J Pediatr Urol. 2021;17(2):239-45.
  8. 8. Giannakopoulos K, Kavadella A, Aaqel Salim A, Stamatopoulos V, Kaklamanos EG. Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study. J Med Internet Res. 2023;25:e51580.

Details

Primary Language

English

Subjects

Clinical Sciences (Other)

Journal Section

Research Article

Publication Date

September 23, 2025

Submission Date

July 10, 2025

Acceptance Date

August 11, 2025

Published in Issue

Year 2025 Volume: 5 Number: 3