Artificial Intelligence in Pediatric Urology: Accuracy and Consistency of ChatGPT's Responses on Hypospadias

Emre Kandemir; Mehmet Sarıkaya

Research Article

Pediatrik Ürolojide Yapay Zeka: ChatGPT'nin Hipospadias Konusundaki Yanıtlarının Doğruluğu ve Tutarlılığı

Year 2025, Volume: 15 Issue: 5, 1 - 5

Abstract

Amaç: Bu çalışma, yaygın bir konjenital ürolojik durum olan hipospadias ile ilgili sık sorulan sorulara ChatGPT (GPT-4-turbo) yanıtlarının doğruluğunu ve tekrarlanabilirliğini değerlendirmeyi amaçlamıştır. Yapay zeka (AI) hasta eğitimine giderek daha fazla entegre hale geldikçe, hassas ve klinik olarak ilgili bilgileri sağlamadaki güvenilirliği ampirik araştırmayı gerektirmektedir.
Gereç ve Yöntemler: Hipospadias hakkında sıkça sorulan sorular, pediatrik üroloji derneği web sitelerinden, halk sağlığı portallarından ve sosyal medya platformlarından derlenmiştir. Sorular beş kategoride sınıflandırıldı: genel bilgi, tanı, tedavi, takip ve kılavuza dayalı öneriler. Mükerrer, belirsiz veya öznel sorular elendikten sonra 97 benzersiz soru ChatGPT'ye girilmiştir. İki bağımsız pediatrik ürolog yanıtları dört puanlık bir ölçekte (1 = tamamen doğru, 4 = tamamen yanlış) değerlendirdi ve yanıtlar tekrarlanabilirliği değerlendirmek için ayrı cihazlarda tekrarlandı.
Bulgular: 97 yanıtın %87,6'sı tamamen doğru, %7,2'si doğru ancak yetersiz, %4,1'i kısmen yanıltıcı ve %1,0'ı tamamen yanlış olarak derecelendirildi. En yüksek doğru cevap oranı tanı ve takip kategorilerinde gözlenirken (%90,0), tedavi ile ilgili sorular biraz daha düşük doğruluk oranı göstermiştir (%86,7). Kılavuza dayalı sorular vakaların %87,5'inde doğru yanıtlanmıştır. Tüm kategorilerdeki genel tekrarlanabilirlik %91,7 olup, en yüksek tutarlılık tanısal yanıtlarda görülmüştür.
Sonuç: ChatGPT, özellikle tanı ve genel bilgi alanlarında olmak üzere hipospadias ile ilgili hasta merkezli soruları yanıtlamada yüksek doğruluk ve tekrarlanabilirlik göstermiştir. Bununla birlikte, tedaviyle ilgili içerikteki değişkenlik ve referans vermedeki sınırlamalar dikkatli yorumlamanın önemini vurgulamaktadır. Yapay zeka pediatrik ürolojide tamamlayıcı bir eğitim aracı olarak hizmet edebilirken, güvenli ve güvenilir bilgi yayılımını sağlamak için klinik gözetim gerekli olmaya devam etmektedir.

Keywords

Hipospadias , Yapay zeka , ChatGpt , Pediyatrik üroloji

Ethical Statement

Bu çalışma, kamuya açık ve anonimleştirilmiş sorulara bir yapay zeka modeli (ChatGPT) tarafından üretilen yanıtların analizini içerdiğinden ve herhangi bir insan katılımcıyı, hasta verilerini veya tanımlanabilir kişisel bilgileri içermediğinden, kurumsal ve uluslararası araştırma etik yönergelerine uygun olarak etik onay gerekmemiştir.

References

1- Gabrielson AT, Galansky L, Shneyderman M, Cohen AJ. The Impact of Hypogonadism on Surgical Outcomes Following Primary Urethroplasty: Analysis of a Large Multi-institutional Database. Urology. 2024;185:116-23.
2- Wang F, Casalino LP, Khullar D. Deep Learning in Medicine-Promise, Progress, and Challenges. JAMA Intern Med. 2019;179(3):293-294.
3- Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44-56.
4- Baskin LS, Ebbers MB. Hypospadias: anatomy, etiology, and technique. J Pediatr Surg. 2006;41(3):463-472.
5- Spinoit AF, Poelaert F, Van Praet C, Groen LA, Van Laecke E, Hoebeke P. Grade of hypospadias is the only factor predicting for re-intervention after primary hypospadias repair: a multivariate analysis from a cohort of 474 patients. J Pediatr Urol. 2015;11(2):70.e1-70.e706.
6- Spinoit AF, Poelaert F, Van Praet C, Groen LA, Van Laecke E, Hoebeke P. Grade of hypospadias is the only factor predicting for re-intervention after primary hypospadias repair: a multivariate analysis from a cohort of 474 patients. J Pediatr Urol. 2015;11(2):70.e1-70.e706.
7- Betschart P, Pratsinis M, Müllhaupt G, et al. Information on surgical treatment of benign prostatic hyperplasia on YouTube is highly biased and misleading. BJU Int. 2020;125(4):595-601.
8- Alsyouf M, Stokes P, Hur D, Amasyali A, Ruckle H, Hu B. 'Fake News' in urology: evaluating the accuracy of articles shared on social media in genitourinary malignancies. BJU Int. 2019;124(4):701-706.
9- Sarikaya M, Ozcan Siki F, Ciftci I. Use of Artificial Intelligence in Vesicoureteral Reflux Disease: A Comparative Study of Guideline Compliance. J Clin Med. 2025;14(7):2378.
10- Nguyen DD, Trinh QD, Cole AP, et al. Impact of health literacy on shared decision making for prostate-specific antigen screening in the United States. Cancer. 2021;127(2):249-256.
11- Checcucci E, Verri P, Amparore D, et al. Generative Pre-training Transformer Chat (ChatGPT) in the scientific community: the train has left the station. Minerva Urol Nephrol. 2023;75(2):131-133.
12- Shiferaw MW, Zheng T, Winter A, Mike LA, Chan LN. Assessing the accuracy and quality of artificial intelligence (AI) chatbot-generated responses in making patient-specific drug-therapy and healthcare-related decisions. BMC Med Inform Decis Mak. 2024;24(1):404.
13- Abbas, T. O., AbdelMoniem, M., Khalil, I., Hossain, M. S. A., & Chowdhury, M. E. (2022). Deep Learning based Automatic Quantification of Urethral Plate Quality using the Plate Objective Scoring Tool (POST). arXiv preprint arXiv:2209.13848.
14- Jeblick K, Schachtner B, Dexl J, et al. ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports. Eur Radiol. 2024;34(5):2817-2825.

Artificial Intelligence in Pediatric Urology: Accuracy and Consistency of ChatGPT's Responses on Hypospadias

Year 2025, Volume: 15 Issue: 5, 1 - 5

Emre Kandemir , Mehmet Sarıkaya

Abstract

Aim: This study aimed to evaluate the accuracy and reproducibility of ChatGPT (GPT-4-turbo) responses to frequently asked questions regarding hypospadias, a common congenital urological condition. As artificial intelligence (AI) becomes increasingly integrated into patient education, its reliability in delivering sensitive and clinically relevant information warrants empirical investigation.
Materials and Methods: Frequently asked questions about hypospadias were compiled from pediatric urology association websites, public health portals, and social media platforms. Questions were classified into five categories: general information, diagnosis, treatment, follow-up, and guideline-based recommendations. After excluding duplicate, vague, or subjective questions, 97 unique items were entered into ChatGPT. Two independent pediatric urologists rated the answers on a four-point scale (1 = completely correct, 4 = completely incorrect), and responses were repeated on separate devices to assess reproducibility.
Results: Of the 97 responses, 87.6% were graded as completely correct, 7.2% as correct but insufficient, 4.1% as partially misleading, and 1.0% as completely incorrect. The highest rate of accurate answers was observed in the diagnosis and follow-up categories (90.0%), while treatment-related questions showed slightly lower accuracy (86.7%). Guideline-based questions were answered correctly in 87.5% of cases. Overall reproducibility across all categories was 91.7%, with the highest consistency in diagnostic responses.
Conclusions: ChatGPT demonstrated high accuracy and reproducibility in answering patient-centered questions related to hypospadias, particularly in diagnosis and general information domains. However, variability in treatment-related content and limitations in referencing highlight the importance of cautious interpretation. While AI may serve as a supplementary educational tool in pediatric urology, clinical oversight remains essential to ensure safe and reliable information dissemination.

Keywords

Hypospadias , Artificial intelligence , ChatGpt , Pediatric urology

Ethical Statement

Since this study involved the analysis of responses generated by an artificial intelligence model (ChatGPT) to publicly available and anonymized questions, and did not include any human participants, patient data, or identifiable personal information, ethical approval was not required in accordance with institutional and international research ethics guidelines.

References

1- Gabrielson AT, Galansky L, Shneyderman M, Cohen AJ. The Impact of Hypogonadism on Surgical Outcomes Following Primary Urethroplasty: Analysis of a Large Multi-institutional Database. Urology. 2024;185:116-23.
2- Wang F, Casalino LP, Khullar D. Deep Learning in Medicine-Promise, Progress, and Challenges. JAMA Intern Med. 2019;179(3):293-294.
3- Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44-56.
4- Baskin LS, Ebbers MB. Hypospadias: anatomy, etiology, and technique. J Pediatr Surg. 2006;41(3):463-472.
5- Spinoit AF, Poelaert F, Van Praet C, Groen LA, Van Laecke E, Hoebeke P. Grade of hypospadias is the only factor predicting for re-intervention after primary hypospadias repair: a multivariate analysis from a cohort of 474 patients. J Pediatr Urol. 2015;11(2):70.e1-70.e706.
6- Spinoit AF, Poelaert F, Van Praet C, Groen LA, Van Laecke E, Hoebeke P. Grade of hypospadias is the only factor predicting for re-intervention after primary hypospadias repair: a multivariate analysis from a cohort of 474 patients. J Pediatr Urol. 2015;11(2):70.e1-70.e706.
7- Betschart P, Pratsinis M, Müllhaupt G, et al. Information on surgical treatment of benign prostatic hyperplasia on YouTube is highly biased and misleading. BJU Int. 2020;125(4):595-601.
8- Alsyouf M, Stokes P, Hur D, Amasyali A, Ruckle H, Hu B. 'Fake News' in urology: evaluating the accuracy of articles shared on social media in genitourinary malignancies. BJU Int. 2019;124(4):701-706.
9- Sarikaya M, Ozcan Siki F, Ciftci I. Use of Artificial Intelligence in Vesicoureteral Reflux Disease: A Comparative Study of Guideline Compliance. J Clin Med. 2025;14(7):2378.
10- Nguyen DD, Trinh QD, Cole AP, et al. Impact of health literacy on shared decision making for prostate-specific antigen screening in the United States. Cancer. 2021;127(2):249-256.
11- Checcucci E, Verri P, Amparore D, et al. Generative Pre-training Transformer Chat (ChatGPT) in the scientific community: the train has left the station. Minerva Urol Nephrol. 2023;75(2):131-133.
12- Shiferaw MW, Zheng T, Winter A, Mike LA, Chan LN. Assessing the accuracy and quality of artificial intelligence (AI) chatbot-generated responses in making patient-specific drug-therapy and healthcare-related decisions. BMC Med Inform Decis Mak. 2024;24(1):404.
13- Abbas, T. O., AbdelMoniem, M., Khalil, I., Hossain, M. S. A., & Chowdhury, M. E. (2022). Deep Learning based Automatic Quantification of Urethral Plate Quality using the Plate Objective Scoring Tool (POST). arXiv preprint arXiv:2209.13848.
14- Jeblick K, Schachtner B, Dexl J, et al. ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports. Eur Radiol. 2024;34(5):2817-2825.

There are 14 citations in total.

Details

Primary Language	English
Subjects	Pediatric Urology
Journal Section	Original Research
Authors	Emre Kandemir 0000-0002-9601-8007 Mehmet Sarıkaya 0000-0003-2453-0893
Publication Date	September 27, 2025
Submission Date	July 13, 2025
Acceptance Date	September 2, 2025
Published in Issue	Year 2025 Volume: 15 Issue: 5

Cite

AMA	Kandemir E, Sarıkaya M. Artificial Intelligence in Pediatric Urology: Accuracy and Consistency of ChatGPT’s Responses on Hypospadias. J Contemp Med. 15(5):1-5.

Download Cover Image

Article Files

Full Text