How much can large language models of Artificial Intelligence inform patients about urodynamics? A comparative analysis

Çağrı Doğan; Mehmet Fatih Şahin

doi:10.38053/acmj.1836376

EN TR

How much can large language models of Artificial Intelligence inform patients about urodynamics? A comparative analysis

Abstract

Aims: To evaluate and compare the readability and informational quality of current large language models (LLMs) in providing patient information about urodynamics (UD) testing. Methods: This cross-sectional study, conducted on October 1, 2025, analyzed five widely used LLMs-ChatGPT-5, Gemini 2.5 Pro, Grok 4, Deepseek v3.1, and Microsoft Copilot. The top 25 UD-related keywords, excluding six of them, searched on Google Trends (2004-2025), were entered into each chatbot using identical prompts. Outputs were independently evaluated using the Quality Analysis of Medical Artificial Intelligence (QAMAI) and DISCERN instruments to evaluate text quality and reliability, while Flesch-Kincaid Reading Ease (FKRE) and Grade Level (FKGL) indices measured readability. Additionally, each LLM was asked to generate a visual depiction of a UD setting to assess the educational potential of AI-based multimodal content. Results: The evaluated LLMs showed significant differences in readability and informational quality (p=0.001). Gemini achieved the highest FKRE score (49.0±8.4) and the lowest FKGL (9.4±1.3), indicating superior readability. Deepseek achieved the highest QAMAI (27.7±1.5) and DISCERN (71.5±6.4) scores, indicating superior quality and reliability. Copilot demonstrated lower readability and consistency scores compared with the other evaluated models. AI-generated visualizations of UD settings (using Gemini, GPT-5, Grok, Copilot, and DALL-E) effectively depicted the main components of the procedures. Conclusion: LLMs show significant variability in the quality, accuracy, and readability of UD-related patient information. Deepseek delivered the most accurate and structured content, whereas Gemini provided the most understanable language. Continuous validation, guideline-based fine-tuning, and expert supervision are essential before AI chatbots can be reliably adopted in patient education and urology practice.

Keywords

Yapay zekâ tabanlı büyük dil modelleri ürodinami konusunda hastaları ne ölçüde bilgilendirebilir? Karşılaştırmalı bir analiz

Abstract

Amaç: Ürodinami (ÜD) testi hakkında hasta bilgilendirmesi sağlama konusunda mevcut yapay zeka (YZ) destekli büyük dil modellerinin (LLM’ler) okunabilirlik ve bilgi kalitesini değerlendirmek ve karşılaştırmak. Yöntem: 1 Ekim 2025 tarihinde yürütülen bu kesitsel çalışmada, beş yaygın kullanılan LLM (ChatGPT-5, Gemini 2.5 Pro, Grok 4, Deepseek v3.1 ve Microsoft Copilot) analiz edildi. Google Trends’te (2004–2025) aranan ÜD ile ilişkili anahtar kelimerden ilk 25’i (altı tanesi dışlanarak) her bir sohbet robotuna aynı komutlarla girildi. Çıktılar, metin kalitesini ve güvenilirliğini değerlendirmek için Tıbbi Yapay Zekânın Kalite Analizi (QAMAI) ve DISCERN araçlarıyla bağımsız olarak değerlendirildi; okunabilirlik içinse Flesch-Kincaid Okuma Kolaylığı (FKRE) ve Okuma Düzeyi (FKGL) indeksleri kullanıldı. Ayrıca, her LLM’den bir ÜD ortamını görsel olarak tasvir etmesi istendi ve bu üretimler, yapay zekâ temelli çok modlu içeriklerin eğitsel potansiyelini değerlendirmek amacıyla incelendi. Bulgular: Hem okunabilirlik hem de kalite parametrelerinde modeller arasında anlamlı farklılıklar bulundu (p = 0.001). Gemini en yüksek FKRE (49.0 ± 8.4) ve en düşük FKGL (9.4 ± 1.3) skorlarına ulaşarak en iyi okunabilirliği sağladı. Deepseek ise en yüksek QAMAI (27.7 ± 1.5) ve DISCERN (71.5 ± 6.4) skorlarını elde ederek genel kalite ve güvenilirlik açısından en üstün sonuçları verdi. Buna karşılık Copilot, en düşük okunabilirlik ve tutarlılığa sahip çıktılar üretti. Gemini, GPT-5, Grok, Copilot ve DALL-E tarafından oluşturulan ÜD ortamı görselleri, prosedürün ana bileşenlerini etkili biçimde yansıttı. Sonuç: LLM’ler, ÜD ile ilgili hasta bilgilendirme metinlerinin kalitesi, doğruluğu ve okunabilirliği açısından önemli değişkenlik göstermektedir. Deepseek en doğru ve yapılandırılmış içeriği üretirken, Gemini en anlaşılır dili sağlamıştır. Bu nedenle, YZ sohbet robotlarının hasta eğitimi ve üroloji pratiğinde güvenle kullanılabilmesi için sürekli doğrulama, kılavuz temelli düzenlemeler ve uzman denetimi gereklidir.

Keywords

Supporting Institution

None

Ethical Statement

No ethical approval was needed because this is not a human study, but only online information was used.

References

Lenherr SM, Clemens JQ. Urodynamics: with a focus on appropriate indications. Urol Clin North Am. 2013;40(4):545-557. doi:10.1016/j.ucl. 2013.07.001
Heesakkers JP, Gerretsen RR. Urinary incontinence: sphincter functioning from a urological perspective. Digestion. 2004;69(2):93-101. doi:10.1159/000077875
Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med. 2023;29(8):1930-1940. doi:10.1038/s41591-023-02448-8
Biswas SS. Role of Chat GPT in public health. Ann Biomed Eng. 2023;51 (5):868-869. doi:10.1007/s10439-023-03172-7
Davis R, Eppler M, Ayo-Ajibola O, et al. Evaluating the effectiveness of Artificial Intelligence-powered large language models application in disseminating appropriate and readable health information in urology. J Urol. 2023;210(4):688-694. doi:10.1097/JU.0000000000003615
Schardt, D. ChatGPT is amazing. But beware its hallucinations. Center for Science in the Public Interest. 2023.
Temel MH, Erden Y, Bağcıer F. Information Quality and Readability: ChatGPT's Responses to the Most Common Questions About Spinal Cord Injury. World Neurosurg. 2024;181: e1138-e1144. doi:10.1016/j.wneu.2023.11.062
Vaira LA, Lechien JR, Abbate V, et al. Validation of the Quality Analysis of Medical Artificial Intelligence (QAMAI) tool: a new tool to assess the quality of health information provided by AI platforms. Eur Arch Otorhinolaryngol. 2024;281(11):6123-6131. doi:10.1007/s00405-024-08710-0

Yüksel G, Gürkan S. Evaluation of the performance of current Artificial Intelligence Chatbots regarding patient information after coronary artery bypass surgery. J Health Sci Medicine. 2025; 8(5): 879-883. doi:10. 32322/jhsm.1752483
Sonmezoglu BG, Sonmezoglu HI. Comparative analysis of AI chatbots Chat GPT, Gemini, and Copilot's answers to common cataract questions. Pakistan J Ophthalmology. 2024;40(4). doi:10.36351/pjo.v40i 4.1887
Ayers JW, Poliak A, Dredze M, et al. Comparing physician and Artificial Intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med. 2023;183(6):589-596. doi:10. 1001/jamainternmed.2023.1838
Patel M, Bhide AA, Digesu GA, et al. Best practice for videourodynamics: a teaching module of the International Continence Society Urodynamics Committee. Continence. 2024; 9:101212. doi:10.1016/j.cont.2024.101212
Cei F, Cacciamani GE. Re: assessment of Artificial Intelligence chatbot responses to top searched queries about cancer. Eur Urol. 2024;86(3):278-279. doi:10.1016/j.eururo.2024.03.033
Sahin MF, Akgül M, Akpınar Ç, et al. What do the current popular Artificial Intelligence chatbots offer us regarding patient information? Comparison of responses from the ten most popular chatbots about bladder cancer. J Cancer Surviv. 2025. doi:10.1007/s11764-025-01921-2
Santucci J, Stapleton P, Ibrahim J, Johns-Putra L, Elmer S, Sathianathen N. Quality of patient information on interstitial cystitis from Artificial Intelligence chatbots. BJU Int. 2025. doi:10.1111/bju.70035
Sönmezoğlu Hİ, Güner Sönmezoğlu B, Temel MH, et al. Comprehensibility and readability of selected Artificial Intelligence chatbots in providing uveitis-related information. Medicine (Baltimore). 2025;104(43): e45135. doi:10.1097/MD.0000000000045135

Details

Primary Language

English

Subjects

Urology

Journal Section

Research Article

Authors

Çağrı Doğan
0000-0001-9681-2473
Türkiye

Mehmet Fatih Şahin ^*
0000-0002-0926-3005
Türkiye

Publication Date

March 10, 2026

Submission Date

December 4, 2025

Acceptance Date

February 2, 2026

Published in Issue

Year 2026 Volume: 8 Number: 2

DOI

https://doi.org/10.38053/acmj.1836376

IZ

https://izlik.org/JA95LP68CP

Cite

RIS / Bibtex

APA

Doğan, Ç., & Şahin, M. F. (2026). How much can large language models of Artificial Intelligence inform patients about urodynamics? A comparative analysis. Anatolian Current Medical Journal, 8(2), 218-223. https://doi.org/10.38053/acmj.1836376

AMA

1.Doğan Ç, Şahin MF. How much can large language models of Artificial Intelligence inform patients about urodynamics? A comparative analysis. Anatolian Curr Med J / ACMJ / acmj. 2026;8(2):218-223. doi:10.38053/acmj.1836376

Chicago

Doğan, Çağrı, and Mehmet Fatih Şahin. 2026. “How Much Can Large Language Models of Artificial Intelligence Inform Patients about Urodynamics? A Comparative Analysis”. Anatolian Current Medical Journal 8 (2): 218-23. https://doi.org/10.38053/acmj.1836376.

EndNote

Doğan Ç, Şahin MF (March 1, 2026) How much can large language models of Artificial Intelligence inform patients about urodynamics? A comparative analysis. Anatolian Current Medical Journal 8 2 218–223.

IEEE

[1]Ç. Doğan and M. F. Şahin, “How much can large language models of Artificial Intelligence inform patients about urodynamics? A comparative analysis”, Anatolian Curr Med J / ACMJ / acmj, vol. 8, no. 2, pp. 218–223, Mar. 2026, doi: 10.38053/acmj.1836376.

ISNAD

Doğan, Çağrı - Şahin, Mehmet Fatih. “How Much Can Large Language Models of Artificial Intelligence Inform Patients about Urodynamics? A Comparative Analysis”. Anatolian Current Medical Journal 8/2 (March 1, 2026): 218-223. https://doi.org/10.38053/acmj.1836376.

JAMA

1.Doğan Ç, Şahin MF. How much can large language models of Artificial Intelligence inform patients about urodynamics? A comparative analysis. Anatolian Curr Med J / ACMJ / acmj. 2026;8:218–223.

MLA

Doğan, Çağrı, and Mehmet Fatih Şahin. “How Much Can Large Language Models of Artificial Intelligence Inform Patients about Urodynamics? A Comparative Analysis”. Anatolian Current Medical Journal, vol. 8, no. 2, Mar. 2026, pp. 218-23, doi:10.38053/acmj.1836376.

Vancouver

1.Çağrı Doğan, Mehmet Fatih Şahin. How much can large language models of Artificial Intelligence inform patients about urodynamics? A comparative analysis. Anatolian Curr Med J / ACMJ / acmj. 2026 Mar. 1;8(2):218-23. doi:10.38053/acmj.1836376