A Comparative Study on the Question-Answering Proficiency of Artificial Intelligence Models in Bladder-Related Conditions: An Evaluation of Gemini and ChatGPT 4.o

Mustafa Azizoğlu; Sergey Klyuev

doi:10.37990/medr.1601528

Araştırma Makalesi

Yıl 2025, Cilt: 7 Sayı: 1, 201 - 205, 15.01.2025

Mustafa Azizoğlu , Sergey Klyuev

https://doi.org/10.37990/medr.1601528

Öz

Kaynakça

Demir S. Evaluation of responses to questions about keratoconus using ChatGPT-4.0, Google Gemini and Microsoft Copilot: a comparative study of large language models on Keratoconus. Eye Contact Lens. 2024 Dec 4. doi: 10.1097/ICL.0000000000001158. [Epub ahead of print].
Sun SH, Chen K, Anavim S, et al. Large language models with vision on diagnostic radiology board exam style questions. Acad Radiol. 2024 Dec 3. doi: 10.1016/j.acra.2024.11.028. [Epub ahead of print].
Galvis-García E, Vega-González FJ, Emura F, et al. Inteligencia artificial en la colonoscopia de tamizaje y la disminución del error. Cir Cir. 2023;91:411-21.
De Busser B, Roth L, De Loof H. The role of large language models in self-care: a study and benchmark on medicines and supplement guidance accuracy. Int J Clin Pharm. 2024 Dec 7. doi: 10.1007/s11096-024-01839-2. [Epub ahead of print].
Ardila CM, Yadalam PK. ChatGPT's influence on dental education: methodological challenges and ethical considerations. Int Dent J. 2024 Dec 6. doi: 10.1016/j.identj.2024.11.014. [Epub ahead of print].
Meo AS, Shaikh N, Meo SA. Assessing the accuracy and efficiency of Chat GPT-4 Omni (GPT-4o) in biomedical statistics: Comparative study with traditional tools. Saudi Med J. 2024;45:1383-90.
Chen Y, Huang X, Yang F, et al. Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study. BMC Med Educ. 2024;24:1372.
Bilgin IA, Percem AK, Aslan O. Artificial intelligence and robotic surgery in colorectal cancer surgery. J Clin Trials Exp Investig. 2024;3:83-4.
Yılmaz M. Revolutionizing laboratory medicine: the critical role of artificial intelligence and deep learning: Artificial intelligence and medical laboratory. The Injector. 2024;3:39-40.
Maraqa N, Samargandi R, Poichotte A, et al. Comparing performances of french orthopaedic surgery residents with the artificial intelligence ChatGPT-4/4o in the French diploma exams of orthopaedic and trauma surgery. Orthop Traumatol Surg Res. 2024 Dec 4. doi: 10.1016/j.otsr.2024.104080. [Epub ahead of print].
Giorgino R, Alessandri-Bonetti M, Luca A, et al. ChatGPT in orthopedics: a narrative review exploring the potential of artificial intelligence in orthopedic practice. Front Surg. 2023;10:1284015.
D'Agostino M, Feo F, Martora F, et al. ChatGPT and dermatology. Ital J Dermatol Venerol. 2024;159:566-71.
Chen TC, Multala E, Kearns P, et al. Assessment of ChatGPT's performance on neurology written board examination questions. BMJ Neurol Open. 2023;5:e000530.
Karakas C, Brock D, Lakhotia A. Leveraging ChatGPT in the pediatric neurology clinic: practical considerations for use to improve efficiency and outcomes. Pediatr Neurol. 2023;148:157-63.
OpenAI, Achiam J, Adler S, et al. GPT-4 technical report. arXiv. 2023 Mar 15. doi: 10.48550/arXiv.2303.08774. [Preprint posted online].
Jin HK, Kim E. Performance of GPT-3.5 and GPT-4 on the Korean pharmacist licensing examination: comparison study. JMIR Med Educ. 2024;10:e57451.
Ulus SA. How does ChatGPT perform on the European board of orthopedics and traumatology examination? A comparative study. Academic Journal of Health Sciences. 2023;38:43-6.
Greif C, Mpunga N, Koopman IV, et al. Evaluating the effectiveness of ChatGPT4 in the diagnosis and workup of dermatologic conditions. Dermatol Online J. 2024;30. doi: 10.5070/D330464104.
Azizoglu M, Aydogdu B. How does ChatGPT perform on the European Board of Pediatric Surgery examination? A randomized comparative study. Academic Journal of Health Sciences. 2024;39:23-6.
Robinson EJ, Qiu C, Sands S, et al. Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians. World J Urol. 2024;43:48.
Zong H, Wu R, Cha J, et al. Large Language Models in worldwide medical exams: platform development and comprehensive analysis. J Med Internet Res. 2024;26:e66114.

A Comparative Study on the Question-Answering Proficiency of Artificial Intelligence Models in Bladder-Related Conditions: An Evaluation of Gemini and ChatGPT 4.o

Yıl 2025, Cilt: 7 Sayı: 1, 201 - 205, 15.01.2025

Mustafa Azizoğlu , Sergey Klyuev

https://doi.org/10.37990/medr.1601528

Öz

Aim: The rapid evolution of artificial intelligence (AI) has revolutionized medicine, with tools like ChatGPT and Google Gemini enhancing clinical decision-making. ChatGPT's advancements, particularly with GPT-4, show promise in diagnostics and education. However, variability in accuracy and limitations in complex scenarios emphasize the need for further evaluation of these models in medical applications. This study aimed to assess the accuracy and agreement between ChatGPT 4.o and Gemini AI in identifying bladder-related conditions, including neurogenic bladder, vesicoureteral reflux (VUR), and posterior urethral valve (PUV).
Material and Method: This study, conducted in October 2024, compared ChatGPT 4.o and Gemini AI's accuracy on 51 questions about neurogenic bladder, VUR, and PUV. Questions, randomly selected from pediatric surgery and urology materials, were evaluated using accuracy metrics and statistical analysis, highlighting AI models' performance and agreement.
Results: ChatGPT 4.o and Gemini AI demonstrated similar accuracy across neurogenic bladder, VUR, and PUV questions, with true response rates of 66.7% and 68.6%, respectively, and no statistically significant differences (p>0.05). Combined accuracy across all topics was 67.6%. Strong inter-rater reliability (κ=0.87) highlights their agreement.
Conclusion: This study highlights the comparable accuracy of ChatGPT-4.o and Gemini AI across key bladder-related conditions, with no significant differences in performance.

Anahtar Kelimeler

ChatGPT , Gemini , articifial intelligence , bladder

Kaynakça

Demir S. Evaluation of responses to questions about keratoconus using ChatGPT-4.0, Google Gemini and Microsoft Copilot: a comparative study of large language models on Keratoconus. Eye Contact Lens. 2024 Dec 4. doi: 10.1097/ICL.0000000000001158. [Epub ahead of print].
Sun SH, Chen K, Anavim S, et al. Large language models with vision on diagnostic radiology board exam style questions. Acad Radiol. 2024 Dec 3. doi: 10.1016/j.acra.2024.11.028. [Epub ahead of print].
Galvis-García E, Vega-González FJ, Emura F, et al. Inteligencia artificial en la colonoscopia de tamizaje y la disminución del error. Cir Cir. 2023;91:411-21.
De Busser B, Roth L, De Loof H. The role of large language models in self-care: a study and benchmark on medicines and supplement guidance accuracy. Int J Clin Pharm. 2024 Dec 7. doi: 10.1007/s11096-024-01839-2. [Epub ahead of print].
Ardila CM, Yadalam PK. ChatGPT's influence on dental education: methodological challenges and ethical considerations. Int Dent J. 2024 Dec 6. doi: 10.1016/j.identj.2024.11.014. [Epub ahead of print].
Meo AS, Shaikh N, Meo SA. Assessing the accuracy and efficiency of Chat GPT-4 Omni (GPT-4o) in biomedical statistics: Comparative study with traditional tools. Saudi Med J. 2024;45:1383-90.
Chen Y, Huang X, Yang F, et al. Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study. BMC Med Educ. 2024;24:1372.
Bilgin IA, Percem AK, Aslan O. Artificial intelligence and robotic surgery in colorectal cancer surgery. J Clin Trials Exp Investig. 2024;3:83-4.
Yılmaz M. Revolutionizing laboratory medicine: the critical role of artificial intelligence and deep learning: Artificial intelligence and medical laboratory. The Injector. 2024;3:39-40.
Maraqa N, Samargandi R, Poichotte A, et al. Comparing performances of french orthopaedic surgery residents with the artificial intelligence ChatGPT-4/4o in the French diploma exams of orthopaedic and trauma surgery. Orthop Traumatol Surg Res. 2024 Dec 4. doi: 10.1016/j.otsr.2024.104080. [Epub ahead of print].
Giorgino R, Alessandri-Bonetti M, Luca A, et al. ChatGPT in orthopedics: a narrative review exploring the potential of artificial intelligence in orthopedic practice. Front Surg. 2023;10:1284015.
D'Agostino M, Feo F, Martora F, et al. ChatGPT and dermatology. Ital J Dermatol Venerol. 2024;159:566-71.
Chen TC, Multala E, Kearns P, et al. Assessment of ChatGPT's performance on neurology written board examination questions. BMJ Neurol Open. 2023;5:e000530.
Karakas C, Brock D, Lakhotia A. Leveraging ChatGPT in the pediatric neurology clinic: practical considerations for use to improve efficiency and outcomes. Pediatr Neurol. 2023;148:157-63.
OpenAI, Achiam J, Adler S, et al. GPT-4 technical report. arXiv. 2023 Mar 15. doi: 10.48550/arXiv.2303.08774. [Preprint posted online].
Jin HK, Kim E. Performance of GPT-3.5 and GPT-4 on the Korean pharmacist licensing examination: comparison study. JMIR Med Educ. 2024;10:e57451.
Ulus SA. How does ChatGPT perform on the European board of orthopedics and traumatology examination? A comparative study. Academic Journal of Health Sciences. 2023;38:43-6.
Greif C, Mpunga N, Koopman IV, et al. Evaluating the effectiveness of ChatGPT4 in the diagnosis and workup of dermatologic conditions. Dermatol Online J. 2024;30. doi: 10.5070/D330464104.
Azizoglu M, Aydogdu B. How does ChatGPT perform on the European Board of Pediatric Surgery examination? A randomized comparative study. Academic Journal of Health Sciences. 2024;39:23-6.
Robinson EJ, Qiu C, Sands S, et al. Physician vs. AI-generated messages in urology: evaluation of accuracy, completeness, and preference by patients and physicians. World J Urol. 2024;43:48.
Zong H, Wu R, Cha J, et al. Large Language Models in worldwide medical exams: platform development and comprehensive analysis. J Med Internet Res. 2024;26:e66114.

Toplam 21 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Çocuk Ürolojisi
Bölüm	Araştırma Makalesi
Yazarlar	Mustafa Azizoğlu 0009-0000-3563-1230 Sergey Klyuev 0000-0002-3217-6874
Gönderilme Tarihi	14 Aralık 2024
Kabul Tarihi	10 Ocak 2025
Yayımlanma Tarihi	15 Ocak 2025
Yayımlandığı Sayı	Yıl 2025 Cilt: 7 Sayı: 1

Kaynak Göster

AMA	Azizoğlu M, Klyuev S. A Comparative Study on the Question-Answering Proficiency of Artificial Intelligence Models in Bladder-Related Conditions: An Evaluation of Gemini and ChatGPT 4.o. Med Records. Ocak 2025;7(1):201-205. doi:10.37990/medr.1601528

Makale Dosyaları

Tam Metin

Chief Editors
Prof. Dr. Berkant Özpolat, MD
Department of Thoracic Surgery, Ufuk University, Dr. Rıdvan Ege Hospital, Ankara, Türkiye

Editors
Prof. Dr. Sercan Okutucu, MD
Department of Cardiology, Ankara Lokman Hekim University, Ankara, Türkiye

Assoc. Prof. Dr. Süleyman Cebeci, MD
Department of Ear, Nose and Throat Diseases, Gazi University Faculty of Medicine, Ankara, Türkiye

Field Editors
Assoc. Prof. Dr. Doğan Öztürk, MD
Department of General Surgery, Manisa Özel Sarıkız Hospital, Manisa, Türkiye

Assoc. Prof. Dr. Birsen Doğanay, MD
Department of Cardiology, Ankara Bilkent City Hospital, Ankara, Türkiye

Assoc. Prof. Dr. Sonay Aydın, MD
Department of Radiology, Erzincan Binali Yıldırım University Faculty of Medicine, Erzincan, Türkiye

Language Editors
PhD, Dr. Evin Mise
Department of Work Psychology, Ankara University, Ayaş Vocational School, Ankara, Türkiye

Dt. Çise Nazım
Department of Periodontology, Dr. Burhan Nalbantoğlu State Hospital, Lefkoşa, North Cyprus

Statistics Editor
Dr. Nurbanu Bursa, PhD
Department of Statistics, Hacettepe University, Faculty of Science, Ankara, Türkiye

Scientific Publication Coordinator
Kübra Toğlu
argistyayincilik@gmail.com

Franchise Owner
Argist Yayıncılık
argistyayincilik@gmail.com

Publisher: Argist Yayıncılık
E-mail: argistyayincilik@gmail.com

Phone: 0312 979 0235
GSM: 0533 320 3209

Address: Kızılırmak Mahallesi Dumlupınar Bulvarı No:3 C-1 160 Çankaya/Ankara, Türkiye
Web: www.argistyayin.com.tr