Araştırma Makalesi

A Comparative Study on the Question-Answering Proficiency of Artificial Intelligence Models in Bladder-Related Conditions: An Evaluation of Gemini and ChatGPT 4.o

Cilt: 7 Sayı: 1 15 Ocak 2025
PDF İndir
EN

A Comparative Study on the Question-Answering Proficiency of Artificial Intelligence Models in Bladder-Related Conditions: An Evaluation of Gemini and ChatGPT 4.o

Öz

Aim: The rapid evolution of artificial intelligence (AI) has revolutionized medicine, with tools like ChatGPT and Google Gemini enhancing clinical decision-making. ChatGPT's advancements, particularly with GPT-4, show promise in diagnostics and education. However, variability in accuracy and limitations in complex scenarios emphasize the need for further evaluation of these models in medical applications. This study aimed to assess the accuracy and agreement between ChatGPT 4.o and Gemini AI in identifying bladder-related conditions, including neurogenic bladder, vesicoureteral reflux (VUR), and posterior urethral valve (PUV). Material and Method: This study, conducted in October 2024, compared ChatGPT 4.o and Gemini AI's accuracy on 51 questions about neurogenic bladder, VUR, and PUV. Questions, randomly selected from pediatric surgery and urology materials, were evaluated using accuracy metrics and statistical analysis, highlighting AI models' performance and agreement. Results: ChatGPT 4.o and Gemini AI demonstrated similar accuracy across neurogenic bladder, VUR, and PUV questions, with true response rates of 66.7% and 68.6%, respectively, and no statistically significant differences (p>0.05). Combined accuracy across all topics was 67.6%. Strong inter-rater reliability (κ=0.87) highlights their agreement. Conclusion: This study highlights the comparable accuracy of ChatGPT-4.o and Gemini AI across key bladder-related conditions, with no significant differences in performance.

Anahtar Kelimeler

Kaynakça

  1. Demir S. Evaluation of responses to questions about keratoconus using ChatGPT-4.0, Google Gemini and Microsoft Copilot: a comparative study of large language models on Keratoconus. Eye Contact Lens. 2024 Dec 4. doi: 10.1097/ICL.0000000000001158. [Epub ahead of print].
  2. Sun SH, Chen K, Anavim S, et al. Large language models with vision on diagnostic radiology board exam style questions. Acad Radiol. 2024 Dec 3. doi: 10.1016/j.acra.2024.11.028. [Epub ahead of print].
  3. Galvis-García E, Vega-González FJ, Emura F, et al. Inteligencia artificial en la colonoscopia de tamizaje y la disminución del error. Cir Cir. 2023;91:411-21.
  4. De Busser B, Roth L, De Loof H. The role of large language models in self-care: a study and benchmark on medicines and supplement guidance accuracy. Int J Clin Pharm. 2024 Dec 7. doi: 10.1007/s11096-024-01839-2. [Epub ahead of print].
  5. Ardila CM, Yadalam PK. ChatGPT's influence on dental education: methodological challenges and ethical considerations. Int Dent J. 2024 Dec 6. doi: 10.1016/j.identj.2024.11.014. [Epub ahead of print].
  6. Meo AS, Shaikh N, Meo SA. Assessing the accuracy and efficiency of Chat GPT-4 Omni (GPT-4o) in biomedical statistics: Comparative study with traditional tools. Saudi Med J. 2024;45:1383-90.
  7. Chen Y, Huang X, Yang F, et al. Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study. BMC Med Educ. 2024;24:1372.
  8. Bilgin IA, Percem AK, Aslan O. Artificial intelligence and robotic surgery in colorectal cancer surgery. J Clin Trials Exp Investig. 2024;3:83-4.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Çocuk Ürolojisi

Bölüm

Araştırma Makalesi

Yayımlanma Tarihi

15 Ocak 2025

Gönderilme Tarihi

14 Aralık 2024

Kabul Tarihi

10 Ocak 2025

Yayımlandığı Sayı

Yıl 2025 Cilt: 7 Sayı: 1

Kaynak Göster

APA
Azizoğlu, M., & Klyuev, S. (2025). A Comparative Study on the Question-Answering Proficiency of Artificial Intelligence Models in Bladder-Related Conditions: An Evaluation of Gemini and ChatGPT 4.o. Medical Records, 7(1), 201-205. https://doi.org/10.37990/medr.1601528
AMA
1.Azizoğlu M, Klyuev S. A Comparative Study on the Question-Answering Proficiency of Artificial Intelligence Models in Bladder-Related Conditions: An Evaluation of Gemini and ChatGPT 4.o. Med Records. 2025;7(1):201-205. doi:10.37990/medr.1601528
Chicago
Azizoğlu, Mustafa, ve Sergey Klyuev. 2025. “A Comparative Study on the Question-Answering Proficiency of Artificial Intelligence Models in Bladder-Related Conditions: An Evaluation of Gemini and ChatGPT 4.o”. Medical Records 7 (1): 201-5. https://doi.org/10.37990/medr.1601528.
EndNote
Azizoğlu M, Klyuev S (01 Ocak 2025) A Comparative Study on the Question-Answering Proficiency of Artificial Intelligence Models in Bladder-Related Conditions: An Evaluation of Gemini and ChatGPT 4.o. Medical Records 7 1 201–205.
IEEE
[1]M. Azizoğlu ve S. Klyuev, “A Comparative Study on the Question-Answering Proficiency of Artificial Intelligence Models in Bladder-Related Conditions: An Evaluation of Gemini and ChatGPT 4.o”, Med Records, c. 7, sy 1, ss. 201–205, Oca. 2025, doi: 10.37990/medr.1601528.
ISNAD
Azizoğlu, Mustafa - Klyuev, Sergey. “A Comparative Study on the Question-Answering Proficiency of Artificial Intelligence Models in Bladder-Related Conditions: An Evaluation of Gemini and ChatGPT 4.o”. Medical Records 7/1 (01 Ocak 2025): 201-205. https://doi.org/10.37990/medr.1601528.
JAMA
1.Azizoğlu M, Klyuev S. A Comparative Study on the Question-Answering Proficiency of Artificial Intelligence Models in Bladder-Related Conditions: An Evaluation of Gemini and ChatGPT 4.o. Med Records. 2025;7:201–205.
MLA
Azizoğlu, Mustafa, ve Sergey Klyuev. “A Comparative Study on the Question-Answering Proficiency of Artificial Intelligence Models in Bladder-Related Conditions: An Evaluation of Gemini and ChatGPT 4.o”. Medical Records, c. 7, sy 1, Ocak 2025, ss. 201-5, doi:10.37990/medr.1601528.
Vancouver
1.Mustafa Azizoğlu, Sergey Klyuev. A Comparative Study on the Question-Answering Proficiency of Artificial Intelligence Models in Bladder-Related Conditions: An Evaluation of Gemini and ChatGPT 4.o. Med Records. 01 Ocak 2025;7(1):201-5. doi:10.37990/medr.1601528