DIAGNOSTIC PERFORMANCE OF CHATGPT, GEMINI, AND DEEPSEEK IN CLINICAL DECISION SUPPORT: A COMPARATIVE ANALYSIS
Öz
Objective: This study aimed to compare the diagnostic performance of three large language model (LLM)-based artificial intelligence (AI) tools-ChatGPT-4, Gemini 2.0 Flash, and DeepSeek-V3-in supporting initial clinical decision-making using standardized clinical scenarios.
Methods: A total of 36 clinical scenarios were selected based on diagnostic algorithms from the Guide to Diagnostic Tests (7th ed.), representing five major clinical domains. For each scenario, only the first decision step of the relevant diagnostic algorithm was assessed. All questions were presented in Turkish and entered once, using identical prompts, into the publicly available free versions of the three models. Responses were evaluated using a three-point categorical accuracy system (completely correct, partially correct, incorrect).
Results: ChatGPT achieved the highest total score (40/72), followed by Gemini and DeepSeek (36/72 each); however, this difference was not statistically significant (p>0.05). ChatGPT provided completely correct responses in 36.1% of scenarios, compared with 33.3% for Gemini and 22.2% for DeepSeek. Overlapping patterns of fully correct responses were observed between ChatGPT and Gemini, although this did not reach statistical significance (p>0.05). Performance varied by category: Gemini excelled in electrolyte disorders, ChatGPT in infectious and systemic conditions, and DeepSeek showed parity only in endocrinology and hematology.
Conclusion: While all models showed some diagnostic potential, none reached a level of accuracy sufficient to replace clinical judgment. However, when used for the initial step of diagnostic reasoning based on limited clinical information, these models may offer supportive value to clinicians, particularly when integrated into broader clinical decision-support systems.
Anahtar Kelimeler
Destekleyen Kurum
Etik Beyan
Teşekkür
Kaynakça
- 1. Sanli DET, Sanli AN, Buyukdereli Atadag Y, Kurt A, Esmerer E. GPT-4o and Specialized AI in Breast Ultrasound Imaging: A Comparative Study on Accuracy, Agreement, Limitations, and Diagnostic Potential. J Ultrasound Med. 2025;44(11):1993-2004. doi:10.1002/jum.16749
- 2. Ranji SR. Large language models-misdiagnosing diagnostic excellence? JAMA Netw Open. 2024;7(10):e2440901. doi:10.1001/jamanetworkopen.2024.40901
- 3. Hager P, Jungmann F, Holland R, et al. Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nat Med. 2024;30(9):2613-2622. doi:10.1038/s41591-024-03097-1
- 4. Austad B, Hetlevik I, Mjølstad BP, Helvik AS. Applying clinical guidelines in general practice: a qualitative study of potential complications. BMC Fam Pract. 2016;17:92. doi:10.1186/s12875-016-0490-3.
- 5. Corrao S, Argano C. Rethinking clinical decision-making to improve clinical reasoning. Front Med (Lausanne). 2022;9:900543. doi:10.3389/fmed.2022.900543
- 6. Wu X, Cai G, Guo B, et al. A multi-dimensional performance evaluation of large language models in dental implantology: comparison of ChatGPT, DeepSeek, Grok, Gemini and Qwen across diverse clinical scenarios. BMC Oral Health. 2025;25(1):1272. Published 2025 Jul 28. doi:10.1186/s12903-025-06619-6
- 7. Meo SA, Abukhalaf FA, ElToukhy RA, Sattar K. Exploring the role of DeepSeek-R1, ChatGPT-4, and Google Gemini in medical education: How valid and reliable are they?. Pak J Med Sci. 2025;41(7):1887-1892. doi:10.12669/pjms.41.7.12183
- 8. Lee S, Jung S, Park JH, Cho H, Moon S, Ahn S. Performance of ChatGPT, Gemini and DeepSeek for non-critical triage support using real-world conversations in emergency department. BMC Emerg Med. 2025;25(1):176. Published 2025 Sep 1. doi:10.1186/s12873-025-01337-2
Ayrıntılar
Birincil Dil
İngilizce
Konular
Aile Hekimliği
Bölüm
Kısa Rapor
Yazarlar
Umut Hilaloğlu
0009-0005-4669-5607
Türkiye
Erken Görünüm Tarihi
10 Mayıs 2026
Yayımlanma Tarihi
1 Haziran 2026
Gönderilme Tarihi
23 Ekim 2025
Kabul Tarihi
23 Mart 2026
Yayımlandığı Sayı
Yıl 2026 Cilt: 20 Sayı: 2