DIAGNOSTIC PERFORMANCE OF CHATGPT, GEMINI, AND DEEPSEEK IN CLINICAL DECISION SUPPORT: A COMPARATIVE ANALYSIS
Abstract
Objective: This study aimed to compare the diagnostic performance of three large language model (LLM)-based artificial intelligence (AI) tools-ChatGPT-4, Gemini 2.0 Flash, and DeepSeek-V3-in supporting initial clinical decision-making using standardized clinical scenarios.
Methods: A total of 36 clinical scenarios were selected based on diagnostic algorithms from the Guide to Diagnostic Tests (7th ed.), representing five major clinical domains. For each scenario, only the first decision step of the relevant diagnostic algorithm was assessed. All questions were presented in Turkish and entered once, using identical prompts, into the publicly available free versions of the three models. Responses were evaluated using a three-point categorical accuracy system (completely correct, partially correct, incorrect).
Results: ChatGPT achieved the highest total score (40/72), followed by Gemini and DeepSeek (36/72 each); however, this difference was not statistically significant (p>0.05). ChatGPT provided completely correct responses in 36.1% of scenarios, compared with 33.3% for Gemini and 22.2% for DeepSeek. Overlapping patterns of fully correct responses were observed between ChatGPT and Gemini, although this did not reach statistical significance (p>0.05). Performance varied by category: Gemini excelled in electrolyte disorders, ChatGPT in infectious and systemic conditions, and DeepSeek showed parity only in endocrinology and hematology.
Conclusion: While all models showed some diagnostic potential, none reached a level of accuracy sufficient to replace clinical judgment. However, when used for the initial step of diagnostic reasoning based on limited clinical information, these models may offer supportive value to clinicians, particularly when integrated into broader clinical decision-support systems.
Keywords
Supporting Institution
Ethical Statement
Thanks
References
- 1. Sanli DET, Sanli AN, Buyukdereli Atadag Y, Kurt A, Esmerer E. GPT-4o and Specialized AI in Breast Ultrasound Imaging: A Comparative Study on Accuracy, Agreement, Limitations, and Diagnostic Potential. J Ultrasound Med. 2025;44(11):1993-2004. doi:10.1002/jum.16749
- 2. Ranji SR. Large language models-misdiagnosing diagnostic excellence? JAMA Netw Open. 2024;7(10):e2440901. doi:10.1001/jamanetworkopen.2024.40901
- 3. Hager P, Jungmann F, Holland R, et al. Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nat Med. 2024;30(9):2613-2622. doi:10.1038/s41591-024-03097-1
- 4. Austad B, Hetlevik I, Mjølstad BP, Helvik AS. Applying clinical guidelines in general practice: a qualitative study of potential complications. BMC Fam Pract. 2016;17:92. doi:10.1186/s12875-016-0490-3.
- 5. Corrao S, Argano C. Rethinking clinical decision-making to improve clinical reasoning. Front Med (Lausanne). 2022;9:900543. doi:10.3389/fmed.2022.900543
- 6. Wu X, Cai G, Guo B, et al. A multi-dimensional performance evaluation of large language models in dental implantology: comparison of ChatGPT, DeepSeek, Grok, Gemini and Qwen across diverse clinical scenarios. BMC Oral Health. 2025;25(1):1272. Published 2025 Jul 28. doi:10.1186/s12903-025-06619-6
- 7. Meo SA, Abukhalaf FA, ElToukhy RA, Sattar K. Exploring the role of DeepSeek-R1, ChatGPT-4, and Google Gemini in medical education: How valid and reliable are they?. Pak J Med Sci. 2025;41(7):1887-1892. doi:10.12669/pjms.41.7.12183
- 8. Lee S, Jung S, Park JH, Cho H, Moon S, Ahn S. Performance of ChatGPT, Gemini and DeepSeek for non-critical triage support using real-world conversations in emergency department. BMC Emerg Med. 2025;25(1):176. Published 2025 Sep 1. doi:10.1186/s12873-025-01337-2
Details
Primary Language
English
Subjects
Family Medicine
Journal Section
Short Report
Authors
Umut Hilaloğlu
0009-0005-4669-5607
Türkiye
Early Pub Date
May 10, 2026
Publication Date
June 1, 2026
Submission Date
October 23, 2025
Acceptance Date
March 23, 2026
Published in Issue
Year 2026 Volume: 20 Number: 2