Short Report

DIAGNOSTIC PERFORMANCE OF CHATGPT, GEMINI, AND DEEPSEEK IN CLINICAL DECISION SUPPORT: A COMPARATIVE ANALYSIS

Volume: 20 Number: 2 June 1, 2026
TR EN

DIAGNOSTIC PERFORMANCE OF CHATGPT, GEMINI, AND DEEPSEEK IN CLINICAL DECISION SUPPORT: A COMPARATIVE ANALYSIS

Abstract

Objective: This study aimed to compare the diagnostic performance of three large language model (LLM)-based artificial intelligence (AI) tools-ChatGPT-4, Gemini 2.0 Flash, and DeepSeek-V3-in supporting initial clinical decision-making using standardized clinical scenarios.

Methods: A total of 36 clinical scenarios were selected based on diagnostic algorithms from the Guide to Diagnostic Tests (7th ed.), representing five major clinical domains. For each scenario, only the first decision step of the relevant diagnostic algorithm was assessed. All questions were presented in Turkish and entered once, using identical prompts, into the publicly available free versions of the three models. Responses were evaluated using a three-point categorical accuracy system (completely correct, partially correct, incorrect).

Results: ChatGPT achieved the highest total score (40/72), followed by Gemini and DeepSeek (36/72 each); however, this difference was not statistically significant (p>0.05). ChatGPT provided completely correct responses in 36.1% of scenarios, compared with 33.3% for Gemini and 22.2% for DeepSeek. Overlapping patterns of fully correct responses were observed between ChatGPT and Gemini, although this did not reach statistical significance (p>0.05). Performance varied by category: Gemini excelled in electrolyte disorders, ChatGPT in infectious and systemic conditions, and DeepSeek showed parity only in endocrinology and hematology.

Conclusion: While all models showed some diagnostic potential, none reached a level of accuracy sufficient to replace clinical judgment. However, when used for the initial step of diagnostic reasoning based on limited clinical information, these models may offer supportive value to clinicians, particularly when integrated into broader clinical decision-support systems.

Keywords

Supporting Institution

N-A

Ethical Statement

Since the nature of the study did not involve the use of human or animal data, ethical committee approval was not required. The research process was conducted in accordance with the principles of scientific integrity and the ethical standards of the Declaration of Helsinki.

Thanks

N-A

References

  1. 1. Sanli DET, Sanli AN, Buyukdereli Atadag Y, Kurt A, Esmerer E. GPT-4o and Specialized AI in Breast Ultrasound Imaging: A Comparative Study on Accuracy, Agreement, Limitations, and Diagnostic Potential. J Ultrasound Med. 2025;44(11):1993-2004. doi:10.1002/jum.16749
  2. 2. Ranji SR. Large language models-misdiagnosing diagnostic excellence? JAMA Netw Open. 2024;7(10):e2440901. doi:10.1001/jamanetworkopen.2024.40901
  3. 3. Hager P, Jungmann F, Holland R, et al. Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nat Med. 2024;30(9):2613-2622. doi:10.1038/s41591-024-03097-1
  4. 4. Austad B, Hetlevik I, Mjølstad BP, Helvik AS. Applying clinical guidelines in general practice: a qualitative study of potential complications. BMC Fam Pract. 2016;17:92. doi:10.1186/s12875-016-0490-3.
  5. 5. Corrao S, Argano C. Rethinking clinical decision-making to improve clinical reasoning. Front Med (Lausanne). 2022;9:900543. doi:10.3389/fmed.2022.900543
  6. 6. Wu X, Cai G, Guo B, et al. A multi-dimensional performance evaluation of large language models in dental implantology: comparison of ChatGPT, DeepSeek, Grok, Gemini and Qwen across diverse clinical scenarios. BMC Oral Health. 2025;25(1):1272. Published 2025 Jul 28. doi:10.1186/s12903-025-06619-6
  7. 7. Meo SA, Abukhalaf FA, ElToukhy RA, Sattar K. Exploring the role of DeepSeek-R1, ChatGPT-4, and Google Gemini in medical education: How valid and reliable are they?. Pak J Med Sci. 2025;41(7):1887-1892. doi:10.12669/pjms.41.7.12183
  8. 8. Lee S, Jung S, Park JH, Cho H, Moon S, Ahn S. Performance of ChatGPT, Gemini and DeepSeek for non-critical triage support using real-world conversations in emergency department. BMC Emerg Med. 2025;25(1):176. Published 2025 Sep 1. doi:10.1186/s12873-025-01337-2

Details

Primary Language

English

Subjects

Family Medicine

Journal Section

Short Report

Early Pub Date

May 10, 2026

Publication Date

June 1, 2026

Submission Date

October 23, 2025

Acceptance Date

March 23, 2026

Published in Issue

Year 2026 Volume: 20 Number: 2

APA
Büyükdereli Atadağ, Y., Kalınkara Seyhan, T., Hilaloğlu, U., & Akbayram, H. T. (2026). DIAGNOSTIC PERFORMANCE OF CHATGPT, GEMINI, AND DEEPSEEK IN CLINICAL DECISION SUPPORT: A COMPARATIVE ANALYSIS. Turkish Journal of Family Medicine and Primary Care, 20(2), 214-219. https://doi.org/10.21763/tjfmpc.1809491
AMA
1.Büyükdereli Atadağ Y, Kalınkara Seyhan T, Hilaloğlu U, Akbayram HT. DIAGNOSTIC PERFORMANCE OF CHATGPT, GEMINI, AND DEEPSEEK IN CLINICAL DECISION SUPPORT: A COMPARATIVE ANALYSIS. TJFMPC. 2026;20(2):214-219. doi:10.21763/tjfmpc.1809491
Chicago
Büyükdereli Atadağ, Yıldız, Tuba Kalınkara Seyhan, Umut Hilaloğlu, and Hatice Tuba Akbayram. 2026. “DIAGNOSTIC PERFORMANCE OF CHATGPT, GEMINI, AND DEEPSEEK IN CLINICAL DECISION SUPPORT: A COMPARATIVE ANALYSIS”. Turkish Journal of Family Medicine and Primary Care 20 (2): 214-19. https://doi.org/10.21763/tjfmpc.1809491.
EndNote
Büyükdereli Atadağ Y, Kalınkara Seyhan T, Hilaloğlu U, Akbayram HT (June 1, 2026) DIAGNOSTIC PERFORMANCE OF CHATGPT, GEMINI, AND DEEPSEEK IN CLINICAL DECISION SUPPORT: A COMPARATIVE ANALYSIS. Turkish Journal of Family Medicine and Primary Care 20 2 214–219.
IEEE
[1]Y. Büyükdereli Atadağ, T. Kalınkara Seyhan, U. Hilaloğlu, and H. T. Akbayram, “DIAGNOSTIC PERFORMANCE OF CHATGPT, GEMINI, AND DEEPSEEK IN CLINICAL DECISION SUPPORT: A COMPARATIVE ANALYSIS”, TJFMPC, vol. 20, no. 2, pp. 214–219, June 2026, doi: 10.21763/tjfmpc.1809491.
ISNAD
Büyükdereli Atadağ, Yıldız - Kalınkara Seyhan, Tuba - Hilaloğlu, Umut - Akbayram, Hatice Tuba. “DIAGNOSTIC PERFORMANCE OF CHATGPT, GEMINI, AND DEEPSEEK IN CLINICAL DECISION SUPPORT: A COMPARATIVE ANALYSIS”. Turkish Journal of Family Medicine and Primary Care 20/2 (June 1, 2026): 214-219. https://doi.org/10.21763/tjfmpc.1809491.
JAMA
1.Büyükdereli Atadağ Y, Kalınkara Seyhan T, Hilaloğlu U, Akbayram HT. DIAGNOSTIC PERFORMANCE OF CHATGPT, GEMINI, AND DEEPSEEK IN CLINICAL DECISION SUPPORT: A COMPARATIVE ANALYSIS. TJFMPC. 2026;20:214–219.
MLA
Büyükdereli Atadağ, Yıldız, et al. “DIAGNOSTIC PERFORMANCE OF CHATGPT, GEMINI, AND DEEPSEEK IN CLINICAL DECISION SUPPORT: A COMPARATIVE ANALYSIS”. Turkish Journal of Family Medicine and Primary Care, vol. 20, no. 2, June 2026, pp. 214-9, doi:10.21763/tjfmpc.1809491.
Vancouver
1.Yıldız Büyükdereli Atadağ, Tuba Kalınkara Seyhan, Umut Hilaloğlu, Hatice Tuba Akbayram. DIAGNOSTIC PERFORMANCE OF CHATGPT, GEMINI, AND DEEPSEEK IN CLINICAL DECISION SUPPORT: A COMPARATIVE ANALYSIS. TJFMPC. 2026 Jun. 1;20(2):214-9. doi:10.21763/tjfmpc.1809491

English or Turkish manuscripts from authors with new knowledge to contribute to understanding and improving health and primary care are welcome. 


Turkish Journal of Family Medicine and Primary Care © 2024 by Academy of Family Medicine Association is licensed under CC BY-NC-ND 4.0