Kısa Rapor

DIAGNOSTIC PERFORMANCE OF CHATGPT, GEMINI, AND DEEPSEEK IN CLINICAL DECISION SUPPORT: A COMPARATIVE ANALYSIS

Cilt: 20 Sayı: 2 1 Haziran 2026
PDF İndir
TR EN

DIAGNOSTIC PERFORMANCE OF CHATGPT, GEMINI, AND DEEPSEEK IN CLINICAL DECISION SUPPORT: A COMPARATIVE ANALYSIS

Öz

Objective: This study aimed to compare the diagnostic performance of three large language model (LLM)-based artificial intelligence (AI) tools-ChatGPT-4, Gemini 2.0 Flash, and DeepSeek-V3-in supporting initial clinical decision-making using standardized clinical scenarios.

Methods: A total of 36 clinical scenarios were selected based on diagnostic algorithms from the Guide to Diagnostic Tests (7th ed.), representing five major clinical domains. For each scenario, only the first decision step of the relevant diagnostic algorithm was assessed. All questions were presented in Turkish and entered once, using identical prompts, into the publicly available free versions of the three models. Responses were evaluated using a three-point categorical accuracy system (completely correct, partially correct, incorrect).

Results: ChatGPT achieved the highest total score (40/72), followed by Gemini and DeepSeek (36/72 each); however, this difference was not statistically significant (p>0.05). ChatGPT provided completely correct responses in 36.1% of scenarios, compared with 33.3% for Gemini and 22.2% for DeepSeek. Overlapping patterns of fully correct responses were observed between ChatGPT and Gemini, although this did not reach statistical significance (p>0.05). Performance varied by category: Gemini excelled in electrolyte disorders, ChatGPT in infectious and systemic conditions, and DeepSeek showed parity only in endocrinology and hematology.

Conclusion: While all models showed some diagnostic potential, none reached a level of accuracy sufficient to replace clinical judgment. However, when used for the initial step of diagnostic reasoning based on limited clinical information, these models may offer supportive value to clinicians, particularly when integrated into broader clinical decision-support systems.

Anahtar Kelimeler

Destekleyen Kurum

yok

Etik Beyan

Çalışma doğası gereği insan ya da hayvan verisi kullanılmadığından, etik kurul onayı gerekmemiştir. Araştırma süreci, bilimsel dürüstlük ilkeleri ve Helsinki Bildirgesi’nin etik standartları gözetilerek yürütülmüştür.

Teşekkür

yok

Kaynakça

  1. 1. Sanli DET, Sanli AN, Buyukdereli Atadag Y, Kurt A, Esmerer E. GPT-4o and Specialized AI in Breast Ultrasound Imaging: A Comparative Study on Accuracy, Agreement, Limitations, and Diagnostic Potential. J Ultrasound Med. 2025;44(11):1993-2004. doi:10.1002/jum.16749
  2. 2. Ranji SR. Large language models-misdiagnosing diagnostic excellence? JAMA Netw Open. 2024;7(10):e2440901. doi:10.1001/jamanetworkopen.2024.40901
  3. 3. Hager P, Jungmann F, Holland R, et al. Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nat Med. 2024;30(9):2613-2622. doi:10.1038/s41591-024-03097-1
  4. 4. Austad B, Hetlevik I, Mjølstad BP, Helvik AS. Applying clinical guidelines in general practice: a qualitative study of potential complications. BMC Fam Pract. 2016;17:92. doi:10.1186/s12875-016-0490-3.
  5. 5. Corrao S, Argano C. Rethinking clinical decision-making to improve clinical reasoning. Front Med (Lausanne). 2022;9:900543. doi:10.3389/fmed.2022.900543
  6. 6. Wu X, Cai G, Guo B, et al. A multi-dimensional performance evaluation of large language models in dental implantology: comparison of ChatGPT, DeepSeek, Grok, Gemini and Qwen across diverse clinical scenarios. BMC Oral Health. 2025;25(1):1272. Published 2025 Jul 28. doi:10.1186/s12903-025-06619-6
  7. 7. Meo SA, Abukhalaf FA, ElToukhy RA, Sattar K. Exploring the role of DeepSeek-R1, ChatGPT-4, and Google Gemini in medical education: How valid and reliable are they?. Pak J Med Sci. 2025;41(7):1887-1892. doi:10.12669/pjms.41.7.12183
  8. 8. Lee S, Jung S, Park JH, Cho H, Moon S, Ahn S. Performance of ChatGPT, Gemini and DeepSeek for non-critical triage support using real-world conversations in emergency department. BMC Emerg Med. 2025;25(1):176. Published 2025 Sep 1. doi:10.1186/s12873-025-01337-2

Ayrıntılar

Birincil Dil

İngilizce

Konular

Aile Hekimliği

Bölüm

Kısa Rapor

Erken Görünüm Tarihi

10 Mayıs 2026

Yayımlanma Tarihi

1 Haziran 2026

Gönderilme Tarihi

23 Ekim 2025

Kabul Tarihi

23 Mart 2026

Yayımlandığı Sayı

Yıl 2026 Cilt: 20 Sayı: 2

Kaynak Göster

APA
Büyükdereli Atadağ, Y., Kalınkara Seyhan, T., Hilaloğlu, U., & Akbayram, H. T. (2026). DIAGNOSTIC PERFORMANCE OF CHATGPT, GEMINI, AND DEEPSEEK IN CLINICAL DECISION SUPPORT: A COMPARATIVE ANALYSIS. Turkish Journal of Family Medicine and Primary Care, 20(2), 214-219. https://doi.org/10.21763/tjfmpc.1809491
AMA
1.Büyükdereli Atadağ Y, Kalınkara Seyhan T, Hilaloğlu U, Akbayram HT. DIAGNOSTIC PERFORMANCE OF CHATGPT, GEMINI, AND DEEPSEEK IN CLINICAL DECISION SUPPORT: A COMPARATIVE ANALYSIS. TJFMPC. 2026;20(2):214-219. doi:10.21763/tjfmpc.1809491
Chicago
Büyükdereli Atadağ, Yıldız, Tuba Kalınkara Seyhan, Umut Hilaloğlu, ve Hatice Tuba Akbayram. 2026. “DIAGNOSTIC PERFORMANCE OF CHATGPT, GEMINI, AND DEEPSEEK IN CLINICAL DECISION SUPPORT: A COMPARATIVE ANALYSIS”. Turkish Journal of Family Medicine and Primary Care 20 (2): 214-19. https://doi.org/10.21763/tjfmpc.1809491.
EndNote
Büyükdereli Atadağ Y, Kalınkara Seyhan T, Hilaloğlu U, Akbayram HT (01 Haziran 2026) DIAGNOSTIC PERFORMANCE OF CHATGPT, GEMINI, AND DEEPSEEK IN CLINICAL DECISION SUPPORT: A COMPARATIVE ANALYSIS. Turkish Journal of Family Medicine and Primary Care 20 2 214–219.
IEEE
[1]Y. Büyükdereli Atadağ, T. Kalınkara Seyhan, U. Hilaloğlu, ve H. T. Akbayram, “DIAGNOSTIC PERFORMANCE OF CHATGPT, GEMINI, AND DEEPSEEK IN CLINICAL DECISION SUPPORT: A COMPARATIVE ANALYSIS”, TJFMPC, c. 20, sy 2, ss. 214–219, Haz. 2026, doi: 10.21763/tjfmpc.1809491.
ISNAD
Büyükdereli Atadağ, Yıldız - Kalınkara Seyhan, Tuba - Hilaloğlu, Umut - Akbayram, Hatice Tuba. “DIAGNOSTIC PERFORMANCE OF CHATGPT, GEMINI, AND DEEPSEEK IN CLINICAL DECISION SUPPORT: A COMPARATIVE ANALYSIS”. Turkish Journal of Family Medicine and Primary Care 20/2 (01 Haziran 2026): 214-219. https://doi.org/10.21763/tjfmpc.1809491.
JAMA
1.Büyükdereli Atadağ Y, Kalınkara Seyhan T, Hilaloğlu U, Akbayram HT. DIAGNOSTIC PERFORMANCE OF CHATGPT, GEMINI, AND DEEPSEEK IN CLINICAL DECISION SUPPORT: A COMPARATIVE ANALYSIS. TJFMPC. 2026;20:214–219.
MLA
Büyükdereli Atadağ, Yıldız, vd. “DIAGNOSTIC PERFORMANCE OF CHATGPT, GEMINI, AND DEEPSEEK IN CLINICAL DECISION SUPPORT: A COMPARATIVE ANALYSIS”. Turkish Journal of Family Medicine and Primary Care, c. 20, sy 2, Haziran 2026, ss. 214-9, doi:10.21763/tjfmpc.1809491.
Vancouver
1.Yıldız Büyükdereli Atadağ, Tuba Kalınkara Seyhan, Umut Hilaloğlu, Hatice Tuba Akbayram. DIAGNOSTIC PERFORMANCE OF CHATGPT, GEMINI, AND DEEPSEEK IN CLINICAL DECISION SUPPORT: A COMPARATIVE ANALYSIS. TJFMPC. 01 Haziran 2026;20(2):214-9. doi:10.21763/tjfmpc.1809491

Sağlığın ve birinci basamak bakımın anlaşılmasına ve geliştirilmesine katkıda bulunacak yeni bilgilere sahip yazarların İngilizce veya Türkçe makaleleri memnuniyetle karşılanmaktadır.

Turkish Journal of Family Medicine and Primary Care © 2024 by Aile Hekimliği Akademisi Derneği is licensed under CC BY-NC-ND 4.0