Yapay Zeka Modelleri ile Acil Beyin BT Patolojilerinin Saptanması: ChatGPT, Grok ve DeepSeek Karşılaştırması

Hamza Eren Güzel; Cemre Ozenbas

doi:10.52309/jaihs.1758538

TR EN

Yapay Zeka Modelleri ile Acil Beyin BT Patolojilerinin Saptanması: ChatGPT, Grok ve DeepSeek Karşılaştırması

Öz

Amaç: Bu çalışmanın amacı, acil beyin BT raporlarında sık rastlanan patolojilerin tespiti açısından üç farklı yapay zeka dil modelinin (ChatGPT, Grok ve DeepSeek) doğruluğunu karşılaştırmaktır. Gereç ve Yöntem: Bu retrospektif çalışmada, İzmir Şehir Hastanesi acil servisinde 2023–2024 yılları arasında çekilmiş 18 yaş üstü 2000 hastaya ait beyin BT raporları kullanıldı. Radyoloji raporları, iki deneyimli radyolog tarafından LabelStudio platformunda beş ana patoloji açısından “var/yok” şeklinde etiketlendi: intraserebral hematom, subaraknoid kanama, subdural hematom, iskemik inme (akut/subakut), kitle etkisine bağlı herniasyon. Etiketleme sırasında doğrudan tanı ifadesi bulunmasa bile yorumdan tanı çıkarımı yapıldı. Üç farklı büyük dil modeli, aynı raporları tarayarak her patoloji için “var/yok” kararı üretti. Bulgular altın standart olan radyolog etiketleriyle karşılaştırılarak F1 skorları hesaplandı. Bulgular: DeepSeek modeli tüm patolojilerde en yüksek F1 skoruna (0.89–0.95) ulaşırken, ChatGPT ikinci sırada yer aldı (0.88–0.93). Grok modeli genel olarak daha düşük performans gösterdi (0.83–0.90). En yüksek F1 skorları intraserebral hematom ve kitle etkisine bağlı herniasyon için DeepSeek’te gözlendi. Sonuç: Yapay zeka dil modelleri, beyin BT raporlarında acil patolojilerin otomatik tespitinde yüksek doğrulukla çalışabilir. Özellikle DeepSeek ve ChatGPT, hastane bilgi sistemlerine entegre edilerek kritik bulgular konusunda klinisyenleri zamanında uyarabilecek yardımcı araçlar olabilir.

Anahtar Kelimeler

Detection of Acute Brain CT Pathologies Using AI Models: A Comparative Study of ChatGPT, Grok, and DeepSeek

Abstract

Objective: To compare the accuracy of three large language models (ChatGPT, Grok, and DeepSeek) in detecting common acute pathologies in emergency brain CT reports. Materials and Methods: In this retrospective study, 2000 emergency brain CT reports from adult patients (>18 years) were annotated by two board-certified radiologists using the LabelStudio platform. Five pathologies were labeled as present/absent: intracerebral hemorrhage, subarachnoid hemorrhage, subdural hematoma, acute/subacute ischemic stroke, and herniation due to mass effect. Even if pathology was not explicitly stated in the report, radiologists inferred its presence from context. AI models were evaluated against the reference standard using F1 scores. Results: DeepSeek achieved the highest F1 scores across all pathologies (range: 0.89–0.95), followed by ChatGPT (0.88–0.93), and Grok (0.83–0.90). DeepSeek performed best in identifying intracerebral hemorrhage and herniation. Conclusion: LLMs demonstrate strong potential in detecting emergency brain CT findings. Especially DeepSeek and ChatGPT could be integrated into hospital systems to provide real-time alerts to physicians regarding critical findings.

Keywords

References

1. Maxwell S, Ha NT, Bulsara MK, Doust J, Mcrobbie D, O’Leary P, et al. Increasing use of CT requested by emergency department physicians in tertiary hospitals in Western Australia 2003–2015: an analysis of linked administrative data. BMJ Open. 2021 Mar 4;11(3):e043315.
2. Kidwell CS, Chalela JA, Saver JL, Elkhedrawy YA, Starkman S, Hillis AE, et al. Comparison of MRI and CT for detection of acute intracerebral hemorrhage. JAMA. 2004 Jul 7;291(3):375–382. doi:10.1001/jama.291.3.375
3. Greenberg SM, Di Tullio MR, Winshi P, et al.; AHA/ASA Guideline Development Group. 2022 guideline for the management of patients with spontaneous intracerebral hemorrhage: a guideline from the American Heart Association/American Stroke Association. Stroke. 2022 May 1;53(7):e1–e52. doi:10.1161/STR.0000000000000407
4. Lee CS, Nagy PG, Weaver SJ, Newman-Torres M, Kim S, Brink JA. Cognitive and system factors contributing to diagnostic errors in radiology. AJR Am J Roentgenol. 2013 Mar;200(3):535–542. doi:10.2214/AJR.12.10375
5. Salbas A, Buyuktoka RE. Performance of Large Language Models in Recognizing Brain MRI Sequences: A Comparative Analysis of ChatGPT-4o, Claude 4 Opus, and Gemini 2.5 Pro. Diagnostics. 2025;15(15):1919. doi:10.3390/diagnostics15151919
6. Elkassem AA, Corral JE, Rubin DL, Muzikansky A, Aydin U. Potential use cases for ChatGPT in radiology reporting. AJR Am J Roentgenol. 2024 Jan 1;222(1):13–19. doi:10.2214/AJR.23.29198
7. Nakamura Y, Prakash P, Harada S, et al. Automatic detection of actionable radiology reports using NLP. BMC Med Inform Decis Mak. 2021 Jun 3;21(1):146. doi:10.1186/s12911-021-01623-6
8. Titano JJ, Badgeley M, Schefflein J, et al. Automated deep-neural-network surveillance of cranial images for acute neurologic events. Nat Med. 2018;24(9):1337-1341. doi:10.1038/s41591-018-0147-y

9. Chilamkurthy S, Ghosh R, Tanamala S, et al. Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study. Lancet. 2018;392(10162):2388-2396. doi:10.1016/S0140-6736(18)31645-3
10. Lee H, Yune S, Mansouri M, et al. An explainable deep-learning algorithm for the detection of acute intracranial hemorrhage from small datasets. Nat Biomed Eng. 2019;3(3):173-182. doi:10.1038/s41551-018-0324-9
11. Güzel HE, Aşcı G, Demirbilek O, Özdemir TD, Erekli PB. Diagnostic precision of a deep learning algorithm for the classification of non-contrast brain CT reports. Front Radiol. 2025 May 9;5:1509377. doi:10.3389/fradi.2025.1509377
12. Chokshi, F. H., Shin, B., Lee, T., Lemmon, A., Necessary, S., & Choi, J. D. (2017). Natural Language Processing for Classification of Acute, Communicable Findings on Unstructured Head CT Reports: Comparison of Neural Network and Non-Neural Machine Learning Techniques. BioRxiv, 173310. https://doi.org/10.1101/173310
13. Iorga M, Drakopoulos M, Naidech AM, Katsaggelos AK, Parrish TB, Hill VB. Labeling Noncontrast Head CT Reports for Common Findings Using Natural Language Processing. AJNR Am J Neuroradiol. 2022 May;43(5):721-726. doi: 10.3174/ajnr.A7500.
14. Le Guellec B, Lefèvre A, Geay C, Shorten L, Bruge C, Hacein-Bey L, Amouyel P, Pruvo JP, Kuchcinski G, Hamroun A. Performance of an Open-Source Large Language Model in Extracting Information from Free-Text Radiology Reports. Radiol Artif Intell. 2024 Jul;6(4):e230364. doi: 10.1148/ryai.230364.
15. Lehnen NC, Dorn F, Wiest IC, Zimmermann H, Radbruch A, Kather JN, Paech D. Data Extraction from Free-Text Reports on Mechanical Thrombectomy in Acute Ischemic Stroke Using ChatGPT: A Retrospective Analysis. Radiology. 2024 Apr;311(1):e232741. doi: 10.1148/radiol.232741.

Details

Primary Language

Turkish

Subjects

Natural Language Processing , Radiology and Organ Imaging

Journal Section

Research Article

Authors

Hamza Eren Güzel ^*
0000-0003-4228-0840
Türkiye

Cemre Ozenbas
0000-0002-6688-5003
Türkiye

Publication Date

December 30, 2025

Submission Date

August 5, 2025

Acceptance Date

December 5, 2025

Published in Issue

Year 2025 Volume: 5 Number: 3

DOI

https://doi.org/10.52309/jaihs.1758538

IZ

https://izlik.org/JA93GM53SN

Cite

RIS / Bibtex

Vancouver

1.Hamza Eren Güzel, Cemre Ozenbas. Yapay Zeka Modelleri ile Acil Beyin BT Patolojilerinin Saptanması: ChatGPT, Grok ve DeepSeek Karşılaştırması. JAIHS. 2025 Dec. 1;5(3):14-9. doi:10.52309/jaihs.1758538