Araştırma Makalesi

Gemini 2.5 Pro and ChatGPT-5 on the ATLS Exam: Accuracy, Consistency, and Comparison with Physicians

Cilt: 6 Sayı: 1 19 Ocak 2026
PDF İndir
EN TR

Gemini 2.5 Pro and ChatGPT-5 on the ATLS Exam: Accuracy, Consistency, and Comparison with Physicians

Abstract

Abstract Objective: This study aimed to evaluate the accuracy and consistency performance of the current large language models (LLMs), Gemini 2.5 Pro and ChatGPT-5, on the Advanced Trauma Life Support (ATLS) exam. It also aimed to compare these two artificial intelligence (AI) models with emergency medicine residents and examine their performance on different question types. Materials and Methods: This observational study used the 2023 ATLS exam, consisting of 40 multiple-choice questions. Questions were categorized as either directly based on basic knowledge or scenario-based. Each question was administered six times to Gemini 2.5 Pro and ChatGPT-5, and once to six emergency medicine residents to measure response consistency. The accuracy rates for all examinees were calculated and compared. Results: On the ATLS exam, Gemini 2.5 Pro achieved an overall accuracy rate of 95.8%, ChatGPT-5 achieved 92.9%, and residents achieved 67.1%. The AI models performed significantly better than residents (p < 0.001). No significant difference was found between the exam performances of Gemini and ChatGPT (p = 0.17). Both models showed lower accuracy on scenario-based questions compared to knowledge questions. The AI models' response consistency across repeated exams was found to be moderate. Conclusion: Both Gemini 2.5 Pro and ChatGPT-5 passed the ATLS exam with a higher success rate and consistent performance than residents. These findings demonstrate the significant potential of LLMs as a tool for assisting in trauma education, providing rapid access to information, and potentially in clinical decision support mechanisms.

Keywords

Kaynakça

  1. GBD 2019 Diseases and Injuries Collaborators. Global burden of 369 diseases and injuries in 204 countries and territories, 1990-2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet. 2020;396(10258):1204–1222.
  2. Tekeşin K, Basak F, Sisik A, et al. Epidemiology of Trauma with Analysis of 138.352 Patients: Trends of a Single Center. Haydarpaşa Numune Med J. 2019; 59(2):181–185.
  3. Galvagno SM, Nahmias JT, Young DA. Advanced Trauma Life Support® Update 2019: Management and Applications for Adults and Special Populations. Anesthesiol Clin. 2019;37(1):13–32.
  4. ATLS Subcommittee, American College of Surgeons’ Committee on Trauma, International ATLS working group. Advanced trauma life support (ATLS®): the ninth edition. J Trauma Acute Care Surg. 2013;74(5):1363–1366.
  5. Advanced Trauma Life Support FAQs [Internet]. American College of Surgeons (ACS); [cited 2025 Aug 13]. Available from: https://www.facs.org/quality-programs/trauma/education/advanced-trauma-life-support/faq/
  6. Cabral S, Restrepo D, Kanjee Z, et al. Clinical Reasoning of a Generative Artificial Intelligence Model Compared With Physicians. JAMA Intern Med. 2024;184(5):581–583.
  7. Ayers JW, Poliak A, Dredze M, et al. Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum. JAMA Intern Med. 2023;183(6):589–596.
  8. Arslan B, Nuhoglu C, Satici MO, et al. Evaluating LLM-based generative AI tools in emergency triage: A comparative study of ChatGPT Plus, Copilot Pro, and triage nurses. Am J Emerg Med. 2025; 89:174–181.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Acil Tıp

Bölüm

Araştırma Makalesi

Yayımlanma Tarihi

19 Ocak 2026

Gönderilme Tarihi

22 Ağustos 2025

Kabul Tarihi

6 Ekim 2025

Yayımlandığı Sayı

Yıl 2026 Cilt: 6 Sayı: 1

Kaynak Göster