Research Article

Gemini 2.5 Pro and ChatGPT-5 on the ATLS Exam: Accuracy, Consistency, and Comparison with Physicians

Volume: 6 Number: 1 January 19, 2026
EN TR

Gemini 2.5 Pro and ChatGPT-5 on the ATLS Exam: Accuracy, Consistency, and Comparison with Physicians

Abstract

Abstract Objective: This study aimed to evaluate the accuracy and consistency performance of the current large language models (LLMs), Gemini 2.5 Pro and ChatGPT-5, on the Advanced Trauma Life Support (ATLS) exam. It also aimed to compare these two artificial intelligence (AI) models with emergency medicine residents and examine their performance on different question types. Materials and Methods: This observational study used the 2023 ATLS exam, consisting of 40 multiple-choice questions. Questions were categorized as either directly based on basic knowledge or scenario-based. Each question was administered six times to Gemini 2.5 Pro and ChatGPT-5, and once to six emergency medicine residents to measure response consistency. The accuracy rates for all examinees were calculated and compared. Results: On the ATLS exam, Gemini 2.5 Pro achieved an overall accuracy rate of 95.8%, ChatGPT-5 achieved 92.9%, and residents achieved 67.1%. The AI models performed significantly better than residents (p < 0.001). No significant difference was found between the exam performances of Gemini and ChatGPT (p = 0.17). Both models showed lower accuracy on scenario-based questions compared to knowledge questions. The AI models' response consistency across repeated exams was found to be moderate. Conclusion: Both Gemini 2.5 Pro and ChatGPT-5 passed the ATLS exam with a higher success rate and consistent performance than residents. These findings demonstrate the significant potential of LLMs as a tool for assisting in trauma education, providing rapid access to information, and potentially in clinical decision support mechanisms.

Keywords

References

  1. GBD 2019 Diseases and Injuries Collaborators. Global burden of 369 diseases and injuries in 204 countries and territories, 1990-2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet. 2020;396(10258):1204–1222.
  2. Tekeşin K, Basak F, Sisik A, et al. Epidemiology of Trauma with Analysis of 138.352 Patients: Trends of a Single Center. Haydarpaşa Numune Med J. 2019; 59(2):181–185.
  3. Galvagno SM, Nahmias JT, Young DA. Advanced Trauma Life Support® Update 2019: Management and Applications for Adults and Special Populations. Anesthesiol Clin. 2019;37(1):13–32.
  4. ATLS Subcommittee, American College of Surgeons’ Committee on Trauma, International ATLS working group. Advanced trauma life support (ATLS®): the ninth edition. J Trauma Acute Care Surg. 2013;74(5):1363–1366.
  5. Advanced Trauma Life Support FAQs [Internet]. American College of Surgeons (ACS); [cited 2025 Aug 13]. Available from: https://www.facs.org/quality-programs/trauma/education/advanced-trauma-life-support/faq/
  6. Cabral S, Restrepo D, Kanjee Z, et al. Clinical Reasoning of a Generative Artificial Intelligence Model Compared With Physicians. JAMA Intern Med. 2024;184(5):581–583.
  7. Ayers JW, Poliak A, Dredze M, et al. Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum. JAMA Intern Med. 2023;183(6):589–596.
  8. Arslan B, Nuhoglu C, Satici MO, et al. Evaluating LLM-based generative AI tools in emergency triage: A comparative study of ChatGPT Plus, Copilot Pro, and triage nurses. Am J Emerg Med. 2025; 89:174–181.

Details

Primary Language

English

Subjects

Emergency Medicine

Journal Section

Research Article

Publication Date

January 19, 2026

Submission Date

August 22, 2025

Acceptance Date

October 6, 2025

Published in Issue

Year 2026 Volume: 6 Number: 1