Abstract
Objective: To compare the diagnostic accuracy of two advanced large language models (LLMs), ChatGPT-o1 and DeepSeek-V3, in expert-validated simulated otorhinolaryngology cases, and to assess subspecialty-specific performance and inter-rater agreement relative to human specialists.
Methods: A cross-sectional diagnostic accuracy study was conducted using 70 expert-validated clinical vignettes across five ENT subspecialties. Two academic otolaryngologists and two LLMs independently evaluated each case. All LLMs operated in deterministic mode (temperature = 0) with standardized single-pass prompting in isolated sessions. Diagnostic accuracy, inter-rater agreement (Cohen’s κ), and subspecialty-specific performance were analyzed. A post hoc power analysis (Cohen’s h = 0.22; α = 0.05) assessed the ability to detect moderate effect sizes.
Results: Both LLMs achieved a diagnostic accuracy of 90.0% (63/70), with no significant difference between them (p = 1.00) and substantial inter-model agreement (κ = 0.68). Human evaluators achieved accuracies of 97.1% and 92.9%, with fair inter-rater agreement (κ = 0.26). Subspecialty performance was highest in otology and pediatric ENT (100%) and rhinology (92.3%), with greater variability observed in laryngology and head and neck surgery. Shared error patterns included overestimation of malignancy in high-risk patients. Post hoc power analysis demonstrated 78% power to detect moderate differences.
Conclusion: In controlled, vignette-based evaluations, ChatGPT-o1 and DeepSeek-V3 demonstrated diagnostic accuracy approaching expert-level performance across simulated ENT scenarios, with strong inter-model agreement and subspecialty-dependent variability. These findings highlight the potential of LLMs as diagnostic decision-support tools while underscoring the need for multimodal and real-world validation before clinical implementation.
Formal ethics committee approval was not required as this study involved only simulated clinical scenarios without real patient data or human subject involvement.
No financial support
| Primary Language | English |
|---|---|
| Subjects | Otorhinolaryngology |
| Journal Section | Research Article |
| Authors | |
| Submission Date | December 21, 2025 |
| Acceptance Date | February 5, 2026 |
| Publication Date | March 26, 2026 |
| DOI | https://doi.org/10.65396/ejra.1846059 |
| IZ | https://izlik.org/JA37CX44UL |
| Published in Issue | Year 2026 Volume: 9 Issue: 1 |
You can find the current version of the Instructions to Authors at: https://www.eurjrhinol.org/en/instructions-to-authors-104
Starting on 2020, all content published in the journal is licensed under the Creative Commons Attribution-NonCommercial (CC BY-NC) 4.0 International
License which allows third parties to use the content for non-commercial purposes as long as they give credit to the original work. This license
allows for the content to be shared and adapted for non-commercial purposes, promoting the dissemination and use of the research published in
the journal.
The content published before 2020 was licensed under a traditional copyright, but the archive is still available for free access.