Can Large Language Models Support Family Communication in the Intensive Care Unit? A Comparative Accuracy Study

Mehmet Yıldırım; Tulay Dilara Demiray; Arda Ayten; Esat Kivanc Kaya

doi:10.31832/smj.1842543

EN

Can Large Language Models Support Family Communication in the Intensive Care Unit? A Comparative Accuracy Study

Abstract

Objective: Family members of patients admitted to the intensive care unit (ICU) frequently experience uncertainty, emotional distress, and significant informational needs. Communication gaps remain common in ICU practice due to time constraints and clinical complexity. Large language models (LLMs) may offer scalable support for patient–family information delivery; however, their performance in responding to real-world ICU family questions has not been systematically evaluated.
Methods: This evaluator-blinded, cross-sectional study compared the accuracy of responses generated by five widely used LLMs (Claude Sonnet 4.0, ChatGPT 5.0, Gemini 2.5, Grok-4, and Sonar) to questions commonly asked by ICU family members. A standardized set of 25 questions was generated by prompting each model to list frequently asked ICU family questions. All questions were subsequently posed to all five models in blinded, independent sessions. Two intensive care medicine specialists independently rated response accuracy using a 6-point Likert scale. Inter-rater reliability was assessed using Cohen’s kappa. Differences between models were analyzed using the Friedman test with post-hoc Wilcoxon signed-rank tests.
Results: A total of 125 responses were evaluated. Inter-rater agreement was moderate (Cohen’s κ = 0.56; overall agreement 73.6%). Accuracy scores differed significantly among models (p < 0.001). Claude Sonnet 4.0 achieved the highest mean accuracy score (5.66 ± 0.61), followed by ChatGPT 5.0, Gemini 2.5, and Sonar, with no statistically significant differences among these four models. Grok-4 demonstrated significantly lower accuracy compared with all other models (all p < 0.001).
Conclusions: Most contemporary LLMs demonstrated high accuracy in answering questions commonly posed by ICU family members, although performance varied across platforms. Selected LLMs may serve as supportive tools to reinforce clinician–family communication; however, careful model selection, clinical oversight, and ethical safeguards are required before implementation in high-stakes intensive care settings.

Keywords

References

Lautrette A, Darmon M, Megarbane B, Joly LM, Chevret S, Adrie C, Barnoud D, et al. A communication strategy and brochure for relatives of patients dying in the ICU. N Engl J Med. 2027;356(5):469-78. doi:10.1056/NEJMoa063446.
Curtis JR, Treece PD, Nielsen EL, Gold J, Ciechanowski PS, Shannon SE, et al. Randomized trial of communication facilitators to reduce family distress and intensity of end-of-life care. Am J Respir Crit Care Med. 2016;193(2):154-62. doi:10.1164/rccm.201505-0900OC.
Aribas YK, Tefon Aribas AB. Comparative analysis of large language models in providing patient information about keratoconus and contact lenses. Int Ophthalmol. 2025;45(1):340. doi:10.1007/s10792-025-03711-2 .
Lambert R, Choo ZY, Gradwohl K, Schroedl L, Ruiz De Luzuriaga A. Assessing the application of large language models in generating dermatologic patient education materials according to reading level: qualitative study. JMIR Dermatol. 2024;7:e55898. doi:10.2196/55898.
Chen D, Parsa R, Swanson K, Nunez JJ, Critch A, Bitterman DS, et al. Large language models in oncology: a review. BMJ Oncol. 2025;4(1):e000759. doi:10.1136/bmjonc-2025-000759.
Cheungpasitporn W, Thongprayoon C, Ronco C, Kashani KB. Generative AI in critical care nephrology: applications and future prospects. Blood Purif. 2024;53(11–12):871-83. doi:10.1159/000541168.
Biesheuvel LA, Workum JD, Reuland M, van Genderen ME, Thoral P, Dongelmans D, et al. Large language models in critical care. J Intensive Med. 2025;5(2):113-8. doi:10.1016/j.jointm.2024.12.001.
Madden MG, McNicholas BA, Laffey JG. Assessing the usefulness of a large language model to query and summarize unstructured medical notes in intensive care. Intensive Care Med. 2023;49(8):1018-20. doi:10.1007/s00134-023-07128-2.

Ayers JW, Poliak A, Dredze M, Leas EC, Zhu Z, Kelley JB, et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med. 2023;183(6):589-96. doi:10.1001/jamainternmed.2023.1838.
Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159-74.
Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44-56. doi:10.1038/s41591-018-0300-7.
Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, et al. Large language models encode clinical knowledge. Nature. 2023;620(7972):172-80. doi:10.1038/s41586-023-06291-2.
Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, et al. How does ChatGPT perform on the United States Medical Licensing Examination (USMLE)? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9:e45312. doi:10.2196/45312.
Yin Y, Riaz Z, Amoro Sanchez R, Mustafa A, Eighaei Sedeh A. Evaluating ChatGPT as a standalone tool for patient education: a review of frequently asked questions by patients with chronic obstructive pulmonary disease. Cureus. 2025;17(9):e92519. doi:10.7759/cureus.92519.
Huo B, Boyle A, Marfo N, Tangamornsuksan W, Steen JP, McKechnie T, et al. Large language models for chatbot health advice studies: a systematic review. JAMA Netw Open. 2025;8(2):e2457879. doi:10.1001/jamanetworkopen.2024.57879.
Ong JCL, Chang SYH, William W, Butte AJ, Shah NH, Chew LST, et al. Ethical and regulatory challenges of large language models in medicine. The Lancet Digital Health. 2024;6(6):e428–32. doi:10.1016/S2589-7500(24)00061-X
Sendak MP, Ratliff W, Sarro D, Alderton E, Futoma J, Gao M, et al. Real-world integration of a sepsis deep learning technology into routine clinical care: implementation study. JMIR Med Inform. 2020;8(7):e15182. doi:10.2196/15182.
Stephenson-Moe CA, Behers BJ, Gibons RM, Behers BM, Jesus Herrera L, Anneaud D, et al. Assessing the quality and readability of patient education materials on chemotherapy cardiotoxicity from artificial intelligence chatbots: an observational cross-sectional study. Medicine (Baltimore). 2025;104(15):e42135. doi:10.1097/MD.0000000000042135.
Hadweh P, Niset A, Salvagno M, Al Barajraji M, El Hadwe S, Taccone FS, et al. Machine learning and artificial intelligence in intensive care medicine: critical recalibrations from rule-based systems to frontier models. J Clin Med. 2025;14(12):4026. doi:10.3390/jcm14124026.
Boudi AL, Boudi M, Chan C, Boudi FB. Ethical challenges of artificial intelligence in medicine. Cureus. 16(11):e74495. doi:10.7759/cureus.74495.
Morley J, Machado CCV, Burr C, Cowls J, Joshi I, Taddeo M, et al. The ethics of AI in health care: a mapping review. Soc Sci Med. 2020;260:113172. doi:10.1016/j.socscimed.2020.113172.
Amann J, Blasimme A, Vayena E, Frey D, Madai VI, Precise4Q consortium. Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Med Inform Decis Mak. 2020;20(1):310. doi:10.1186/s12911-020-01332-6.
Sadeghi Z, Alizadehsani R, Cifci MA, Kausar S, Rehman R, Mahanta P, et al. A review of explainable artificial intelligence in healthcare. Computers and Electrical Engineering. 2024;118:109370. doi:10.1016/j.compeleceng.2024.109370.
Abgrall G, Holder AL, Chelly Dagdia Z, Zeitouni K, Monnet X. Should AI models be explainable to clinicians?. Crit Care. 2024;28(1):301. doi:10.1186/s13054-024-05005-y.

Details

Primary Language

English

Subjects

Internal Diseases, Intensive Care

Journal Section

Research Article

Authors

Mehmet Yıldırım ^*
0000-0002-0526-5943
Türkiye

Tulay Dilara Demiray
0000-0002-8629-4040
Türkiye

Arda Ayten
0009-0007-0639-4484
Türkiye

Esat Kivanc Kaya
0000-0002-3449-0701
Türkiye

Early Pub Date

June 15, 2026

Publication Date

-

Submission Date

December 15, 2025

Acceptance Date

February 2, 2026

Published in Issue

Year 2026 Number: Advanced Online Publication

DOI

https://doi.org/10.31832/smj.1842543

IZ

https://izlik.org/JA36KW59XK

Cite

RIS / Bibtex

APA

Yıldırım, M., Demiray, T. D., Ayten, A., & Kaya, E. K. (2026). Can Large Language Models Support Family Communication in the Intensive Care Unit? A Comparative Accuracy Study. Sakarya Medical Journal, Advanced Online Publication. https://doi.org/10.31832/smj.1842543

AMA

1.Yıldırım M, Demiray TD, Ayten A, Kaya EK. Can Large Language Models Support Family Communication in the Intensive Care Unit? A Comparative Accuracy Study. Sakarya Medical Journal. 2026;(Advanced Online Publication). doi:10.31832/smj.1842543

Chicago

Yıldırım, Mehmet, Tulay Dilara Demiray, Arda Ayten, and Esat Kivanc Kaya. 2026. “Can Large Language Models Support Family Communication in the Intensive Care Unit? A Comparative Accuracy Study”. Sakarya Medical Journal, no. Advanced Online Publication. https://doi.org/10.31832/smj.1842543.

EndNote

Yıldırım M, Demiray TD, Ayten A, Kaya EK (June 1, 2026) Can Large Language Models Support Family Communication in the Intensive Care Unit? A Comparative Accuracy Study. Sakarya Medical Journal Advanced Online Publication

IEEE

[1]M. Yıldırım, T. D. Demiray, A. Ayten, and E. K. Kaya, “Can Large Language Models Support Family Communication in the Intensive Care Unit? A Comparative Accuracy Study”, Sakarya Medical Journal, no. Advanced Online Publication, June 2026, doi: 10.31832/smj.1842543.

ISNAD

Yıldırım, Mehmet - Demiray, Tulay Dilara - Ayten, Arda - Kaya, Esat Kivanc. “Can Large Language Models Support Family Communication in the Intensive Care Unit? A Comparative Accuracy Study”. Sakarya Medical Journal. Advanced Online Publication (June 1, 2026). https://doi.org/10.31832/smj.1842543.

JAMA

1.Yıldırım M, Demiray TD, Ayten A, Kaya EK. Can Large Language Models Support Family Communication in the Intensive Care Unit? A Comparative Accuracy Study. Sakarya Medical Journal. 2026. doi:10.31832/smj.1842543.

MLA

Yıldırım, Mehmet, et al. “Can Large Language Models Support Family Communication in the Intensive Care Unit? A Comparative Accuracy Study”. Sakarya Medical Journal, no. Advanced Online Publication, June 2026, doi:10.31832/smj.1842543.

Vancouver

1.Mehmet Yıldırım, Tulay Dilara Demiray, Arda Ayten, Esat Kivanc Kaya. Can Large Language Models Support Family Communication in the Intensive Care Unit? A Comparative Accuracy Study. Sakarya Medical Journal. 2026 Jun. 1;(Advanced Online Publication). doi:10.31832/smj.1842543