A COMPARATIVE STUDY: PERFORMANCE OF LARGE LANGUAGE MODELS IN SIMPLIFYING TURKISH COMPUTED TOMOGRAPHY REPORTS

Eren Çamur; Turay Cesur; Yasin Celal Güneş

doi:10.26650/IUITFD.1494572

Research Article

KARŞILAŞTIRMALI BİR ÇALIŞMA: TÜRKÇE BİLGİSAYARLI TOMOGRAFİ RAPORLARININ SADELEŞTİRİLMESİNDE BÜYÜK DİL MODELLERİNİN PERFORMANSI

Year 2024, Volume: 87 Issue: 4, 321 - 326, 25.10.2024

Eren Çamur , Turay Cesur , Yasin Celal Güneş

https://doi.org/10.26650/IUITFD.1494572

Abstract

Amaç: Bu çalışmada, yaygın bir görüntüleme yöntemi olan Türk çe bilgisayarlı tomografi (BT) raporlarının sadeleştirilmesinde çeşitli büyük dil modellerinin (BDM) etkinliği değerlendirilmiştir.
Gereç ve Yöntem: Kurgusal BT bulguları kullanılarak, Tanısal Doğruluk Çalışmaları Raporlama Standartları (STARD) ve Helsinki Bildirgesi'ne uyulmuştur. Elli kurgusal Türkçe BT bulgusu oluşturuldu. Dört LLM (ChatGPT 4, ChatGPT-3.5, Gemini 1.5 Pro ve Claude 3 Opus) istemini kullanarak raporları sadeleştirdi: "Please explain them in a way that someone without a medical background can understand in Turkish". Okunabilirlik değerlen dirmesi Ateşman Okunabilirlik Endeksi, doğruluk derecesi Likert ölçeğine göre yapılmıştır.
Bulgular: Claude 3 Opus okunabilirlik açısından en yüksek puanı alırken (58,9), onu ChatGPT-3.5 (54,5), Gemini 1.5 Pro (53,7) ve ChatGPT 4 (45,1) izledi. Claude 3 Opus (ortalama: 4,7) ve Chat GPT 4 (ortalama: 4,5) için Likert skorları anlamlı bir farklılık yoktu (p>0,05). ChatGPT 4, Claude 3 Opus (90,6), Gemini 1.5 Pro (74,4) ve ChatGPT-3.5 (38,7) ile karşılaştırıldığında en yüksek kelime sayısına (96,98) sahipti (p<0,001).
Sonuç: Bu çalışma, BDM'lerin Türkçe BT raporlarını tıp bilgisi ol mayan bireylerin anlayabileceği düzeyde ve yüksek okunabilirlik ve doğrulukla sadeleştirebildiğini göstermektedir. ChatGPT 4 ve Claude 3 Opus en doğru sadeleştirmeleri yapmaktadır. ChatGPT 4'ün daha basit cümleleri, onu Türkçe BT raporları için tercih edi len seçenek haline getirebilir.

Keywords

Büyük dil modelleri, radyoloji raporları, okunabilirlik, bilgisayarlı tomografi, Türkçe, sadeleştirme

Ethical Statement

Bu çalışmada gerçek hasta bilgi ve verileri kullanılmadığı için etik kurul onay gerekmemektedir.

References

Zhao WX, Zhou K, Li J, Tang T, Wang X, Hou Y, et al. A Survey of Large Language Models. 2023 http://arxiv.org/ abs/2303.18223 google scholar
Kung TH, Cheatham M, Medenilla A, Sillos C, Leon L De, Elepano C, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digital Health 2023;2(2):e0000198. [CrossRef] google scholar
Yilmaz EC, Belue MJ, Turkbey B, Reinhold C, Choyke PL. A Brief Review of Artificial Intelligence in Genitourinary Oncological Imaging. Can Assoc Radiol J 2023;74(3):534-47. [CrossRef] google scholar
Akinci D’Antonoli T, Stanzione A, Bluethgen C, Vernuccio F, Ugga L, Klontzas ME, et al. Large language models in radiology: fundamentals, applications, ethical considerations, risks, and future directions. Diagnostic and Interventional Radiology 2024;30(2):80-90. [CrossRef] google scholar
Doshi R, Amin K, Khosla P, Bajaj S, Chheang S, Forman HP. Utilizing Large Language Models to Simplify Radiology Reports: a comparative analysis of ChatGPT3.5, ChatGPT4.0, Google Bard, and Microsoft Bing. medRxiv 2023. https:// www.medrxiv.org/content/10.1101/2023.06.04.23290786v2 [CrossRef] google scholar
Li H, Moon JT, Iyer D, Balthazar P, Krupinski EA, Bercu ZL, et al. Decoding radiology reports: Potential application of OpenAI ChatGPT to enhance patient understanding of diagnostic reports. Clin Imaging. 2023;101:137-41. [CrossRef] google scholar
Luo W, Liu F, Liu Z, Litman D. A novel ILP framework for summarizing content with high lexical variety. Nat Lang Eng 2018;24(6):887-920. [CrossRef] google scholar
Guadalupe Ramos J, Navarro-Alatorre I, Flores Becerra G, Flores-Sanchez O. A Formal Technique for Text Summarization from Web Pages by using Latent Semantic Analysis. Research in Computing Science 2019;148(3):11-22. [CrossRef] google scholar
Bossuyt PM, Reitsma JB, Bruns DE, Bruns DE, Glasziou PP, Irwig L, et al. STARD 2015: An updated list of essential items for reporting diagnostic accuracy studies1. Radiology 2015;277(3):826-32. [CrossRef] google scholar
Jeblick K, Schachtner B, Dexl J, Mittermeier A, Stüber AT, Topalis J, et al. ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports. Eur Radiol 2023;1:1-9. [CrossRef] google scholar
Schmidt S, Zimmerer A, Cucos T, Feucht M, Navas L. Simplifying radiologic reports with natural language processing: a novel approach using ChatGPT in enhancing patient understanding of MRI results. Arch Orthop Trauma Surg 2024;144(2):611-8. [CrossRef] google scholar
Ateşman E. Türkçede okunabilirliğin ölçülmesi. Dil Dergisi. 1997;58:71-4. google scholar
Johnson AEW, Bulgarelli L, Shen L, Gayles A, Shammout A, Horng S, et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci Data 2023;10(1):1. [CrossRef] google scholar
Lyu Q, Tan J, Zapadka ME, Ponnatapura J, Niu C, Myers KJ, et al. Translating radiology reports into plain language using ChatGPT and GPT-4 with prompt learning: results, limitations, and potential. Vis Comput Ind Biomed Art 2023;6(1):1-10. [CrossRef] google scholar

A COMPARATIVE STUDY: PERFORMANCE OF LARGE LANGUAGE MODELS IN SIMPLIFYING TURKISH COMPUTED TOMOGRAPHY REPORTS

Year 2024, Volume: 87 Issue: 4, 321 - 326, 25.10.2024

Eren Çamur , Turay Cesur , Yasin Celal Güneş

https://doi.org/10.26650/IUITFD.1494572

Abstract

Objective: This study evaluated the effectiveness of various large language models (LLMs) in simplifying Turkish Computed Tomograpghy (CT) reports, a common imaging modality.
Material and Method: Using fictional CT findings, we followed the Standards for Reporting of Diagnostic Accuracy Studies (STARD) and the Declaration of Helsinki. Fifty fictional Turkish CT findings were generated. Four LLMs (ChatGPT 4, ChatGPT-3.5, Gemini 1.5 Pro, and Claude 3 Opus) simplified reports using the prompt: "Please explain them in a way that someone without a medical background can understand in Turkish.” Evaluations were based on the Ateşman’s Readability Index and Likert scale for accuracy and readability.
Results: Claude 3 Opus scored the highest in readability (58.9), followed by ChatGPT-3.5 (54.5), Gemini 1.5 Pro (53.7), and ChatGPT 4 (45.1). Likert scores for Claude 3 Opus (mean: 4.7) and ChatGPT 4 (mean: 4.5) showed no significant differ ence (p>0.05). ChatGPT 4 had the highest word count (96.98) compared to Claude 3 Opus (90.6), Gemini 1.5 Pro (74.4), and ChatGPT-3.5 (38.7) (p<0.001).
Conclusion: This study shows that LLMs can simplify Turkish CT reports at a level that individuals without medical knowledge can understand and with high readability and accuracy. ChatGPT 4 and Claude 3 Opus produced the most comprehensible sim plifications. Claude 3 Opus’ simpler sentences may make it the optimal choice for simplifying Turkish CT reports.

Keywords

Large language model, radiology reports, readability, computed tomography, Turkish, simplifying

Ethical Statement

Since real patient information and data were not used in this study, ethics committee approval was not required.

References

Zhao WX, Zhou K, Li J, Tang T, Wang X, Hou Y, et al. A Survey of Large Language Models. 2023 http://arxiv.org/ abs/2303.18223 google scholar
Kung TH, Cheatham M, Medenilla A, Sillos C, Leon L De, Elepano C, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digital Health 2023;2(2):e0000198. [CrossRef] google scholar
Yilmaz EC, Belue MJ, Turkbey B, Reinhold C, Choyke PL. A Brief Review of Artificial Intelligence in Genitourinary Oncological Imaging. Can Assoc Radiol J 2023;74(3):534-47. [CrossRef] google scholar
Akinci D’Antonoli T, Stanzione A, Bluethgen C, Vernuccio F, Ugga L, Klontzas ME, et al. Large language models in radiology: fundamentals, applications, ethical considerations, risks, and future directions. Diagnostic and Interventional Radiology 2024;30(2):80-90. [CrossRef] google scholar
Doshi R, Amin K, Khosla P, Bajaj S, Chheang S, Forman HP. Utilizing Large Language Models to Simplify Radiology Reports: a comparative analysis of ChatGPT3.5, ChatGPT4.0, Google Bard, and Microsoft Bing. medRxiv 2023. https:// www.medrxiv.org/content/10.1101/2023.06.04.23290786v2 [CrossRef] google scholar
Li H, Moon JT, Iyer D, Balthazar P, Krupinski EA, Bercu ZL, et al. Decoding radiology reports: Potential application of OpenAI ChatGPT to enhance patient understanding of diagnostic reports. Clin Imaging. 2023;101:137-41. [CrossRef] google scholar
Luo W, Liu F, Liu Z, Litman D. A novel ILP framework for summarizing content with high lexical variety. Nat Lang Eng 2018;24(6):887-920. [CrossRef] google scholar
Guadalupe Ramos J, Navarro-Alatorre I, Flores Becerra G, Flores-Sanchez O. A Formal Technique for Text Summarization from Web Pages by using Latent Semantic Analysis. Research in Computing Science 2019;148(3):11-22. [CrossRef] google scholar
Bossuyt PM, Reitsma JB, Bruns DE, Bruns DE, Glasziou PP, Irwig L, et al. STARD 2015: An updated list of essential items for reporting diagnostic accuracy studies1. Radiology 2015;277(3):826-32. [CrossRef] google scholar
Jeblick K, Schachtner B, Dexl J, Mittermeier A, Stüber AT, Topalis J, et al. ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports. Eur Radiol 2023;1:1-9. [CrossRef] google scholar
Schmidt S, Zimmerer A, Cucos T, Feucht M, Navas L. Simplifying radiologic reports with natural language processing: a novel approach using ChatGPT in enhancing patient understanding of MRI results. Arch Orthop Trauma Surg 2024;144(2):611-8. [CrossRef] google scholar
Ateşman E. Türkçede okunabilirliğin ölçülmesi. Dil Dergisi. 1997;58:71-4. google scholar
Johnson AEW, Bulgarelli L, Shen L, Gayles A, Shammout A, Horng S, et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci Data 2023;10(1):1. [CrossRef] google scholar
Lyu Q, Tan J, Zapadka ME, Ponnatapura J, Niu C, Myers KJ, et al. Translating radiology reports into plain language using ChatGPT and GPT-4 with prompt learning: results, limitations, and potential. Vis Comput Ind Biomed Art 2023;6(1):1-10. [CrossRef] google scholar

There are 14 citations in total.

Details

Primary Language	English
Subjects	Health Services and Systems (Other)
Journal Section	RESEARCH
Authors	Eren Çamur 0000-0002-8774-5800 Turay Cesur 0000-0002-2726-8045 Yasin Celal Güneş 0000-0001-7631-854X
Publication Date	October 25, 2024
Submission Date	June 3, 2024
Acceptance Date	September 2, 2024
Published in Issue	Year 2024 Volume: 87 Issue: 4

Cite

APA	Çamur, E., Cesur, T., & Güneş, Y. C. (2024). A COMPARATIVE STUDY: PERFORMANCE OF LARGE LANGUAGE MODELS IN SIMPLIFYING TURKISH COMPUTED TOMOGRAPHY REPORTS. Journal of Istanbul Faculty of Medicine, 87(4), 321-326. https://doi.org/10.26650/IUITFD.1494572
AMA	Çamur E, Cesur T, Güneş YC. A COMPARATIVE STUDY: PERFORMANCE OF LARGE LANGUAGE MODELS IN SIMPLIFYING TURKISH COMPUTED TOMOGRAPHY REPORTS. İst Tıp Fak Derg. October 2024;87(4):321-326. doi:10.26650/IUITFD.1494572
Chicago	Çamur, Eren, Turay Cesur, and Yasin Celal Güneş. “A COMPARATIVE STUDY: PERFORMANCE OF LARGE LANGUAGE MODELS IN SIMPLIFYING TURKISH COMPUTED TOMOGRAPHY REPORTS”. Journal of Istanbul Faculty of Medicine 87, no. 4 (October 2024): 321-26. https://doi.org/10.26650/IUITFD.1494572.
EndNote	Çamur E, Cesur T, Güneş YC (October 1, 2024) A COMPARATIVE STUDY: PERFORMANCE OF LARGE LANGUAGE MODELS IN SIMPLIFYING TURKISH COMPUTED TOMOGRAPHY REPORTS. Journal of Istanbul Faculty of Medicine 87 4 321–326.
IEEE	E. Çamur, T. Cesur, and Y. C. Güneş, “A COMPARATIVE STUDY: PERFORMANCE OF LARGE LANGUAGE MODELS IN SIMPLIFYING TURKISH COMPUTED TOMOGRAPHY REPORTS”, İst Tıp Fak Derg, vol. 87, no. 4, pp. 321–326, 2024, doi: 10.26650/IUITFD.1494572.
ISNAD	Çamur, Eren et al. “A COMPARATIVE STUDY: PERFORMANCE OF LARGE LANGUAGE MODELS IN SIMPLIFYING TURKISH COMPUTED TOMOGRAPHY REPORTS”. Journal of Istanbul Faculty of Medicine 87/4 (October 2024), 321-326. https://doi.org/10.26650/IUITFD.1494572.
JAMA	Çamur E, Cesur T, Güneş YC. A COMPARATIVE STUDY: PERFORMANCE OF LARGE LANGUAGE MODELS IN SIMPLIFYING TURKISH COMPUTED TOMOGRAPHY REPORTS. İst Tıp Fak Derg. 2024;87:321–326.
MLA	Çamur, Eren et al. “A COMPARATIVE STUDY: PERFORMANCE OF LARGE LANGUAGE MODELS IN SIMPLIFYING TURKISH COMPUTED TOMOGRAPHY REPORTS”. Journal of Istanbul Faculty of Medicine, vol. 87, no. 4, 2024, pp. 321-6, doi:10.26650/IUITFD.1494572.
Vancouver	Çamur E, Cesur T, Güneş YC. A COMPARATIVE STUDY: PERFORMANCE OF LARGE LANGUAGE MODELS IN SIMPLIFYING TURKISH COMPUTED TOMOGRAPHY REPORTS. İst Tıp Fak Derg. 2024;87(4):321-6.

Download Cover Image

Article Files

Full Text

Contact information and address

Addressi: İ.Ü. İstanbul Tıp Fakültesi Dekanlığı, Turgut Özal Cad. 34093 Çapa, Fatih, İstanbul, TÜRKİYE

Email: itfdergisi@istanbul.edu.tr

Phone: +90 212 414 21 61