Araştırma Makalesi

Comparative analysis of large language models' performance in breast ımaging

Cilt: 15 Sayı: 4 31 Aralık 2024
PDF İndir
EN TR

Comparative analysis of large language models' performance in breast ımaging

Öz

Aim: To evaluate the performance of the flagship models, OpenAI's GPT-4o and Anthropic's Claude 3.5 Sonnet, in breast imaging cases. Material and Methods: The dataset consisted of cases from the publicly available Case of the Month archive by the Society of Breast Imaging. Questions were classified as text-based or containing images from mammography, ultrasound, magnetic resonance imaging, or hybrid imaging. The accuracy rates of GPT-4o and Claude 3.5 Sonnet were compared using the Mann-Whitney U test. Results: Of the total 94 questions, 61.7% were image-based. The overall accuracy rate of GPT-4o was higher than that of Claude 3.5 Sonnet (75.4% vs. 67.7%, p=0.432). GPT-4o achieved higher scores on questions based on ultrasound and hybrid imaging, while Claude 3.5 Sonnet performed better on mammography-based questions. In tumor group cases, both models reached higher accuracy rates compared to the non-tumor group (both, p>0.05). The models' performance in breast imaging cases overall exceeded 75%, ranging between 64-83% for questions involving different imaging modalities. Conclusion: In breast imaging cases, although GPT-4o generally achieved higher accuracy rates than Claude 3.5 Sonnet in image-based and other types of questions, their performances were comparable.

Anahtar Kelimeler

Kaynakça

  1. Kim S, Lee CK, Kim SS. Large Language Models: A Guide for Radiologists. Korean J Radiol. 2024;25(2):126-133. doi:10.3348/ kjr.2023.0997
  2. https://openai.com/index/hello-gpt-4o/ accessed on July 28, 2024
  3. https://www.anthropic.com/news/claude-3-5-sonnet accessed on July 28, 2024
  4. Sonoda Y, Kurokawa R, Nakamura Y, et al. Diagnostic performances of GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro in "Diagnosis Please" cases. Jpn J Radiol. Published online July 1, 2024. doi:10.1007/s11604-024-01619-y
  5. Oura T, Tatekawa H, Horiuchi D, et al. Diagnostic accuracy of vision-language models on Japanese diagnostic radiology, nuclear medicine, and interventional radiology specialty board examinations. Jpn J Radiol. Published online July 20, 2024. doi:10.1007/s11604-024-01633-0
  6. Sorin V, Glicksberg BS, Artsi Y, et al. Utilizing large language models in breast cancer management: systematic review. J Cancer Res Clin Oncol. 2024;150(3):140. Published 2024 Mar 19. doi:10.1007/s00432-024-05678-6
  7. Cozzi A, Pinker K, Hidber A, et al. BI-RADS Category Assignments by GPT-3.5, GPT-4, and Google Bard: A Multilanguage Study. Radiology. 2024;311(1):e232133. doi:10.1148/radiol.232133
  8. Choi HS, Song JY, Shin KH, Chang JH, Jang BS. Developing prompts from large language model for extracting clinical information from pathology and ultrasound reports in breast cancer. Radiat Oncol J. 2023;41(3):209-216. doi:10.3857/ roj.2023.00633

Ayrıntılar

Birincil Dil

İngilizce

Konular

Radyoloji ve Organ Görüntüleme

Bölüm

Araştırma Makalesi

Yayımlanma Tarihi

31 Aralık 2024

Gönderilme Tarihi

4 Ekim 2024

Kabul Tarihi

18 Ekim 2024

Yayımlandığı Sayı

Yıl 2024 Cilt: 15 Sayı: 4

Kaynak Göster

APA
Beşler, M. S. (2024). Comparative analysis of large language models’ performance in breast ımaging. Turkish Journal of Clinics and Laboratory, 15(4), 542-546. https://doi.org/10.18663/tjcl.1561361
AMA
1.Beşler MS. Comparative analysis of large language models’ performance in breast ımaging. TJCL. 2024;15(4):542-546. doi:10.18663/tjcl.1561361
Chicago
Beşler, Muhammed Said. 2024. “Comparative analysis of large language models’ performance in breast ımaging”. Turkish Journal of Clinics and Laboratory 15 (4): 542-46. https://doi.org/10.18663/tjcl.1561361.
EndNote
Beşler MS (01 Aralık 2024) Comparative analysis of large language models’ performance in breast ımaging. Turkish Journal of Clinics and Laboratory 15 4 542–546.
IEEE
[1]M. S. Beşler, “Comparative analysis of large language models’ performance in breast ımaging”, TJCL, c. 15, sy 4, ss. 542–546, Ara. 2024, doi: 10.18663/tjcl.1561361.
ISNAD
Beşler, Muhammed Said. “Comparative analysis of large language models’ performance in breast ımaging”. Turkish Journal of Clinics and Laboratory 15/4 (01 Aralık 2024): 542-546. https://doi.org/10.18663/tjcl.1561361.
JAMA
1.Beşler MS. Comparative analysis of large language models’ performance in breast ımaging. TJCL. 2024;15:542–546.
MLA
Beşler, Muhammed Said. “Comparative analysis of large language models’ performance in breast ımaging”. Turkish Journal of Clinics and Laboratory, c. 15, sy 4, Aralık 2024, ss. 542-6, doi:10.18663/tjcl.1561361.
Vancouver
1.Muhammed Said Beşler. Comparative analysis of large language models’ performance in breast ımaging. TJCL. 01 Aralık 2024;15(4):542-6. doi:10.18663/tjcl.1561361

e-ISSN: 2149-8296

Publication Model: Continuous Publication

Peer Review Model: Double-Blind Peer Review

Publication Language: Turkish and English

Access Model: Open Access

DOI Prefix: (Crossref DOI numaranız)

Publisher: DNT Ortadoğu Publishing Inc.

Journal Abbreviation: Turk J Clin Lab

Indexed in J-Gate

The content of this site is intended for health care professionals. All the published articles are distributed under the terms of

Creative Commons Attribution Licence,

which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.