Artificial intelligence meets medical expertise: evaluating GPT-4's proficiency in generating medical article abstracts

Ergin Sağtaş; Furkan Ufuk; Hakkı Peker; Ahmet Baki Yağcı

doi:10.31362/patd.1487575

Research Article

BibTex

RIS

Cite

Artificial intelligence meets medical expertise: evaluating GPT-4's proficiency in generating medical article abstracts

Year 2024, , 756 - 762, 09.10.2024

Ergin Sağtaş , Furkan Ufuk , Hakkı Peker , Ahmet Baki Yağcı

https://doi.org/10.31362/patd.1487575

Abstract

Purpose: The advent of large language models like GPT-4 has opened new possibilities in natural language processing, with potential applications in medical literature. This study assesses GPT-4's ability to generate medical abstracts. It compares their quality to original abstracts written by human authors, aiming to understand the effectiveness of artificial intelligence in replicating complex, professional writing tasks.
Materials and methods: A total of 250 original research articles from five prominent radiology journals published between 2021 and 2023 were selected. The body of these articles, excluding the abstracts, was fed into GPT-4, which then generated new abstracts. Three experienced radiologists blindly and independently evaluated all 500 abstracts using a five-point Likert scale for quality and understandability. Statistical analysis included mean score comparison inter-rater reliability using Fleiss' Kappa and Bland-Altman plots to assess agreement levels between raters.
Results: Analysis revealed no significant difference in the mean scores between original and GPT-4 generated abstracts. The inter-rater reliability yielded kappa values indicating moderate to substantial agreement: 0.497 between Observers 1 and 2, 0.753 between Observers 1 and 3, and 0.645 between Observers 2 and 3. Bland-Altman analysis showed a slight systematic bias but was within acceptable limits of agreement.
Conclusion: The study demonstrates that GPT-4 can generate medical abstracts with a quality comparable to those written by human experts. This suggests a promising role for artificial intelligence in facilitating the abstract writing process and improving its quality.

Keywords

Artificial intelligence, ChatGPT, radiology, diagnosis, abstracts

References

1. Elkassem AA, Smith AD. Potential Use Cases for ChatGPT in Radiology. AJR 2023;221:373-376. https://doi.org/10.2214/AJR.23.29198
2. Shen Y, Heacock L, Elias J, et al. ChatGPT and other large language models are double-edged swords. Radiology 2023;307:e230163. https://doi.org/10.1148/radiol.230163
3. Ufuk F. The role and limitations of large language models such as ChatGPT in clinical settings and medical journalism. Radiology 2023;307:e230276. https://doi.org/10.1148/radiol.230276
4. Sevgi UT, Erol G, Doğruel Y, Sönmez OF, Tubbs RS, Güngor A. The role of an open artificial intelligence platform in modern neurosurgical education: a preliminary study. Neurosurg Rev 2023;46:86(e1-11). https://doi.org/10.1007/s10143-023-01998-2
5. Bhayana R, Krishna S, Bleakney RR. Performance of ChatGPT on a radiology board-style examination: insights into current strengths and limitations. Radiology 2023;307:e230582. https://doi.org/10.1148/radiol.230582
6. Akinci D'Antonoli T, Stanzione A, Bluethgen C, et al. Large language models in radiology: fundamentals, applications, ethical considerations, risks, and future directions. Diagn Interv Radiol 2023;30:80-90. https://doi.org/10.4274/dir.2023.232417
7. Amin K, Khosla P, Doshi R, Chheang S, Forman HP. Artificial intelligence to improve patient understanding of radiology reports. Yale J Biol Med 2023;96:407-417. https://doi.org/10.59249/NKOY5498
8. Ghim JL, Ahn S. Transforming clinical trials: the emerging roles of large language models. Transl Clin Pharmacol 2023;31:131-138. https://doi.org/10.12793/tcp.2023.31.e16
9. Tippareddy C, Jiang S, Bera K, Ramaiya N. Radiology reading room for the future: harnessing the power of large language models like ChatGPT. Curr Probl Diagn Radiol 2023;1-6. https://doi.org/10.1067/j.cpradiol.2023.08.018
10. Currie GM. Academic integrity and artificial intelligence: is ChatGPT hype, hero or heresy?. Semin Nucl Med 2023;53:719-730. https://doi.org/10.1053/j.semnuclmed.2023.04.008
11. Gastel B, Day RA. How to write and publish a scientific paper. 9th ed. Greenwood, USA: Bloomsbury Publishing, 2022.
12. Atzen SL, Bluemke DA. How to write the perfect abstract for radiology. Radiology 2022;305:498-501. https://doi.org/10.1148/radiol.229012
13. Woolston C. Words matter: jargon alienates readers. Nature 2020;579:309. https://doi.org/10.1038/d41586-020-00580-w
14. Gisev N, Bell JS, Chen TF. Interrater agreement and interrater reliability: key concepts, approaches, and applications. Res Social Adm Pharm 2013;9:330-338. https://doi.org/10.1016/j.sapharm.2012.04.004
15. Jeblick K, Schachtner B, Dexl J, et al. ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports. Eur Radiol 2023:(e1-9). https://doi.org/10.1007/s00330-023-10213-1
16. Li H, Moon JT, Iyer D, et al. Decoding radiology reports: Potential application of OpenAI ChatGPT to enhance patient understanding of diagnostic reports. Clin Imaging 2023;101:137-141. https://doi.org/10.1016/j.clinimag.2023.06.008
17. Ueda D, Mitsuyama Y, Takita H, et al. ChatGPT's Diagnostic performance from patient history and imaging findings on the diagnosis please quizzes. Radiology 2023;308:e231040. https://doi.org/10.1148/radiol.231040
18. Fink MA, Bischoff A, Fink CA, et al. Potential of ChatGPT and GPT-4 for data mining of free-text CT reports on lung cancer. Radiology 2023;308:e231362. https://doi.org/10.1148/radiol.231362
19. Sun Z, Ong H, Kennedy P, et al. Evaluating GPT4 on impressions generation in radiology reports. Radiology 2023;307:e231259(e1-4). https://doi.org/10.1148/radiol.231259

Yapay zeka tıbbi uzmanlıkla buluşuyor: GPT-4'ün tıbbi makale özetleri oluşturmadaki yeterliliğinin değerlendirilmesi

Year 2024, , 756 - 762, 09.10.2024

Ergin Sağtaş , Furkan Ufuk , Hakkı Peker , Ahmet Baki Yağcı

https://doi.org/10.31362/patd.1487575

Abstract

Amaç: GPT-4 gibi büyük dil modellerinin ortaya çıkışı, tıbbi literatürdeki potansiyel uygulamalarla birlikte doğal dil işlemede yeni olanaklar sağlamıştır. Bu çalışma GPT-4'ün tıbbi makale özetleri oluşturma yeteneğini değerlendirmektedir. Çalışma yapay zekanın karmaşık, profesyonel yazma görevlerini kopyalamadaki etkinliğini anlamayı amaçlamakta ve kalitelerini insan yazarlar tarafından yazılan orijinal özetlerle karşılaştırmaktadır.
Gereç ve yöntem: 2021-2023 yılları arasında yayınlanan beş önde gelen radyoloji dergisinden toplam 250 orijinal araştırma makalesi seçildi. Bu makalelerin tamamı, özetler hariç, GPT-4'e yüklendi ve daha sonra GPT-4 tarafından yeni özetler oluşturuldu. Üç deneyimli radyolog, kalite ve anlaşılabilirlik açısından beşli Likert ölçeği kullanarak 500 özetin tamamını kör ve bağımsız bir şekilde değerlendirdi. İstatistiksel analizde, değerlendiriciler arasındaki güvenilirliği ölçmek için Fleiss' Kappa testi ve değerlendiriciler arasındaki uyum düzeylerini değerlendirmek için Bland-Altman grafikleri kullanıldı.
Bulgular: Analiz, orijinal ve GPT-4 ile oluşturulan özetler arasında ortalama puanlar açısından anlamlı bir fark olmadığını ortaya koymuştur. Değerlendiriciler arası güvenilirlik açısından, orta ile önemli düzeyde uyuma işaret eden kappa değerleri bulunmuştur; değerler Gözlemci 1 ve 2 arasında 0.497, Gözlemci 1 ve 3 arasında 0.753 ve Gözlemci 2 ve 3 arasında 0.645 idi. Bland-Altman analizi hafif bir sistematik sapma göstermiş ancak kabul edilebilir uyum sınırları içinde kalmıştır.
Sonuç: Çalışma, GPT-4'ün insan uzmanlar tarafından yazılanlarla karşılaştırılabilir kalitede tıbbi özetler oluşturabildiğini göstermektedir. Yapay zeka kullanımı özet yazma sürecini kolaylaştırma ve kalitesini artırma konusunda önemli katkılar sağlayabilir.

Keywords

Yapay zeka, ChatGPT, radyoloji, tanı, özet

References

1. Elkassem AA, Smith AD. Potential Use Cases for ChatGPT in Radiology. AJR 2023;221:373-376. https://doi.org/10.2214/AJR.23.29198
2. Shen Y, Heacock L, Elias J, et al. ChatGPT and other large language models are double-edged swords. Radiology 2023;307:e230163. https://doi.org/10.1148/radiol.230163
3. Ufuk F. The role and limitations of large language models such as ChatGPT in clinical settings and medical journalism. Radiology 2023;307:e230276. https://doi.org/10.1148/radiol.230276
4. Sevgi UT, Erol G, Doğruel Y, Sönmez OF, Tubbs RS, Güngor A. The role of an open artificial intelligence platform in modern neurosurgical education: a preliminary study. Neurosurg Rev 2023;46:86(e1-11). https://doi.org/10.1007/s10143-023-01998-2
5. Bhayana R, Krishna S, Bleakney RR. Performance of ChatGPT on a radiology board-style examination: insights into current strengths and limitations. Radiology 2023;307:e230582. https://doi.org/10.1148/radiol.230582
6. Akinci D'Antonoli T, Stanzione A, Bluethgen C, et al. Large language models in radiology: fundamentals, applications, ethical considerations, risks, and future directions. Diagn Interv Radiol 2023;30:80-90. https://doi.org/10.4274/dir.2023.232417
7. Amin K, Khosla P, Doshi R, Chheang S, Forman HP. Artificial intelligence to improve patient understanding of radiology reports. Yale J Biol Med 2023;96:407-417. https://doi.org/10.59249/NKOY5498
8. Ghim JL, Ahn S. Transforming clinical trials: the emerging roles of large language models. Transl Clin Pharmacol 2023;31:131-138. https://doi.org/10.12793/tcp.2023.31.e16
9. Tippareddy C, Jiang S, Bera K, Ramaiya N. Radiology reading room for the future: harnessing the power of large language models like ChatGPT. Curr Probl Diagn Radiol 2023;1-6. https://doi.org/10.1067/j.cpradiol.2023.08.018
10. Currie GM. Academic integrity and artificial intelligence: is ChatGPT hype, hero or heresy?. Semin Nucl Med 2023;53:719-730. https://doi.org/10.1053/j.semnuclmed.2023.04.008
11. Gastel B, Day RA. How to write and publish a scientific paper. 9th ed. Greenwood, USA: Bloomsbury Publishing, 2022.
12. Atzen SL, Bluemke DA. How to write the perfect abstract for radiology. Radiology 2022;305:498-501. https://doi.org/10.1148/radiol.229012
13. Woolston C. Words matter: jargon alienates readers. Nature 2020;579:309. https://doi.org/10.1038/d41586-020-00580-w
14. Gisev N, Bell JS, Chen TF. Interrater agreement and interrater reliability: key concepts, approaches, and applications. Res Social Adm Pharm 2013;9:330-338. https://doi.org/10.1016/j.sapharm.2012.04.004
15. Jeblick K, Schachtner B, Dexl J, et al. ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports. Eur Radiol 2023:(e1-9). https://doi.org/10.1007/s00330-023-10213-1
16. Li H, Moon JT, Iyer D, et al. Decoding radiology reports: Potential application of OpenAI ChatGPT to enhance patient understanding of diagnostic reports. Clin Imaging 2023;101:137-141. https://doi.org/10.1016/j.clinimag.2023.06.008
17. Ueda D, Mitsuyama Y, Takita H, et al. ChatGPT's Diagnostic performance from patient history and imaging findings on the diagnosis please quizzes. Radiology 2023;308:e231040. https://doi.org/10.1148/radiol.231040
18. Fink MA, Bischoff A, Fink CA, et al. Potential of ChatGPT and GPT-4 for data mining of free-text CT reports on lung cancer. Radiology 2023;308:e231362. https://doi.org/10.1148/radiol.231362
19. Sun Z, Ong H, Kennedy P, et al. Evaluating GPT4 on impressions generation in radiology reports. Radiology 2023;307:e231259(e1-4). https://doi.org/10.1148/radiol.231259

There are 19 citations in total.

Details

Primary Language	English
Subjects	Radiology and Organ Imaging
Journal Section	Research Article
Authors	Ergin Sağtaş 0000-0001-6723-6593 Furkan Ufuk 0000-0002-8614-5387 Hakkı Peker 0000-0002-9604-7529 Ahmet Baki Yağcı 0000-0001-7544-5731
Early Pub Date	June 4, 2024
Publication Date	October 9, 2024
Submission Date	May 21, 2024
Acceptance Date	June 3, 2024
Published in Issue	Year 2024

Cite

AMA	Sağtaş E, Ufuk F, Peker H, Yağcı AB. Artificial intelligence meets medical expertise: evaluating GPT-4’s proficiency in generating medical article abstracts. Pam Tıp Derg. October 2024;17(4):756-762. doi:10.31362/patd.1487575