Araştırma Makalesi

Performance of Generative AI Models on Cardiology Practice in Emergency Service: A Pilot Evaluation of GPT-4.o and Gemini-1.5-Flash

Cilt: 51 Sayı: 2 28 Ağustos 2025
PDF İndir
EN TR

Performance of Generative AI Models on Cardiology Practice in Emergency Service: A Pilot Evaluation of GPT-4.o and Gemini-1.5-Flash

Öz

In healthcare, emergent clinical decision-making is complex and large language models (LLMs) may enhance both the quality and efficiency of care by aiding physicians. Case scenario-based multiple choice questions (CS-MCQs) are valuable for testing analytical skills and knowledge integration. Moreover, readability is as important as content accuracy. This study aims to compare the diagnostic and treatment capabilities of GPT-4.o and Gemini-1.5-Flash and to evaluate the readability of the responses for cardiac emergencies. A total of 70 single-answer MCQs were randomly selected from the Medscape Case Challenges and ECG Challenges series. The questions were about cardiac emergencies and were further categorized into four subgroups according to whether the question included a case presentation or an image, or not. ChatGPT and Gemini platforms were used to assess the selected questions. The Flesch–Kincaid Grade Level (FKGL) and Flesch Reading Ease (FRE) scores were utilized to evaluate the readability of the responses. GPT-4.o had a correct response rate of 65.7%, outperforming Gemini-1.5-Flash, which had a 58.6% correct response rate (p=0.010). When comparing by question type, GPT-4.o was inferior to Gemini-1.5-Flash only for non-case questions (52.5% vs. 62.5%, p=0.011). For all other question types, there were no significant performance differences between the two models (p>0.05). Both models performed better on easy questions compared to difficult ones, and on questions without images compared to those with images. Additionally, while GPT-4.o performed better on case questions than non-case questions. Gemini-1.5-Flash’s FRE score was higher than GPT-4.o’s (median [min-max], 23.75 [0-64.60] vs. 17.0 [0-56.60], p<0.001). Although on the whole GPT-4.o outperformed Gemini-1.5-Flash, both models demonstrated an ability to comprehend the case scenarios and provided reasonable answers.

Anahtar Kelimeler

Kaynakça

  1. 1. Labadze L, Grigolia M, Machaidze L. Role of AI chatbots in education: systematic literature review. Int J Educ Technol High Educ. 2023;20(56). doi:10.1186/s41239-023-00416-7
  2. 2. Dwivedi YK, Kshetri N, Hughes L, et al. Opinion paper: “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. Int J Inform Manag. 2023;71:102642. doi:10.1016/j.ijinfomgt.2023.102642
  3. 3. Yenduri G. GPT (Generative Pre-Trained Transformer)—A comprehensive review on enabling technologies, potential applications, emerging challenges, and future directions. IEEE Access. 2024;12:1-36. doi:10.1109/ACCESS.2024.3389497
  4. 4. Hadi MU, Al-Tashi Q, Qureshi R, et al. Large language models: a comprehensive survey of applications, challenges, limitations, and future prospects. Authorea. Preprint. 2023.
  5. 5. Johnson D, Goodman R, Patrinely J, et al. Assessing the accuracy and reliability of AI-generated medical responses: an evaluation of the Chat-GPT model. Res Sq. 2023. doi:10.21203/rs.3.rs-2924050/v1
  6. 6. Saka A, Taiwo R, Saka N, et al. GPT models in construction industry: opportunities, limitations, and a use case validation. Dev Built Environ. 2024;17:100300. doi:10.1016/j.dibe.2023.100300
  7. 7. Urbina F, Lentzos F, Invernizzi C, Ekins S. Dual use of artificial intelligence-powered drug discovery. Nat Mach Intell. 2022;4(3):189-191. doi:10.1038/s42256-022-00480-0
  8. 8. OpenAI. GPT-4 Technical Report. 2023. Available at: https://cdn.openai.com/papers/gpt-4.pdf. Accessed June 16, 2025.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Acil Tıp

Bölüm

Araştırma Makalesi

Yayımlanma Tarihi

28 Ağustos 2025

Gönderilme Tarihi

12 Haziran 2025

Kabul Tarihi

2 Temmuz 2025

Yayımlandığı Sayı

Yıl 2025 Cilt: 51 Sayı: 2

Kaynak Göster

APA
Günay Polatkan, Ş., Sığırlı, D., Durak, V. A., Alak, Ç., & Kan, I. I. (2025). Performance of Generative AI Models on Cardiology Practice in Emergency Service: A Pilot Evaluation of GPT-4.o and Gemini-1.5-Flash. Journal of Uludağ University Medical Faculty, 51(2), 239-246. https://doi.org/10.32708/uutfd.1718121
AMA
1.Günay Polatkan Ş, Sığırlı D, Durak VA, Alak Ç, Kan II. Performance of Generative AI Models on Cardiology Practice in Emergency Service: A Pilot Evaluation of GPT-4.o and Gemini-1.5-Flash. Uludağ Tıp Derg. 2025;51(2):239-246. doi:10.32708/uutfd.1718121
Chicago
Günay Polatkan, Şeyda, Deniz Sığırlı, Vahide Aslıhan Durak, Çetin Alak, ve Irem Iris Kan. 2025. “Performance of Generative AI Models on Cardiology Practice in Emergency Service: A Pilot Evaluation of GPT-4.o and Gemini-1.5-Flash”. Journal of Uludağ University Medical Faculty 51 (2): 239-46. https://doi.org/10.32708/uutfd.1718121.
EndNote
Günay Polatkan Ş, Sığırlı D, Durak VA, Alak Ç, Kan II (01 Ağustos 2025) Performance of Generative AI Models on Cardiology Practice in Emergency Service: A Pilot Evaluation of GPT-4.o and Gemini-1.5-Flash. Journal of Uludağ University Medical Faculty 51 2 239–246.
IEEE
[1]Ş. Günay Polatkan, D. Sığırlı, V. A. Durak, Ç. Alak, ve I. I. Kan, “Performance of Generative AI Models on Cardiology Practice in Emergency Service: A Pilot Evaluation of GPT-4.o and Gemini-1.5-Flash”, Uludağ Tıp Derg, c. 51, sy 2, ss. 239–246, Ağu. 2025, doi: 10.32708/uutfd.1718121.
ISNAD
Günay Polatkan, Şeyda - Sığırlı, Deniz - Durak, Vahide Aslıhan - Alak, Çetin - Kan, Irem Iris. “Performance of Generative AI Models on Cardiology Practice in Emergency Service: A Pilot Evaluation of GPT-4.o and Gemini-1.5-Flash”. Journal of Uludağ University Medical Faculty 51/2 (01 Ağustos 2025): 239-246. https://doi.org/10.32708/uutfd.1718121.
JAMA
1.Günay Polatkan Ş, Sığırlı D, Durak VA, Alak Ç, Kan II. Performance of Generative AI Models on Cardiology Practice in Emergency Service: A Pilot Evaluation of GPT-4.o and Gemini-1.5-Flash. Uludağ Tıp Derg. 2025;51:239–246.
MLA
Günay Polatkan, Şeyda, vd. “Performance of Generative AI Models on Cardiology Practice in Emergency Service: A Pilot Evaluation of GPT-4.o and Gemini-1.5-Flash”. Journal of Uludağ University Medical Faculty, c. 51, sy 2, Ağustos 2025, ss. 239-46, doi:10.32708/uutfd.1718121.
Vancouver
1.Şeyda Günay Polatkan, Deniz Sığırlı, Vahide Aslıhan Durak, Çetin Alak, Irem Iris Kan. Performance of Generative AI Models on Cardiology Practice in Emergency Service: A Pilot Evaluation of GPT-4.o and Gemini-1.5-Flash. Uludağ Tıp Derg. 01 Ağustos 2025;51(2):239-46. doi:10.32708/uutfd.1718121

ISSN: 1300-414X, e-ISSN: 2645-9027

Uludağ Üniversitesi Tıp Fakültesi Dergisi "Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License" ile lisanslanmaktadır.


Creative Commons License
Journal of Uludag University Medical Faculty is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

2023