Comparison of ChatGPT-3.5 and Google Bard Performance on Turkish orthopaedics and traumatology national board examination

Murat Korkmaz; Abdullah Kahraman

EN

Comparison of ChatGPT-3.5 and Google Bard Performance on Turkish orthopaedics and traumatology national board examination

Abstract

This study ia a cross-sectional study to evaluate and compare the responses of two chatbots to compare the performance of ChatGPT-3.5 and Google Bard on the Turkish Orthopaedics and Traumatology National Board Examination. The questions of the Turkish Orthopaedics and Traumatology National Board Examination were asked to the chatbots one by one to have them indicate what the correct answer was and determine the difficulty level of the questions. The examination consists of 100 questions; 92 were included in the study. It was found that ChatGPT-3.5 answered 54.3% of the questions correctly, while Google Bard answered 45.7% of the questions correctly. When the correlation of difficulty and accuracy between the two AI models was evaluated, it was found that both were poorly correlated between the two different AI models (r=0.290 and p=0.005 for difficulty; r=0.314 and p=0.002 for accuracy). Both language models showed about 50% success on the Turkish Orthopaedics and Traumatology National Board Examination. Both found similar levels of difficulty in the questions.

Keywords

References

1. Menekşeoğlu AK, İş EE. Comparative performance of artificial ıntelligence models in physical medicine and rehabilitation board-level questions. Rev Assoc Med Bras (1992). 2024;70(7):e20240241.
2. Mejia MR, Arroyave JS, Saturno M, Ndjonko LCM, Zaidat B, Rajjoub R, et al. Use of ChatGPT for Determining Clinical and Surgical Treatment of Lumbar Disc Herniation With Radiculopathy: A North American Spine Society Guideline Comparison. Neurospine. 2024;21(1):149-58.
3. Chang MC. Use of artificial intelligence in the field of pain medicine. World J Clin Cases. 2024;12(2):236-9.
4. Sancheti P, Bijlani N, Shyam A, Yerudkar A, Lunawat R. ORTHO AI : World's First ARTIFICIAL INTELLIGENCE IN ORTHOPAEDICS. J Orthop Case Rep. 2023;13(12):178-9.
5. Chatterjee S, Bhattacharya M, Pal S, Lee SS, Chakraborty C. ChatGPT and large language models in orthopedics: from education and surgery to research. J Exp Orthop. 2023;10(1):128.
6. Han T, Xiong F, Sun B, Zhong L, Han Z, Lei M. Development and validation of an artificial intelligence mobile application for predicting 30-day mortality in critically ill patients with orthopaedic trauma. Int J Med Inform. 2024 Apr;184:105383.
7. Fan X, Qiao X, Wang Z, Jiang L, Liu Y, Sun Q. Artificial Intelligence-Based CT Imaging on Diagnosis of Patients with Lumbar Disc Herniation by Scalpel Treatment. Comput Intell Neurosci. 2022 May 27;2022:3688630.
8. Gan W, Ouyang J, Li H, Xue Z, Zhang Y, Dong Q, Huang J, Zheng X, Zhang Y. Integrating ChatGPT in Orthopedic Education for Medical Undergraduates: Randomized Controlled Trial. J Med Internet Res. 2024 Aug 20;26:e57037.

9. Khan AA, Yunus R, Sohail M, Rehman TA, Saeed S, Bu Y, et al. Artificial Intelligence for Anesthesiology Board-Style Examination Questions: Role of Large Language Models. J Cardiothorac Vasc Anesth. 2024;38(5):1251-9.
10. Fayers PM, Machin D. Quality of life: The assessment, analysis and reporting of patient-reported outcomes: John Wiley & Sons; 2015.
11. Korkmaz MD, Korkmaz M, Altın YF, Akgül T. Adaptation and validation of the Turkish version of the Quality of Life Profile for Spinal Deformities in idiopathic scoliosis. Acta Orthop Traumatol Turc. 2024;58(3):182-6.
12. Subramani M, Jaleel I, Krishna Mohan S. Evaluating the performance of ChatGPT in medical physiology university examination of phase I MBBS. Adv Physiol Educ. 2023;47(2):270-1.
13. Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, et al. How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment. JMIR Med Educ. 2023;9:e45312.
14. Lum ZC. Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT. Clin Orthop Relat Res. 2023;481(8):1623-30.
15. Sparks CA, Kraeutler MJ, Chester GA, Contrada EV, Zhu E, Fasulo SM, et al. Inadequate Performance of ChatGPT on Orthopedic Board-Style Written Exams. Cureus. 2024;16(6):e62643.
16. Cuthbert R, Simpson AI. Artificial intelligence in orthopaedics: can Chat Generative Pre-trained Transformer (ChatGPT) pass Section 1 of the Fellowship of the Royal College of Surgeons (Trauma & Orthopaedics) examination? Postgrad Med J. 2023;99(1176):1110–1114.
17. Traoré SY, Goetsch T, Muller B, Dabbagh A, Liverneaux PA. Is ChatGPT able to pass the first part of the European Board of Hand Surgery diploma examination? Hand Surg Rehabil. 2023;42(4):362-4.
18. Thibaut G, Dabbagh A, Liverneaux P. Does Google's Bard Chatbot perform better than ChatGPT on the European hand surgery exam? Int Orthop. 2024;48(1):151-8.

Details

Primary Language

English

Subjects

Allied Health and Rehabilitation Science (Other)

Journal Section

Research Article

Authors

Murat Korkmaz ^*
0000-0003-2809-6721
Türkiye

Abdullah Kahraman This is me
0000-0002-6098-5097
Türkiye

Publication Date

March 28, 2025

Submission Date

November 21, 2024

Acceptance Date

November 29, 2024

Published in Issue

Year 2025 Volume: 42 Number: 1

IZ

https://izlik.org/JA75HG79MJ

Cite

RIS / Bibtex

APA

Korkmaz, M., & Kahraman, A. (2025). Comparison of ChatGPT-3.5 and Google Bard Performance on Turkish orthopaedics and traumatology national board examination. Deneysel Ve Klinik Tıp Dergisi, 42(1), 40-42. https://izlik.org/JA75HG79MJ

AMA

1.Korkmaz M, Kahraman A. Comparison of ChatGPT-3.5 and Google Bard Performance on Turkish orthopaedics and traumatology national board examination. J. Exp. Clin. Med. 2025;42(1):40-42. https://izlik.org/JA75HG79MJ

Chicago

Korkmaz, Murat, and Abdullah Kahraman. 2025. “Comparison of ChatGPT-3.5 and Google Bard Performance on Turkish Orthopaedics and Traumatology National Board Examination”. Deneysel Ve Klinik Tıp Dergisi 42 (1): 40-42. https://izlik.org/JA75HG79MJ.

EndNote

Korkmaz M, Kahraman A (March 1, 2025) Comparison of ChatGPT-3.5 and Google Bard Performance on Turkish orthopaedics and traumatology national board examination. Deneysel ve Klinik Tıp Dergisi 42 1 40–42.

IEEE

[1]M. Korkmaz and A. Kahraman, “Comparison of ChatGPT-3.5 and Google Bard Performance on Turkish orthopaedics and traumatology national board examination”, J. Exp. Clin. Med., vol. 42, no. 1, pp. 40–42, Mar. 2025, [Online]. Available: https://izlik.org/JA75HG79MJ

ISNAD

Korkmaz, Murat - Kahraman, Abdullah. “Comparison of ChatGPT-3.5 and Google Bard Performance on Turkish Orthopaedics and Traumatology National Board Examination”. Deneysel ve Klinik Tıp Dergisi 42/1 (March 1, 2025): 40-42. https://izlik.org/JA75HG79MJ.

JAMA

1.Korkmaz M, Kahraman A. Comparison of ChatGPT-3.5 and Google Bard Performance on Turkish orthopaedics and traumatology national board examination. J. Exp. Clin. Med. 2025;42:40–42.

MLA

Korkmaz, Murat, and Abdullah Kahraman. “Comparison of ChatGPT-3.5 and Google Bard Performance on Turkish Orthopaedics and Traumatology National Board Examination”. Deneysel Ve Klinik Tıp Dergisi, vol. 42, no. 1, Mar. 2025, pp. 40-42, https://izlik.org/JA75HG79MJ.

Vancouver

1.Murat Korkmaz, Abdullah Kahraman. Comparison of ChatGPT-3.5 and Google Bard Performance on Turkish orthopaedics and traumatology national board examination. J. Exp. Clin. Med. [Internet]. 2025 Mar. 1;42(1):40-2. Available from: https://izlik.org/JA75HG79MJ