Comparison of ChatGPT-3.5 and Google Bard Performance on Turkish orthopaedics and traumatology national board examination

Murat Korkmaz; Abdullah Kahraman

Research Article

Year 2025, Volume: 42 Issue: 1, 40 - 42, 28.03.2025

Murat Korkmaz , Abdullah Kahraman

Abstract

References

1. Menekşeoğlu AK, İş EE. Comparative performance of artificial ıntelligence models in physical medicine and rehabilitation board-level questions. Rev Assoc Med Bras (1992). 2024;70(7):e20240241.
2. Mejia MR, Arroyave JS, Saturno M, Ndjonko LCM, Zaidat B, Rajjoub R, et al. Use of ChatGPT for Determining Clinical and Surgical Treatment of Lumbar Disc Herniation With Radiculopathy: A North American Spine Society Guideline Comparison. Neurospine. 2024;21(1):149-58.
3. Chang MC. Use of artificial intelligence in the field of pain medicine. World J Clin Cases. 2024;12(2):236-9.
4. Sancheti P, Bijlani N, Shyam A, Yerudkar A, Lunawat R. ORTHO AI : World's First ARTIFICIAL INTELLIGENCE IN ORTHOPAEDICS. J Orthop Case Rep. 2023;13(12):178-9.
5. Chatterjee S, Bhattacharya M, Pal S, Lee SS, Chakraborty C. ChatGPT and large language models in orthopedics: from education and surgery to research. J Exp Orthop. 2023;10(1):128.
6. Han T, Xiong F, Sun B, Zhong L, Han Z, Lei M. Development and validation of an artificial intelligence mobile application for predicting 30-day mortality in critically ill patients with orthopaedic trauma. Int J Med Inform. 2024 Apr;184:105383.
7. Fan X, Qiao X, Wang Z, Jiang L, Liu Y, Sun Q. Artificial Intelligence-Based CT Imaging on Diagnosis of Patients with Lumbar Disc Herniation by Scalpel Treatment. Comput Intell Neurosci. 2022 May 27;2022:3688630.
8. Gan W, Ouyang J, Li H, Xue Z, Zhang Y, Dong Q, Huang J, Zheng X, Zhang Y. Integrating ChatGPT in Orthopedic Education for Medical Undergraduates: Randomized Controlled Trial. J Med Internet Res. 2024 Aug 20;26:e57037.
9. Khan AA, Yunus R, Sohail M, Rehman TA, Saeed S, Bu Y, et al. Artificial Intelligence for Anesthesiology Board-Style Examination Questions: Role of Large Language Models. J Cardiothorac Vasc Anesth. 2024;38(5):1251-9.
10. Fayers PM, Machin D. Quality of life: The assessment, analysis and reporting of patient-reported outcomes: John Wiley & Sons; 2015.
11. Korkmaz MD, Korkmaz M, Altın YF, Akgül T. Adaptation and validation of the Turkish version of the Quality of Life Profile for Spinal Deformities in idiopathic scoliosis. Acta Orthop Traumatol Turc. 2024;58(3):182-6.
12. Subramani M, Jaleel I, Krishna Mohan S. Evaluating the performance of ChatGPT in medical physiology university examination of phase I MBBS. Adv Physiol Educ. 2023;47(2):270-1.
13. Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, et al. How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment. JMIR Med Educ. 2023;9:e45312.
14. Lum ZC. Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT. Clin Orthop Relat Res. 2023;481(8):1623-30.
15. Sparks CA, Kraeutler MJ, Chester GA, Contrada EV, Zhu E, Fasulo SM, et al. Inadequate Performance of ChatGPT on Orthopedic Board-Style Written Exams. Cureus. 2024;16(6):e62643.
16. Cuthbert R, Simpson AI. Artificial intelligence in orthopaedics: can Chat Generative Pre-trained Transformer (ChatGPT) pass Section 1 of the Fellowship of the Royal College of Surgeons (Trauma & Orthopaedics) examination? Postgrad Med J. 2023;99(1176):1110–1114.
17. Traoré SY, Goetsch T, Muller B, Dabbagh A, Liverneaux PA. Is ChatGPT able to pass the first part of the European Board of Hand Surgery diploma examination? Hand Surg Rehabil. 2023;42(4):362-4.
18. Thibaut G, Dabbagh A, Liverneaux P. Does Google's Bard Chatbot perform better than ChatGPT on the European hand surgery exam? Int Orthop. 2024;48(1):151-8.

Comparison of ChatGPT-3.5 and Google Bard Performance on Turkish orthopaedics and traumatology national board examination

Year 2025, Volume: 42 Issue: 1, 40 - 42, 28.03.2025

Murat Korkmaz , Abdullah Kahraman

Abstract

This study ia a cross-sectional study to evaluate and compare the responses of two chatbots to compare the performance of ChatGPT-3.5 and Google Bard on the Turkish Orthopaedics and Traumatology National Board Examination. The questions of the Turkish Orthopaedics and Traumatology National Board Examination were asked to the chatbots one by one to have them indicate what the correct answer was and determine the difficulty level of the questions. The examination consists of 100 questions; 92 were included in the study. It was found that ChatGPT-3.5 answered 54.3% of the questions correctly, while Google Bard answered 45.7% of the questions correctly. When the correlation of difficulty and accuracy between the two AI models was evaluated, it was found that both were poorly correlated between the two different AI models (r=0.290 and p=0.005 for difficulty; r=0.314 and p=0.002 for accuracy). Both language models showed about 50% success on the Turkish Orthopaedics and Traumatology National Board Examination. Both found similar levels of difficulty in the questions.

Keywords

Accuracy, Bard, ChatGPT-3.5, Difficulty, Orthopedics

References

1. Menekşeoğlu AK, İş EE. Comparative performance of artificial ıntelligence models in physical medicine and rehabilitation board-level questions. Rev Assoc Med Bras (1992). 2024;70(7):e20240241.
2. Mejia MR, Arroyave JS, Saturno M, Ndjonko LCM, Zaidat B, Rajjoub R, et al. Use of ChatGPT for Determining Clinical and Surgical Treatment of Lumbar Disc Herniation With Radiculopathy: A North American Spine Society Guideline Comparison. Neurospine. 2024;21(1):149-58.
3. Chang MC. Use of artificial intelligence in the field of pain medicine. World J Clin Cases. 2024;12(2):236-9.
4. Sancheti P, Bijlani N, Shyam A, Yerudkar A, Lunawat R. ORTHO AI : World's First ARTIFICIAL INTELLIGENCE IN ORTHOPAEDICS. J Orthop Case Rep. 2023;13(12):178-9.
5. Chatterjee S, Bhattacharya M, Pal S, Lee SS, Chakraborty C. ChatGPT and large language models in orthopedics: from education and surgery to research. J Exp Orthop. 2023;10(1):128.
6. Han T, Xiong F, Sun B, Zhong L, Han Z, Lei M. Development and validation of an artificial intelligence mobile application for predicting 30-day mortality in critically ill patients with orthopaedic trauma. Int J Med Inform. 2024 Apr;184:105383.
7. Fan X, Qiao X, Wang Z, Jiang L, Liu Y, Sun Q. Artificial Intelligence-Based CT Imaging on Diagnosis of Patients with Lumbar Disc Herniation by Scalpel Treatment. Comput Intell Neurosci. 2022 May 27;2022:3688630.
8. Gan W, Ouyang J, Li H, Xue Z, Zhang Y, Dong Q, Huang J, Zheng X, Zhang Y. Integrating ChatGPT in Orthopedic Education for Medical Undergraduates: Randomized Controlled Trial. J Med Internet Res. 2024 Aug 20;26:e57037.
9. Khan AA, Yunus R, Sohail M, Rehman TA, Saeed S, Bu Y, et al. Artificial Intelligence for Anesthesiology Board-Style Examination Questions: Role of Large Language Models. J Cardiothorac Vasc Anesth. 2024;38(5):1251-9.
10. Fayers PM, Machin D. Quality of life: The assessment, analysis and reporting of patient-reported outcomes: John Wiley & Sons; 2015.
11. Korkmaz MD, Korkmaz M, Altın YF, Akgül T. Adaptation and validation of the Turkish version of the Quality of Life Profile for Spinal Deformities in idiopathic scoliosis. Acta Orthop Traumatol Turc. 2024;58(3):182-6.
12. Subramani M, Jaleel I, Krishna Mohan S. Evaluating the performance of ChatGPT in medical physiology university examination of phase I MBBS. Adv Physiol Educ. 2023;47(2):270-1.
13. Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, et al. How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment. JMIR Med Educ. 2023;9:e45312.
14. Lum ZC. Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT. Clin Orthop Relat Res. 2023;481(8):1623-30.
15. Sparks CA, Kraeutler MJ, Chester GA, Contrada EV, Zhu E, Fasulo SM, et al. Inadequate Performance of ChatGPT on Orthopedic Board-Style Written Exams. Cureus. 2024;16(6):e62643.
16. Cuthbert R, Simpson AI. Artificial intelligence in orthopaedics: can Chat Generative Pre-trained Transformer (ChatGPT) pass Section 1 of the Fellowship of the Royal College of Surgeons (Trauma & Orthopaedics) examination? Postgrad Med J. 2023;99(1176):1110–1114.
17. Traoré SY, Goetsch T, Muller B, Dabbagh A, Liverneaux PA. Is ChatGPT able to pass the first part of the European Board of Hand Surgery diploma examination? Hand Surg Rehabil. 2023;42(4):362-4.
18. Thibaut G, Dabbagh A, Liverneaux P. Does Google's Bard Chatbot perform better than ChatGPT on the European hand surgery exam? Int Orthop. 2024;48(1):151-8.

There are 18 citations in total.

Details

Primary Language	English
Subjects	Allied Health and Rehabilitation Science (Other)
Journal Section	Research Article
Authors	Murat Korkmaz 0000-0003-2809-6721 Abdullah Kahraman This is me 0000-0002-6098-5097
Publication Date	March 28, 2025
Submission Date	November 21, 2024
Acceptance Date	November 29, 2024
Published in Issue	Year 2025 Volume: 42 Issue: 1

Cite

APA	Korkmaz, M., & Kahraman, A. (2025). Comparison of ChatGPT-3.5 and Google Bard Performance on Turkish orthopaedics and traumatology national board examination. Deneysel Ve Klinik Tıp Dergisi, 42(1), 40-42.
AMA	Korkmaz M, Kahraman A. Comparison of ChatGPT-3.5 and Google Bard Performance on Turkish orthopaedics and traumatology national board examination. J. Exp. Clin. Med. March 2025;42(1):40-42.
Chicago	Korkmaz, Murat, and Abdullah Kahraman. “Comparison of ChatGPT-3.5 and Google Bard Performance on Turkish Orthopaedics and Traumatology National Board Examination”. Deneysel Ve Klinik Tıp Dergisi 42, no. 1 (March 2025): 40-42.
EndNote	Korkmaz M, Kahraman A (March 1, 2025) Comparison of ChatGPT-3.5 and Google Bard Performance on Turkish orthopaedics and traumatology national board examination. Deneysel ve Klinik Tıp Dergisi 42 1 40–42.
IEEE	M. Korkmaz and A. Kahraman, “Comparison of ChatGPT-3.5 and Google Bard Performance on Turkish orthopaedics and traumatology national board examination”, J. Exp. Clin. Med., vol. 42, no. 1, pp. 40–42, 2025.
ISNAD	Korkmaz, Murat - Kahraman, Abdullah. “Comparison of ChatGPT-3.5 and Google Bard Performance on Turkish Orthopaedics and Traumatology National Board Examination”. Deneysel ve Klinik Tıp Dergisi 42/1 (March 2025), 40-42.
JAMA	Korkmaz M, Kahraman A. Comparison of ChatGPT-3.5 and Google Bard Performance on Turkish orthopaedics and traumatology national board examination. J. Exp. Clin. Med. 2025;42:40–42.
MLA	Korkmaz, Murat and Abdullah Kahraman. “Comparison of ChatGPT-3.5 and Google Bard Performance on Turkish Orthopaedics and Traumatology National Board Examination”. Deneysel Ve Klinik Tıp Dergisi, vol. 42, no. 1, 2025, pp. 40-42.
Vancouver	Korkmaz M, Kahraman A. Comparison of ChatGPT-3.5 and Google Bard Performance on Turkish orthopaedics and traumatology national board examination. J. Exp. Clin. Med. 2025;42(1):40-2.

Download Cover Image

Article Files

Full Text

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.