Evaluation of ChatGPT's Performance in Residency Training Progress Exams and Competency Exams in Orthopedics and Traumatology

Yaşar Mahsut Dinçel; Gündüz Ercan Kutluay; Hadi Sasanı; Murat Erem

doi:10.5281/zenodo.18998621

Araştırma Makalesi

BibTex

RIS

Kaynak Göster

Evaluation of ChatGPT's Performance in Residency Training Progress Exams and Competency Exams in Orthopedics and Traumatology

Yıl 2026, Cilt: 2 Sayı: 1 , 14 - 19 , 30.03.2026

Yaşar Mahsut Dinçel , Gündüz Ercan Kutluay , Hadi Sasanı , Murat Erem

https://doi.org/10.5281/zenodo.18998621

https://izlik.org/JA82JZ85JB

Öz

Background: Artificial intelligence (AI) technologies have rapidly expanded into the field of medical education, offering innovative tools for training and assessment.This study aimed to evaluate the performance of the ChatGPT-3.5 language model in the “Residency Training Progress Examination” (UEGS) and the “Competency Examination” administered by the Turkish Society of Orthopedics and Traumatology (TOTBID). The objective was to determine whether ChatGPT performs comparably to orthopedic residents and whether it can achieve a passing score in the Competency Exam. Methods: A total of 2,000 UEGS and 1,000 Competency Exam questions (2012–2023, excluding 2020) were presented to ChatGPT-3.5 using standardized prompts designed within the Role–Goals–Context (RGC) framework. The model’s responses were statistically compared with those of orthopedic residents and specialists using the Mann–Whitney U and Kruskal–Wallis tests (p < 0.05). Results: ChatGPT achieved the highest accuracy in the General Orthopedics category (62%) and the lowest in Adult Reconstructive Surgery (40%). It outperformed residents only in the Spine Surgery category (p < 0.05). In the Competency Exams, ChatGPT passed four of ten exams. Conclusion: ChatGPT-3.5 demonstrated limited reliability and accuracy in orthopedic examinations and should be used cautiously as an educational support tool. Future studies involving newer multimodal versions of large language models may clarify their potential role in medical education and assessment.

Anahtar Kelimeler

ChatGPT , Board Examination , Orthopedics , Traumatology , Artificial Intelligence

Etik Beyan

Honorable editor No clinical patient data was used in the article we uploaded to your journal for review. The artificial intelligence algorithm we used in our study is an “open access” artificial intelligence platform. In addition, the questions asked to the artificial intelligence engine in our study are development exam questions for residents receiving Orthopedics and Traumatology education in Turkey and board questions for Orthopedics and Traumatology specialists in Turkey. These questions are available to everyone as open access. For these reasons, there was no need to obtain an ethics committee for the study. Coresponding author : Gündüz Ercan Kutluay,MD Mail :gunduzercankutluay@gmail.com Adress: Namık Kemal Mahallesi Kampüs Caddesi No:1 Süleymanpaşa-Tekirdağ

Kaynakça

Acaroğlu, E., Kahraman, S., Senköylü, A., Berk, H., Caner, H., Özkan, S., ... (2014). Core curriculum (CC) of spinal surgery: A step forward in defining our profession. Acta Orthopaedica et Traumatologica Turcica, 48(5), 475–478.
Alessandri Bonetti, M., Giorgino, R., Gallo Afflitto, G., De Lorenzi, F., & Egro, F. M. (2024). How does ChatGPT perform on the Italian Residency Admission National Exam compared to 15,869 medical graduates? Annals of Biomedical Engineering, 52(4), 745–749.
Aljindan, F. K., Al Qurashi, A. A., Albalawi, I. A. S., Alanazi, A. M. M., Aljuhani, H. A. M., Falah Almutairi, F., ... (2023). ChatGPT conquers the Saudi Medical Licensing Exam: Exploring the accuracy of artificial intelligence in medical knowledge assessment and implications for modern medical education. Cureus, 15(9), Article e45043.
Atik, O. Ş. (2024). Artificial intelligence: Who must have autonomy the machine or the human? Joint Diseases and Related Surgery, 35(1), 1–2. Ayik, G., Kolac, U. C., Aksoy, T., Yilmaz, A., Sili, M. V., Tokgozoglu, M., ... (2025). Exploring the role of artificial intelligence in Turkish orthopedic progression exams. Acta Orthopaedica et Traumatologica Turcica, 59(1), 18–26.
Benli, İ., & Acaroğlu, E. (2011). Türk Ortopedi ve Travmatoloji Birliği Derneği (TOTBİD) Türk Ortopedi ve Travmatoloji Eğitim Konseyi Yeterlik Sınavları. Acta Orthopaedica et Traumatologica Turcica, 45(2). https://dergipark.org.tr/en/download/article-file/169969
Gönen, D. E. (2013). 2012-2013 TOTBİD-TOTEK Uzmanlık Eğitimi Gelişim Sınavı Raporu (UEGS). Türk Ortopedi ve Travmatoloji Birliği Derneği. https://totbid.org.tr/uploads/files/uegs_2013_rapor.pdf
Haenlein, M., & Kaplan, A. (2019). A brief history of artificial intelligence: On the past, present, and future of artificial intelligence. California Management Review, 61(4), 5–14.
Huang, Y., Gomaa, A., Semrau, S., Haderlein, M., Lettmaier, S., Weissmann, T., ... (2023). Benchmarking ChatGPT-4 on a radiation oncology in-training exam and Red Journal Gray Zone cases: Potentials and challenges for AI-assisted medical education and decision making in radiation oncology. Frontiers in Oncology, 13, Article 1265024.
Khan, R. A., Jawaid, M., Khan, A. R., & Sajjad, M. (2023). ChatGPT - Reshaping medical education and clinical management. Pakistan Journal of Medical Sciences, 39(2). https://pjms.org.pk/index.php/pjms/article/view/7653
Liu, P. R., Lu, L., Zhang, J. Y., Huo, T. T., Liu, S. X., & Ye, Z. W. (2021). Application of artificial intelligence in medicine: An overview. Current Medical Science, 41(6), 1105–1115.
Massey, P. A., Montgomery, C., & Zhang, A. S. (2023). Comparison of ChatGPT–3.5, ChatGPT-4, and orthopaedic resident performance on orthopaedic assessment examinations. Journal of the American Academy of Orthopaedic Surgeons, 31(23), 1173. Minh, D., Wang, H. X., Li, Y. F., & Nguyen, T. N. (2022). Explainable artificial intelligence: A comprehensive review. Artificial Intelligence Review, 55(5), 3503–3568.
Moritz, S., Romeike, B., Stosch, C., & Tolks, D. (2023). Generative AI (gAI) in medical education: Chat-GPT and co. GMS Journal for Medical Education, 40(4), Article Doc54.
Ollivier, M., Pareek, A., Dahmen, J., Kayaalp, M. E., Winkler, P. W., Hirschmann, M. T., ... (2023). A deeper dive into ChatGPT: History, use and future perspectives for orthopaedic research. Knee Surgery, Sports Traumatology, Arthroscopy, 31(4), 1190–1192.
Oztermeli, A. D., & Oztermeli, A. (2023). ChatGPT performance in the medical specialty exam: An observational study. Medicine, 102(32), Article e34673.
Ruksakulpiwat, S., Kumar, A., & Ajibade, A. (2023). Using ChatGPT in medical research: Current status and future directions. Journal of Multidisciplinary Healthcare, 16, 1513–1520.
Sumbal, A., Sumbal, R., & Amir, A. (2024). Can ChatGPT-3.5 pass a medical exam? A systematic review of ChatGPT’s performance in academic testing. Journal of Medical Education and Curricular Development, 11, Article 23821205241238641.
Tabatabaian, M. (2024). Prompt engineering using ChatGPT: Crafting effective interactions and building GPT apps. Mercury Learning and Information.
Türk Ortopedi ve Travmatoloji Birliği Derneği. (n.d.). TOTBİD resmi sitesi. https://totbid.org.tr/tr/
Wang, X., Gong, Z., Wang, G., Jia, J., Xu, Y., Zhao, J., ... (2023). ChatGPT performs on the Chinese National Medical Licensing Examination. Journal of Medical Systems, 47(1), Article 86.
Wu, D., Xiang, Y., Wu, X., Yu, T., Huang, X., Zou, Y., ... (2020). Artificial intelligence-tutoring problem-based learning in ophthalmology clerkship. Annals of Translational Medicine, 8(11), Article 700.
Yang, Y. Y., & Shulruf, B. (2019). Expert-led and artificial intelligence (AI) system-assisted tutoring course increase confidence of Chinese medical interns on suturing and ligature skills: Prospective pilot study. Journal of Educational Evaluation for Health Professions, 16, Article 7.
Yağar, H., Gümüşoğlu, E., & Mert Asfuroğlu, Z. (2025). Assessing the performance of ChatGPT-4o on the Turkish Orthopedics and Traumatology Board Examination. Joint Diseases and Related Surgery, 36(2), 304–310.

Ortopedi ve Travmatoloji’de Uzmanlık Eğitimi Gelişim Sınavları ve Yeterlik Sınavlarında ChatGPT’nin Performansının Değerlendirilmesi

Yıl 2026, Cilt: 2 Sayı: 1 , 14 - 19 , 30.03.2026

Yaşar Mahsut Dinçel , Gündüz Ercan Kutluay , Hadi Sasanı , Murat Erem

https://doi.org/10.5281/zenodo.18998621

https://izlik.org/JA82JZ85JB

Öz

Amaç : Yapay zekâ (YZ) teknolojileri, eğitim ve değerlendirmede yenilikçi araçlar sunarak tıp eğitimi alanında hızla yaygınlaşmaktadır. Bu çalışmanın amacı, ChatGPT-3.5 dil modelinin Türk Ortopedi ve Travmatoloji Birliği (TOTBİD) tarafından uygulanan Uzmanlık Eğitimi Gelişim Sınavı (UEGS) ve Yeterlik Sınavı’ndaki performansını değerlendirmektir. Çalışmada, ChatGPT’nin ortopedi asistanlarıyla karşılaştırılabilir bir performans sergileyip sergilemediği ve Yeterlik Sınavı’nda geçme başarısı gösterip gösteremediği araştırılmıştır. Yöntemler : 2012–2023 yılları arasına ait (2020 yılı hariç) toplam 2.000 UEGS ve 1.000 Yeterlik Sınavı sorusu, Rol–Amaçlar–Bağlam (RGC) çerçevesinde oluşturulan standartlaştırılmış istemler (promptlar) kullanılarak ChatGPT-3.5’e sunuldu. Modelin yanıtları, ortopedi asistanları ve uzmanlarının sonuçlarıyla Mann–Whitney U ve Kruskal–Wallis testleri kullanılarak istatistiksel olarak karşılaştırıldı (p < 0,05). Bulgular : ChatGPT en yüksek doğruluk oranını Genel Ortopedi alanında (%62), en düşük doğruluk oranını ise Erişkin Rekonstrüktif Cerrahi alanında (%40) gösterdi. Sadece Omurga Cerrahisi alanında asistanlardan daha yüksek performans sergiledi (p < 0,05). Yeterlik Sınavlarında ise ChatGPT, on sınavın dördünde geçme başarısı gösterdi. Sonuç : ChatGPT-3.5, ortopedi sınavlarında sınırlı güvenilirlik ve doğruluk göstermiştir ve bir eğitim destek aracı olarak temkinli kullanılmalıdır. Gelecekte, daha yeni ve çok modlu büyük dil modellerini içeren çalışmalar, bu tür sistemlerin tıp eğitimindeki ve değerlendirmedeki potansiyel rolünü daha net ortaya koyabilir.

Anahtar Kelimeler

ChatGPT , Yeterlik Sınavı , Ortopedi , Yapay Zekâ

Etik Beyan

Sayın Editör, Derginize değerlendirilmek üzere yüklediğimiz makalede herhangi bir klinik hasta verisi kullanılmamıştır. Çalışmamızda kullanılan yapay zekâ algoritması açık erişimli (open access) bir yapay zekâ platformudur. Ayrıca çalışmada yapay zekâ motoruna yöneltilen sorular, Türkiye’de Ortopedi ve Travmatoloji eğitimi alan asistanlara yönelik Uzmanlık Eğitimi Gelişim Sınavı soruları ile Türkiye’de Ortopedi ve Travmatoloji uzmanlarına yönelik Yeterlik Sınavı sorularından oluşmaktadır. Bu sorular herkesin erişimine açık olup açık erişimlidir. Bu nedenlerle, çalışma için etik kurul onayı alınmasına gerek duyulmamıştır. Sorumlu yazar: Gündüz Ercan Kutluay, MD E-posta: gunduzercankutluay@gmail.com Adres: Namık Kemal Mahallesi, Kampüs Caddesi No:1, Süleymanpaşa, Tekirdağ

Kaynakça

Acaroğlu, E., Kahraman, S., Senköylü, A., Berk, H., Caner, H., Özkan, S., ... (2014). Core curriculum (CC) of spinal surgery: A step forward in defining our profession. Acta Orthopaedica et Traumatologica Turcica, 48(5), 475–478.
Alessandri Bonetti, M., Giorgino, R., Gallo Afflitto, G., De Lorenzi, F., & Egro, F. M. (2024). How does ChatGPT perform on the Italian Residency Admission National Exam compared to 15,869 medical graduates? Annals of Biomedical Engineering, 52(4), 745–749.
Aljindan, F. K., Al Qurashi, A. A., Albalawi, I. A. S., Alanazi, A. M. M., Aljuhani, H. A. M., Falah Almutairi, F., ... (2023). ChatGPT conquers the Saudi Medical Licensing Exam: Exploring the accuracy of artificial intelligence in medical knowledge assessment and implications for modern medical education. Cureus, 15(9), Article e45043.
Atik, O. Ş. (2024). Artificial intelligence: Who must have autonomy the machine or the human? Joint Diseases and Related Surgery, 35(1), 1–2. Ayik, G., Kolac, U. C., Aksoy, T., Yilmaz, A., Sili, M. V., Tokgozoglu, M., ... (2025). Exploring the role of artificial intelligence in Turkish orthopedic progression exams. Acta Orthopaedica et Traumatologica Turcica, 59(1), 18–26.
Benli, İ., & Acaroğlu, E. (2011). Türk Ortopedi ve Travmatoloji Birliği Derneği (TOTBİD) Türk Ortopedi ve Travmatoloji Eğitim Konseyi Yeterlik Sınavları. Acta Orthopaedica et Traumatologica Turcica, 45(2). https://dergipark.org.tr/en/download/article-file/169969
Gönen, D. E. (2013). 2012-2013 TOTBİD-TOTEK Uzmanlık Eğitimi Gelişim Sınavı Raporu (UEGS). Türk Ortopedi ve Travmatoloji Birliği Derneği. https://totbid.org.tr/uploads/files/uegs_2013_rapor.pdf
Haenlein, M., & Kaplan, A. (2019). A brief history of artificial intelligence: On the past, present, and future of artificial intelligence. California Management Review, 61(4), 5–14.
Huang, Y., Gomaa, A., Semrau, S., Haderlein, M., Lettmaier, S., Weissmann, T., ... (2023). Benchmarking ChatGPT-4 on a radiation oncology in-training exam and Red Journal Gray Zone cases: Potentials and challenges for AI-assisted medical education and decision making in radiation oncology. Frontiers in Oncology, 13, Article 1265024.
Khan, R. A., Jawaid, M., Khan, A. R., & Sajjad, M. (2023). ChatGPT - Reshaping medical education and clinical management. Pakistan Journal of Medical Sciences, 39(2). https://pjms.org.pk/index.php/pjms/article/view/7653
Liu, P. R., Lu, L., Zhang, J. Y., Huo, T. T., Liu, S. X., & Ye, Z. W. (2021). Application of artificial intelligence in medicine: An overview. Current Medical Science, 41(6), 1105–1115.
Massey, P. A., Montgomery, C., & Zhang, A. S. (2023). Comparison of ChatGPT–3.5, ChatGPT-4, and orthopaedic resident performance on orthopaedic assessment examinations. Journal of the American Academy of Orthopaedic Surgeons, 31(23), 1173. Minh, D., Wang, H. X., Li, Y. F., & Nguyen, T. N. (2022). Explainable artificial intelligence: A comprehensive review. Artificial Intelligence Review, 55(5), 3503–3568.
Moritz, S., Romeike, B., Stosch, C., & Tolks, D. (2023). Generative AI (gAI) in medical education: Chat-GPT and co. GMS Journal for Medical Education, 40(4), Article Doc54.
Ollivier, M., Pareek, A., Dahmen, J., Kayaalp, M. E., Winkler, P. W., Hirschmann, M. T., ... (2023). A deeper dive into ChatGPT: History, use and future perspectives for orthopaedic research. Knee Surgery, Sports Traumatology, Arthroscopy, 31(4), 1190–1192.
Oztermeli, A. D., & Oztermeli, A. (2023). ChatGPT performance in the medical specialty exam: An observational study. Medicine, 102(32), Article e34673.
Ruksakulpiwat, S., Kumar, A., & Ajibade, A. (2023). Using ChatGPT in medical research: Current status and future directions. Journal of Multidisciplinary Healthcare, 16, 1513–1520.
Sumbal, A., Sumbal, R., & Amir, A. (2024). Can ChatGPT-3.5 pass a medical exam? A systematic review of ChatGPT’s performance in academic testing. Journal of Medical Education and Curricular Development, 11, Article 23821205241238641.
Tabatabaian, M. (2024). Prompt engineering using ChatGPT: Crafting effective interactions and building GPT apps. Mercury Learning and Information.
Türk Ortopedi ve Travmatoloji Birliği Derneği. (n.d.). TOTBİD resmi sitesi. https://totbid.org.tr/tr/
Wang, X., Gong, Z., Wang, G., Jia, J., Xu, Y., Zhao, J., ... (2023). ChatGPT performs on the Chinese National Medical Licensing Examination. Journal of Medical Systems, 47(1), Article 86.
Wu, D., Xiang, Y., Wu, X., Yu, T., Huang, X., Zou, Y., ... (2020). Artificial intelligence-tutoring problem-based learning in ophthalmology clerkship. Annals of Translational Medicine, 8(11), Article 700.
Yang, Y. Y., & Shulruf, B. (2019). Expert-led and artificial intelligence (AI) system-assisted tutoring course increase confidence of Chinese medical interns on suturing and ligature skills: Prospective pilot study. Journal of Educational Evaluation for Health Professions, 16, Article 7.
Yağar, H., Gümüşoğlu, E., & Mert Asfuroğlu, Z. (2025). Assessing the performance of ChatGPT-4o on the Turkish Orthopedics and Traumatology Board Examination. Joint Diseases and Related Surgery, 36(2), 304–310.

Toplam 22 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Ortopedi
Bölüm	Araştırma Makalesi
Yazarlar	Yaşar Mahsut Dinçel 0000-0001-6576-1802 Gündüz Ercan Kutluay 0000-0002-1077-4945 Hadi Sasanı 0000-0001-6236-4123 Murat Erem 0000-0002-9743-5515
Gönderilme Tarihi	6 Şubat 2026
Kabul Tarihi	4 Mart 2026
Yayımlanma Tarihi	30 Mart 2026
DOI	https://doi.org/10.5281/zenodo.18998621
IZ	https://izlik.org/JA82JZ85JB
Yayımlandığı Sayı	Yıl 2026 Cilt: 2 Sayı: 1

Kaynak Göster

APA	Dinçel, Y. M., Kutluay, G. E., Sasanı, H., & Erem, M. (2026). Evaluation of ChatGPT’s Performance in Residency Training Progress Exams and Competency Exams in Orthopedics and Traumatology. Journal of Baltalimanı, 2(1), 14-19. https://doi.org/10.5281/zenodo.18998621

Makale Dosyaları

Tam Metin