Research Article
BibTex RIS Cite

Okuduklarında öğreniyorlar mı? Orthobullets ve Miller’s review kullanılarak yapay zekâ modellerinin ortopedik bilgi düzeyinin iki aşamalı değerlendirilmesi

Year 2025, Volume: 6 Issue: 6, 788 - 792, 27.12.2025

Abstract

Giriş/Amaç:
ChatGPT, Gemini, Claude ve Perplexity gibi büyük dil modelleri (LLM’ler) tıp eğitiminde giderek daha fazla kullanılmaktadır. Ancak bu modellerin ortopedik bilgi düzeyi ile yapılandırılmış referans materyallerinden öğrenme ve kendini geliştirme kapasiteleri hâlâ net değildir. Bu çalışma, dört gelişmiş LLM’in standart bir ders kaynağına maruz kalmadan önce ve sonra ortopedik bilgi performanslarını karşılaştırmayı ve alan-özgü eğitsel içeriğin model performansını artırıp artırmadığını belirlemeyi amaçlamıştır.

Gereç ve Yöntem:
Orthobullets platformundan elde edilen 110 çoktan seçmeli ortopedi sorusu kullanılarak iki aşamalı bir değerlendirme yapılmıştır. Her model, Miller’s Review of Orthopaedics kaynağına erişimden önce ve sonra test edilmiştir. Doğruluk oranları kaydedilmiş, model içi karşılaştırmalar için Wilcoxon işaretli sıralar testi, modeller arası karşılaştırmalar için ise Bonferroni düzeltmeli Kruskal–Wallis testi uygulanmıştır. Birincil sonuç ölçütü, eğitimsel maruziyet sonrası doğruluk yüzdesindeki değişimdir.

Bulgular/Sonuçlar:
Tüm modeller, ders kitabı maruziyetinden sonra anlamlı doğruluk artışı göstermiştir (p < 0.001). En büyük artış Gemini’de (+20.9%) gözlenmiş, bunu sırasıyla Claude (+10.9%), Perplexity (+10.0%) ve ChatGPT (+9.1%) takip etmiştir. Müdahale sonrası en yüksek toplam doğruluk oranına Perplexity (%90.0) ulaşırken, Claude en düşük performansa sahip model olarak kalmıştır. Gemini’nin artışı en yüksek olmasına rağmen, diğer modellerle karşılaştırıldığında istatistiksel anlamlılığa ulaşmamıştır (p = 0.052).

Sonuç:
Bu çalışma, büyük dil modelleri arasında ortopedik bilgi düzeyi ve öğrenme kapasitesi açısından belirgin farklılıklar olduğunu ortaya koymuştur. Alan-özgü referans materyaliyle desteklenen modellerde doğruluk artışı gözlenmiş olsa da, bu artışın büyüklüğü modele göre değişmektedir. Bulgular, LLM’lerin tıp eğitimine ve klinik karar destek süreçlerine entegrasyonunda model-temelli değerlendirme ve dikkatli yaklaşım gerekliliğini vurgulamaktadır. Daha geniş veri setleri ve gerçek yaşam klinik görevlerini içeren ileri çalışmalara ihtiyaç vardır.

Ethical Statement

Bu çalışma insan katılımcıları, hasta verilerini veya herhangi bir biyolojik materyali içermemektedir. Bu nedenle etik kurul onayı gerekmemiştir. Çalışma, büyük dil modelleri (LLM’ler) tarafından üretilen çıktılar kullanılarak yürütülmüş olup etik değerlendirmeye tabi herhangi bir veri içermemektedir.

Supporting Institution

Destekleyen kurum bulunmamaktadır.

References

  • Crompton H, Burke D. Artificial Intelligence in higher education: the state of the field. Int J Educ Technol High Educ. 2023;20:22. doi:10.1186/s41239-023-00392-8
  • Mah E. Metaverse, AR, machine learning and AI in orthopaedics. J Orthop Surg (Hong Kong). 2023;31(1):10225536231165362.
  • Federer SJ, Jones GG. Artificial Intelligence in orthopaedics: a scoping review. PLoS One. 2021;16(11):e0260471. doi:10.1371/journal.pone.0260471
  • Hamid T, Chhabra M, Ravulakollu K, Singh P, Dalal S, Dewan R. A review on artificial intelligence in orthopaedics. In: Proceedings of the 9th International Conference on Computing for Sustainable Global Development (INDIACom); 2022.
  • Haleem A, Vaishya R, Javaid M, Khan IH. Artificial Intelligence applications in orthopaedics: an innovative technology to embrace. J Clin Orthop Trauma. 2020;11(suppl 1):S80-S81. doi:10.1016/j.jcot.2019.07.012
  • Myers TG, Ramkumar PN, Ricciardi BF, Urish KL, Kipper J, Ketonis C. Artificial Intelligence and orthopaedics: an introduction for clinicians. J Bone Joint Surg Am. 2020;102(9):830-840. doi:10.2106/JBJS.19.01128
  • Kumar V, Patel S, Baburaj V, Vardhan A, Singh PK, Vaishya R. Current understanding on Artificial Intelligence and machine learning in orthopaedics: a scoping review. J Orthop. 2022;34:201-206. doi:10.1016/j.jor.2022.09.003
  • Hui AT, Alvandi LM, Eleswarapu AS, Fornari ED. Artificial intelligence in modern orthopaedics: current and future applications. JBJS Rev. 2022;10(10):e22. doi:10.2106/JBJS.RVW.22.00022
  • Familiari F, Saithna A, Martinez-Cano JP, et al. Exploring artificial intelligence in orthopaedics: a collaborative survey from the ISAKOS Young Professional Task Force. J Exp Orthop. 2025;12(1):e70181.
  • Clement ND, Simpson AHRW. Artificial Intelligence in orthopaedics: what level of evidence does it represent and how is it validated? Bone Joint Res. 2023;12(8):494-496. doi:10.1302/2046-3758.128.BJR-2023-0123
  • Gencer G, Gencer K. A comparative analysis of ChatGPT and medical faculty graduates in medical specialization exams: uncovering the potential of Artificial Intelligence in medical education. Cureus. 2024; 16(8):e66517. doi:10.7759/cureus.66517
  • Zsidai B, Hilkert AS, Kaarre J, et al. A practical guide to the implementation of AI in orthopaedic research—part 1: opportunities in clinical application and overcoming existing challenges. J Exp Orthop. 2023;10(1):117. doi:10.1186/s40634-023-00630-1
  • Coppola A, Asopa V. A practical approach to artificial intelligence in trauma and orthopaedics. J Trauma Orthop. 2024;12(2):30-32.
  • Ray PP. A critical analysis of the use of ChatGPT in orthopaedics. Int Orthop. 2023;47(10):2617-2618. doi:10.1007/s00264-023-05855-4
  • Rizzo MG, Cai N, Constantinescu D. The performance of ChatGPT on orthopaedic in-service training examinations: a comparative study of the GPT-3.5 Turbo and GPT-4 models in orthopaedic education. J Orthop. 2024;50:70-75. doi:10.1016/j.jor.2024.02.004
  • Gezer MC, Armangil M. Assessing the quality of ChatGPT’s responses to commonly asked questions about trigger finger treatment. Ulus Travma Acil Cerrahi Derg. 2025;31(4):389-393. doi:10.14744/tjtes.2025.32735
  • Bayrak HC, Karagoz B, Bayrak O. Comparative evaluation of large language model–based chatbots in a septic arthritis scenario: ChatGPT, Claude, and Perplexity. Acta Orthop Traumatol Turc. 2025;in press:1-27. doi:10.5152/j.aott.2025.25428

Do they learn when they read? A two-stage evaluation of AI models’ orthopedic knowledge using Orthobullets and Miller’s review

Year 2025, Volume: 6 Issue: 6, 788 - 792, 27.12.2025

Abstract

Aims: Large language models (LLMs) such as ChatGPT, Gemini, Claude, and Perplexity are increasingly incorporated into medical education; however, their baseline orthopedic knowledge and their ability to utilize structured reference materials remain insufficiently characterized. This study aimed to compare the performance of four advanced LLMs before and after exposure to a standardized orthopedic textbook and to determine whether domain-specific educational content enhances inference-time accuracy.
Methods: A two-stage evaluation was conducted using 110 multiple-choice questions from the Orthobullets platform. Each model first completed the question set under identical prompting conditions. A new chat session was then initiated, and the full PDF of Miller’s Review of Orthopaedics (9th edition) was uploaded using native document-processing functions. Models were subsequently retested with the same questions. Pre–post accuracy differences were analyzed using the Wilcoxon signed-rank test (effect size r calculated as Z/√N). Between-model differences were assessed using the Kruskal–Wallis test with Bonferroni adjusted pairwise comparisons. The primary outcome was the change in accuracy (%) after textbook exposure.
Results: All four models demonstrated significant improvement following access to the textbook (p<0.001). Gemini showed the greatest numerical gain (+20.9%), followed by Claude (+10.9%), Perplexity (+10.0%), and ChatGPT (+9.1%). Perplexity achieved the highest absolute post-exposure accuracy (90.0%), whereas Claude remained the lowest performer. Although Gemini exhibited the largest relative improvement, its advantage over the other models did not reach statistical significance (p=0.052).
Conclusion: Exposure to a standardized orthopedic textbook was associated with improved inference-time accuracy across all models, though the magnitude of benefit varied by platform. These findings underscore the heterogeneity of LLM performance in subspecialty medical topics and highlight the importance of model-specific benchmarking. Because LLMs do not undergo parameter-level learning during user interaction, observed improvements reflect temporary contextual integration rather than durable knowledge acquisition. Further research involving broader datasets, additional model architectures, and clinically oriented task evaluations is warranted.

Ethical Statement

This study did not involve human participants, patient data, or any biological material. Therefore, ethics committee approval was not required. The study was conducted using outputs generated by large language models (LLMs) and did not include any data subject to ethical review.

Supporting Institution

There is no supporting institution for this study.

References

  • Crompton H, Burke D. Artificial Intelligence in higher education: the state of the field. Int J Educ Technol High Educ. 2023;20:22. doi:10.1186/s41239-023-00392-8
  • Mah E. Metaverse, AR, machine learning and AI in orthopaedics. J Orthop Surg (Hong Kong). 2023;31(1):10225536231165362.
  • Federer SJ, Jones GG. Artificial Intelligence in orthopaedics: a scoping review. PLoS One. 2021;16(11):e0260471. doi:10.1371/journal.pone.0260471
  • Hamid T, Chhabra M, Ravulakollu K, Singh P, Dalal S, Dewan R. A review on artificial intelligence in orthopaedics. In: Proceedings of the 9th International Conference on Computing for Sustainable Global Development (INDIACom); 2022.
  • Haleem A, Vaishya R, Javaid M, Khan IH. Artificial Intelligence applications in orthopaedics: an innovative technology to embrace. J Clin Orthop Trauma. 2020;11(suppl 1):S80-S81. doi:10.1016/j.jcot.2019.07.012
  • Myers TG, Ramkumar PN, Ricciardi BF, Urish KL, Kipper J, Ketonis C. Artificial Intelligence and orthopaedics: an introduction for clinicians. J Bone Joint Surg Am. 2020;102(9):830-840. doi:10.2106/JBJS.19.01128
  • Kumar V, Patel S, Baburaj V, Vardhan A, Singh PK, Vaishya R. Current understanding on Artificial Intelligence and machine learning in orthopaedics: a scoping review. J Orthop. 2022;34:201-206. doi:10.1016/j.jor.2022.09.003
  • Hui AT, Alvandi LM, Eleswarapu AS, Fornari ED. Artificial intelligence in modern orthopaedics: current and future applications. JBJS Rev. 2022;10(10):e22. doi:10.2106/JBJS.RVW.22.00022
  • Familiari F, Saithna A, Martinez-Cano JP, et al. Exploring artificial intelligence in orthopaedics: a collaborative survey from the ISAKOS Young Professional Task Force. J Exp Orthop. 2025;12(1):e70181.
  • Clement ND, Simpson AHRW. Artificial Intelligence in orthopaedics: what level of evidence does it represent and how is it validated? Bone Joint Res. 2023;12(8):494-496. doi:10.1302/2046-3758.128.BJR-2023-0123
  • Gencer G, Gencer K. A comparative analysis of ChatGPT and medical faculty graduates in medical specialization exams: uncovering the potential of Artificial Intelligence in medical education. Cureus. 2024; 16(8):e66517. doi:10.7759/cureus.66517
  • Zsidai B, Hilkert AS, Kaarre J, et al. A practical guide to the implementation of AI in orthopaedic research—part 1: opportunities in clinical application and overcoming existing challenges. J Exp Orthop. 2023;10(1):117. doi:10.1186/s40634-023-00630-1
  • Coppola A, Asopa V. A practical approach to artificial intelligence in trauma and orthopaedics. J Trauma Orthop. 2024;12(2):30-32.
  • Ray PP. A critical analysis of the use of ChatGPT in orthopaedics. Int Orthop. 2023;47(10):2617-2618. doi:10.1007/s00264-023-05855-4
  • Rizzo MG, Cai N, Constantinescu D. The performance of ChatGPT on orthopaedic in-service training examinations: a comparative study of the GPT-3.5 Turbo and GPT-4 models in orthopaedic education. J Orthop. 2024;50:70-75. doi:10.1016/j.jor.2024.02.004
  • Gezer MC, Armangil M. Assessing the quality of ChatGPT’s responses to commonly asked questions about trigger finger treatment. Ulus Travma Acil Cerrahi Derg. 2025;31(4):389-393. doi:10.14744/tjtes.2025.32735
  • Bayrak HC, Karagoz B, Bayrak O. Comparative evaluation of large language model–based chatbots in a septic arthritis scenario: ChatGPT, Claude, and Perplexity. Acta Orthop Traumatol Turc. 2025;in press:1-27. doi:10.5152/j.aott.2025.25428
There are 17 citations in total.

Details

Primary Language English
Subjects Orthopaedics
Journal Section Research Article
Authors

Mahircan Demir 0000-0002-7372-3280

İbrahim Faruk Adıgüzel 0000-0003-2493-5540

Mustafa Dinç 0000-0002-3002-5028

Recep Karasu 0000-0002-0628-5794

Submission Date November 15, 2025
Acceptance Date December 22, 2025
Publication Date December 27, 2025
Published in Issue Year 2025 Volume: 6 Issue: 6

Cite

AMA Demir M, Adıgüzel İF, Dinç M, Karasu R. Do they learn when they read? A two-stage evaluation of AI models’ orthopedic knowledge using Orthobullets and Miller’s review. J Med Palliat Care / JOMPAC / jompac. December 2025;6(6):788-792.

TR DİZİN ULAKBİM and International Indexes (1d)

Interuniversity Board (UAK) Equivalency: Article published in Ulakbim TR Index journal [10 POINTS], and Article published in other (excuding 1a, b, c) international indexed journal (1d) [5 POINTS]



google-scholar.png


crossref.jpg

f9ab67f.png

asos-index.png


COPE.jpg

icmje_1_orig.png

cc.logo.large.png

ncbi.png


pn6krf5.jpg


Our journal is in TR-Dizin, DRJI (Directory of Research Journals Indexing, General Impact Factor, Google Scholar, Researchgate, CrossRef (DOI), ROAD, ASOS Index, Turk Medline Index, Eurasian Scientific Journal Index (ESJI), and Turkiye Citation Index.

EBSCO, DOAJ, OAJI and ProQuest Index are in process of evaluation. 

Journal articles are evaluated as "Double-Blind Peer Review"