Research Article

Do they learn when they read? A two-stage evaluation of AI models’ orthopedic knowledge using Orthobullets and Miller’s review

Volume: 6 Number: 6 December 27, 2025
TR EN

Do they learn when they read? A two-stage evaluation of AI models’ orthopedic knowledge using Orthobullets and Miller’s review

Abstract

Aims: Large language models (LLMs) such as ChatGPT, Gemini, Claude, and Perplexity are increasingly incorporated into medical education; however, their baseline orthopedic knowledge and their ability to utilize structured reference materials remain insufficiently characterized. This study aimed to compare the performance of four advanced LLMs before and after exposure to a standardized orthopedic textbook and to determine whether domain-specific educational content enhances inference-time accuracy. Methods: A two-stage evaluation was conducted using 110 multiple-choice questions from the Orthobullets platform. Each model first completed the question set under identical prompting conditions. A new chat session was then initiated, and the full PDF of Miller’s Review of Orthopaedics (9th edition) was uploaded using native document-processing functions. Models were subsequently retested with the same questions. Pre–post accuracy differences were analyzed using the Wilcoxon signed-rank test (effect size r calculated as Z/√N). Between-model differences were assessed using the Kruskal–Wallis test with Bonferroni adjusted pairwise comparisons. The primary outcome was the change in accuracy (%) after textbook exposure. Results: All four models demonstrated significant improvement following access to the textbook (p<0.001). Gemini showed the greatest numerical gain (+20.9%), followed by Claude (+10.9%), Perplexity (+10.0%), and ChatGPT (+9.1%). Perplexity achieved the highest absolute post-exposure accuracy (90.0%), whereas Claude remained the lowest performer. Although Gemini exhibited the largest relative improvement, its advantage over the other models did not reach statistical significance (p=0.052). Conclusion: Exposure to a standardized orthopedic textbook was associated with improved inference-time accuracy across all models, though the magnitude of benefit varied by platform. These findings underscore the heterogeneity of LLM performance in subspecialty medical topics and highlight the importance of model-specific benchmarking. Because LLMs do not undergo parameter-level learning during user interaction, observed improvements reflect temporary contextual integration rather than durable knowledge acquisition. Further research involving broader datasets, additional model architectures, and clinically oriented task evaluations is warranted.

Keywords

Supporting Institution

There is no supporting institution for this study.

Ethical Statement

This study did not involve human participants, patient data, or any biological material. Therefore, ethics committee approval was not required. The study was conducted using outputs generated by large language models (LLMs) and did not include any data subject to ethical review.

References

  1. Crompton H, Burke D. Artificial Intelligence in higher education: the state of the field. Int J Educ Technol High Educ. 2023;20:22. doi:10.1186/s41239-023-00392-8
  2. Mah E. Metaverse, AR, machine learning and AI in orthopaedics. J Orthop Surg (Hong Kong). 2023;31(1):10225536231165362.
  3. Federer SJ, Jones GG. Artificial Intelligence in orthopaedics: a scoping review. PLoS One. 2021;16(11):e0260471. doi:10.1371/journal.pone.0260471
  4. Hamid T, Chhabra M, Ravulakollu K, Singh P, Dalal S, Dewan R. A review on artificial intelligence in orthopaedics. In: Proceedings of the 9th International Conference on Computing for Sustainable Global Development (INDIACom); 2022.
  5. Haleem A, Vaishya R, Javaid M, Khan IH. Artificial Intelligence applications in orthopaedics: an innovative technology to embrace. J Clin Orthop Trauma. 2020;11(suppl 1):S80-S81. doi:10.1016/j.jcot.2019.07.012
  6. Myers TG, Ramkumar PN, Ricciardi BF, Urish KL, Kipper J, Ketonis C. Artificial Intelligence and orthopaedics: an introduction for clinicians. J Bone Joint Surg Am. 2020;102(9):830-840. doi:10.2106/JBJS.19.01128
  7. Kumar V, Patel S, Baburaj V, Vardhan A, Singh PK, Vaishya R. Current understanding on Artificial Intelligence and machine learning in orthopaedics: a scoping review. J Orthop. 2022;34:201-206. doi:10.1016/j.jor.2022.09.003
  8. Hui AT, Alvandi LM, Eleswarapu AS, Fornari ED. Artificial intelligence in modern orthopaedics: current and future applications. JBJS Rev. 2022;10(10):e22. doi:10.2106/JBJS.RVW.22.00022

Details

Primary Language

English

Subjects

Orthopaedics

Journal Section

Research Article

Publication Date

December 27, 2025

Submission Date

November 15, 2025

Acceptance Date

December 22, 2025

Published in Issue

Year 2025 Volume: 6 Number: 6

APA
Demir, M., Adıgüzel, İ. F., Dinç, M., & Karasu, R. (2025). Do they learn when they read? A two-stage evaluation of AI models’ orthopedic knowledge using Orthobullets and Miller’s review. Journal of Medicine and Palliative Care, 6(6), 788-792. https://doi.org/10.47582/jompac.1824569
AMA
1.Demir M, Adıgüzel İF, Dinç M, Karasu R. Do they learn when they read? A two-stage evaluation of AI models’ orthopedic knowledge using Orthobullets and Miller’s review. J Med Palliat Care / JOMPAC / jompac. 2025;6(6):788-792. doi:10.47582/jompac.1824569
Chicago
Demir, Mahircan, İbrahim Faruk Adıgüzel, Mustafa Dinç, and Recep Karasu. 2025. “Do They Learn When They Read? A Two-Stage Evaluation of AI Models’ Orthopedic Knowledge Using Orthobullets and Miller’s Review”. Journal of Medicine and Palliative Care 6 (6): 788-92. https://doi.org/10.47582/jompac.1824569.
EndNote
Demir M, Adıgüzel İF, Dinç M, Karasu R (December 1, 2025) Do they learn when they read? A two-stage evaluation of AI models’ orthopedic knowledge using Orthobullets and Miller’s review. Journal of Medicine and Palliative Care 6 6 788–792.
IEEE
[1]M. Demir, İ. F. Adıgüzel, M. Dinç, and R. Karasu, “Do they learn when they read? A two-stage evaluation of AI models’ orthopedic knowledge using Orthobullets and Miller’s review”, J Med Palliat Care / JOMPAC / jompac, vol. 6, no. 6, pp. 788–792, Dec. 2025, doi: 10.47582/jompac.1824569.
ISNAD
Demir, Mahircan - Adıgüzel, İbrahim Faruk - Dinç, Mustafa - Karasu, Recep. “Do They Learn When They Read? A Two-Stage Evaluation of AI Models’ Orthopedic Knowledge Using Orthobullets and Miller’s Review”. Journal of Medicine and Palliative Care 6/6 (December 1, 2025): 788-792. https://doi.org/10.47582/jompac.1824569.
JAMA
1.Demir M, Adıgüzel İF, Dinç M, Karasu R. Do they learn when they read? A two-stage evaluation of AI models’ orthopedic knowledge using Orthobullets and Miller’s review. J Med Palliat Care / JOMPAC / jompac. 2025;6:788–792.
MLA
Demir, Mahircan, et al. “Do They Learn When They Read? A Two-Stage Evaluation of AI Models’ Orthopedic Knowledge Using Orthobullets and Miller’s Review”. Journal of Medicine and Palliative Care, vol. 6, no. 6, Dec. 2025, pp. 788-92, doi:10.47582/jompac.1824569.
Vancouver
1.Mahircan Demir, İbrahim Faruk Adıgüzel, Mustafa Dinç, Recep Karasu. Do they learn when they read? A two-stage evaluation of AI models’ orthopedic knowledge using Orthobullets and Miller’s review. J Med Palliat Care / JOMPAC / jompac. 2025 Dec. 1;6(6):788-92. doi:10.47582/jompac.1824569

TR DİZİN ULAKBİM and International Indexes (1d)

Interuniversity Board (UAK) Equivalency: Article published in Ulakbim TR Index journal [10 POINTS], and Article published in other (excuding 1a, b, c) international indexed journal (1d) [5 POINTS]
 


 

download?token=eyJhdXRoX3JvbGVzIjpbXSwiZW5kcG9pbnQiOiJqb3VybmFsIiwib3JpZ2luYWxuYW1lIjoiVHJfSW5kZXhfbG9nby5wbmciLCJwYXRoIjoiN2EzMC84NTVhL2UyMWMvNjlkZjRkZmVhNTUyNTYuNzg3NjU2ODgucG5nIiwiZXhwIjoxNzc2MjQ1Nzc0LCJub25jZSI6IjU0MDZkMWE2NmE1Y2QwZTJjNGYyNDA1OTM2MTE0YWIxIn0.Tt-WScFXTj5r2jji5eDMFApNzujLMjMPl8ivXRbozSI



f9ab67f.png
asos-index.png


 


download?token=eyJhdXRoX3JvbGVzIjpbXSwiZW5kcG9pbnQiOiJqb3VybmFsIiwib3JpZ2luYWxuYW1lIjoiQ3Jvc3NyZWYuanBnIiwicGF0aCI6IjAzMzEvMTdkZi8yN2ZkLzY5ZGY0ZThhMDZkMjg0LjQxMjAyNDg5LmpwZyIsImV4cCI6MTc3NjI0NTkxNCwibm9uY2UiOiI2NjM1Yjc5MWFiY2I1MDQ0NjkzMTAxMDhjY2Y2NzRlMCJ9.5jDQBEY-KErkDK1QjDmv9ichOkNIn5CWYibe1Wz1644
icmje_1_orig.png
 
cc.logo.large.png
 
ncbi.png
 
google-scholar.pngpn6krf5.jpg
 


 

Our journal is in TR-Dizin, DRJI (Directory of Research Journals Indexing, General Impact Factor, Google Scholar, Researchgate, CrossRef (DOI), ROAD, ASOS Index, Turk Medline Index, Eurasian Scientific Journal Index (ESJI), and Turkiye Citation Index.

EBSCO, DOAJ, OAJI and ProQuest Index are in process of evaluation. 

 

Journal articles are evaluated as "Double-Blind Peer Review"