Research Article
BibTex RIS Cite

Assessing the accuracy, safety, and clinical utility of AI-generated rehabilitation guidance for Bankart lesions: a blinded comparative evaluation with expert protocols

Year 2026, Volume: 7 Issue: 2, 274 - 281, 27.03.2026
https://izlik.org/JA43AJ56XR

Abstract

Aims: Artificial Intelligence (AI)–based language models are increasingly used to generate medical information and patient education materials. However, the reliability and safety of AI-generated rehabilitation guidance remain uncertain. This study aimed to evaluate the accuracy, safety, clinical utility, and readability of rehabilitation recommendations generated by ChatGPT-5 for Bankart lesions and to compare these outputs with expert-developed rehabilitation protocols.
Methods: A blinded, cross-sectional comparative quality assessment was conducted. Standardized prompts regarding nonoperative and postoperative Bankart rehabilitation were used to generate responses from ChatGPT-5. AI-generated texts were compared with protocols prepared by a panel of orthopedic shoulder surgeons and an experienced physiotherapist. All texts were anonymized and independently evaluated by three blinded expert raters using a structured 5-point Likert scale assessing clinical accuracy, safety, actionability, comprehensiveness, and overall quality. Major clinical errors were recorded separately. Readability was assessed using Flesch Reading Ease and Flesch–Kincaid Grade Level scores. Inter-rater reliability was analyzed using intraclass correlation coefficients (ICC).
Results: A total of 20 rehabilitation texts (10 AI-generated and 10 expert-developed) were evaluated. Expert protocols demonstrated significantly higher scores in clinical accuracy (4.6±0.4 vs 3.4±0.7, p<0.001), safety (4.8±0.3 vs 3.2±0.8, p<0.001), comprehensiveness (4.7±0.4 vs 3.1±0.9, p<0.001), and overall quality (4.6±0.4 vs 3.5±0.6, p<0.001). AI outputs were more readable (Flesch Reading Ease: 72.6±5.8 vs 58.4±6.2, p<0.01) but frequently lacked critical safety information. Major clinical errors were identified in 20% of AI-generated texts (2/10), whereas no major errors were detected in expert-developed protocols (0/10) (p<0.05). Inter-rater reliability showed good to excellent agreement across domains (ICC=0.80–0.89).
Conclusion: Although ChatGPT-5 can produce well-structured and easily readable rehabilitation information for Bankart lesions, its outputs show significant deficiencies in safety, accuracy, and comprehensiveness. Unsupervised use of AI-generated rehabilitation guidance may pose clinically relevant risks. A hybrid model in which AI-generated content is reviewed and validated by clinicians represents a safer and more appropriate approach for integrating AI into postoperative rehabilitation
education.

References

  • Tupe RN, Tiwari V. Anteroinferior Glenoid Labrum Lesion (Bankart Lesion). In: StatPearls. Treasure Island (FL): StatPearls Publishing; 2025.
  • Utami SW, Pratiwi SR, Mitchel, Gani KS, Kholinne E. Return to sports following arthroscopic Bankart repair: a narrative review. Ewha Med J. 2024;47(2):e21. doi:10.12771/emj.2024.e21
  • Popchak A, Patterson-Lynch B, Christain H, Irrgang JJ. Rehabilitation and return to sports after anterior shoulder stabilization. Ann Jt. 2017;2: 62-62. doi:10.21037/aoj.2017.10.06
  • Villarreal-Espinosa JB, Reinold MM, Khak M, et al. Rehabilitation protocol variability following arthroscopic Bankart repair and remplissage for management of anterior shoulder instability: a systematic review. Int J Sports Phys Ther. 2024;19(10):1172-1187. doi:10. 26603/001c.123481
  • McIsaac W, Lalani A, Silveira A, Chepeha J, Luciak-Corea C, Beaupre L. Rehabilitation after arthroscopic Bankart repair: a systematic scoping review identifying important evidence gaps. Physiotherapy. 2022;114:68-76. doi:10.1016/j.physio.2021.03.014
  • Van Gastel ML, Van Iersel TP, Tutuhatunewa ED, et al. Developing a rehabilitation guideline (REGUIDE) for patients undergoing an arthroscopic Bankart repair after traumatic anterior shoulder dislocation, focusing on managing apprehension: an international delphi-based consensus study. J Orthop Sports Phys Ther. 2024;54(5):289-301. doi:10.2519/jospt.2024.12106
  • Drummond Junior M, Popchak A, Wilson K, Kane G, Lin A. Criteria-based return-to-sport testing is associated with lower recurrence rates following arthroscopic Bankart repair. J Shoulder Elbow Surg. 2021; 30(7S):S14-S20. doi:10.1016/j.jse.2021.03.141
  • Patel K, Radcliffe R. Evaluating the readability and quality of bladder cancer information from AI Chatbots: a comparative study between ChatGPT, Google Gemini, Grok, Claude and DeepSeek. J Clin Med. 2025;14(21):7804. doi:10.3390/jcm14217804
  • Koluman AC, Çiftçi MU, Çiftçi EA, Çakmur BB, Ziroğlu N. Balancing accuracy and readability: comparative evaluation of AI Chatbots for patient education on rotator cuff tears. Healthcare. 2025;13(21):2670. doi:10.3390/healthcare13212670
  • Çakmur BB, Koluman AC, Çiftçi MU, Aloğlu Çiftçi E, Ziroğlu N. Gemini 1.5 Flash provides the most reliable content while ChatGPT-4o offers the highest readability for patient education on meniscal tears. Knee Surg Sports Traumatol Arthrosc. 2026;34(3):1141-1149. doi:10.1002/ksa.70247
  • Oeding JF, Lu AZ, Mazzucco M, et al. Effectiveness of a large language model for clinical information retrieval regarding shoulder arthroplasty. J Exp Orthop. 2024;11(4):e70114. doi:10.1002/jeo2.70114
  • Johnson CK, Mandalia K, Corban J, Beall KE, Shah SS. Adequacy of ChatGPT responses to frequently asked questions about shoulder arthroplasty: is it an appropriate adjunct for patient education? JSES Int. 2025;9(3):830-836. doi:10.1016/j.jseint.2025.01.008
  • Lower K, Lin JY, Jenkin D, et al. Comparing the quality and readability of ChatGPT-4-Generated vs. human-generated patient education materials for total knee arthroplasty. Cureus. 2025;17(6):e86491. doi:10. 7759/cureus.86491
  • Luo M, Duan Z, Gao J, Sun Y, Chen L, Feng X. Evaluating the role of ChatGPT in rehabilitation medicine: a narrative review. Front Digit Health. 2025;7:1618510. doi:10.3389/fdgth.2025.1618510
  • Shoemaker SJ, Wolf MS, Brach C. Development of the patient education materials assessment tool (PEMAT): a new measure of understandability and actionability for print and audiovisual patient information. Patient Educ Couns. 2014;96(3):395-403. doi:10.1016/j.pec.2014.05.027
  • Eltorai AEM, Sharma P, Wang J, Daniels AH. Most American Academy of Orthopaedic Surgeons’ online patient education material exceeds average patient reading level. Clin Orthop. 2015;473(4):1181-1186. doi:10. 1007/s11999-014-4071-2
  • Fernández Dorado F, Álvarez Villar S, Osuna Mavare CA, Ruíz Díaz R, Díaz Heredia J, Ruiz Ibán MÁ. Evaluation of ChatGPT’s responses to frequently asked questions about shoulder arthroplasty. JSES Int. 2025; 9(5):1771-1777. doi:10.1016/j.jseint.2025.05.008
  • Kodra JD, Saroyan A, Darby F, et al. ChatGPT-generated responses across orthopaedic sports medicine surgery vary in accuracy, quality, and readability: a systematic review. Arthrosc Sports Med Rehabil. 2025; 7(4):101210. doi:10.1016/j.asmr.2025.101210
  • DeFroda SF, Mehta N, Owens BD. Physical therapy protocols for arthroscopic Bankart repair. Sports Health Multidiscip Approach. 2018; 10(3):250-258. doi:10.1177/1941738117750553
  • Rossettini G, Bargeri S, Cook C, et al. Accuracy of ChatGPT-3.5, ChatGPT-4o, Copilot, Gemini, Claude, and Perplexity in advising on lumbosacral radicular pain against clinical practice guidelines: cross-sectional study. Front Digit Health. 2025;7:1574287. doi:10.3389/fdgth. 2025.1574287
  • Duan L, Yao Z, Li X, Wu Y, Sheng D. Comparing large language models and human doctors in symptom-driven online medical consultations: a case study on trigeminal neuralgia. Digit Health. 2025;11: 20552076251388140. doi:10.1177/20552076251388140
  • Oviedo-Trespalacios O, Peden AE, Cole-Hunter T, et al. The risks of using ChatGPT to obtain common safety-related information and advice. Saf Sci. 2023;167:106244. doi:10.1016/j.ssci.2023.106244
  • Cornelison B, Axon DR, Abbott B, et al. Accuracy and safety of ChatGPT-3.5 in assessing over-the-counter medication use during pregnancy: a descriptive comparative study. Pharmacy. 2025;13(4):104. doi:10.3390/pharmacy13040104
  • Badarudeen S, Sabharwal S. Assessing readability of patient education materials: current role in orthopaedics. Clin Orthop. 2010;468(10):2572-2580. doi:10.1007/s11999-010-1380-y
  • Akinleye SD, Krochak R, Richardson N, Garofolo G, Culbertson MD, Erez O. Readability of the most commonly accessed arthroscopy-related online patient education materials. Arthrosc J Arthrosc Relat Surg. 2018;34(4):1272-1279. doi:10.1016/j.arthro.2017.09.043
  • Austin RR, Jantraporn R, Schulz C, Zhang R. Navigating online health information: assessing the quality and readability of dietary and herbal supplements for chronic musculoskeletal pain. CIN Comput Inform Nurs. 2024;42(8):547-554. doi:10.1097/CIN.0000000000001138
  • Gianola S, Bargeri S, Castellini G, et al. Performance of ChatGPT compared to clinical practice guidelines in making informed decisions for lumbosacral radicular pain: a cross-sectional study. J Orthop Sports Phys Ther. 2024;54(3):222-228. doi:10.2519/jospt.2024.12151
There are 27 citations in total.

Details

Primary Language English
Subjects Orthopaedics
Journal Section Research Article
Authors

Ali Can Koluman 0000-0002-0191-3229

Ahmet Yiğitbay 0000-0002-7845-1974

Ebru Aloğlu Çiftçi

Mehmet Utku Çiftçi 0000-0001-5594-4138

Nezih Ziroğlu 0000-0002-2595-9459

Cemal Kural 0000-0001-7493-391X

Submission Date February 8, 2026
Acceptance Date February 27, 2026
Publication Date March 27, 2026
IZ https://izlik.org/JA43AJ56XR
Published in Issue Year 2026 Volume: 7 Issue: 2

Cite

AMA 1.Koluman AC, Yiğitbay A, Aloğlu Çiftçi E, Çiftçi MU, Ziroğlu N, Kural C. Assessing the accuracy, safety, and clinical utility of AI-generated rehabilitation guidance for Bankart lesions: a blinded comparative evaluation with expert protocols. J Med Palliat Care / JOMPAC / jompac. 2026;7(2):274-281. https://izlik.org/JA43AJ56XR

TR DİZİN ULAKBİM and International Indexes (1d)

Interuniversity Board (UAK) Equivalency: Article published in Ulakbim TR Index journal [10 POINTS], and Article published in other (excuding 1a, b, c) international indexed journal (1d) [5 POINTS]



google-scholar.png


crossref.jpg

f9ab67f.png

asos-index.png


COPE.jpg

icmje_1_orig.png

cc.logo.large.png

ncbi.png


pn6krf5.jpg


Our journal is in TR-Dizin, DRJI (Directory of Research Journals Indexing, General Impact Factor, Google Scholar, Researchgate, CrossRef (DOI), ROAD, ASOS Index, Turk Medline Index, Eurasian Scientific Journal Index (ESJI), and Turkiye Citation Index.

EBSCO, DOAJ, OAJI and ProQuest Index are in process of evaluation. 

Journal articles are evaluated as "Double-Blind Peer Review"