Assessing the accuracy, safety, and clinical utility of AI-generated rehabilitation guidance for Bankart lesions: a blinded comparative evaluation with expert protocols

Ali Can Koluman; Ahmet Yiğitbay; Ebru Aloğlu Çiftçi; Mehmet Utku Çiftçi; Nezih Ziroğlu; Cemal Kural

doi:10.47582/jompac.1884791

Assessing the accuracy, safety, and clinical utility of AI-generated rehabilitation guidance for Bankart lesions: a blinded comparative evaluation with expert protocols

Abstract

Aims: Artificial Intelligence (AI)–based language models are increasingly used to generate medical information and patient education materials. However, the reliability and safety of AI-generated rehabilitation guidance remain uncertain. This study aimed to evaluate the accuracy, safety, clinical utility, and readability of rehabilitation recommendations generated by ChatGPT-5 for Bankart lesions and to compare these outputs with expert-developed rehabilitation protocols. Methods: A blinded, cross-sectional comparative quality assessment was conducted. Standardized prompts regarding nonoperative and postoperative Bankart rehabilitation were used to generate responses from ChatGPT-5. AI-generated texts were compared with protocols prepared by a panel of orthopedic shoulder surgeons and an experienced physiotherapist. All texts were anonymized and independently evaluated by three blinded expert raters using a structured 5-point Likert scale assessing clinical accuracy, safety, actionability, comprehensiveness, and overall quality. Major clinical errors were recorded separately. Readability was assessed using Flesch Reading Ease and Flesch–Kincaid Grade Level scores. Inter-rater reliability was analyzed using intraclass correlation coefficients (ICC). Results: A total of 20 rehabilitation texts (10 AI-generated and 10 expert-developed) were evaluated. Expert protocols demonstrated significantly higher scores in clinical accuracy (4.6±0.4 vs 3.4±0.7, p<0.001), safety (4.8±0.3 vs 3.2±0.8, p<0.001), comprehensiveness (4.7±0.4 vs 3.1±0.9, p<0.001), and overall quality (4.6±0.4 vs 3.5±0.6, p<0.001). AI outputs were more readable (Flesch Reading Ease: 72.6±5.8 vs 58.4±6.2, p<0.01) but frequently lacked critical safety information. Major clinical errors were identified in 20% of AI-generated texts (2/10), whereas no major errors were detected in expert-developed protocols (0/10) (p<0.05). Inter-rater reliability showed good to excellent agreement across domains (ICC=0.80–0.89). Conclusion: Although ChatGPT-5 can produce well-structured and easily readable rehabilitation information for Bankart lesions, its outputs show significant deficiencies in safety, accuracy, and comprehensiveness. Unsupervised use of AI-generated rehabilitation guidance may pose clinically relevant risks. A hybrid model in which AI-generated content is reviewed and validated by clinicians represents a safer and more appropriate approach for integrating AI into postoperative rehabilitation education.

Keywords

References

Tupe RN, Tiwari V. Anteroinferior Glenoid Labrum Lesion (Bankart Lesion). In: StatPearls. Treasure Island (FL): StatPearls Publishing; 2025.
Utami SW, Pratiwi SR, Mitchel, Gani KS, Kholinne E. Return to sports following arthroscopic Bankart repair: a narrative review. Ewha Med J. 2024;47(2):e21. doi:10.12771/emj.2024.e21
Popchak A, Patterson-Lynch B, Christain H, Irrgang JJ. Rehabilitation and return to sports after anterior shoulder stabilization. Ann Jt. 2017;2: 62-62. doi:10.21037/aoj.2017.10.06
Villarreal-Espinosa JB, Reinold MM, Khak M, et al. Rehabilitation protocol variability following arthroscopic Bankart repair and remplissage for management of anterior shoulder instability: a systematic review. Int J Sports Phys Ther. 2024;19(10):1172-1187. doi:10. 26603/001c.123481
McIsaac W, Lalani A, Silveira A, Chepeha J, Luciak-Corea C, Beaupre L. Rehabilitation after arthroscopic Bankart repair: a systematic scoping review identifying important evidence gaps. Physiotherapy. 2022;114:68-76. doi:10.1016/j.physio.2021.03.014
Van Gastel ML, Van Iersel TP, Tutuhatunewa ED, et al. Developing a rehabilitation guideline (REGUIDE) for patients undergoing an arthroscopic Bankart repair after traumatic anterior shoulder dislocation, focusing on managing apprehension: an international delphi-based consensus study. J Orthop Sports Phys Ther. 2024;54(5):289-301. doi:10.2519/jospt.2024.12106
Drummond Junior M, Popchak A, Wilson K, Kane G, Lin A. Criteria-based return-to-sport testing is associated with lower recurrence rates following arthroscopic Bankart repair. J Shoulder Elbow Surg. 2021; 30(7S):S14-S20. doi:10.1016/j.jse.2021.03.141
Patel K, Radcliffe R. Evaluating the readability and quality of bladder cancer information from AI Chatbots: a comparative study between ChatGPT, Google Gemini, Grok, Claude and DeepSeek. J Clin Med. 2025;14(21):7804. doi:10.3390/jcm14217804

Koluman AC, Çiftçi MU, Çiftçi EA, Çakmur BB, Ziroğlu N. Balancing accuracy and readability: comparative evaluation of AI Chatbots for patient education on rotator cuff tears. Healthcare. 2025;13(21):2670. doi:10.3390/healthcare13212670
Çakmur BB, Koluman AC, Çiftçi MU, Aloğlu Çiftçi E, Ziroğlu N. Gemini 1.5 Flash provides the most reliable content while ChatGPT-4o offers the highest readability for patient education on meniscal tears. Knee Surg Sports Traumatol Arthrosc. 2026;34(3):1141-1149. doi:10.1002/ksa.70247
Oeding JF, Lu AZ, Mazzucco M, et al. Effectiveness of a large language model for clinical information retrieval regarding shoulder arthroplasty. J Exp Orthop. 2024;11(4):e70114. doi:10.1002/jeo2.70114
Johnson CK, Mandalia K, Corban J, Beall KE, Shah SS. Adequacy of ChatGPT responses to frequently asked questions about shoulder arthroplasty: is it an appropriate adjunct for patient education? JSES Int. 2025;9(3):830-836. doi:10.1016/j.jseint.2025.01.008
Lower K, Lin JY, Jenkin D, et al. Comparing the quality and readability of ChatGPT-4-Generated vs. human-generated patient education materials for total knee arthroplasty. Cureus. 2025;17(6):e86491. doi:10. 7759/cureus.86491
Luo M, Duan Z, Gao J, Sun Y, Chen L, Feng X. Evaluating the role of ChatGPT in rehabilitation medicine: a narrative review. Front Digit Health. 2025;7:1618510. doi:10.3389/fdgth.2025.1618510
Shoemaker SJ, Wolf MS, Brach C. Development of the patient education materials assessment tool (PEMAT): a new measure of understandability and actionability for print and audiovisual patient information. Patient Educ Couns. 2014;96(3):395-403. doi:10.1016/j.pec.2014.05.027
Eltorai AEM, Sharma P, Wang J, Daniels AH. Most American Academy of Orthopaedic Surgeons’ online patient education material exceeds average patient reading level. Clin Orthop. 2015;473(4):1181-1186. doi:10. 1007/s11999-014-4071-2
Fernández Dorado F, Álvarez Villar S, Osuna Mavare CA, Ruíz Díaz R, Díaz Heredia J, Ruiz Ibán MÁ. Evaluation of ChatGPT’s responses to frequently asked questions about shoulder arthroplasty. JSES Int. 2025; 9(5):1771-1777. doi:10.1016/j.jseint.2025.05.008
Kodra JD, Saroyan A, Darby F, et al. ChatGPT-generated responses across orthopaedic sports medicine surgery vary in accuracy, quality, and readability: a systematic review. Arthrosc Sports Med Rehabil. 2025; 7(4):101210. doi:10.1016/j.asmr.2025.101210
DeFroda SF, Mehta N, Owens BD. Physical therapy protocols for arthroscopic Bankart repair. Sports Health Multidiscip Approach. 2018; 10(3):250-258. doi:10.1177/1941738117750553
Rossettini G, Bargeri S, Cook C, et al. Accuracy of ChatGPT-3.5, ChatGPT-4o, Copilot, Gemini, Claude, and Perplexity in advising on lumbosacral radicular pain against clinical practice guidelines: cross-sectional study. Front Digit Health. 2025;7:1574287. doi:10.3389/fdgth. 2025.1574287
Duan L, Yao Z, Li X, Wu Y, Sheng D. Comparing large language models and human doctors in symptom-driven online medical consultations: a case study on trigeminal neuralgia. Digit Health. 2025;11: 20552076251388140. doi:10.1177/20552076251388140
Oviedo-Trespalacios O, Peden AE, Cole-Hunter T, et al. The risks of using ChatGPT to obtain common safety-related information and advice. Saf Sci. 2023;167:106244. doi:10.1016/j.ssci.2023.106244
Cornelison B, Axon DR, Abbott B, et al. Accuracy and safety of ChatGPT-3.5 in assessing over-the-counter medication use during pregnancy: a descriptive comparative study. Pharmacy. 2025;13(4):104. doi:10.3390/pharmacy13040104
Badarudeen S, Sabharwal S. Assessing readability of patient education materials: current role in orthopaedics. Clin Orthop. 2010;468(10):2572-2580. doi:10.1007/s11999-010-1380-y
Akinleye SD, Krochak R, Richardson N, Garofolo G, Culbertson MD, Erez O. Readability of the most commonly accessed arthroscopy-related online patient education materials. Arthrosc J Arthrosc Relat Surg. 2018;34(4):1272-1279. doi:10.1016/j.arthro.2017.09.043
Austin RR, Jantraporn R, Schulz C, Zhang R. Navigating online health information: assessing the quality and readability of dietary and herbal supplements for chronic musculoskeletal pain. CIN Comput Inform Nurs. 2024;42(8):547-554. doi:10.1097/CIN.0000000000001138
Gianola S, Bargeri S, Castellini G, et al. Performance of ChatGPT compared to clinical practice guidelines in making informed decisions for lumbosacral radicular pain: a cross-sectional study. J Orthop Sports Phys Ther. 2024;54(3):222-228. doi:10.2519/jospt.2024.12151

Details

Primary Language

English

Subjects

Orthopaedics

Journal Section

Research Article

Authors

Ali Can Koluman ^*
0000-0002-0191-3229
Türkiye

Ahmet Yiğitbay
0000-0002-7845-1974
Türkiye

Ebru Aloğlu Çiftçi
Türkiye

Mehmet Utku Çiftçi
0000-0001-5594-4138
Türkiye

Nezih Ziroğlu
0000-0002-2595-9459
Türkiye

Cemal Kural
0000-0001-7493-391X
Türkiye

Publication Date

March 27, 2026

Submission Date

February 8, 2026

Acceptance Date

February 27, 2026

Published in Issue

Year 2026 Volume: 7 Number: 2

DOI

https://doi.org/10.47582/jompac.1884791

IZ

https://izlik.org/JA43AJ56XR

Cite

RIS / Bibtex

APA

Koluman, A. C., Yiğitbay, A., Aloğlu Çiftçi, E., Çiftçi, M. U., Ziroğlu, N., & Kural, C. (2026). Assessing the accuracy, safety, and clinical utility of AI-generated rehabilitation guidance for Bankart lesions: a blinded comparative evaluation with expert protocols. Journal of Medicine and Palliative Care, 7(2), 274-281. https://doi.org/10.47582/jompac.1884791

AMA

1.Koluman AC, Yiğitbay A, Aloğlu Çiftçi E, Çiftçi MU, Ziroğlu N, Kural C. Assessing the accuracy, safety, and clinical utility of AI-generated rehabilitation guidance for Bankart lesions: a blinded comparative evaluation with expert protocols. J Med Palliat Care / JOMPAC / jompac. 2026;7(2):274-281. doi:10.47582/jompac.1884791

Chicago

Koluman, Ali Can, Ahmet Yiğitbay, Ebru Aloğlu Çiftçi, Mehmet Utku Çiftçi, Nezih Ziroğlu, and Cemal Kural. 2026. “Assessing the Accuracy, Safety, and Clinical Utility of AI-Generated Rehabilitation Guidance for Bankart Lesions: A Blinded Comparative Evaluation With Expert Protocols”. Journal of Medicine and Palliative Care 7 (2): 274-81. https://doi.org/10.47582/jompac.1884791.

EndNote

Koluman AC, Yiğitbay A, Aloğlu Çiftçi E, Çiftçi MU, Ziroğlu N, Kural C (March 1, 2026) Assessing the accuracy, safety, and clinical utility of AI-generated rehabilitation guidance for Bankart lesions: a blinded comparative evaluation with expert protocols. Journal of Medicine and Palliative Care 7 2 274–281.

IEEE

[1]A. C. Koluman, A. Yiğitbay, E. Aloğlu Çiftçi, M. U. Çiftçi, N. Ziroğlu, and C. Kural, “Assessing the accuracy, safety, and clinical utility of AI-generated rehabilitation guidance for Bankart lesions: a blinded comparative evaluation with expert protocols”, J Med Palliat Care / JOMPAC / jompac, vol. 7, no. 2, pp. 274–281, Mar. 2026, doi: 10.47582/jompac.1884791.

ISNAD

Koluman, Ali Can - Yiğitbay, Ahmet - Aloğlu Çiftçi, Ebru - Çiftçi, Mehmet Utku - Ziroğlu, Nezih - Kural, Cemal. “Assessing the Accuracy, Safety, and Clinical Utility of AI-Generated Rehabilitation Guidance for Bankart Lesions: A Blinded Comparative Evaluation With Expert Protocols”. Journal of Medicine and Palliative Care 7/2 (March 1, 2026): 274-281. https://doi.org/10.47582/jompac.1884791.

JAMA

1.Koluman AC, Yiğitbay A, Aloğlu Çiftçi E, Çiftçi MU, Ziroğlu N, Kural C. Assessing the accuracy, safety, and clinical utility of AI-generated rehabilitation guidance for Bankart lesions: a blinded comparative evaluation with expert protocols. J Med Palliat Care / JOMPAC / jompac. 2026;7:274–281.

MLA

Koluman, Ali Can, et al. “Assessing the Accuracy, Safety, and Clinical Utility of AI-Generated Rehabilitation Guidance for Bankart Lesions: A Blinded Comparative Evaluation With Expert Protocols”. Journal of Medicine and Palliative Care, vol. 7, no. 2, Mar. 2026, pp. 274-81, doi:10.47582/jompac.1884791.

Vancouver

1.Ali Can Koluman, Ahmet Yiğitbay, Ebru Aloğlu Çiftçi, Mehmet Utku Çiftçi, Nezih Ziroğlu, Cemal Kural. Assessing the accuracy, safety, and clinical utility of AI-generated rehabilitation guidance for Bankart lesions: a blinded comparative evaluation with expert protocols. J Med Palliat Care / JOMPAC / jompac. 2026 Mar. 1;7(2):274-81. doi:10.47582/jompac.1884791