Araştırma Makalesi

Large Language Models’ Responses to Patient Questions on Lateral Epicondylitis: Multi- Institutional Orthopaedic Surgeon Evaluation

Cilt: 7 Sayı: 2 2 Haziran 2026
PDF İndir
TR EN

Large Language Models’ Responses to Patient Questions on Lateral Epicondylitis: Multi- Institutional Orthopaedic Surgeon Evaluation

Öz

Background: Lateral epicondylitis (tennis elbow) is a common cause of elbow pain. With the increasing use of the internet and artificial intelligence (AI) for health information, large language models (LLMs) are frequently consulted by patients. This study aimed to evaluate the accuracy, reliability, content quality, and readability of responses provided by different large language models (ChatGPT-3.5, ChatGPT-4, Gemini, and Copilot) to frequently asked patient questions about lateral epicondylitis. Methods: The author committee reviewed patient-oriented questions on lateral epicondylitis using Google searches and selected the 12 most frequently asked questions for inclusion. These questions were presented to four LLMs: ChatGPT-3.5, ChatGPT-4, Gemini, and Copilot. Responses were evaluated for accuracy using a five-point Likert scale, reliability using the modified DISCERN scale, quality using the Global Quality Scale (GQS), and readability using the Flesch Reading Ease Score (FRES). Results: Perceived medical accuracy did not differ significantly among the LLMs (p = 0.579). Reliability differed significantly (modified DISCERN: p < 0.001), with Copilot and Gemini achieving higher scores than ChatGPT-4 (both p < 0.001) and Copilot also outperforming ChatGPT-3.5 (p = 0.002). Quality differed significantly (GQS: p < 0.001), with ChatGPT-3.5 and Gemini scoring higher than ChatGPT-4 (p = 0.001 and p = 0.006, respectively). Readability differed across models (FRES: p = 0.049); Gemini demonstrated higher readability than ChatGPT-3.5 (p = 0.040), while responses from all models were generally difficult to read. Response generation time differed significantly (p < 0.001), with ChatGPT-4 producing the slowest responses. Conclusions: All evaluated LLMs provided generally accurate and moderately reliable responses to questions about tennis elbow, with differences observed across specific quality domains such as source transparency, readability, and response time. Models with citation capabilities demonstrated higher reliability in terms of source transparency, while readability remained a common limitation. LLMs show potential as supplementary patient information tools in orthopaedic; however, further refinement and improved readability are needed before widespread clinical use.

Anahtar Kelimeler

tennis, elbow, pain, sports injuries, large language models, artificial intelligence, tendinitis

Destekleyen Kurum

The authors did not receive any financial support for the submitted work.

Etik Beyan

Yazarlar, bu çalışma ile ilgili herhangi bir çıkar çatışması bulunmadığını beyan etmektedir.

Teşekkür

Not applicable.

Kaynakça

  1. Finestone HM, Rabinovitch DL. Tennis elbow no more: practical eccentric and concentric exercises to heal the pain. Can Fam Physician. 2008;54(8):1115-6.
  2. Tyrrell Burrus M, Werner BC, Starman JS, Kurkis GM, Pierre JM, Diduch DR, et al. Patient perceptions and current trends in internet use by orthopedic outpatients. HSS J. 2017;13(3):271-5.
  3. Koenig S, Nadarajah V, Smuda MP, Meredith S, Packer JD, Henn RF. Patients' use and perception of internet-based orthopaedic sports medicine resources. Orthop J Sports Med. 2018;6(9):232596711879646.
  4. Krempec J, Hall J, Biermann JS. Internet use by patients in orthopaedic surgery. Iowa Orthop J. 2003;23:80-2.
  5. Abu Arqub S, Al-Moghrabi D, Allareddy V, Upadhyay M, Vaid N, Yadav S. Content analysis of AI-generated (ChatGPT) responses concerning orthodontic clear aligners. Angle Orthod. 2024;94(3):263-72.
  6. Nagendraswamy C, Amogh S. A review article on artificial intelligence. Ann Biomed Sci Eng. 2021;5(1):13-4.
  7. Chakraborty C, Pal S, Bhattacharya M, Dash S, Lee SS. Overview of chatbots with special emphasis on artificial intelligence-enabled ChatGPT in medical science. Front Artif Intell. 2023;6:1253929.
  8. Johnson D, Goodman R, Patrinely J, Stone C, Zimmerman E, Donald R, et al. Assessing the accuracy and reliability of AI-generated medical responses: an evaluation of the ChatGPT model. Res Sq. 2023. doi:10.21203/rs.3.rs-2566942/v1.
  9. Onder CE, Koc G, Gokbulut P, Taskaldiran I, Kuskonmaz SM. Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy. Sci Rep. 2024;14(1):243.
  10. Giorgino R, Alessandri-Bonetti M, Del Re M, Verdoni F, Peretti GM, Mangiavini L. Google Bard and ChatGPT in orthopedics: which is the better doctor in sports medicine and pediatric orthopedics? The role of AI in patient education. Diagnostics (Basel). 2024;14(12):1253.

Kaynak Göster

APA
Geçer, A., Kaya, E., Kendirci, A. Ş., Paksoy, A., & Akgün, D. (2026). Large Language Models’ Responses to Patient Questions on Lateral Epicondylitis: Multi- Institutional Orthopaedic Surgeon Evaluation. Archives of Current Medical Research, 7(2), 321-330. https://doi.org/10.47482/acmr.1778992
AMA
1.Geçer A, Kaya E, Kendirci AŞ, Paksoy A, Akgün D. Large Language Models’ Responses to Patient Questions on Lateral Epicondylitis: Multi- Institutional Orthopaedic Surgeon Evaluation. Arch Curr Med Res. 2026;7(2):321-330. doi:10.47482/acmr.1778992
Chicago
Geçer, Ali, Emre Kaya, Alper Şükrü Kendirci, Alp Paksoy, ve Doruk Akgün. 2026. “Large Language Models’ Responses to Patient Questions on Lateral Epicondylitis: Multi- Institutional Orthopaedic Surgeon Evaluation”. Archives of Current Medical Research 7 (2): 321-30. https://doi.org/10.47482/acmr.1778992.
EndNote
Geçer A, Kaya E, Kendirci AŞ, Paksoy A, Akgün D (01 Haziran 2026) Large Language Models’ Responses to Patient Questions on Lateral Epicondylitis: Multi- Institutional Orthopaedic Surgeon Evaluation. Archives of Current Medical Research 7 2 321–330.
IEEE
[1]A. Geçer, E. Kaya, A. Ş. Kendirci, A. Paksoy, ve D. Akgün, “Large Language Models’ Responses to Patient Questions on Lateral Epicondylitis: Multi- Institutional Orthopaedic Surgeon Evaluation”, Arch Curr Med Res, c. 7, sy 2, ss. 321–330, Haz. 2026, doi: 10.47482/acmr.1778992.
ISNAD
Geçer, Ali - Kaya, Emre - Kendirci, Alper Şükrü - Paksoy, Alp - Akgün, Doruk. “Large Language Models’ Responses to Patient Questions on Lateral Epicondylitis: Multi- Institutional Orthopaedic Surgeon Evaluation”. Archives of Current Medical Research 7/2 (01 Haziran 2026): 321-330. https://doi.org/10.47482/acmr.1778992.
JAMA
1.Geçer A, Kaya E, Kendirci AŞ, Paksoy A, Akgün D. Large Language Models’ Responses to Patient Questions on Lateral Epicondylitis: Multi- Institutional Orthopaedic Surgeon Evaluation. Arch Curr Med Res. 2026;7:321–330.
MLA
Geçer, Ali, vd. “Large Language Models’ Responses to Patient Questions on Lateral Epicondylitis: Multi- Institutional Orthopaedic Surgeon Evaluation”. Archives of Current Medical Research, c. 7, sy 2, Haziran 2026, ss. 321-30, doi:10.47482/acmr.1778992.
Vancouver
1.Ali Geçer, Emre Kaya, Alper Şükrü Kendirci, Alp Paksoy, Doruk Akgün. Large Language Models’ Responses to Patient Questions on Lateral Epicondylitis: Multi- Institutional Orthopaedic Surgeon Evaluation. Arch Curr Med Res. 01 Haziran 2026;7(2):321-30. doi:10.47482/acmr.1778992