Araştırma Makalesi

Performance of Deepseek vs. Established Large Language Models in Answering Frequently Asked Questions About Refractive Surgery

Cilt: 7 Sayı: 2 2 Haziran 2026
Yavuz Kemal Arıbaş *, Atike Burcin Tefon Aribas
PDF İndir
TR EN

Performance of Deepseek vs. Established Large Language Models in Answering Frequently Asked Questions About Refractive Surgery

Öz

Background: To evaluate and compare the performance of four large language models (LLMs)—ChatGPT, DeepSeek, Gemini, and Copilot—in answering frequently asked patient questions on laser refractive surgery. Methods: This cross-sectional, non-clinical study evaluated 25 patient-centered refractive surgery questions posed to four LLMs. Two ophthalmologists independently rated response accuracy and completeness using Likert scales. Information quality was assessed using the DISCERN instrument, and readability using the Flesch Reading Ease (FRE) and Flesch–Kincaid Grade Level (FKGL). Statistical analysis included the Friedman test with Wilcoxon signed-rank post-hoc comparisons using Bonferroni cor-rection. Cohen’s kappa assessed inter-rater reliability. Results: Inter-rater agreement was substantial for accuracy (κ = 0.650, p < 0.001) and moderate for completeness (κ = 0.533, p < 0.001). ChatGPT and DeepSeek achieved the highest accuracy and completeness scores with no significant difference between them. Copilot performed significantly worse than both (p = 0.003 and p = 0.031, respectively), while Gemini showed interme-diate performance. DISCERN scores placed all models in the good range (54–58/75). When prompted to provide references, DeepSeek showed the greatest improvement (+7 points), reaching the outstanding category. All models produced responses in the “difficult” readability range; DeepSeek generated the most accessible text (FRE = 45.5; FKGL = 9.1), whereas Gemini required the highest reading level (FRE = 35.2; FKGL = 12.7). Conclusion: Large language models can provide reasonably accurate responses to refractive surgery–related patient questions. However, variability in information quality and readability highlights the importance of clinician oversight when using these tools for patient education.

Anahtar Kelimeler

Artificial Intelligence, Large Language Models, Patient Education, Refractive Surgery

Destekleyen Kurum

The authors received no financial support or funding for the research, authorship, or publication of this article.

Etik Beyan

Bu çalışma klinik olmayan nitelikte olup insan katılımcı, hasta verisi veya biyolojik materyal içermemektedir. Bu nedenle etik kurul onayı ve bilgilendirilmiş onam gerekmemektedir.

Teşekkür

None

Kaynakça

  1. Faith SC, Jhanji V. Refractive Surgery: History in the Making. Asia-Pacific Journal of Ophthalmology. 2017;6(5):401-2.
  2. Vought R, Vought V, Herzog I, Greenstein SA. EQIP Quality Assessment of Refractive Surgery Resources on YouTube. Seminars in Ophthalmology. 2023;38(8):768-72.
  3. Kim T-i, Alió del Barrio JL, Wilkins M, Cochener B, Ang M. Refractive surgery. The Lancet. 2019;393(10185):2085-98.
  4. Hunsaker A, Hargittai E, Micheli M. Relationship Between Internet Use and Change in Health Status: Panel Study of Young Adults. J Med Internet Res. 2021;23(1):e22051.
  5. Mirzaei A, Aslani P, Luca EJ, Schneider CR. Predictors of Health Information-Seeking Behavior: Systematic Literature Review and Network Analysis. J Med Internet Res. 2021;23(7):e21680.
  6. Kanclerz P, Przewłócka K. Internet as a main source of information before corneal refractive surgery. Journal of Cataract & Refractive Surgery. 2021;47(3):413-4.
  7. Ali MJ. DeepSeek(TM) and lacrimal drainage disorders: hype or is it performing better than ChatGPT(TM)? Orbit. 2025:1-7.
  8. Ophthalmologists TRCo. Patient Information Laser Vision Correction: The Royal College of Ophthalmologists; 2024 [Available from: ].
  9. Charnock D, Shepperd S, Needham G, Gann R. DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. J Epidemiol Community Health. 1999;53(2):105-11.
  10. Kincaid P, Fishburne RP, Rogers RL, Chissom BS, editors. Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel 1975.

Kaynak Göster

APA
Arıbaş, Y. K., & Tefon Aribas, A. B. (2026). Performance of Deepseek vs. Established Large Language Models in Answering Frequently Asked Questions About Refractive Surgery. Archives of Current Medical Research, 7(2), 339-347. https://doi.org/10.47482/acmr.1893217
AMA
1.Arıbaş YK, Tefon Aribas AB. Performance of Deepseek vs. Established Large Language Models in Answering Frequently Asked Questions About Refractive Surgery. Arch Curr Med Res. 2026;7(2):339-347. doi:10.47482/acmr.1893217
Chicago
Arıbaş, Yavuz Kemal, ve Atike Burcin Tefon Aribas. 2026. “Performance of Deepseek vs. Established Large Language Models in Answering Frequently Asked Questions About Refractive Surgery”. Archives of Current Medical Research 7 (2): 339-47. https://doi.org/10.47482/acmr.1893217.
EndNote
Arıbaş YK, Tefon Aribas AB (01 Haziran 2026) Performance of Deepseek vs. Established Large Language Models in Answering Frequently Asked Questions About Refractive Surgery. Archives of Current Medical Research 7 2 339–347.
IEEE
[1]Y. K. Arıbaş ve A. B. Tefon Aribas, “Performance of Deepseek vs. Established Large Language Models in Answering Frequently Asked Questions About Refractive Surgery”, Arch Curr Med Res, c. 7, sy 2, ss. 339–347, Haz. 2026, doi: 10.47482/acmr.1893217.
ISNAD
Arıbaş, Yavuz Kemal - Tefon Aribas, Atike Burcin. “Performance of Deepseek vs. Established Large Language Models in Answering Frequently Asked Questions About Refractive Surgery”. Archives of Current Medical Research 7/2 (01 Haziran 2026): 339-347. https://doi.org/10.47482/acmr.1893217.
JAMA
1.Arıbaş YK, Tefon Aribas AB. Performance of Deepseek vs. Established Large Language Models in Answering Frequently Asked Questions About Refractive Surgery. Arch Curr Med Res. 2026;7:339–347.
MLA
Arıbaş, Yavuz Kemal, ve Atike Burcin Tefon Aribas. “Performance of Deepseek vs. Established Large Language Models in Answering Frequently Asked Questions About Refractive Surgery”. Archives of Current Medical Research, c. 7, sy 2, Haziran 2026, ss. 339-47, doi:10.47482/acmr.1893217.
Vancouver
1.Yavuz Kemal Arıbaş, Atike Burcin Tefon Aribas. Performance of Deepseek vs. Established Large Language Models in Answering Frequently Asked Questions About Refractive Surgery. Arch Curr Med Res. 01 Haziran 2026;7(2):339-47. doi:10.47482/acmr.1893217