Performance of Deepseek vs. Established Large Language Models in Answering Frequently Asked Questions About Refractive Surgery

Yavuz Kemal Arıbaş; Atike Burcin Tefon Aribas

doi:10.47482/acmr.1893217

TR EN

Refraktif Cerrahiye İlişkin Sık Sorulan Soruların Yanıtlanmasında DeepSeek ile Yerleşik Büyük Dil Modellerinin Performansının Karşılaştırılması

Öz

Amaç: Bu çalışmanın amacı, dört büyük dil modelinin (LLM)—ChatGPT, DeepSeek, Gemini ve Copilot—lazer refraktif cerrahi ile ilgili hastalar tarafından sık sorulan sorulara verdikleri yanıtların performansını değerlendirmek ve karşılaştırmaktır. Yöntem: Bu kesitsel, klinik dışı çalışmada, refraktif cerrahi ile ilgili hasta odaklı 25 soru dört LLM’ye yöneltildi. İki göz hastalıkları uzmanı yanıtların doğruluk ve kapsamını Likert ölçekleri kullanarak bağımsız olarak değerlendirdi. Bilgi kalitesi DISCERN aracı ile, okunabilirlik ise Flesch Reading Ease (FRE) ve Flesch–Kincaid Grade Level (FKGL) indeksleri ile değerlendirildi. İstatistiksel analizlerde Friedman testi ve Bonferroni düzeltmeli Wilcoxon signed-rank post-hoc karşılaştırmaları kullanıldı. Değerlendiriciler arası uyum Cohen’in kappa katsayısı ile değerlendirildi. Bulgular: Değerlendiriciler arası uyum doğruluk için yüksek düzeyde (κ = 0.650, p < 0.001) ve kapsam için orta düzeyde (κ = 0.533, p < 0.001) bulundu. ChatGPT ve DeepSeek doğruluk ve kapsam açısından en yüksek puanları elde etti ve aralarında anlamlı fark saptanmadı. Copilot her iki modele göre anlamlı derecede daha düşük performans gösterdi (sırasıyla p = 0.003 ve p = 0.031), Gemini ise orta düzey performans sergiledi. DISCERN skorları tüm modelleri iyi kalite aralığında (54–58/75) gösterdi. Referans sağlanması istendiğinde DeepSeek en büyük artışı (+7 puan) göstererek “mükemmel” kategorisine ulaştı. Tüm modeller “zor” okunabilirlik aralığında metin üretti; DeepSeek en erişilebilir metni oluştururken (FRE = 45.5; FKGL = 9.1), Gemini en yüksek okuma düzeyini gerektirdi (FRE = 35.2; FKGL = 12.7). Sonuç: Büyük dil modelleri, refraktif cerrahi ile ilgili hasta sorularına makul düzeyde doğru yanıtlar sağlayabilmektedir. Ancak bilgi kalitesi ve okunabilirlik açısından gözlenen farklılıklar, bu araçların hasta eğitimi amacıyla kullanımında klinisyen gözetiminin önemini ortaya koymaktadır.

Anahtar Kelimeler

Yapay Zekâ, Büyük Dil Modelleri, Hasta Eğitimi, Refraktif Cerrahi

Etik Beyan

Bu çalışma klinik olmayan nitelikte olup insan katılımcı, hasta verisi veya biyolojik materyal içermemektedir. Bu nedenle etik kurul onayı ve bilgilendirilmiş onam gerekmemektedir.

Performance of Deepseek vs. Established Large Language Models in Answering Frequently Asked Questions About Refractive Surgery

Öz

Background: To evaluate and compare the performance of four large language models (LLMs)—ChatGPT, DeepSeek, Gemini, and Copilot—in answering frequently asked patient questions on laser refractive surgery. Methods: This cross-sectional, non-clinical study evaluated 25 patient-centered refractive surgery questions posed to four LLMs. Two ophthalmologists independently rated response accuracy and completeness using Likert scales. Information quality was assessed using the DISCERN instrument, and readability using the Flesch Reading Ease (FRE) and Flesch–Kincaid Grade Level (FKGL). Statistical analysis included the Friedman test with Wilcoxon signed-rank post-hoc comparisons using Bonferroni cor-rection. Cohen’s kappa assessed inter-rater reliability. Results: Inter-rater agreement was substantial for accuracy (κ = 0.650, p < 0.001) and moderate for completeness (κ = 0.533, p < 0.001). ChatGPT and DeepSeek achieved the highest accuracy and completeness scores with no significant difference between them. Copilot performed significantly worse than both (p = 0.003 and p = 0.031, respectively), while Gemini showed interme-diate performance. DISCERN scores placed all models in the good range (54–58/75). When prompted to provide references, DeepSeek showed the greatest improvement (+7 points), reaching the outstanding category. All models produced responses in the “difficult” readability range; DeepSeek generated the most accessible text (FRE = 45.5; FKGL = 9.1), whereas Gemini required the highest reading level (FRE = 35.2; FKGL = 12.7). Conclusion: Large language models can provide reasonably accurate responses to refractive surgery–related patient questions. However, variability in information quality and readability highlights the importance of clinician oversight when using these tools for patient education.

Anahtar Kelimeler

Artificial Intelligence, Large Language Models, Patient Education, Refractive Surgery

Destekleyen Kurum

The authors received no financial support or funding for the research, authorship, or publication of this article.

Etik Beyan

This study is non-clinical in nature and did not involve human participants, patient data, or biological materials. Therefore, ethics committee approval and informed consent were not required.

Teşekkür

None

Kaynakça

Faith SC, Jhanji V. Refractive Surgery: History in the Making. Asia-Pacific Journal of Ophthalmology. 2017;6(5):401-2.
Vought R, Vought V, Herzog I, Greenstein SA. EQIP Quality Assessment of Refractive Surgery Resources on YouTube. Seminars in Ophthalmology. 2023;38(8):768-72.
Kim T-i, Alió del Barrio JL, Wilkins M, Cochener B, Ang M. Refractive surgery. The Lancet. 2019;393(10185):2085-98.
Hunsaker A, Hargittai E, Micheli M. Relationship Between Internet Use and Change in Health Status: Panel Study of Young Adults. J Med Internet Res. 2021;23(1):e22051.
Mirzaei A, Aslani P, Luca EJ, Schneider CR. Predictors of Health Information-Seeking Behavior: Systematic Literature Review and Network Analysis. J Med Internet Res. 2021;23(7):e21680.
Kanclerz P, Przewłócka K. Internet as a main source of information before corneal refractive surgery. Journal of Cataract & Refractive Surgery. 2021;47(3):413-4.
Ali MJ. DeepSeek(TM) and lacrimal drainage disorders: hype or is it performing better than ChatGPT(TM)? Orbit. 2025:1-7.
Ophthalmologists TRCo. Patient Information Laser Vision Correction: The Royal College of Ophthalmologists; 2024 [Available from: ].
Charnock D, Shepperd S, Needham G, Gann R. DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. J Epidemiol Community Health. 1999;53(2):105-11.
Kincaid P, Fishburne RP, Rogers RL, Chissom BS, editors. Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel 1975.

Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159-74.
Fiel Peres F. Effect sizes for nonparametric tests. Biochem Med (Zagreb). 2026;36(1):010101.
Rooney MK, Santiago G, Perni S, Horowitz DP, McCall AR, Einstein AJ, et al. Readability of Patient Education Materials From High-Impact Medical Journals: A 20-Year Analysis. J Patient Exp. 2021;8:2374373521998847.
Durmaz Engin C, Karatas E, Ozturk T. Exploring the Role of ChatGPT-4, BingAI, and Gemini as Virtual Consultants to Educate Families about Retinopathy of Prematurity. Children (Basel). 2024;11(6).
Demir S. Evaluation of Responses to Questions About Keratoconus Using ChatGPT-4.0, Google Gemini and Microsoft Copilot: A Comparative Study of Large Language Models on Keratoconus. Eye Contact Lens. 2025;51(3):e107-e11.
Demir S. Investigating the role of large language models on questions about refractive surgery. Int J Med Inform. 2025;195:105787.
Antaki F, Touma S, Milad D, El-Khoury J, Duval R. Evaluating the Performance of ChatGPT in Ophthalmology: An Analysis of Its Successes and Shortcomings. Ophthalmol Sci. 2023;3(4):100324.
Mihalache A, Popovic MM, Muni RH. Performance of an Artificial Intelligence Chatbot in Ophthalmic Knowledge Assessment. JAMA Ophthalmol. 2023;141(6):589-97.
Raimondi R, Tzoumas N, Salisbury T, Di Simplicio S, Romano MR. Comparative analysis of large language models in the Royal College of Ophthalmologists fellowship exams. Eye (Lond). 2023;37(17):3530-3.
Aydın FO, Aksoy BK, Ceylan A, Akbaş YB, Ermiş S, Kepez Yıldız B, et al. Readability and Appropriateness of Responses Generated by ChatGPT 3.5, ChatGPT 4.0, Gemini, and Microsoft Copilot for FAQs in Refractive Surgery. Turk J Ophthalmol. 2024;54(6):313-7.
Zhou H, Wang Z, Wang R, Jiang L, Zhu C, Guo H, et al. DeepSeek Versus GPT: Evaluation of Large Language Model Chatbots' Responses on Orofacial Clefts. J Craniofac Surg. 2025.
Yalla GR, Hyman N, Hock LE, Zhang Q, Shukla AG, Kolomeyer NN. Performance of Artificial Intelligence Chatbots on Glaucoma Questions Adapted From Patient Brochures. Cureus. 2024;16(3):e56766.
Wang Y, Liang L, Li R, Wang Y, Hao C. Comparison of the Performance of ChatGPT, Claude and Bard in Support of Myopia Prevention and Control. J Multidiscip Healthc. 2024;17:3917-29.
Zalzal HG, Abraham A, Cheng J, Shah RK. Can ChatGPT help patients answer their otolaryngology questions? Laryngoscope Investig Otolaryngol. 2024;9(1):e1193.
Carnino JM, Pellegrini WR, Willis M, Cohen MB, Paz-Lansberg M, Davis EM, et al. Assessing ChatGPT's Responses to Otolaryngology Patient Questions. Ann Otol Rhinol Laryngol. 2024;133(7):658-64.
Kedia N, Sanjeev S, Ong J, Chhablani J. ChatGPT and Beyond: An overview of the growing field of large language models and their use in ophthalmology. Eye (Lond). 2024;38(7):1252-61.
Fowler T, Pullen S, Birkett L. Performance of ChatGPT and Bard on the official part 1 FRCOphth practice questions. Br J Ophthalmol. 2023.
Brin D, Sorin V, Vaid A, Soroush A, Glicksberg BS, Charney AW, et al. Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments. Scientific Reports. 2023;13(1):16492.
Ayers JW, Poliak A, Dredze M, Leas EC, Zhu Z, Kelley JB, et al. Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum. JAMA Intern Med. 2023;183(6):589-96.
Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2):e0000198.
Zeng D, Qin Y, Sheng B, Wong TY. DeepSeek's "Low-Cost" Adoption Across China's Hospital Systems: Too Fast, Too Soon? JAMA. 2025.
Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, et al. Large language models encode clinical knowledge. Nature. 2023;620(7972):172-80.
Rasu RS, Bawa WA, Suminski R, Snella K, Warady B. Health Literacy Impact on National Healthcare Utilization and Expenditure. Int J Health Policy Manag. 2015;4(11):747-55.
Aydin S, Karabacak M, Vlachos V, Margetis K. Large language models in patient education: a scoping review of applications in medicine. Frontiers in Medicine. 2024;11.
Aydin S, Karabacak M, Vlachos V, Margetis K. Navigating the potential and pitfalls of large language models in patient-centered medication guidance and self-decision support. Frontiers in Medicine. 2025;12.
Bélisle-Pipon J-C. Why we need to be careful with LLMs in medicine. Frontiers in Medicine. 2024;11.
Jung KH. Large Language Models in Medicine: Clinical Applications, Technical Challenges, and Ethical Considerations. Healthc Inform Res. 2025;31(2):114-24.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Cerrahi (Diğer)

Bölüm

Araştırma Makalesi

Yazarlar

Yavuz Kemal Arıbaş ^*
0000-0001-5516-0747
Türkiye

Atike Burcin Tefon Aribas Bu kişi benim
0000-0002-5248-9747
Türkiye

Yayımlanma Tarihi

2 Haziran 2026

Gönderilme Tarihi

19 Şubat 2026

Kabul Tarihi

20 Nisan 2026

Yayımlandığı Sayı

Yıl 2026 Cilt: 7 Sayı: 2

DOI

https://doi.org/10.47482/acmr.1893217

IZ

https://izlik.org/JA74XR49ZU

APA

Arıbaş, Y. K., & Tefon Aribas, A. B. (2026). Performance of Deepseek vs. Established Large Language Models in Answering Frequently Asked Questions About Refractive Surgery. Archives of Current Medical Research, 7(2), 339-347. https://doi.org/10.47482/acmr.1893217

AMA

1.Arıbaş YK, Tefon Aribas AB. Performance of Deepseek vs. Established Large Language Models in Answering Frequently Asked Questions About Refractive Surgery. Arch Curr Med Res. 2026;7(2):339-347. doi:10.47482/acmr.1893217

Chicago

Arıbaş, Yavuz Kemal, ve Atike Burcin Tefon Aribas. 2026. “Performance of Deepseek vs. Established Large Language Models in Answering Frequently Asked Questions About Refractive Surgery”. Archives of Current Medical Research 7 (2): 339-47. https://doi.org/10.47482/acmr.1893217.

EndNote

Arıbaş YK, Tefon Aribas AB (01 Haziran 2026) Performance of Deepseek vs. Established Large Language Models in Answering Frequently Asked Questions About Refractive Surgery. Archives of Current Medical Research 7 2 339–347.

IEEE

[1]Y. K. Arıbaş ve A. B. Tefon Aribas, “Performance of Deepseek vs. Established Large Language Models in Answering Frequently Asked Questions About Refractive Surgery”, Arch Curr Med Res, c. 7, sy 2, ss. 339–347, Haz. 2026, doi: 10.47482/acmr.1893217.

ISNAD

Arıbaş, Yavuz Kemal - Tefon Aribas, Atike Burcin. “Performance of Deepseek vs. Established Large Language Models in Answering Frequently Asked Questions About Refractive Surgery”. Archives of Current Medical Research 7/2 (01 Haziran 2026): 339-347. https://doi.org/10.47482/acmr.1893217.

JAMA

1.Arıbaş YK, Tefon Aribas AB. Performance of Deepseek vs. Established Large Language Models in Answering Frequently Asked Questions About Refractive Surgery. Arch Curr Med Res. 2026;7:339–347.

MLA

Arıbaş, Yavuz Kemal, ve Atike Burcin Tefon Aribas. “Performance of Deepseek vs. Established Large Language Models in Answering Frequently Asked Questions About Refractive Surgery”. Archives of Current Medical Research, c. 7, sy 2, Haziran 2026, ss. 339-47, doi:10.47482/acmr.1893217.

Vancouver

1.Yavuz Kemal Arıbaş, Atike Burcin Tefon Aribas. Performance of Deepseek vs. Established Large Language Models in Answering Frequently Asked Questions About Refractive Surgery. Arch Curr Med Res. 01 Haziran 2026;7(2):339-47. doi:10.47482/acmr.1893217

Archives of Current Medical Research (ACMR), araştırmaları ücretsiz sunmanın daha büyük bir küresel bilgi alışverişini desteklediğini göz önünde bulundurarak, tüm içeriğe anında açık erişim sağlar. Kamunun erişimine açık olması, daha büyük bir küresel bilgi alışverişini destekler.

http://www.acmronline.org/

Refraktif Cerrahiye İlişkin Sık Sorulan Soruların Yanıtlanmasında DeepSeek ile Yerleşik Büyük Dil Modellerinin Performansının Karşılaştırılması

Öz

Anahtar Kelimeler

Etik Beyan

Performance of Deepseek vs. Established Large Language Models in Answering Frequently Asked Questions About Refractive Surgery

Öz

Anahtar Kelimeler

Destekleyen Kurum

Etik Beyan

Teşekkür

Kaynakça

Ayrıntılar

Birincil Dil

Konular

Bölüm

Yazarlar

Yayımlanma Tarihi

Gönderilme Tarihi

Kabul Tarihi

Yayımlandığı Sayı

DOI

IZ

Kaynak Göster