Evaluating AI Chatbots for Pediatric Contact Lenses: A Study on Accuracy, Readability, and Reliability

Mehmet Ömer Kırıştıoğlu; Meral Yıldız; Sevde İşleker; Esin Söğütlü Sarı; Ahmet Özmen; Mehmet Baykara

doi:10.32708/uutfd.1780297

TR EN

Pediatrik Kontakt Lensler için Büyük Dil Modeli Sohbet Botlarının Değerlendirilmesi: Doğruluk, Okunabilirlik ve Güvenilirlik

Öz

Bu çalışma, pediatrik kontakt lenslerle ilgili sorulara verilen yapay zekâ tabanlı sohbet robotu yanıtlarının doğruluk, okunabilirlik ve kapsamlılık açısından değerlendirilmesini, uzman değerlendirmeleri ve okunabilirlik ölçütleri kullanarak incelemiştir. ChatGPT-4o, Gemini 1.5, Perplexity, Copilot ve Claude 3.5 Sonnet olmak üzere beş büyük dil modeli, 28 adet seçilmiş soru ile test edilmiştir. Yanıtlar, DISCERN ve PEMAT-P gibi doğrulanmış araçlar, doğruluk ve kapsamlılık için 5 puanlık Likert ölçekleri ve çeşitli okunabilirlik indeksleri kullanılarak, iki pediatrik oftalmoloji uzmanı tarafından değerlendirilmiştir. Uzman yanıtları yalnızca okunabilirlik karşılaştırmalarında kullanılmıştır. ChatGPT’nin yanıtları en uzun (p<0,0001) ve en ayrıntılı olanlardı. Doğruluk ve kapsamlılık skorları modeller arasında anlamlı farklılık göstermiş (p=0,0216, p=0,0067) ve ChatGPT, Perplexity’den daha iyi performans sergilemiştir (p=0,0173, p=0,0087). Uzman yanıtları daha kısa olmakla birlikte okunabilirlik indekslerinde daha yüksek karmaşıklık göstermiştir. Tekrarlanabilirlik genel pediatrik kontakt lens sorularında yüksek bulunurken, afakik lenslerle ilgili sorularda anlamlı derecede düşük saptanmıştır (p=0,041). Özellikle afakik kontakt lens konularında bazı olgusal hatalar tespit edilmiştir. Büyük dil modelleri, hasta eğitimi materyallerini erişilebilir hale getirse de doğruluk ve bütünlükteki değişkenlik uzman gözetiminin önemini vurgulamaktadır. Bu çalışma, uzmanların sohbet robotu yanıtlarını değerlendirmesini yansıtmakta olup, uzman yanıtlarıyla doğrudan bir karşılaştırma sunmamaktadır. Yapay zekâ sohbet robotları, pediatrik oftalmolojide klinik uzmanlığın yerini almak yerine onu tamamlayıcı bir araç olabilir.

Anahtar Kelimeler

Evaluating AI Chatbots for Pediatric Contact Lenses: A Study on Accuracy, Readability, and Reliability

Öz

This study evaluated the accuracy, readability, and comprehensiveness of patient-facing responses generated by LLM-based chatbot platforms to pediatric contact lens (CL)–related questions, using expert grading and readability benchmarking. Five platforms (ChatGPT-4o, Gemini 1.5, Perplexity, Copilot, and Claude 3.5 Sonnet) were assessed using 28 curated questions. Two pediatric ophthalmologists graded anonymized outputs using DISCERN and PEMAT-P, 5-point Likert scales for accuracy and comprehensiveness, and multiple automated readability indices. Expert-written responses were included only for readability benchmarking. ChatGPT-4o produced the longest responses (p<0.0001). Accuracy and comprehensiveness differed across platforms (p=0.0216 and p=0.0067), with ChatGPT-4o scoring higher than Perplexity in post-hoc comparisons (p=0.0173 and p=0.0087). Expert responses were shorter but showed higher complexity on readability indices. Accuracy-based reproducibility was high for general pediatric CL queries but lower for aphakic CL–related questions (p=0.041), and factual inaccuracies were more frequent in aphakic topics. While LLMs may support patient education, variability in correctness and completeness underscores the need for expert oversight; these tools should complement, not replace, clinical expertise in pediatric CL usage.

Anahtar Kelimeler

Destekleyen Kurum

Yok

Etik Beyan

Etik Kurul Onay Bilgisi: Bu çalışma herhangi bir hasta katılımı, insan verisi veya hayvan deneyi içermediğinden etik kurul onayı alınmasına gerek duyulmamıştır.

Teşekkür

Yok

Kaynakça

1. Korngiebel DM, Mooney SD. Considering the possibilities and pitfalls of Generative Pre-trained Transformer 3 (GPT-3) in healthcare delivery. NPJ Digit Med. Jun 3 2021;4(1):93. doi:10.1038/s41746-021-00464-x
2. Wang L, Wan Z, Ni C, et al. A Systematic Review of ChatGPT and Other Conversational Large Language Models in Healthcare. medRxiv. Apr 27 2024;doi:10.1101/2024.04.26.24306390
3. Alowais SA, Alghamdi SS, Alsuhebany N, et al. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC Medical Education. 2023/09/22 2023;23(1):689. doi:10.1186/s12909-023-04698-z
4. Sengor T, Gencaga Atakan T. Management of Contact Lenses and Visual Development in Pediatric Aphakia. Turk J Ophthalmol. Apr 19 2024;54(2):90-102. doi:10.4274/tjo.galenos.2023.56252
5. Tomiyama ES, Kobia-Acquah E, Ansari SM, et al. Scoping review: Reporting characteristics for the safety of contact lenses in the pediatric population. Optom Vis Sci. Jul 16 2024;doi:10.1097/OPX.0000000000002156
6. Bullimore MA, Richdale K. Incidence of Corneal Adverse Events in Children Wearing Soft Contact Lenses. Eye Contact Lens. May 1 2023;49(5):204-211. doi:10.1097/ICL.0000000000000976
7. Ezinne NE, Bhattarai D, Ekemiri KK, et al. Demographic profiles of contact lens wearers and their association with lens wear characteristics in Trinidad and Tobago: A retrospective study. PLoS One. 2022;17(7):e0264659. doi:10.1371/journal.pone.0264659
8. Bullimore MA. The Safety of Soft Contact Lenses in Children. Optom Vis Sci. Jun 2017;94(6):638-646. doi:10.1097/OPX.0000000000001078

9. Lazarus DR. Can Children Wear Contact Lenses? Accessed 12/08/2024, 2024. https://www.optometrists.org/childrens-vision/guide-to-childrens-eye-exams/can-kids-wear-contact-lenses/
10. de Brabander J, Kok JH, Nuijts RM, Wenniger-Prick LJ. A practical approach to and long-term results of fitting silicone contact lenses in aphakic children after congenital cataract. CLAO J. Jan 2002;28(1):31-5.
11. Vincent SJ. The use of contact lenses in low vision rehabilitation: optical and therapeutic applications. Clin Exp Optom. Sep 2017;100(5):513-521. doi:10.1111/cxo.12562
12. Gallifant J, Afshar M, Ameen S, et al. The TRIPOD-LLM reporting guideline for studies using large language models. Nat Med. Jan 2025;31(1):60-69. doi:10.1038/s41591-024-03425-5
13. Garcia-Porta N, Vaughan M, Rendo-Gonzalez S, et al. Are artificial intelligence chatbots a reliable source of information about contact lenses? Cont Lens Anterior Eye. Apr 2024;47(2):102130. doi:10.1016/j.clae.2024.102130
14. Nield D. Battle of the AI bots: Copilot vs ChatGPT vs Gemini. Popular Science. 2024. https://www.popsci.com/technology/copilot-vs-chatgpt-vs-gemini/
15. Giannakopoulos K, Kavadella A, Aaqel Salim A, Stamatopoulos V, Kaklamanos EG. Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study. J Med Internet Res. Dec 28 2023;25:e51580. doi:10.2196/51580
16. Gencer A. Readability analysis of ChatGPT's responses on lung cancer. Sci Rep. Jul 26 2024;14(1):17234. doi:10.1038/s41598-024-67293-2
17. Charnock D, Shepperd S, Needham G, Gann R. DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. J Epidemiol Community Health. Feb 1999;53(2):105-11. doi:10.1136/jech.53.2.105
18. Vishnevetsky J, Walters CB, Tan KS. Interrater reliability of the Patient Education Materials Assessment Tool (PEMAT). Patient Educ Couns. Mar 2018;101(3):490-496. doi:10.1016/j.pec.2017.09.003
19. Bajwa J, Munir U, Nori A, Williams B. Artificial intelligence in healthcare: transforming the practice of medicine. Future Healthc J. Jul 2021;8(2):e188-e194. doi:10.7861/fhj.2021-0095
20. Hernandez A, Amigo JM. Attention Mechanisms and Their Applications to Complex Systems. Entropy (Basel). Feb 26 2021;23(3)doi:10.3390/e23030283
21. Clusmann J, Kolbinger FR, Muti HS, et al. The future landscape of large language models in medicine. Commun Med (Lond). Oct 10 2023;3(1):141. doi:10.1038/s43856-023-00370-1
22. Yang D, Wei J, Xiao D, et al. PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications. ArXiv. 2024;abs/2405.19266
23. Estill J. The application of large language models in pediatrics and medical research—Revolution or risk? Pediatric Discovery. 2023;1(3):e39. doi:https://doi.org/10.1002/pdi3.39
24. Alhur A. Redefining Healthcare With Artificial Intelligence (AI): The Contributions of ChatGPT, Gemini, and Co-pilot. Cureus. Apr 2024;16(4):e57795. doi:10.7759/cureus.57795
25. Rossettini G, Bargeri S, Cook C, et al. Accuracy of ChatGPT-3.5, ChatGPT-4o, Copilot, Gemini, Claude, and Perplexity in advising on lumbosacral radicular pain against clinical practice guidelines: cross-sectional study. Front Digit Health. 2025;7:1574287. doi:10.3389/fdgth.2025.1574287
26. Yang R, Tan TF, Lu W, Thirunavukarasu AJ, Ting DSW, Liu N. Large language models in health care: Development, applications, and challenges. Health Care Sci. Aug 2023;2(4):255-263. doi:10.1002/hcs2.61
27. Hatem R, Simmons B, Thornton JE. A Call to Address AI "Hallucinations" and How Healthcare Professionals Can Mitigate Their Risks. Cureus. Sep 2023;15(9):e44720. doi:10.7759/cureus.44720
28. Jones-Jordan LA, Walline JJ, Mutti DO, et al. Gas permeable and soft contact lens wear in children. Optom Vis Sci. Jun 2010;87(6):414-20. doi:10.1097/OPX.0b013e3181dc9a04
29. Wang C, Gallo RE, Fleisher L, Miller SM. Literacy assessment of family health history tools for public health prevention. Public Health Genomics. 2011;14(4-5):222-37. doi:10.1159/000273689
30. Kochanek K, Skarzynski H, Jedrzejczak WW. Accuracy and Repeatability of ChatGPT Based on a Set of Multiple-Choice Questions on Objective Tests of Hearing. Cureus. May 2024;16(5):e59857. doi:10.7759/cureus.59857
31. Walker HL, Ghani S, Kuemmerli C, et al. Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument. J Med Internet Res. Jun 30 2023;25:e47479. doi:10.2196/47479
32. Chakraborty C, Pal S, Bhattacharya M, Dash S, Lee SS. Overview of Chatbots with special emphasis on artificial intelligence-enabled ChatGPT in medical science. Front Artif Intell. 2023;6:1237704. doi:10.3389/frai.2023.1237704

Ayrıntılar

Birincil Dil

İngilizce

Konular

Göz Hastalıkları

Bölüm

Araştırma Makalesi

Yazarlar

Mehmet Ömer Kırıştıoğlu ^*
0009-0002-1977-1772
Türkiye

Meral Yıldız
0000-0002-8503-5637
Türkiye

Sevde İşleker
0000-0002-7352-7044
Türkiye

Esin Söğütlü Sarı
0000-0003-3729-6178
Türkiye

Ahmet Özmen
0000-0002-1261-5120
Türkiye

Mehmet Baykara
0000-0002-5555-1649
Türkiye

Yayımlanma Tarihi

16 Mart 2026

Gönderilme Tarihi

23 Eylül 2025

Kabul Tarihi

26 Şubat 2026

Yayımlandığı Sayı

Yıl 2026 Cilt: 52

DOI

https://doi.org/10.32708/uutfd.1780297

IZ

https://izlik.org/JA33HK88DP

Kaynak Göster

RIS / Bibtex

APA

Kırıştıoğlu, M. Ö., Yıldız, M., İşleker, S., Söğütlü Sarı, E., Özmen, A., & Baykara, M. (2026). Evaluating AI Chatbots for Pediatric Contact Lenses: A Study on Accuracy, Readability, and Reliability. Journal of Uludağ University Medical Faculty, 52, 1780297. https://doi.org/10.32708/uutfd.1780297

AMA

1.Kırıştıoğlu MÖ, Yıldız M, İşleker S, Söğütlü Sarı E, Özmen A, Baykara M. Evaluating AI Chatbots for Pediatric Contact Lenses: A Study on Accuracy, Readability, and Reliability. Uludağ Tıp Derg. 2026;52:1780297. doi:10.32708/uutfd.1780297

Chicago

Kırıştıoğlu, Mehmet Ömer, Meral Yıldız, Sevde İşleker, Esin Söğütlü Sarı, Ahmet Özmen, ve Mehmet Baykara. 2026. “Evaluating AI Chatbots for Pediatric Contact Lenses: A Study on Accuracy, Readability, and Reliability”. Journal of Uludağ University Medical Faculty 52 (Mart): 1780297. https://doi.org/10.32708/uutfd.1780297.

EndNote

Kırıştıoğlu MÖ, Yıldız M, İşleker S, Söğütlü Sarı E, Özmen A, Baykara M (01 Mart 2026) Evaluating AI Chatbots for Pediatric Contact Lenses: A Study on Accuracy, Readability, and Reliability. Journal of Uludağ University Medical Faculty 52 1780297.

IEEE

[1]M. Ö. Kırıştıoğlu, M. Yıldız, S. İşleker, E. Söğütlü Sarı, A. Özmen, ve M. Baykara, “Evaluating AI Chatbots for Pediatric Contact Lenses: A Study on Accuracy, Readability, and Reliability”, Uludağ Tıp Derg, c. 52, s. 1780297, Mar. 2026, doi: 10.32708/uutfd.1780297.

ISNAD

Kırıştıoğlu, Mehmet Ömer - Yıldız, Meral - İşleker, Sevde - Söğütlü Sarı, Esin - Özmen, Ahmet - Baykara, Mehmet. “Evaluating AI Chatbots for Pediatric Contact Lenses: A Study on Accuracy, Readability, and Reliability”. Journal of Uludağ University Medical Faculty 52 (01 Mart 2026): 1780297. https://doi.org/10.32708/uutfd.1780297.

JAMA

1.Kırıştıoğlu MÖ, Yıldız M, İşleker S, Söğütlü Sarı E, Özmen A, Baykara M. Evaluating AI Chatbots for Pediatric Contact Lenses: A Study on Accuracy, Readability, and Reliability. Uludağ Tıp Derg. 2026;52:1780297.

MLA

Kırıştıoğlu, Mehmet Ömer, vd. “Evaluating AI Chatbots for Pediatric Contact Lenses: A Study on Accuracy, Readability, and Reliability”. Journal of Uludağ University Medical Faculty, c. 52, Mart 2026, s. 1780297, doi:10.32708/uutfd.1780297.

Vancouver

1.Mehmet Ömer Kırıştıoğlu, Meral Yıldız, Sevde İşleker, Esin Söğütlü Sarı, Ahmet Özmen, Mehmet Baykara. Evaluating AI Chatbots for Pediatric Contact Lenses: A Study on Accuracy, Readability, and Reliability. Uludağ Tıp Derg. 01 Mart 2026;52:1780297. doi:10.32708/uutfd.1780297