Araştırma Makalesi
BibTex RIS Kaynak Göster

Yapay Zeka Sohbet Robotlarının Diller Arası Değerlendirmesi: ChatGPT-3.5, Copilot ve Gemini'nin Nöro-oftalmolojik Değerlendirmede İngilizce ve Türkçe Performansı

Yıl 2025, Cilt: 35 Sayı: 4, 597 - 604, 29.08.2025
https://doi.org/10.54005/geneltip.1627508

Öz

Özet
Amaç: ChatGPT-3,5, Copilot ve Gemini yapay zeka sohbet botlarının nöro-oftalmolojik değerlendirmede İngilizce ve Türkçe aynı sorulardaki performanslarını değerlendirmek.
Gereç ve Yöntem: Nöro-oftalmoloji ile ilişkili 40 soru çalışmaya dahil edildi. Tüm İngilizce soruların sertifikasyonlu çevirmen (native speaker) tarafından Türkçeye çevirileri gerçekleştirildikten sonra soruların her iki versiyonu ChatGPT-3,5, Copilot ve Gemini sohbet botlarına soruldu. Verilen cevaplar cevap anahtarı ile karşılaştırılarak doğru ve yanlış olarak gruplandırıldı. Birbirlerine üstünlükleri istatistiksel olarak karşılaştırıldı.
Bulgular: Sorulan İngilizce sorulara ChatGPT-3,5 %47,5, Copilot %57,5 ve Gemini %32,5 oranında doğru cevap verdi. Sorulan Türkçe sorulara ChatGPT-3,5 %57,5, Copilot %52,5 ve Gemini %32,5 oranında doğru cevap verdi. Sohbet botları arasında, İngilizce ve Türkçe aynı soruları cevaplamada farklı başarı düzeyi olduğu halde, istatistiksel olarak anlamlı başarı farkı tespit edilmedi (p>0,05).
Sonuç: İstatistiksel olarak anlamlı bir fark izlenmemesine rağmen sohbet botları aynı sorulara farklı cevaplar verebilmektedir. Sohbet botlarının bilgi düzeylerinin geliştirilmesinin yanında dil becerilerinin de geliştirilmeye ihtiyacı vardır.

Kaynakça

  • 1. Madadi Y, Delsoz M, Lao PA, Fong JW, Hollingsworth T, Kahook MY, et al. ChatGPT Assisting Diagnosis of Neuro-ophthalmology Diseases Based on Case Reports. medRxiv 2023.
  • 2. Stunkel L, Sharma RA, Mackay DD, Wilson B, Van Stavern GP, Newman NJ, et al. Patient Harm Due to Diagnostic Error of Neuro-Ophthalmologic Conditions. Ophthalmology 2021; 128:1356–1362.
  • 3. Frohman LP. The human resource crisis in neuro-ophthalmology. J Neuroophthalmol 2008; 28:231–234.
  • 4. Debusk A, Subramanian PS, Scannell Bryan M, Moster ML, Calvert PC, and Frohman LP. Mismatch in Supply and Demand for Neuro-Ophthalmic Care. J Neuroophthalmol 2022; 42:62–67.
  • 5. Ting DSW, Pasquale LR, Peng L, Campbell JP, Lee AY, Raman R, et al. Artificial intelligence and deep learning in ophthalmology. Br J Ophthalmol 2019; 103:167.
  • 6. Sensoy E and Citirik M. A comparative study on the knowledge levels of artificial intelligence programs in diagnosing ophthalmic pathologies and intraocular tumors evaluated their superiority and potential utility. Int Ophthalmol 2023; 43:4905–4909.
  • 7. Bhatti TM, Chen JJ, Danesh-Meyer H V., Levin LA, Moss HE, Philips PH, et al., editors. Neuro-Ophthalmology. San Francisco: American Academy of Ophthalmology; 2023.
  • 8. Şensoy E, Çıtırık M. ChatGPT-3.5, Copilot ve Gemini'nin oküler inflamasyon ve üveit konusundaki çoktan seçmeli sorularda performans analizi: Dil farklılıklarının etkisi: Kesitsel araştırma. Turkiye Klinikleri J Ophthalmol 2025;34:12-16
  • 9. Şensoy E and Çıtırık M. Performance of chatgptChatGPT-3.5, copilot Copilot, and gemini Gemini in answering english English and turkish Turkish questions related to ocular surface diseases and cornea: a comparison study. Turkish Journal of Clinical and Experimental Ophthalmology 2025; 20:37-41.
  • 10. Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS digital health 2023; 2:e0000198.
  • 11. Khan RA, Jawaid M, Khan AR, and Sajjad M. ChatGPT - Reshaping medical education and clinical management. Pak J Med Sci 2023; 39:605.
  • 12. Jeblick K, Schachtner B, Dexl J, Mittermeier A, Stüber AT, Topalis J, et al. ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports. 2022.
  • 13. Shukla R, Mishra AK, Banerjee N, and Verma A. The Comparison of ChatGPT 3.5, Microsoft Bing, and Google Gemini for Diagnosing Cases of Neuro-Ophthalmology. Cureus 2024; 16.
  • 14. Haddad F and Saade JS. Performance of ChatGPT on Ophthalmology-Related Questions Across Various Examination Levels: Observational Study. JMIR Med Educ 2024; 10:e50842.
  • 15. Tao BKL, Hua N, Milkovich J, and Micieli JA. ChatGPT-3.5 and Bing Chat in ophthalmology: an updated evaluation of performance, readability, and informative sources. Eye 2024 2024; 1–6.
  • 16. Tailor PD, Dalvin LA, Starr MR, Tajfirouz DA, Chodnicki KD, Brodsky MC, et al. A Comparative Study of Large Language Models, Human Experts, and Expert-Edited Large Language Models to Neuro-Ophthalmology Questions. J Neuroophthalmol 2024.
  • 17. Mihalache A, Grad J, Patil NS, Huang RS, Popovic MM, Mallipatna A, et al. Google Gemini and Bard artificial intelligence chatbot performance in ophthalmology knowledge assessment. Eye 2024 2024; 1–6.
  • 18. Canleblebici M, Dal A, and Erdağ M. Evaluation of the Performance of Large Language Models (ChatGPT-3.5, ChatGPT-4, Bing, and Bard) in Turkish Ophthalmology Chief-Assistant Exams: A Comparative Study. Turkiye Klinikleri J of Ophthalmol 2024;33:163-170.

Cross-Linguistic Evaluation of Artificial Intelligence Chatbots: Performance of ChatGPT-3.5, Copilot and Gemini in Neuro-ophthalmologic Evaluation in English and Turkish

Yıl 2025, Cilt: 35 Sayı: 4, 597 - 604, 29.08.2025
https://doi.org/10.54005/geneltip.1627508

Öz

Abstract
Background/Aims: To evaluate the performance of ChatGPT-3.5, Copilot, and Gemini artificial intelligence chatbots on the same questions in neuro-ophthalmologic evaluation in English and Turkish.
Methods: Forty questions related to neuro-ophthalmology were included in the study. After all English questions were translated into Turkish by a certified native speaker, both versions of the questions were asked to ChatGPT-3.5, Copilot, and Gemini chatbots. The answers were compared with the answer key and grouped as correct and incorrect. Their superiority over each other was compared statistically.
Results: ChatGPT-3,5 47.5%, Copilot 57.5%, and Gemini 32.5% answered the English questions correctly. ChatGPT-3,5 57.5%, Copilot 52.5%, and Gemini 32.5% answered the questions correctly in Turkish. No statistically significant difference was detected between chatbots in answering the same questions in English and Turkish, although there were different levels of success (p>0.05).
Conclusions: Although there is no statistically significant difference, chatbots can answer the same questions differently. In addition to improving the knowledge level of chatbots, their language skills also need to be improved.

Etik Beyan

Since the data in our study is not from any animal or human sources, ethics committee approval is not required.

Kaynakça

  • 1. Madadi Y, Delsoz M, Lao PA, Fong JW, Hollingsworth T, Kahook MY, et al. ChatGPT Assisting Diagnosis of Neuro-ophthalmology Diseases Based on Case Reports. medRxiv 2023.
  • 2. Stunkel L, Sharma RA, Mackay DD, Wilson B, Van Stavern GP, Newman NJ, et al. Patient Harm Due to Diagnostic Error of Neuro-Ophthalmologic Conditions. Ophthalmology 2021; 128:1356–1362.
  • 3. Frohman LP. The human resource crisis in neuro-ophthalmology. J Neuroophthalmol 2008; 28:231–234.
  • 4. Debusk A, Subramanian PS, Scannell Bryan M, Moster ML, Calvert PC, and Frohman LP. Mismatch in Supply and Demand for Neuro-Ophthalmic Care. J Neuroophthalmol 2022; 42:62–67.
  • 5. Ting DSW, Pasquale LR, Peng L, Campbell JP, Lee AY, Raman R, et al. Artificial intelligence and deep learning in ophthalmology. Br J Ophthalmol 2019; 103:167.
  • 6. Sensoy E and Citirik M. A comparative study on the knowledge levels of artificial intelligence programs in diagnosing ophthalmic pathologies and intraocular tumors evaluated their superiority and potential utility. Int Ophthalmol 2023; 43:4905–4909.
  • 7. Bhatti TM, Chen JJ, Danesh-Meyer H V., Levin LA, Moss HE, Philips PH, et al., editors. Neuro-Ophthalmology. San Francisco: American Academy of Ophthalmology; 2023.
  • 8. Şensoy E, Çıtırık M. ChatGPT-3.5, Copilot ve Gemini'nin oküler inflamasyon ve üveit konusundaki çoktan seçmeli sorularda performans analizi: Dil farklılıklarının etkisi: Kesitsel araştırma. Turkiye Klinikleri J Ophthalmol 2025;34:12-16
  • 9. Şensoy E and Çıtırık M. Performance of chatgptChatGPT-3.5, copilot Copilot, and gemini Gemini in answering english English and turkish Turkish questions related to ocular surface diseases and cornea: a comparison study. Turkish Journal of Clinical and Experimental Ophthalmology 2025; 20:37-41.
  • 10. Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS digital health 2023; 2:e0000198.
  • 11. Khan RA, Jawaid M, Khan AR, and Sajjad M. ChatGPT - Reshaping medical education and clinical management. Pak J Med Sci 2023; 39:605.
  • 12. Jeblick K, Schachtner B, Dexl J, Mittermeier A, Stüber AT, Topalis J, et al. ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports. 2022.
  • 13. Shukla R, Mishra AK, Banerjee N, and Verma A. The Comparison of ChatGPT 3.5, Microsoft Bing, and Google Gemini for Diagnosing Cases of Neuro-Ophthalmology. Cureus 2024; 16.
  • 14. Haddad F and Saade JS. Performance of ChatGPT on Ophthalmology-Related Questions Across Various Examination Levels: Observational Study. JMIR Med Educ 2024; 10:e50842.
  • 15. Tao BKL, Hua N, Milkovich J, and Micieli JA. ChatGPT-3.5 and Bing Chat in ophthalmology: an updated evaluation of performance, readability, and informative sources. Eye 2024 2024; 1–6.
  • 16. Tailor PD, Dalvin LA, Starr MR, Tajfirouz DA, Chodnicki KD, Brodsky MC, et al. A Comparative Study of Large Language Models, Human Experts, and Expert-Edited Large Language Models to Neuro-Ophthalmology Questions. J Neuroophthalmol 2024.
  • 17. Mihalache A, Grad J, Patil NS, Huang RS, Popovic MM, Mallipatna A, et al. Google Gemini and Bard artificial intelligence chatbot performance in ophthalmology knowledge assessment. Eye 2024 2024; 1–6.
  • 18. Canleblebici M, Dal A, and Erdağ M. Evaluation of the Performance of Large Language Models (ChatGPT-3.5, ChatGPT-4, Bing, and Bard) in Turkish Ophthalmology Chief-Assistant Exams: A Comparative Study. Turkiye Klinikleri J of Ophthalmol 2024;33:163-170.
Toplam 18 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Klinik Tıp Bilimleri (Diğer)
Bölüm Araştırma Makalesi
Yazarlar

Eyüpcan Şensoy 0000-0002-4401-8435

Mehmet Çıtırık 0000-0002-0558-5576

Gönderilme Tarihi 27 Ocak 2025
Kabul Tarihi 22 Temmuz 2025
Erken Görünüm Tarihi 29 Ağustos 2025
Yayımlanma Tarihi 29 Ağustos 2025
Yayımlandığı Sayı Yıl 2025 Cilt: 35 Sayı: 4

Kaynak Göster

Vancouver Şensoy E, Çıtırık M. Cross-Linguistic Evaluation of Artificial Intelligence Chatbots: Performance of ChatGPT-3.5, Copilot and Gemini in Neuro-ophthalmologic Evaluation in English and Turkish. Genel Tıp Derg. 2025;35(4):597-604.