Research Article
BibTex RIS Cite

Impact of Language Variation English and Turkish on Artificial Intelligence Chatbot Performance in Oculofacial Plastic and Orbital Surgery: A Study of ChatGPT-3.5, Copilot, and Gemini

Year 2024, Volume: 46 Issue: 5, 781 - 786, 12.09.2024
https://doi.org/10.20515/otd.1520495

Abstract

The aim is to investigate the effects of applying the same questions in different languages related to oculofacial plastic and orbital surgery to ChatGPT-3.5, Copilot, and Gemini artificial intelligence chatbots, which are freely accessible, on the performance of these programs. English and Turkish versions of 30 questions related to oculofacial plastic and orbital surgery were applied to ChatGPT-3.5, Copilot, and Gemini chatbots. The answers given by the chatbots were compared with the answer key at the back of the book and grouped as correct and incorrect. Their superiority over each other was compared statistically. While ChatGPT-3.5 answered 43.3% of the English questions correctly, it answered 23.3% of the Turkish questions correctly (p=0.07). While Copilot answered 73.3% of the English questions correctly, it answered 63.3% of the Turkish questions correctly (p=0.375). While Gemini answered 46.7% of the English questions correctly, it answered 33.3% of the Turkish questions correctly (p=0.344). Copilot showed higher performance than other programs in answering Turkish questions (p<0.05). In addition to improving the knowledge level of chatbots, their performance in different languages also needs to be examined and improved. Correcting these disadvantages in chatbots will pave the way for more widespread and reliable use of these programs.

References

  • 1. Rahimy E. Deep learning applications in ophthalmology. Curr Opin Ophthalmol. 2018;29(3):254-60.
  • 2. Ting DSW, Pasquale LR, Peng L, et al. Artificial intelligence and deep learning in ophthalmology. Br J Ophthalmol. 2019;103(2):167-75.
  • 3. Antaki F, Coussa RG, Kahwati G, Hammamji K, Sebag M, Duval R. Accuracy of automated machine learning in classifying retinal pathologies from ultra-widefield pseudocolour fundus images. Br J Ophthalmol. 2023;107(1):90-5.
  • 4. Schmidt-Erfurth U, Sadeghipour A, Gerendas BS, Waldstein SM, Bogunović H. Artificial intelligence in retina. Prog Retin Eye Res. 2018;67:1-29.
  • 5. Mikolov T, Deoras A, Povey D, Burget L, Černocký J. Strategies for training large scale neural network language models. 2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings. Published online 2011:196-201.
  • 6. Google AI updates: Bard and new AI features in Search. Accessed July 4, 2024. https://blog.google/technology/ai/bard-google-ai-search-updates/
  • 7. Bing Chat | Microsoft Edge. Accessed July 4, 2024. https://www.microsoft.com/en-us/edge/features/bing-chat?form=MT00D8
  • 8. Korn BS, Burkat CN, Couch SM, et al., eds. Oculofacial Plastic and Orbital Surgery. American Academy of Ophthalmology; 2023.
  • 9. Khan RA, Jawaid M, Khan AR, Sajjad M. ChatGPT - Reshaping medical education and clinical management. Pak J Med Sci. 2023;39(2):605.
  • 10. Jeblick K, Schachtner B, Dexl J, et al. ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports. Published online December 30, 2022. Accessed June 10, 2023. https://arxiv.org/abs/2212.14882v1
  • 11. Al-Sharif EM, Penteado RC, Dib El Jalbout N, et al. Evaluating the Accuracy of ChatGPT and Google BARD in Fielding Oculoplastic Patient Queries: A Comparative Study on Artificial versus Human Intelligence. Ophthalmic Plast Reconstr Surg. 2024;40(3):303-11.
  • 12. Haddad F, Saade JS. Performance of ChatGPT on Ophthalmology-Related Questions Across Various Examination Levels: Observational Study. JMIR Med Educ. 2024;10:e50842.
  • 13. Tao BKL, Hua N, Milkovich J, Micieli JA. ChatGPT-3.5 and Bing Chat in ophthalmology: an updated evaluation of performance, readability, and informative sources. Eye 2024. Published online March 20, 2024:1-6.
  • 14. Canleblebici M, Dal A, Erdağ M. Evaluation of the Performance of Large Language Models (ChatGPT-3.5, ChatGPT-4, Bing and Bard) in Turkish Ophthalmology Chief-Assistant Exams: A Comparative Study. Turkiye Klinikleri J Ophthalmol. Published online June 11, 2024.
  • 15. Mihalache A, Grad J, Patil NS, et al. Google Gemini and Bard artificial intelligence chatbot performance in ophthalmology knowledge assessment. Eye 2024. Published online April 13, 2024:1-6.

Okülofasiyal Plastik ve Orbital Cerrahide İngilizce ve Türkçe Dil Çeşitliliğinin Yapay Zeka Chatbot Performansına Etkisi: ChatGPT-3.5, Copilot ve Gemini Üzerine Bir Çalışma

Year 2024, Volume: 46 Issue: 5, 781 - 786, 12.09.2024
https://doi.org/10.20515/otd.1520495

Abstract

Ücretsiz olarak erişim sağlanabilen ChatGPT-3,5, Copilot ve Gemini yapay zeka sohbet botlarına okülofasiyal plastik ve orbita cerrahisi ile ilişkili farklı dillerdeki aynı soru uygulamalarının bu programların performanslarına olan etkilerini araştırmaktır. Okülofasiyal plastik ve orbita cerrahisi ile ilişkili 30 sorunun İngilizce ve Türkçe versiyonları ChatGPT-3,5, Copilot ve Gemini sohbet botlarına uygulandı. Sohbet botlarının verdikleri cevaplar kitap arkasında yer alan cevap anahtarı ile karşılaştırıldı, doğru ve yanlış olarak gruplandırıldı. Birbirlerine üstünlükleri istatistiksel olarak karşılaştırıldı. ChatGPT-3,5 İngilizce soruların %43,3’üne doğru cevap verirken, Türkçe soruların %23,3’üne doğru cevap verdi (p=0,07). Copilot İngilizce soruların %73,3’üne doğru cevap verirken, Türkçe soruların %63,3’üne doğru cevap verdi (p=0,375). Gemini İngilizce soruların %46,7’sine doğru cevap verirken, Türkçe soruların %33,3’üne doğru cevap verdi (p=0,344). Copilot, Türkçe soruları cevaplamada diğer programlardan daha yüksek performans gösterdi (p<0,05). Sohbet botlarının bilgi düzeylerinin geliştirilmesinin yanında farklı dillerdeki performanslarının da incelenmeye ve geliştirilmeye ihtiyacı vardır. Sohbet botlarındaki bu dezavantajların düzeltilmesi, bu programların daha yaygın ve güvenilir bir şekilde kullanılmasına zemin hazırlayacaktır.

References

  • 1. Rahimy E. Deep learning applications in ophthalmology. Curr Opin Ophthalmol. 2018;29(3):254-60.
  • 2. Ting DSW, Pasquale LR, Peng L, et al. Artificial intelligence and deep learning in ophthalmology. Br J Ophthalmol. 2019;103(2):167-75.
  • 3. Antaki F, Coussa RG, Kahwati G, Hammamji K, Sebag M, Duval R. Accuracy of automated machine learning in classifying retinal pathologies from ultra-widefield pseudocolour fundus images. Br J Ophthalmol. 2023;107(1):90-5.
  • 4. Schmidt-Erfurth U, Sadeghipour A, Gerendas BS, Waldstein SM, Bogunović H. Artificial intelligence in retina. Prog Retin Eye Res. 2018;67:1-29.
  • 5. Mikolov T, Deoras A, Povey D, Burget L, Černocký J. Strategies for training large scale neural network language models. 2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings. Published online 2011:196-201.
  • 6. Google AI updates: Bard and new AI features in Search. Accessed July 4, 2024. https://blog.google/technology/ai/bard-google-ai-search-updates/
  • 7. Bing Chat | Microsoft Edge. Accessed July 4, 2024. https://www.microsoft.com/en-us/edge/features/bing-chat?form=MT00D8
  • 8. Korn BS, Burkat CN, Couch SM, et al., eds. Oculofacial Plastic and Orbital Surgery. American Academy of Ophthalmology; 2023.
  • 9. Khan RA, Jawaid M, Khan AR, Sajjad M. ChatGPT - Reshaping medical education and clinical management. Pak J Med Sci. 2023;39(2):605.
  • 10. Jeblick K, Schachtner B, Dexl J, et al. ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports. Published online December 30, 2022. Accessed June 10, 2023. https://arxiv.org/abs/2212.14882v1
  • 11. Al-Sharif EM, Penteado RC, Dib El Jalbout N, et al. Evaluating the Accuracy of ChatGPT and Google BARD in Fielding Oculoplastic Patient Queries: A Comparative Study on Artificial versus Human Intelligence. Ophthalmic Plast Reconstr Surg. 2024;40(3):303-11.
  • 12. Haddad F, Saade JS. Performance of ChatGPT on Ophthalmology-Related Questions Across Various Examination Levels: Observational Study. JMIR Med Educ. 2024;10:e50842.
  • 13. Tao BKL, Hua N, Milkovich J, Micieli JA. ChatGPT-3.5 and Bing Chat in ophthalmology: an updated evaluation of performance, readability, and informative sources. Eye 2024. Published online March 20, 2024:1-6.
  • 14. Canleblebici M, Dal A, Erdağ M. Evaluation of the Performance of Large Language Models (ChatGPT-3.5, ChatGPT-4, Bing and Bard) in Turkish Ophthalmology Chief-Assistant Exams: A Comparative Study. Turkiye Klinikleri J Ophthalmol. Published online June 11, 2024.
  • 15. Mihalache A, Grad J, Patil NS, et al. Google Gemini and Bard artificial intelligence chatbot performance in ophthalmology knowledge assessment. Eye 2024. Published online April 13, 2024:1-6.
There are 15 citations in total.

Details

Primary Language Turkish
Subjects Ophthalmology
Journal Section ORİJİNAL MAKALE
Authors

Eyüpcan Şensoy 0000-0002-4401-8435

Mehmet Çıtırık 0000-0002-0558-5576

Publication Date September 12, 2024
Submission Date July 22, 2024
Acceptance Date September 3, 2024
Published in Issue Year 2024 Volume: 46 Issue: 5

Cite

Vancouver Şensoy E, Çıtırık M. Okülofasiyal Plastik ve Orbital Cerrahide İngilizce ve Türkçe Dil Çeşitliliğinin Yapay Zeka Chatbot Performansına Etkisi: ChatGPT-3.5, Copilot ve Gemini Üzerine Bir Çalışma. Osmangazi Tıp Dergisi. 2024;46(5):781-6.


13299        13308       13306       13305    13307  1330126978