Araştırma Makalesi
BibTex RIS Kaynak Göster

Comparison of the Success of Chatgpt 4.0 and Google Gemini in Anatomy Questions Asked in Türkiye National Medical Specialization Exams

Yıl 2025, Cilt: 24 Sayı: 74, 127 - 134, 22.12.2025
https://doi.org/10.25282/ted.1716591

Öz

Objective: The scientific validity of utilizing artificial intelligence (AI)-based tools for studying anatomy and preparing for medical specialization exams has increasingly become a subject of academic interest. This study aimed to evaluate the performance of ChatGPT 4.0 and Google Gemini in answering anatomy questions from the Türkiye National Medical Specialization Examination. Materials and Methods: Anatomy-related questions were extracted from exams administered biannually between 2006 and 2021, which were publicly available through the institutional website. Out of 400 questions, 384 were deemed suitable and were simultaneously posed to both AI models. Results: The overall accuracy was 80.7% for ChatGPT 4.0 and 69.3% for Gemini (p < 0.001). ChatGPT 4.0 demonstrated a significantly higher success rate in questions requiring clinical reasoning and inference (91.1%) compared to Gemini (71.4%) (p = 0.007). Conclusion: ChatGPT 4.0 outperformed Gemini in terms of accuracy and reliability, particularly for clinically oriented anatomy questions. While AI models such as ChatGPT show promise in anatomy education and exam preparation, it is advisable to use them in conjunction with validated academic resources.

Kaynakça

  • 1. Meroueh C, Chen ZE. Artificial intelligence in anatomical pathology: building a strong foundation for precision medicine. Hum Pathol. 2023;132:31-8.
  • 2. Mogali SR. Initial impressions of ChatGPT for anatomy education. Anatomical sciences education. 2024;17(3):444-7.
  • 3. Pirkle S, Yang J, Blumberg TJ. Do ChatGPT and Gemini Provide Appropriate Recommendations for Pediatric Orthopaedic Conditions? J Pediatr Orthop. 2024;44(8):e123-9.
  • 4. Peters M, Leclercq M, Yanni A, Eynden XV, Martin L, Haute NV, et al. ChatGPT and Trainee performances in the management of maxillofacial patients. J Stomatol Oral Maxillofac Surg. 2024;125(4):102090.
  • 5. Al-Sharif EM, Penteado RC, Dib El Jalbout N, Topilow NJ, Shoji MK, Kikkawa DO, et al. Evaluating the Accuracy of ChatGPT and Google BARD in Fielding Oculoplastic Patient Queries: A Comparative Study on Artificial versus Human Intelligence. Ophthal Plast Reconstr Surg. 2024;40(3):303-11.
  • 6. Mayo-Yáñez M, Lechien JR, Maria-Saibene A, Vaira LA, Maniaci A, Chiesa-Estomba CM. Examining the Performance of ChatGPT 3.5 and Microsoft Copilot in Otolaryngology: A Comparative Study with Otolaryngologists' Evaluation. Indian J Otolaryngol Head Neck Surg. 2024;76(4):3465-9.
  • 7. Meral G, Ateş S, Günay S, Öztürk A, Kuşdoğan M. Comparative analysis of ChatGPT, Gemini and emergency medicine specialist in ESI triage assessment. Am J Emerg Med. 2024;81:146-50.
  • 8. Qin S, Chislett B, Ischia J, Ranasinghe W, de Silva D, Coles-Black J, et al. ChatGPT and generative AI in urology and surgery-A narrative review. BJUI Compass. 2024;5(9):813-21.
  • 9. Aygün, T., Keskin, A., & Yücel, N. Changes in the types of anatomy questions asked in the medical specialization exam over the years, Türkiye example. BMC Medical Education. 2025;25(1): 607.
  • 10. Lewandowski M, Łukowicz P, Świetlik D, Barańska-Rybak W. ChatGPT-3.5 and ChatGPT-4 dermatological knowledge level based on the Specialty Certificate Examination in Dermatology. Clin Exp Dermatol. 2024;49(7):686-91.
  • 11. Reddy S, Schwartzman G, Flowers RH. ChatGPT in Dermatology Clinical Practice: Potential Uses and Pitfalls. Cutis. 2023;112(2):E15-7.
  • 12. D'Agostino M, Feo F, Martora F, Genco L, Megna M, Cacciapuoti S, et al. ChatGPT and dermatology. Ital J Dermatol Venereol. 2024;159(4):234-40.
  • 13. Massey PA, Montgomery C, Zhang AS. Comparison of ChatGPT-3.5, ChatGPT-4, and Orthopaedic Resident Performance on Orthopaedic Assessment Examinations. J Am Acad Orthop Surg. 2023;31(23):1173-9.
  • 14. Jain N, Gottlich C, Fisher J, Campano D, Winston T. Assessing ChatGPT's orthopedic in-service training exam performance and applicability in the field. J Orthop Surg Res. 2024;19(1):27.
  • 15. Schoch J, Schmelz HU, Strauch A, Borgmann H, Nestler T. Performance of ChatGPT-3.5 and ChatGPT-4 on the European Board of Urology (EBU) exams: a comparative analysis. World J Urol. 2024;42(1):445.
  • 16. Huang CH, Hsiao HJ, Yeh PC, Wu KC, Kao CH. Performance of ChatGPT on Stage 1 of the Taiwanese medical licensing exam. Digit Health. 2024;10:20552076241233144.
  • 17. Ishida K, Hanada E. Potential of ChatGPT to Pass the Japanese Medical and Healthcare Professional National Licenses: A Literature Review. Cureus. 2024;16(8):e66324.
  • 18. Takagi S, Koda M, Watari T. The Performance of ChatGPT-4V in Interpreting Images and Tables in the Japanese Medical Licensing Exam. JMIR Med Educ. 2024;10:e54283.
  • 19. Zong H, Li J, Wu E, Wu R, Lu J, Shen B. Performance of ChatGPT on Chinese national medical licensing examinations: a five-year examination evaluation study for physicians, pharmacists and nurses. BMC Med Educ. 2024;24(1):143.
  • 20. Oztermeli AD, Oztermeli A. ChatGPT performance in the medical specialty exam: An observational study. Medicine (Baltimore). 2023;102(32):e34673.
  • 21. Alessandri-Bonetti M, Liu HY, Donovan JM, Ziembicki JA, Egro FM. A Comparative Analysis of ChatGPT, ChatGPT-4, and Google Bard Performances at the Advanced Burn Life Support Exam. J Burn Care Res. 2024;45(4):945-8.
  • 22. Fowler T, Pullen S, Birkett L. Performance of ChatGPT and Bard on the official part 1 FRCOphth practice questions. Br J Ophthalmol. 2024;108(10):1379-83.
  • 23. Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2):e0000198.
  • 24. Totlis T, Natsis K, Filos D, Ediaroglou V, Mantzou N, Duparc F, et al. The potential role of ChatGPT and artificial intelligence in anatomy education: a conversation with ChatGPT. Surg Radiol Anat. 2023;45(10):1321-9.
  • 25. Mantzou N, Ediaroglou V, Drakonaki E, Syggelos SA, Karageorgos FF, Totlis T. ChatGPT efficacy for answering musculoskeletal anatomy questions: a study evaluating quality and consistency between raters and timepoints. Surg Radiol Anat. 2024;46(9):1455-64.
  • 26. İlgaz HB, Çelik Z. The importance of artificial intelligence platforms in anatomy education: ChatGPT and Google Bard experience. Turkish Clinics Journal of Anatomy. 2023;15(3):45-52.
  • 27. Keskin A., Aygun, T. A Performance of Generative Pre-Trained Transformers (GPT) in Answering Questions on Anatomy in The Turkish Dentistry Specialization Exam. JITSI: Jurnal Ilmiah Teknologi Sistem Informasi. (2024); 5(4): 188-192.

Türkiye Ulusal Tıp Uzmanlık Sınavlarında Sorulan Anatomi Sorularında Chatgpt 4.0 ve Google Gemini'nin Başarısının Karşılaştırılması

Yıl 2025, Cilt: 24 Sayı: 74, 127 - 134, 22.12.2025
https://doi.org/10.25282/ted.1716591

Öz

Amaç: Anatomi çalışmak ve tıp uzmanlık sınavlarına hazırlanmak için yapay zeka tabanlı araçların kullanılmasının bilimsel geçerliliği giderek akademik bir ilgi konusu haline gelmiştir. Bu çalışmanın amacı, ChatGPT 4.0 ve Google Gemini'nin Türkiye Ulusal Tıpta Uzmanlık Sınavı anatomi sorularını yanıtlamadaki performansını değerlendirmektir. Gereç ve Yöntemler: Anatomi ile ilgili sorular, 2006-2021 yılları arasında yılda iki kez uygulanan ve kurumsal web sitesi aracılığıyla kamuya açık sunulan sınavlardan çıkarılmıştır. 400 sorudan 384'ü uygun bulunmuş ve her iki yapay zeka modeline eşzamanlı olarak sorulmuştur. Sonuçlar: Genel doğruluk oranı ChatGPT 4.0 için %80,7 ve Gemini için %69,3 olmuştur (p < 0,001). ChatGPT 4.0, Gemini'ye (%71,4) kıyasla klinik muhakeme ve çıkarım gerektiren sorularda (%91,1) önemli ölçüde daha yüksek bir başarı oranı göstermiştir (p = 0,007). Sonuç: ChatGPT 4.0, özellikle klinik odaklı anatomi sorularında doğruluk ve güvenilirlik açısından Gemini'den daha iyi performans göstermiştir. ChatGPT gibi yapay zeka modelleri anatomi eğitimi ve sınav hazırlığında umut vaat etse de, bunların doğrulanmış akademik kaynaklarla birlikte kullanılması tavsiye edilir.

Kaynakça

  • 1. Meroueh C, Chen ZE. Artificial intelligence in anatomical pathology: building a strong foundation for precision medicine. Hum Pathol. 2023;132:31-8.
  • 2. Mogali SR. Initial impressions of ChatGPT for anatomy education. Anatomical sciences education. 2024;17(3):444-7.
  • 3. Pirkle S, Yang J, Blumberg TJ. Do ChatGPT and Gemini Provide Appropriate Recommendations for Pediatric Orthopaedic Conditions? J Pediatr Orthop. 2024;44(8):e123-9.
  • 4. Peters M, Leclercq M, Yanni A, Eynden XV, Martin L, Haute NV, et al. ChatGPT and Trainee performances in the management of maxillofacial patients. J Stomatol Oral Maxillofac Surg. 2024;125(4):102090.
  • 5. Al-Sharif EM, Penteado RC, Dib El Jalbout N, Topilow NJ, Shoji MK, Kikkawa DO, et al. Evaluating the Accuracy of ChatGPT and Google BARD in Fielding Oculoplastic Patient Queries: A Comparative Study on Artificial versus Human Intelligence. Ophthal Plast Reconstr Surg. 2024;40(3):303-11.
  • 6. Mayo-Yáñez M, Lechien JR, Maria-Saibene A, Vaira LA, Maniaci A, Chiesa-Estomba CM. Examining the Performance of ChatGPT 3.5 and Microsoft Copilot in Otolaryngology: A Comparative Study with Otolaryngologists' Evaluation. Indian J Otolaryngol Head Neck Surg. 2024;76(4):3465-9.
  • 7. Meral G, Ateş S, Günay S, Öztürk A, Kuşdoğan M. Comparative analysis of ChatGPT, Gemini and emergency medicine specialist in ESI triage assessment. Am J Emerg Med. 2024;81:146-50.
  • 8. Qin S, Chislett B, Ischia J, Ranasinghe W, de Silva D, Coles-Black J, et al. ChatGPT and generative AI in urology and surgery-A narrative review. BJUI Compass. 2024;5(9):813-21.
  • 9. Aygün, T., Keskin, A., & Yücel, N. Changes in the types of anatomy questions asked in the medical specialization exam over the years, Türkiye example. BMC Medical Education. 2025;25(1): 607.
  • 10. Lewandowski M, Łukowicz P, Świetlik D, Barańska-Rybak W. ChatGPT-3.5 and ChatGPT-4 dermatological knowledge level based on the Specialty Certificate Examination in Dermatology. Clin Exp Dermatol. 2024;49(7):686-91.
  • 11. Reddy S, Schwartzman G, Flowers RH. ChatGPT in Dermatology Clinical Practice: Potential Uses and Pitfalls. Cutis. 2023;112(2):E15-7.
  • 12. D'Agostino M, Feo F, Martora F, Genco L, Megna M, Cacciapuoti S, et al. ChatGPT and dermatology. Ital J Dermatol Venereol. 2024;159(4):234-40.
  • 13. Massey PA, Montgomery C, Zhang AS. Comparison of ChatGPT-3.5, ChatGPT-4, and Orthopaedic Resident Performance on Orthopaedic Assessment Examinations. J Am Acad Orthop Surg. 2023;31(23):1173-9.
  • 14. Jain N, Gottlich C, Fisher J, Campano D, Winston T. Assessing ChatGPT's orthopedic in-service training exam performance and applicability in the field. J Orthop Surg Res. 2024;19(1):27.
  • 15. Schoch J, Schmelz HU, Strauch A, Borgmann H, Nestler T. Performance of ChatGPT-3.5 and ChatGPT-4 on the European Board of Urology (EBU) exams: a comparative analysis. World J Urol. 2024;42(1):445.
  • 16. Huang CH, Hsiao HJ, Yeh PC, Wu KC, Kao CH. Performance of ChatGPT on Stage 1 of the Taiwanese medical licensing exam. Digit Health. 2024;10:20552076241233144.
  • 17. Ishida K, Hanada E. Potential of ChatGPT to Pass the Japanese Medical and Healthcare Professional National Licenses: A Literature Review. Cureus. 2024;16(8):e66324.
  • 18. Takagi S, Koda M, Watari T. The Performance of ChatGPT-4V in Interpreting Images and Tables in the Japanese Medical Licensing Exam. JMIR Med Educ. 2024;10:e54283.
  • 19. Zong H, Li J, Wu E, Wu R, Lu J, Shen B. Performance of ChatGPT on Chinese national medical licensing examinations: a five-year examination evaluation study for physicians, pharmacists and nurses. BMC Med Educ. 2024;24(1):143.
  • 20. Oztermeli AD, Oztermeli A. ChatGPT performance in the medical specialty exam: An observational study. Medicine (Baltimore). 2023;102(32):e34673.
  • 21. Alessandri-Bonetti M, Liu HY, Donovan JM, Ziembicki JA, Egro FM. A Comparative Analysis of ChatGPT, ChatGPT-4, and Google Bard Performances at the Advanced Burn Life Support Exam. J Burn Care Res. 2024;45(4):945-8.
  • 22. Fowler T, Pullen S, Birkett L. Performance of ChatGPT and Bard on the official part 1 FRCOphth practice questions. Br J Ophthalmol. 2024;108(10):1379-83.
  • 23. Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2):e0000198.
  • 24. Totlis T, Natsis K, Filos D, Ediaroglou V, Mantzou N, Duparc F, et al. The potential role of ChatGPT and artificial intelligence in anatomy education: a conversation with ChatGPT. Surg Radiol Anat. 2023;45(10):1321-9.
  • 25. Mantzou N, Ediaroglou V, Drakonaki E, Syggelos SA, Karageorgos FF, Totlis T. ChatGPT efficacy for answering musculoskeletal anatomy questions: a study evaluating quality and consistency between raters and timepoints. Surg Radiol Anat. 2024;46(9):1455-64.
  • 26. İlgaz HB, Çelik Z. The importance of artificial intelligence platforms in anatomy education: ChatGPT and Google Bard experience. Turkish Clinics Journal of Anatomy. 2023;15(3):45-52.
  • 27. Keskin A., Aygun, T. A Performance of Generative Pre-Trained Transformers (GPT) in Answering Questions on Anatomy in The Turkish Dentistry Specialization Exam. JITSI: Jurnal Ilmiah Teknologi Sistem Informasi. (2024); 5(4): 188-192.
Toplam 27 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Tıp Eğitimi
Bölüm Araştırma Makalesi
Yazarlar

Arif Keskin 0000-0002-1634-1091

Tayfun Aygün 0000-0001-5058-3513

Gönderilme Tarihi 10 Haziran 2025
Kabul Tarihi 3 Kasım 2025
Yayımlanma Tarihi 22 Aralık 2025
Yayımlandığı Sayı Yıl 2025 Cilt: 24 Sayı: 74

Kaynak Göster

Vancouver Keskin A, Aygün T. Comparison of the Success of Chatgpt 4.0 and Google Gemini in Anatomy Questions Asked in Türkiye National Medical Specialization Exams. TED. 2025;24(74):127-34.