Comparison of the Success of Chatgpt 4.0 and Google Gemini in Anatomy Questions Asked in Türkiye National Medical Specialization Exams

Arif Keskin; Tayfun Aygün

doi:10.25282/ted.1716591

EN TR

Comparison of the Success of Chatgpt 4.0 and Google Gemini in Anatomy Questions Asked in Türkiye National Medical Specialization Exams

Abstract

Objective: The scientific validity of utilizing artificial intelligence (AI)-based tools for studying anatomy and preparing for medical specialization exams has increasingly become a subject of academic interest. This study aimed to evaluate the performance of ChatGPT 4.0 and Google Gemini in answering anatomy questions from the Türkiye National Medical Specialization Examination. Materials and Methods: Anatomy-related questions were extracted from exams administered biannually between 2006 and 2021, which were publicly available through the institutional website. Out of 400 questions, 384 were deemed suitable and were simultaneously posed to both AI models. Results: The overall accuracy was 80.7% for ChatGPT 4.0 and 69.3% for Gemini (p < 0.001). ChatGPT 4.0 demonstrated a significantly higher success rate in questions requiring clinical reasoning and inference (91.1%) compared to Gemini (71.4%) (p = 0.007). Conclusion: ChatGPT 4.0 outperformed Gemini in terms of accuracy and reliability, particularly for clinically oriented anatomy questions. While AI models such as ChatGPT show promise in anatomy education and exam preparation, it is advisable to use them in conjunction with validated academic resources.

Keywords

Türkiye Ulusal Tıp Uzmanlık Sınavlarında Sorulan Anatomi Sorularında Chatgpt 4.0 ve Google Gemini'nin Başarısının Karşılaştırılması

Abstract

Amaç: Anatomi çalışmak ve tıp uzmanlık sınavlarına hazırlanmak için yapay zeka tabanlı araçların kullanılmasının bilimsel geçerliliği giderek akademik bir ilgi konusu haline gelmiştir. Bu çalışmanın amacı, ChatGPT 4.0 ve Google Gemini'nin Türkiye Ulusal Tıpta Uzmanlık Sınavı anatomi sorularını yanıtlamadaki performansını değerlendirmektir. Gereç ve Yöntemler: Anatomi ile ilgili sorular, 2006-2021 yılları arasında yılda iki kez uygulanan ve kurumsal web sitesi aracılığıyla kamuya açık sunulan sınavlardan çıkarılmıştır. 400 sorudan 384'ü uygun bulunmuş ve her iki yapay zeka modeline eşzamanlı olarak sorulmuştur. Sonuçlar: Genel doğruluk oranı ChatGPT 4.0 için %80,7 ve Gemini için %69,3 olmuştur (p < 0,001). ChatGPT 4.0, Gemini'ye (%71,4) kıyasla klinik muhakeme ve çıkarım gerektiren sorularda (%91,1) önemli ölçüde daha yüksek bir başarı oranı göstermiştir (p = 0,007). Sonuç: ChatGPT 4.0, özellikle klinik odaklı anatomi sorularında doğruluk ve güvenilirlik açısından Gemini'den daha iyi performans göstermiştir. ChatGPT gibi yapay zeka modelleri anatomi eğitimi ve sınav hazırlığında umut vaat etse de, bunların doğrulanmış akademik kaynaklarla birlikte kullanılması tavsiye edilir.

Keywords

References

1. Meroueh C, Chen ZE. Artificial intelligence in anatomical pathology: building a strong foundation for precision medicine. Hum Pathol. 2023;132:31-8.
2. Mogali SR. Initial impressions of ChatGPT for anatomy education. Anatomical sciences education. 2024;17(3):444-7.
3. Pirkle S, Yang J, Blumberg TJ. Do ChatGPT and Gemini Provide Appropriate Recommendations for Pediatric Orthopaedic Conditions? J Pediatr Orthop. 2024;44(8):e123-9.
4. Peters M, Leclercq M, Yanni A, Eynden XV, Martin L, Haute NV, et al. ChatGPT and Trainee performances in the management of maxillofacial patients. J Stomatol Oral Maxillofac Surg. 2024;125(4):102090.
5. Al-Sharif EM, Penteado RC, Dib El Jalbout N, Topilow NJ, Shoji MK, Kikkawa DO, et al. Evaluating the Accuracy of ChatGPT and Google BARD in Fielding Oculoplastic Patient Queries: A Comparative Study on Artificial versus Human Intelligence. Ophthal Plast Reconstr Surg. 2024;40(3):303-11.
6. Mayo-Yáñez M, Lechien JR, Maria-Saibene A, Vaira LA, Maniaci A, Chiesa-Estomba CM. Examining the Performance of ChatGPT 3.5 and Microsoft Copilot in Otolaryngology: A Comparative Study with Otolaryngologists' Evaluation. Indian J Otolaryngol Head Neck Surg. 2024;76(4):3465-9.
7. Meral G, Ateş S, Günay S, Öztürk A, Kuşdoğan M. Comparative analysis of ChatGPT, Gemini and emergency medicine specialist in ESI triage assessment. Am J Emerg Med. 2024;81:146-50.
8. Qin S, Chislett B, Ischia J, Ranasinghe W, de Silva D, Coles-Black J, et al. ChatGPT and generative AI in urology and surgery-A narrative review. BJUI Compass. 2024;5(9):813-21.

9. Aygün, T., Keskin, A., & Yücel, N. Changes in the types of anatomy questions asked in the medical specialization exam over the years, Türkiye example. BMC Medical Education. 2025;25(1): 607.
10. Lewandowski M, Łukowicz P, Świetlik D, Barańska-Rybak W. ChatGPT-3.5 and ChatGPT-4 dermatological knowledge level based on the Specialty Certificate Examination in Dermatology. Clin Exp Dermatol. 2024;49(7):686-91.
11. Reddy S, Schwartzman G, Flowers RH. ChatGPT in Dermatology Clinical Practice: Potential Uses and Pitfalls. Cutis. 2023;112(2):E15-7.
12. D'Agostino M, Feo F, Martora F, Genco L, Megna M, Cacciapuoti S, et al. ChatGPT and dermatology. Ital J Dermatol Venereol. 2024;159(4):234-40.
13. Massey PA, Montgomery C, Zhang AS. Comparison of ChatGPT-3.5, ChatGPT-4, and Orthopaedic Resident Performance on Orthopaedic Assessment Examinations. J Am Acad Orthop Surg. 2023;31(23):1173-9.
14. Jain N, Gottlich C, Fisher J, Campano D, Winston T. Assessing ChatGPT's orthopedic in-service training exam performance and applicability in the field. J Orthop Surg Res. 2024;19(1):27.
15. Schoch J, Schmelz HU, Strauch A, Borgmann H, Nestler T. Performance of ChatGPT-3.5 and ChatGPT-4 on the European Board of Urology (EBU) exams: a comparative analysis. World J Urol. 2024;42(1):445.
16. Huang CH, Hsiao HJ, Yeh PC, Wu KC, Kao CH. Performance of ChatGPT on Stage 1 of the Taiwanese medical licensing exam. Digit Health. 2024;10:20552076241233144.
17. Ishida K, Hanada E. Potential of ChatGPT to Pass the Japanese Medical and Healthcare Professional National Licenses: A Literature Review. Cureus. 2024;16(8):e66324.
18. Takagi S, Koda M, Watari T. The Performance of ChatGPT-4V in Interpreting Images and Tables in the Japanese Medical Licensing Exam. JMIR Med Educ. 2024;10:e54283.
19. Zong H, Li J, Wu E, Wu R, Lu J, Shen B. Performance of ChatGPT on Chinese national medical licensing examinations: a five-year examination evaluation study for physicians, pharmacists and nurses. BMC Med Educ. 2024;24(1):143.
20. Oztermeli AD, Oztermeli A. ChatGPT performance in the medical specialty exam: An observational study. Medicine (Baltimore). 2023;102(32):e34673.
21. Alessandri-Bonetti M, Liu HY, Donovan JM, Ziembicki JA, Egro FM. A Comparative Analysis of ChatGPT, ChatGPT-4, and Google Bard Performances at the Advanced Burn Life Support Exam. J Burn Care Res. 2024;45(4):945-8.
22. Fowler T, Pullen S, Birkett L. Performance of ChatGPT and Bard on the official part 1 FRCOphth practice questions. Br J Ophthalmol. 2024;108(10):1379-83.
23. Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2):e0000198.
24. Totlis T, Natsis K, Filos D, Ediaroglou V, Mantzou N, Duparc F, et al. The potential role of ChatGPT and artificial intelligence in anatomy education: a conversation with ChatGPT. Surg Radiol Anat. 2023;45(10):1321-9.
25. Mantzou N, Ediaroglou V, Drakonaki E, Syggelos SA, Karageorgos FF, Totlis T. ChatGPT efficacy for answering musculoskeletal anatomy questions: a study evaluating quality and consistency between raters and timepoints. Surg Radiol Anat. 2024;46(9):1455-64.
26. İlgaz HB, Çelik Z. The importance of artificial intelligence platforms in anatomy education: ChatGPT and Google Bard experience. Turkish Clinics Journal of Anatomy. 2023;15(3):45-52.
27. Keskin A., Aygun, T. A Performance of Generative Pre-Trained Transformers (GPT) in Answering Questions on Anatomy in The Turkish Dentistry Specialization Exam. JITSI: Jurnal Ilmiah Teknologi Sistem Informasi. (2024); 5(4): 188-192.

Details

Primary Language

English

Subjects

Medical Education

Journal Section

Research Article

Authors

Arif Keskin ^*
0000-0002-1634-1091
Türkiye

Tayfun Aygün
0000-0001-5058-3513
Türkiye

Publication Date

December 22, 2025

Submission Date

June 10, 2025

Acceptance Date

November 3, 2025

Published in Issue

Year 2025 Volume: 24 Number: 74

DOI

https://doi.org/10.25282/ted.1716591

IZ

https://izlik.org/JA33LU59XR

Cite

RIS / Bibtex

APA

Keskin, A., & Aygün, T. (2025). Comparison of the Success of Chatgpt 4.0 and Google Gemini in Anatomy Questions Asked in Türkiye National Medical Specialization Exams. Tıp Eğitimi Dünyası, 24(74), 127-134. https://doi.org/10.25282/ted.1716591

AMA

1.Keskin A, Aygün T. Comparison of the Success of Chatgpt 4.0 and Google Gemini in Anatomy Questions Asked in Türkiye National Medical Specialization Exams. Tıp Eğitimi Dünyası. 2025;24(74):127-134. doi:10.25282/ted.1716591

Chicago

Keskin, Arif, and Tayfun Aygün. 2025. “Comparison of the Success of Chatgpt 4.0 and Google Gemini in Anatomy Questions Asked in Türkiye National Medical Specialization Exams”. Tıp Eğitimi Dünyası 24 (74): 127-34. https://doi.org/10.25282/ted.1716591.

EndNote

Keskin A, Aygün T (December 1, 2025) Comparison of the Success of Chatgpt 4.0 and Google Gemini in Anatomy Questions Asked in Türkiye National Medical Specialization Exams. Tıp Eğitimi Dünyası 24 74 127–134.

IEEE

[1]A. Keskin and T. Aygün, “Comparison of the Success of Chatgpt 4.0 and Google Gemini in Anatomy Questions Asked in Türkiye National Medical Specialization Exams”, Tıp Eğitimi Dünyası, vol. 24, no. 74, pp. 127–134, Dec. 2025, doi: 10.25282/ted.1716591.

ISNAD

Keskin, Arif - Aygün, Tayfun. “Comparison of the Success of Chatgpt 4.0 and Google Gemini in Anatomy Questions Asked in Türkiye National Medical Specialization Exams”. Tıp Eğitimi Dünyası 24/74 (December 1, 2025): 127-134. https://doi.org/10.25282/ted.1716591.

JAMA

1.Keskin A, Aygün T. Comparison of the Success of Chatgpt 4.0 and Google Gemini in Anatomy Questions Asked in Türkiye National Medical Specialization Exams. Tıp Eğitimi Dünyası. 2025;24:127–134.

MLA

Keskin, Arif, and Tayfun Aygün. “Comparison of the Success of Chatgpt 4.0 and Google Gemini in Anatomy Questions Asked in Türkiye National Medical Specialization Exams”. Tıp Eğitimi Dünyası, vol. 24, no. 74, Dec. 2025, pp. 127-34, doi:10.25282/ted.1716591.

Vancouver

1.Arif Keskin, Tayfun Aygün. Comparison of the Success of Chatgpt 4.0 and Google Gemini in Anatomy Questions Asked in Türkiye National Medical Specialization Exams. Tıp Eğitimi Dünyası. 2025 Dec. 1;24(74):127-34. doi:10.25282/ted.1716591