Guideline Concordance and Safety of AI Chatbots for Circumcision Anesthesia: A Comparative Study

Ahmet Tuğrul Şahin; Enis Mert Yorulmaz

doi:10.16899/jcm.1878107

Research Article

BibTex

RIS

Cite

Sünnet Anestezisinde Yapay Zekâ Sohbet Botlarının Kılavuzlara Uygunluğu ve Güvenliği: Karşılaştırmalı Bir Çalışma

Year 2026, Volume: 16 Issue: 2, 109 - 115, 27.03.2026

Ahmet Tuğrul Şahin , Enis Mert Yorulmaz

https://doi.org/10.16899/jcm.1878107

https://izlik.org/JA25UA36DC

Abstract

Giriş Sünnet sırasında anestezi kullanımına yönelik toplumsal ilgi giderek artmaktadır; ancak bu tür tıbbi sorulara yanıt vermede serbestçe erişilebilen yapay zekâ (YZ) sohbet botlarının güvenilirliği henüz net değildir. Bu çalışmanın amacı, sünnet anestezisine ilişkin yaygın kamu sorularına verilen yanıtlar açısından yaygın olarak kullanılan üç YZ sohbet botunun—ChatGPT, Gemini ve DeepSeek—doğruluk, güvenlik ve kaynak gösterme güvenilirliğini karşılaştırmalı olarak değerlendirmektir. Yöntemler Küresel Google Trends verilerinden yüksek ilgi gören beş soru belirlendi ve her bir sohbet botuna iki farklı girdi formatında yöneltildi: yapılandırılmamış, halk diliyle sorular ve güncel klinik kılavuzlara açıkça dayandırılmış yapılandırılmış istemler. Üretilen tüm yanıtlar bir ürolog ve bir anesteziyolog tarafından bağımsız olarak değerlendirildi ve kılavuzlara uygunluk, kaynak doğruluğu ve potansiyel olarak zararlı bilgi içeriği açısından puanlandı. Bulgular Her iki soru formatında da DeepSeek, ChatGPT ve Gemini’ye kıyasla yerleşik klinik kılavuzlarla daha yüksek uyum gösteren yanıtlar üretti (P<0,05). Yapılandırılmış istemler altında DeepSeek’in kaynak gösterme doğruluğu ChatGPT’ye kıyasla daha yüksekti (P=0,049). Önemli olarak, değerlendirilen hiçbir yanıt klinik açıdan güvensiz veya zararlı kabul edilen bir öneri içermedi. Yapılandırılmış ve kılavuz odaklı istemlerin kullanımı, değerlendirilen tüm YZ platformlarında yanıt kalitesinde tutarlı bir iyileşme ile ilişkili bulundu. Sonuç Serbestçe erişilebilen YZ sohbet botları, sünnet anestezisi hakkında bilgi sunma konusunda heterojen bir performans sergilemektedir. Bu sistemler tamamlayıcı bir eğitsel değer sunabilse de, ürettikleri çıktılar güvenilirlik açısından değişkenlik göstermekte olup dikkatle yorumlanmalıdır. Hasta güvenliğinin sağlanması ve kanıta dayalı kılavuzlara uyumun korunması için uzman klinik denetim vazgeçilmezdir.

Keywords

Yapay zeka , Medikal sohbet botları , Sünnet , Anestezi , Klinik rehberler

Ethical Statement

Kurum ve dergi politikalarına uygun olarak, bu çalışmada insan katılımcı, hasta düzeyinde veri, biyolojik materyal veya müdahale bulunmadığından etik onay gerekmemiştir. Çalışma, yalnızca kamuya açık Google Trends arama verilerini ve yapay zeka tarafından oluşturulan metinsel çıktıları analiz etmiştir; bunların tümü anonim, tanımlanamaz ve serbestçe erişilebilirdir. Çalışmanın hiçbir aşamasında kişisel veya hassas bilgiler toplanmamış veya işlenmemiştir.

References

1. Iacob SI, Feinn RS, Sardi L. Systematic review of complications arising from male circumcision. BJUI Compass 2022;3(2):99–123.
2. Omole F, Smith W, Carter-Wicker K. Newborn Circumcision Techniques. Am Fam Physician 2020;101(11):680–685.
3. Taddio A. Pain Management for Neonatal Circumcision. Paediatr Drugs 2001;3(2):101–111.
4. Morris BJ, Moreton S, Bailis SA, Cox G, Krieger JN. Critical evaluation of contrasting evidence on whether male circumcision has adverse psychological effects: A systematic review. J Evid Based Med 2022;15(2):123–135.
5. Walsh HA. Newborn Male Circumcision. Narrat Inq Bioeth 2023;13(2):65–69.
6. Massey PM, Kearney MD, Rideau A et al. Measuring impact of storyline engagement on health knowledge, attitudes, and norms: A digital evaluation of an online health-focused serial drama in West Africa. J Glob Health 2022;12:04039.
7. Morrison C, Vercnocke J, Moser AM et al. Are ChatGPT’s Responses to Urologic Inquiries Readable and Supported by AUA Guidelines? International Journal of Urological Nursing 2025;19(3):e70023.
8. Shryock T. AI Special Report: What patients and doctors really think about AI in health care. 2023;100. Available at: https://www.medicaleconomics.com/view/ai-special-report-what-patients-and-doctors-really-think-about-ai-in-health-care. Accessed September 12, 2025.
9. Al Ramlawi A, Over DJ, Weltsch D et al. Evaluating the Accuracy, Clarity, and Safety of Artificial Intelligence-Generated Information on Clubfoot. J Am Acad Orthop Surg 2025;33(12):663–672.
10. Zhang L, Wang T, Zheng Y et al. Assessment of ChatGPT’s adherence to evidence-based clinical practice guidelines for plantar fasciitis management. J Orthop Surg Res 2025;20(1):434.
11. Sandmann S, Hegselmann S, Fujarski M et al. Benchmark evaluation of DeepSeek large language models in clinical decision-making. Nat Med 2025;31(8):2546–2549.
12. Aljamaan F, Temsah M-H, Altamimi I et al. Reference Hallucination Score for Medical Artificial Intelligence Chatbots: Development and Usability Study. JMIR Med Inform 2024;12:e54345.
13. Fast D, Adams LC, Busch F et al. Autonomous medical evaluation for guideline adherence of large language models. npj Digit. Med. 2024;7(1):358.
14. Walters WH, Wilder EI. Fabrication and errors in the bibliographic citations generated by ChatGPT. Sci Rep 2023;13(1):14045.
15. Sonoda Y, Kurokawa R, Hagiwara A et al. Structured clinical reasoning prompt enhances LLM’s diagnostic capabilities in diagnosis please quiz cases. Jpn J Radiol 2025;43(4):586–592.
16. Vaira LA, Lechien JR, Abbate V et al. Enhancing AI Chatbot Responses in Health Care: The SMART Prompt Structure in Head and Neck Surgery. OTO Open 2025;9(1):e70075.
17. Chelli M, Descamps J, Lavoué V et al. Hallucination Rates and Reference Accuracy of ChatGPT and Bard for Systematic Reviews: Comparative Analysis. J Med Internet Res 2024;26:e53164.
18. Blacker SN, Kang M, Chakraborty I et al. Utilizing Artificial Intelligence and Chat Generative Pretrained Transformer to Answer Questions About Clinical Scenarios in Neuroanesthesiology. J Neurosurg Anesthesiol 2024;36(4):346–351.

Guideline Concordance and Safety of AI Chatbots for Circumcision Anesthesia: A Comparative Study

Year 2026, Volume: 16 Issue: 2, 109 - 115, 27.03.2026

Ahmet Tuğrul Şahin , Enis Mert Yorulmaz

https://doi.org/10.16899/jcm.1878107

https://izlik.org/JA25UA36DC

Abstract

Background Public interest in the use of anesthesia during circumcision has increased, yet the reliability of freely available artificial intelligence (AI) chatbots in addressing such medical questions remains unclear. This study aimed to comparatively assess the accuracy, safety, and citation reliability of three widely used AI chatbots—ChatGPT, Gemini, and DeepSeek—when responding to common public queries related to circumcision anesthesia. Methods Five high-interest questions were derived from global Google Trends data and submitted to each chatbot using two different input formats: unstructured lay-language queries and structured prompts explicitly based on current clinical guidelines. All generated responses were independently reviewed by a urologist and an anesthesiologist and scored for guideline concordance, citation accuracy, and the presence of potentially harmful information. Results Across both query formats, DeepSeek produced responses that were more closely aligned with established guidelines compared with ChatGPT and Gemini (P<0.05). Under structured prompting, DeepSeek also demonstrated higher citation accuracy than ChatGPT (P=0.049). Importantly, none of the evaluated responses contained advice deemed unsafe or clinically harmful. The use of structured, guideline-oriented prompts was associated with a consistent improvement in response quality across all evaluated AI platforms. Conclusion Freely accessible AI chatbots show heterogeneous performance in providing information on circumcision anesthesia. Although these systems may offer supplementary educational value, their outputs vary in reliability and should be interpreted with caution. Expert clinical oversight remains essential to ensure patient safety and adherence to evidence-based guidelines.

Keywords

Artificial intelligence , Medical chatbots , Circumcision , Anesthesia , Clinical guidelines

Ethical Statement

Ethics approval was not required for this study in accordance with institutional and journal policies, as no human participants, patient-level data, biological materials, or interventions were involved. The study exclusively analyzed publicly available Google Trends search data and AI-generated textual outputs, all of which are anonymous, non-identifiable, and freely accessible. No personal or sensitive information was collected or processed at any stage of the study.

References

1. Iacob SI, Feinn RS, Sardi L. Systematic review of complications arising from male circumcision. BJUI Compass 2022;3(2):99–123.
2. Omole F, Smith W, Carter-Wicker K. Newborn Circumcision Techniques. Am Fam Physician 2020;101(11):680–685.
3. Taddio A. Pain Management for Neonatal Circumcision. Paediatr Drugs 2001;3(2):101–111.
4. Morris BJ, Moreton S, Bailis SA, Cox G, Krieger JN. Critical evaluation of contrasting evidence on whether male circumcision has adverse psychological effects: A systematic review. J Evid Based Med 2022;15(2):123–135.
5. Walsh HA. Newborn Male Circumcision. Narrat Inq Bioeth 2023;13(2):65–69.
6. Massey PM, Kearney MD, Rideau A et al. Measuring impact of storyline engagement on health knowledge, attitudes, and norms: A digital evaluation of an online health-focused serial drama in West Africa. J Glob Health 2022;12:04039.
7. Morrison C, Vercnocke J, Moser AM et al. Are ChatGPT’s Responses to Urologic Inquiries Readable and Supported by AUA Guidelines? International Journal of Urological Nursing 2025;19(3):e70023.
8. Shryock T. AI Special Report: What patients and doctors really think about AI in health care. 2023;100. Available at: https://www.medicaleconomics.com/view/ai-special-report-what-patients-and-doctors-really-think-about-ai-in-health-care. Accessed September 12, 2025.
9. Al Ramlawi A, Over DJ, Weltsch D et al. Evaluating the Accuracy, Clarity, and Safety of Artificial Intelligence-Generated Information on Clubfoot. J Am Acad Orthop Surg 2025;33(12):663–672.
10. Zhang L, Wang T, Zheng Y et al. Assessment of ChatGPT’s adherence to evidence-based clinical practice guidelines for plantar fasciitis management. J Orthop Surg Res 2025;20(1):434.
11. Sandmann S, Hegselmann S, Fujarski M et al. Benchmark evaluation of DeepSeek large language models in clinical decision-making. Nat Med 2025;31(8):2546–2549.
12. Aljamaan F, Temsah M-H, Altamimi I et al. Reference Hallucination Score for Medical Artificial Intelligence Chatbots: Development and Usability Study. JMIR Med Inform 2024;12:e54345.
13. Fast D, Adams LC, Busch F et al. Autonomous medical evaluation for guideline adherence of large language models. npj Digit. Med. 2024;7(1):358.
14. Walters WH, Wilder EI. Fabrication and errors in the bibliographic citations generated by ChatGPT. Sci Rep 2023;13(1):14045.
15. Sonoda Y, Kurokawa R, Hagiwara A et al. Structured clinical reasoning prompt enhances LLM’s diagnostic capabilities in diagnosis please quiz cases. Jpn J Radiol 2025;43(4):586–592.
16. Vaira LA, Lechien JR, Abbate V et al. Enhancing AI Chatbot Responses in Health Care: The SMART Prompt Structure in Head and Neck Surgery. OTO Open 2025;9(1):e70075.
17. Chelli M, Descamps J, Lavoué V et al. Hallucination Rates and Reference Accuracy of ChatGPT and Bard for Systematic Reviews: Comparative Analysis. J Med Internet Res 2024;26:e53164.
18. Blacker SN, Kang M, Chakraborty I et al. Utilizing Artificial Intelligence and Chat Generative Pretrained Transformer to Answer Questions About Clinical Scenarios in Neuroanesthesiology. J Neurosurg Anesthesiol 2024;36(4):346–351.

There are 18 citations in total.

Details

Primary Language	English
Subjects	Pediatric Urology, Anaesthesiology, Urology
Journal Section	Research Article
Authors	Ahmet Tuğrul Şahin 0000-0002-4855-696X Enis Mert Yorulmaz 0000-0003-2109-2015
Submission Date	January 30, 2026
Acceptance Date	March 2, 2026
Publication Date	March 27, 2026
DOI	https://doi.org/10.16899/jcm.1878107
IZ	https://izlik.org/JA25UA36DC
Published in Issue	Year 2026 Volume: 16 Issue: 2

Cite

AMA	1.Şahin AT, Yorulmaz EM. Guideline Concordance and Safety of AI Chatbots for Circumcision Anesthesia: A Comparative Study. J Contemp Med. 2026;16(2):109-115. doi:10.16899/jcm.1878107

Article Files

Full Text