Evidence Bound Clinical Decision Support with RAG

Leyla Turker Sener; Umit Yılmaz; İsmail Can Dikmen; Ali Arı; Teoman Karadag

doi:10.2339/politeknik.1810629

Research Article

RAG Tabanlı Kanıta Dayalı Klinik Karar Destek Sistemi

Year 2025, EARLY VIEW, 1 - 1

Leyla Turker Sener , Umit Yılmaz , İsmail Can Dikmen , Ali Arı , Teoman Karadag

https://doi.org/10.2339/politeknik.1810629

Abstract

Büyük dil modelleri, bilimsel ve klinik sorular için giderek daha fazla başvurulan araçlar hâline gelse de, kaynağa dayanmayan yanıtların hâlâ sıkça görülmesi, bu modellerin tek başına güvenilir kabul edilmesini zorlaştırmaktadır. Bu nedenle, üretimi seçilmiş ve sürümlenmiş bir literatür korpusuna sıkı biçimde bağlayan ve verinin sisteme alınmasından nihai yanıta kadar tüm adımları kayıt altına alan bir geri-getirmeli (retrieval-augmented) asistan geliştirdik. Belgeler, pratik ve token-duyarlı bir politika ile parçalara ayrılır, yerel olarak kodlanır ve vektörler, sistemin gerektiğinde atıf yapabilmesi veya yanıt vermekten çekinebilmesi için kaynak bilgisiyle birlikte saklanır. Sorgular gömülür, vektör deposundan en uygun top-k pasajlar getirilir ve istem, modele yalnızca kanıtla desteklenen ifadeler üretmesini veya yanıt vermekten kaçınmasını söyler. Bileşenlerin tümü kasıtlı olarak değiştirilebilir yapıdadır: gömücü (embedder) gizlilik için kurum içinde çalışır; vektör deposu, tekrarlanabilir deneyler için anlık kopyaları (snapshot) destekler; üretici model (Gemma/Gemma2) ise verimli çıkarım sağlayacak şekilde seçilmiştir. Bunun ötesinde, getirinin kalitesi, yanıt doğruluğu ve kapsayıcılığını ölçen; parça boyutu, örtüşme ve k değeri üzerinde ablatif analizler içeren önceden kayıtlı bir değerlendirme planı sunuyoruz. Tüm kodlar, varsayılan ayarlar ve betikler paylaşılmış olup, böylece diğer araştırmacılar sistemi yeniden kurabilir, kendi tercihlerinin sonuçlarını karşılaştırabilir ve yapıyı yeni alanlara genişletebilir. Hedef nettir: Yanıtları literatüre dayandırarak halüsinasyonu azaltmak, tek GPU’lu bir sunucuda maliyet ve gecikmeyi öngörülebilir kılmak ve ampirik değerlendirmenin isteğe bağlı değil, rutin olmasını sağlamak. Deneysel sonuçlar, bu tasarım ilkelerini doğrulamıştır: Önerdiğimiz modüler RAG sistemi, Recall@k = 0.86, F1 = 0.79 ve Atıf Doğruluğu = 0.91 değerlerine ulaşarak, hem klasik RAG yaklaşımını hem de yalnız-LLM temelli sistemleri anlamlı düzeyde geride bırakmıştır (p < 0.05). Bu bulgular, çerçevenin kanıt-temelli klinik karar desteği için güvenilirliğini, dayanak doğruluğunu ve yeniden üretilebilirliğini güçlü biçimde göstermektedir.

Keywords

Arama ile Güçlendirilmiş Üretim , Klinik Karar Desteği , Halüsinasyon Azaltma , Bilgi Alma , Açıklanabilir Yapay Zeka , Büyük Dil Modeli.

References

[1] Y. Gao et al., “Retrieval-Augmented Generation for Large Language Models: A Survey,” [Online]. Available: https://arxiv.org/abs/2312.10997 (2024).
[2] S. M. T. Islam and others, “A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models,” arXiv preprint arXiv:2401.01313, (2024).
[3] T. Zhang and others, “Hallucinations in Large Language Models: A Review and Outlook,” Mathematics, vol. 13, no. 5, p. 856, (2025).
[4] Z. Sun, X. Zang, and K. Zheng, “ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability,” arXiv preprint arXiv:2410.11414, (2024).
[5] H. Béchard, M. El-Hindi, B. Hoen, and T. Sinnige, “Reducing Hallucinations in Large Language Models via Improved Prompting and Retrieval Augmentation,” arXiv preprint arXiv:2401.12866, (2024).
[6] S. Yao and others, “ReAct: Synergizing Reasoning and Acting in Language Models,” arXiv preprint arXiv:2210.03629, (2022).
[7] S. Bartels and J. Carus, “From text to data: Open-source large language models in extracting cancer-related medical attributes from German pathology reports,” Int J Med Inform, vol. 203, p. 106022, (2025).
[8] G. Xiong and others, “MIRAGE: Evaluating Retrieval-Augmented Generation for Medical Question Answering,” arXiv preprint arXiv:2402.13178, (2024).
[9] E. Gamma, R. Helm, R. Johnson, and J. Vlissides, Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, (1994).
[10] Z. Nussbaum, J. X. Morris, B. Duderstadt, and A. Mulyar, “Nomic Embed: Training a Reproducible Long Context Text Embedder,” [Online]. Available: https://arxiv.org/abs/2402.01613 (2025).
[11] M. Riviere and others, “Gemma 2: Improving Open Language Models at a Practical Size,” arXiv preprint arXiv:2408.00118, (2024).
[12] S. Es, J. James, L. Espinosa-Anke, and S. Schockaert, “RAGAs: Automated Evaluation of Retrieval Augmented Generation,” in Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics, (2024).
[13] A. Yunusoglu, D. Le, M. Isik, K. Tiwari, I. C. Dikmen, and T. Karadag, “Battery State of Health Estimation Using LLM Framework,” in 2025 26th International Symposium on Quality Electronic Design (ISQED), pp. 1–8., (2025).
[14] P. Bevara, “Prospects of Integrating Retrieval-Augmented Generation (RAG) into Academic Library Search and Services,” SSRN Electronic Journal, (2025).
[15] J. Miao, C. Thongprayoon, S. Suppadungsuk, O. A. Garcia Valencia, and W. Cheungpasitporn, “Integrating Retrieval-Augmented Generation with Large Language Models in Nephrology: Advancing Practical Applications,” Medicina (B Aires), 60(3):445, (2024),
[16] J. Lee, H. Cha, Y. Hwangbo, and W. Cheon, “Enhancing Large Language Model Reliability: Minimizing Hallucinations with Dual Retrieval-Augmented Generation Based on the Latest Diabetes Guidelines,” J Pers Med, 14(12):1131, (2024),
[17] Y. Zheng and others, “Integrating Retrieval-Augmented Generation for Personalized Physician Recommendations,” Front Public Health, vol. 13, p. 1545682, (2025).
[18] S. Sahoo and others, “A Survey on Prompt Engineering for Healthcare,” J Biomed Inform, vol. 151, p. 104721, (2024).
[19] R. Chen, S. Zhang, Y. Zheng, Q. Yu, and C. Wang, “Enhancing Treatment Decision-Making for Low Back Pain: A Framework Integrating LLMs with Retrieval-Augmented Generation,” Front Med (Lausanne), vol. 12, p. 1599241, (2025).
[20] D. Le, A. Yunusoglu, K. Tiwari, M. Isik, and I. C. Dikmen, “Multimodal llm for intelligent transportation systems,” arXiv preprint arXiv:2412.11683, (2024).
[21] C. Zakka and others, “Almanac—Retrieval-Augmented Language Models for Clinical Medicine,” NEJM AI, 1(2), (2024).
[22] P. Lewis, E. Perez, A. Piktus, and others, “Retrieval-Augmented Generation for Knowledge-Intensive NLP,” arXiv preprint arXiv:2005.11401, (2020).
[23] L. Huang et al., “A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions,” ACM Trans Inf Syst, 43(2):1–55, Jan. (2025).
[24] J. Wei and others, “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,” in Advances in Neural Information Processing Systems, (2022).
[25] X. Wang and others, “Self-Consistency Improves Chain of Thought Reasoning in Language Models,” in Advances in Neural Information Processing Systems, (2022).
[26] Z. Jiang and W. Chen, “Active Retrieval Augmented Generation,” in Proceedings of EMNLP (2023). [27] C. Niu, Y. Wu, J. Zhu, and others, “RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models,” in Proceedings of ACL 2024 (Long Papers), (2024).
[28] S. Sivarajkumar and others, “Prompt Engineering for Health Large Language Models: An Empirical Reference Map,” J Med Internet Res, 26:e59850, (2024).
[29] U. Yilmaz, “RAG-LLM,” [Online]. Available: https://github.com/Umit-Yilmaz/RAG-LLM (2025).
[30] C. Chen et al., “Deep learning-based in-ambulance speech recognition and generation of prehospital emergency diagnostic summaries using large language models,” Int J Med Inform, 203:106029, (2025).
[31] F. Tong, R. Lederman, and S. D’Alfonso, “Clinical decision support systems in mental health: A scoping review of health professionals’ experiences,” Int J Med Inform, 199:105881, (2025).
[32] Karadag T, Abbasov T, Karadag MO. “Measuring, Evaluating, and Mapping the Electromagnetic Field Levels in Turgut Ozal Medical Center Building and Environment.” Journal of Turgut Ozal Medical Center 21(3):186–95., (2014).
[33] Y.E. Ekici, O. Akdağ, A.A. Aydın, T. Karadağ, Review And Analysis Of Real Time Big Data Of Electric Bus Consumption Data .pdf. In Ankara Internatıonal Congress On Scıentıfıc Research-VII. December (2022).

Evidence Bound Clinical Decision Support with RAG

Year 2025, EARLY VIEW, 1 - 1

Leyla Turker Sener , Umit Yılmaz , İsmail Can Dikmen , Ali Arı , Teoman Karadag

https://doi.org/10.2339/politeknik.1810629

Abstract

Large language models are increasingly consulted for scientific and clinical questions, yet ungrounded answers still appear too often to trust them on their own. We built a retrieval-augmented assistant that keeps generation tied to a curated, versioned corpus, and records every step from ingestion to answer. Documents are segmented with a practical, token-aware policy and encoded locally; vectors are stored with provenance so the system can cite or abstain. Queries are embedded, top-k passages are retrieved from a vector store, and a prompt asks the generator to respond only with supported statements or to decline. The components are intentionally swappable: the embedder runs on-premises for privacy, the store supports snapshots for repeatable experiments, and the generator (Gemma/Gemma2) is selected for efficient inference. Beyond the pipeline, we preregister an evaluation plan that measures retrieval quality, answer faithfulness and coverage, with ablations on chunk size, overlap, and k. All code, defaults, and scripts are released so others can reproduce the setup, compare their own choices, and extend the system to new domains. The goal is clear: reduce hallucination by grounding answers in literature, keep costs and latency predictable on a single-GPU server, and make empirical evaluation routine rather than optional. Experimental evaluation confirmed these design claims: the proposed modular RAG achieved Recall@k = 0.86, F1 = 0.79, and Attribution Accuracy = 0.91, significantly outperforming both Classic RAG and LLM-only baselines (p < 0.05). These results validate the framework’s reliability, grounding fidelity, and reproducibility for evidence-based clinical decision support.

Keywords

Retrieval-Augmented Generation , Clinical Decision Support , Hallucination Mitigation , Information Retrieval , Explainable AI , Large Language Model.

References

[1] Y. Gao et al., “Retrieval-Augmented Generation for Large Language Models: A Survey,” [Online]. Available: https://arxiv.org/abs/2312.10997 (2024).
[2] S. M. T. Islam and others, “A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models,” arXiv preprint arXiv:2401.01313, (2024).
[3] T. Zhang and others, “Hallucinations in Large Language Models: A Review and Outlook,” Mathematics, vol. 13, no. 5, p. 856, (2025).
[4] Z. Sun, X. Zang, and K. Zheng, “ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability,” arXiv preprint arXiv:2410.11414, (2024).
[5] H. Béchard, M. El-Hindi, B. Hoen, and T. Sinnige, “Reducing Hallucinations in Large Language Models via Improved Prompting and Retrieval Augmentation,” arXiv preprint arXiv:2401.12866, (2024).
[6] S. Yao and others, “ReAct: Synergizing Reasoning and Acting in Language Models,” arXiv preprint arXiv:2210.03629, (2022).
[7] S. Bartels and J. Carus, “From text to data: Open-source large language models in extracting cancer-related medical attributes from German pathology reports,” Int J Med Inform, vol. 203, p. 106022, (2025).
[8] G. Xiong and others, “MIRAGE: Evaluating Retrieval-Augmented Generation for Medical Question Answering,” arXiv preprint arXiv:2402.13178, (2024).
[9] E. Gamma, R. Helm, R. Johnson, and J. Vlissides, Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, (1994).
[10] Z. Nussbaum, J. X. Morris, B. Duderstadt, and A. Mulyar, “Nomic Embed: Training a Reproducible Long Context Text Embedder,” [Online]. Available: https://arxiv.org/abs/2402.01613 (2025).
[11] M. Riviere and others, “Gemma 2: Improving Open Language Models at a Practical Size,” arXiv preprint arXiv:2408.00118, (2024).
[12] S. Es, J. James, L. Espinosa-Anke, and S. Schockaert, “RAGAs: Automated Evaluation of Retrieval Augmented Generation,” in Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics, (2024).
[13] A. Yunusoglu, D. Le, M. Isik, K. Tiwari, I. C. Dikmen, and T. Karadag, “Battery State of Health Estimation Using LLM Framework,” in 2025 26th International Symposium on Quality Electronic Design (ISQED), pp. 1–8., (2025).
[14] P. Bevara, “Prospects of Integrating Retrieval-Augmented Generation (RAG) into Academic Library Search and Services,” SSRN Electronic Journal, (2025).
[15] J. Miao, C. Thongprayoon, S. Suppadungsuk, O. A. Garcia Valencia, and W. Cheungpasitporn, “Integrating Retrieval-Augmented Generation with Large Language Models in Nephrology: Advancing Practical Applications,” Medicina (B Aires), 60(3):445, (2024),
[16] J. Lee, H. Cha, Y. Hwangbo, and W. Cheon, “Enhancing Large Language Model Reliability: Minimizing Hallucinations with Dual Retrieval-Augmented Generation Based on the Latest Diabetes Guidelines,” J Pers Med, 14(12):1131, (2024),
[17] Y. Zheng and others, “Integrating Retrieval-Augmented Generation for Personalized Physician Recommendations,” Front Public Health, vol. 13, p. 1545682, (2025).
[18] S. Sahoo and others, “A Survey on Prompt Engineering for Healthcare,” J Biomed Inform, vol. 151, p. 104721, (2024).
[19] R. Chen, S. Zhang, Y. Zheng, Q. Yu, and C. Wang, “Enhancing Treatment Decision-Making for Low Back Pain: A Framework Integrating LLMs with Retrieval-Augmented Generation,” Front Med (Lausanne), vol. 12, p. 1599241, (2025).
[20] D. Le, A. Yunusoglu, K. Tiwari, M. Isik, and I. C. Dikmen, “Multimodal llm for intelligent transportation systems,” arXiv preprint arXiv:2412.11683, (2024).
[21] C. Zakka and others, “Almanac—Retrieval-Augmented Language Models for Clinical Medicine,” NEJM AI, 1(2), (2024).
[22] P. Lewis, E. Perez, A. Piktus, and others, “Retrieval-Augmented Generation for Knowledge-Intensive NLP,” arXiv preprint arXiv:2005.11401, (2020).
[23] L. Huang et al., “A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions,” ACM Trans Inf Syst, 43(2):1–55, Jan. (2025).
[24] J. Wei and others, “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,” in Advances in Neural Information Processing Systems, (2022).
[25] X. Wang and others, “Self-Consistency Improves Chain of Thought Reasoning in Language Models,” in Advances in Neural Information Processing Systems, (2022).
[26] Z. Jiang and W. Chen, “Active Retrieval Augmented Generation,” in Proceedings of EMNLP (2023). [27] C. Niu, Y. Wu, J. Zhu, and others, “RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models,” in Proceedings of ACL 2024 (Long Papers), (2024).
[28] S. Sivarajkumar and others, “Prompt Engineering for Health Large Language Models: An Empirical Reference Map,” J Med Internet Res, 26:e59850, (2024).
[29] U. Yilmaz, “RAG-LLM,” [Online]. Available: https://github.com/Umit-Yilmaz/RAG-LLM (2025).
[30] C. Chen et al., “Deep learning-based in-ambulance speech recognition and generation of prehospital emergency diagnostic summaries using large language models,” Int J Med Inform, 203:106029, (2025).
[31] F. Tong, R. Lederman, and S. D’Alfonso, “Clinical decision support systems in mental health: A scoping review of health professionals’ experiences,” Int J Med Inform, 199:105881, (2025).
[32] Karadag T, Abbasov T, Karadag MO. “Measuring, Evaluating, and Mapping the Electromagnetic Field Levels in Turgut Ozal Medical Center Building and Environment.” Journal of Turgut Ozal Medical Center 21(3):186–95., (2014).
[33] Y.E. Ekici, O. Akdağ, A.A. Aydın, T. Karadağ, Review And Analysis Of Real Time Big Data Of Electric Bus Consumption Data .pdf. In Ankara Internatıonal Congress On Scıentıfıc Research-VII. December (2022).

There are 32 citations in total.

Details

Primary Language	English
Subjects	Artificial Intelligence (Other)
Journal Section	Research Article
Authors	Leyla Turker Sener 0000-0002-7317-9086 Umit Yılmaz 0009-0000-0039-0801 İsmail Can Dikmen 0000-0002-7747-7777 Ali Arı 0000-0002-5071-6790 Teoman Karadag 0000-0002-7682-7771
Early Pub Date	November 18, 2025
Publication Date	November 28, 2025
Submission Date	October 25, 2025
Acceptance Date	November 13, 2025
Published in Issue	Year 2025 EARLY VIEW

Cite

APA	Turker Sener, L., Yılmaz, U., Dikmen, İ. C., … Arı, A. (2025). Evidence Bound Clinical Decision Support with RAG. Politeknik Dergisi1-1. https://doi.org/10.2339/politeknik.1810629
AMA	Turker Sener L, Yılmaz U, Dikmen İC, Arı A, Karadag T. Evidence Bound Clinical Decision Support with RAG. Politeknik Dergisi. Published online November 1, 2025:1-1. doi:10.2339/politeknik.1810629
Chicago	Turker Sener, Leyla, Umit Yılmaz, İsmail Can Dikmen, Ali Arı, and Teoman Karadag. “Evidence Bound Clinical Decision Support With RAG”. Politeknik Dergisi, November (November 2025), 1-1. https://doi.org/10.2339/politeknik.1810629.
EndNote	Turker Sener L, Yılmaz U, Dikmen İC, Arı A, Karadag T (November 1, 2025) Evidence Bound Clinical Decision Support with RAG. Politeknik Dergisi 1–1.
IEEE	L. Turker Sener, U. Yılmaz, İ. C. Dikmen, A. Arı, and T. Karadag, “Evidence Bound Clinical Decision Support with RAG”, Politeknik Dergisi, pp. 1–1, November2025, doi: 10.2339/politeknik.1810629.
ISNAD	Turker Sener, Leyla et al. “Evidence Bound Clinical Decision Support With RAG”. Politeknik Dergisi. November2025. 1-1. https://doi.org/10.2339/politeknik.1810629.
JAMA	Turker Sener L, Yılmaz U, Dikmen İC, Arı A, Karadag T. Evidence Bound Clinical Decision Support with RAG. Politeknik Dergisi. 2025;:1–1.
MLA	Turker Sener, Leyla et al. “Evidence Bound Clinical Decision Support With RAG”. Politeknik Dergisi, 2025, pp. 1-1, doi:10.2339/politeknik.1810629.
Vancouver	Turker Sener L, Yılmaz U, Dikmen İC, Arı A, Karadag T. Evidence Bound Clinical Decision Support with RAG. Politeknik Dergisi. 2025:1-.

Article Files

Full Text

download This work is licensed under Creative Commons Attribution-ShareAlike 4.0 International.