Araştırma Makalesi

Dissecting Medical RAG: Why Reranking Matters More Than Complexity in Question Answering

Cilt: 9 Sayı: 2 15 Mart 2026
PDF İndir
TR EN

Dissecting Medical RAG: Why Reranking Matters More Than Complexity in Question Answering

Öz

Retrieval-Augmented Generation (RAG) systems integrate large language models with information retrieval to ground responses in factual data. This study systematically evaluates the contribution of each RAG component in a medical question answering system through comprehensive ablation analysis. We designed a hierarchical RAG architecture with six key components: hierarchical intent classification, query rewriting, two-stage retrieval (dense retrieval with FAISS + cross-encoder reranking using Clinical-Longformer), and specialist routing. We conducted systematic ablation studies across seven configurations on 476 medical questions from MedQA benchmarks. Each configuration was evaluated independently using GPT-4o mini as an LLM judge across four metrics: context relevance, completeness, faithfulness, and correctness (1-5 Likert scale), with each metric assessed through separate evaluation calls to minimize inter-metric bias. Statistical significance was validated through paired t-tests with effect size calculations (Cohen’s d). The full system achieved an overall score of 3.64/5.0. Systematic ablation revealed two critical components: reranking (removal: -0.24 overall, P<0.001, d = -0.44) and specialists (removal: -0.17 overall, P<0.001, d = -0.29), both showing small but statistically significant effect sizes. Surprisingly, hierarchical intent classification degraded performance when included (+0.09 when removed, p = 0.010 for completeness), suggesting simpler query processing may be preferable. Query rewriting showed minimal impact (-0.04 overall), while raw query inclusion significantly affected completeness (-0.15, P<0.001). Reranking and specialist components are essential for medical RAG systems, with statistical significance confirmed across 476 queries. The counterintuitive finding that hierarchical intent classification degrades performance (P<0.05) suggests that architectural complexity does not always improve RAG system quality. These results provide evidence-based guidance for designing medical question answering systems, showing that reranking infrastructure and domain expertise are more critical than sophisticated query understanding techniques.

Anahtar Kelimeler

Etik Beyan

Ethics committee approval was not required for this study because of there was no study on animals or humans.

Kaynakça

  1. Belkin, N. J., Oddy, R. N., & Brooks, H. M. (1982). ASK for information retrieval: Part I. Background and theory. Journal of Documentation, 38(2), 61–71. https://doi.org/10.1108/eb026722
  2. Beltagy, I., Peters, M. E., & Cohan, A. (2020). Longformer: The long-document transformer. arXiv. https://arxiv.org/abs/2004.05150
  3. Ben Abacha, A., & Demner-Fushman, D. (2019). On the summarization of consumer health questions. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics içinde (pp. 2228–2234). https://doi.org/10.18653/v1/P19-1215
  4. Ben Abacha, A., Yim, W., Michalopoulos, G., & Lin, T. (2023). An investigation of evaluation methods in automatic medical note generation. Findings of the Association for Computational Linguistics: ACL 2023 içinde (pp. 2575–2588). https://doi.org/10.18653/v1/2023.findings-acl.161
  5. Carbonell, J., & Goldstein, J. (1998). The use of MMR, diversity-based reranking for reordering documents and producing summaries. Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval içinde (pp. 335–336). https://doi.org/10.1145/290941.291025
  6. Casanueva, I., Temčinas, T., Gerz, D., Henderson, M., & Vulić, I. (2020). Efficient intent detection with dual sentence encoders. Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI içinde (pp. 38–45). https://doi.org/10.18653/v1/2020.nlp4convai-1.5
  7. Chen, N., Su, X., Liu, T., Hao, Q., & Wei, M. (2020). A benchmark dataset and case study for Chinese medical question intent classification. BMC Medical Informatics and Decision Making, 20, 125. https://doi.org/10.1186/s12911-020-1122-3
  8. Chu, Y. W., Zhang, K., Malon, C., & Min, M. R. (2025). Reducing hallucinations of medical multimodal large language models with visual retrieval-augmented generation. arXiv. https://arxiv.org/abs/2502.15040

Ayrıntılar

Birincil Dil

İngilizce

Konular

Bilgi Modelleme, Yönetim ve Ontolojiler

Bölüm

Araştırma Makalesi

Yayımlanma Tarihi

15 Mart 2026

Gönderilme Tarihi

28 Aralık 2025

Kabul Tarihi

28 Ocak 2026

Yayımlandığı Sayı

Yıl 2026 Cilt: 9 Sayı: 2

Kaynak Göster

APA
Emekci, H., & Roxas, D. Q. (2026). Dissecting Medical RAG: Why Reranking Matters More Than Complexity in Question Answering. Black Sea Journal of Engineering and Science, 9(2), 549-561. https://doi.org/10.34248/bsengineering.1849342
AMA
1.Emekci H, Roxas DQ. Dissecting Medical RAG: Why Reranking Matters More Than Complexity in Question Answering. BSJ Eng. Sci. 2026;9(2):549-561. doi:10.34248/bsengineering.1849342
Chicago
Emekci, Hakan, ve Daniel Quillan Roxas. 2026. “Dissecting Medical RAG: Why Reranking Matters More Than Complexity in Question Answering”. Black Sea Journal of Engineering and Science 9 (2): 549-61. https://doi.org/10.34248/bsengineering.1849342.
EndNote
Emekci H, Roxas DQ (01 Mart 2026) Dissecting Medical RAG: Why Reranking Matters More Than Complexity in Question Answering. Black Sea Journal of Engineering and Science 9 2 549–561.
IEEE
[1]H. Emekci ve D. Q. Roxas, “Dissecting Medical RAG: Why Reranking Matters More Than Complexity in Question Answering”, BSJ Eng. Sci., c. 9, sy 2, ss. 549–561, Mar. 2026, doi: 10.34248/bsengineering.1849342.
ISNAD
Emekci, Hakan - Roxas, Daniel Quillan. “Dissecting Medical RAG: Why Reranking Matters More Than Complexity in Question Answering”. Black Sea Journal of Engineering and Science 9/2 (01 Mart 2026): 549-561. https://doi.org/10.34248/bsengineering.1849342.
JAMA
1.Emekci H, Roxas DQ. Dissecting Medical RAG: Why Reranking Matters More Than Complexity in Question Answering. BSJ Eng. Sci. 2026;9:549–561.
MLA
Emekci, Hakan, ve Daniel Quillan Roxas. “Dissecting Medical RAG: Why Reranking Matters More Than Complexity in Question Answering”. Black Sea Journal of Engineering and Science, c. 9, sy 2, Mart 2026, ss. 549-61, doi:10.34248/bsengineering.1849342.
Vancouver
1.Hakan Emekci, Daniel Quillan Roxas. Dissecting Medical RAG: Why Reranking Matters More Than Complexity in Question Answering. BSJ Eng. Sci. 01 Mart 2026;9(2):549-61. doi:10.34248/bsengineering.1849342

                           24890