Research Article
BibTex RIS Cite

Dissecting Medical RAG: Why Reranking Matters More Than Complexity in Question Answering

Year 2026, Volume: 9 Issue: 2, 549 - 561, 15.03.2026
https://doi.org/10.34248/bsengineering.1849342
https://izlik.org/JA59DL84TU

Abstract

Retrieval-Augmented Generation (RAG) systems integrate large language models with information retrieval to ground responses in factual data. This study systematically evaluates the contribution of each RAG component in a medical question answering system through comprehensive ablation analysis. We designed a hierarchical RAG architecture with six key components: hierarchical intent classification, query rewriting, two-stage retrieval (dense retrieval with FAISS + cross-encoder reranking using Clinical-Longformer), and specialist routing. We conducted systematic ablation studies across seven configurations on 476 medical questions from MedQA benchmarks. Each configuration was evaluated independently using GPT-4o mini as an LLM judge across four metrics: context relevance, completeness, faithfulness, and correctness (1-5 Likert scale), with each metric assessed through separate evaluation calls to minimize inter-metric bias. Statistical significance was validated through paired t-tests with effect size calculations (Cohen’s d). The full system achieved an overall score of 3.64/5.0. Systematic ablation revealed two critical components: reranking (removal: -0.24 overall, P<0.001, d = -0.44) and specialists (removal: -0.17 overall, P<0.001, d = -0.29), both showing small but statistically significant effect sizes. Surprisingly, hierarchical intent classification degraded performance when included (+0.09 when removed, p = 0.010 for completeness), suggesting simpler query processing may be preferable. Query rewriting showed minimal impact (-0.04 overall), while raw query inclusion significantly affected completeness (-0.15, P<0.001). Reranking and specialist components are essential for medical RAG systems, with statistical significance confirmed across 476 queries. The counterintuitive finding that hierarchical intent classification degrades performance (P<0.05) suggests that architectural complexity does not always improve RAG system quality. These results provide evidence-based guidance for designing medical question answering systems, showing that reranking infrastructure and domain expertise are more critical than sophisticated query understanding techniques.

Ethical Statement

Ethics committee approval was not required for this study because of there was no study on animals or humans.

Supporting Institution

-

Project Number

-

Thanks

-

References

  • Belkin, N. J., Oddy, R. N., & Brooks, H. M. (1982). ASK for information retrieval: Part I. Background and theory. Journal of Documentation, 38(2), 61–71. https://doi.org/10.1108/eb026722
  • Beltagy, I., Peters, M. E., & Cohan, A. (2020). Longformer: The long-document transformer. arXiv. https://arxiv.org/abs/2004.05150
  • Ben Abacha, A., & Demner-Fushman, D. (2019). On the summarization of consumer health questions. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics içinde (pp. 2228–2234). https://doi.org/10.18653/v1/P19-1215
  • Ben Abacha, A., Yim, W., Michalopoulos, G., & Lin, T. (2023). An investigation of evaluation methods in automatic medical note generation. Findings of the Association for Computational Linguistics: ACL 2023 içinde (pp. 2575–2588). https://doi.org/10.18653/v1/2023.findings-acl.161
  • Carbonell, J., & Goldstein, J. (1998). The use of MMR, diversity-based reranking for reordering documents and producing summaries. Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval içinde (pp. 335–336). https://doi.org/10.1145/290941.291025
  • Casanueva, I., Temčinas, T., Gerz, D., Henderson, M., & Vulić, I. (2020). Efficient intent detection with dual sentence encoders. Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI içinde (pp. 38–45). https://doi.org/10.18653/v1/2020.nlp4convai-1.5
  • Chen, N., Su, X., Liu, T., Hao, Q., & Wei, M. (2020). A benchmark dataset and case study for Chinese medical question intent classification. BMC Medical Informatics and Decision Making, 20, 125. https://doi.org/10.1186/s12911-020-1122-3
  • Chu, Y. W., Zhang, K., Malon, C., & Min, M. R. (2025). Reducing hallucinations of medical multimodal large language models with visual retrieval-augmented generation. arXiv. https://arxiv.org/abs/2502.15040
  • Deka, P., Jurek-Loughrey, A., & Padmanabhan, D. (2022). Improved methods to aid unsupervised evidence-based fact checking for online health news. Journal of Data Intelligence, 3(4), 474–505. https://doi.org/10.26421/JDI3.4-5
  • Dorfner, F. J., Dada, A., Busch, F., Makowski, M. R., Han, T., Truhn, D., Kleesiek, J., Sushil, M., Adams, L. C., & Bressem, K. K. (2025). Evaluating the effectiveness of biomedical fine-tuning for large language models on clinical tasks. Journal of the American Medical Informatics Association, 32(6), 1015–1024. https://doi.org/10.1093/jamia/ocaf045
  • Fu, T., Huang, K., Xiao, C., Glass, L. M., & Sun, J. (2022). HINT: Hierarchical interaction network for clinical-trial-outcome predictions. Patterns, 3(4), 100445. https://doi.org/10.1016/j.patter.2022.100445
  • Gu, Y., Tinn, R., Cheng, H., Lucas, M., Usuyama, N., Liu, X., Naumann, T., Gao, J., & Poon, H. (2021). Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare, 3(1), 1–23. https://doi.org/10.1145/3458754
  • Jeong, M., Sohn, J., Sung, M., & Kang, J. (2024). Improving medical reasoning through retrieval and self-reflection with retrieval-augmented large language models. Bioinformatics, 40(Ek 1), i119–i129. https://doi.org/10.1093/bioinformatics/btae238
  • Jin, D., Pan, E., Oufattole, N., Weng, W.-H., Fang, H., & Szolovits, P. (2021). What disease does this patient have? A large-scale open domain question answering dataset from medical exams. Applied Sciences, 11(14), 6421. https://doi.org/10.3390/app11146421
  • Jin, Q., Yuan, Z., Xiong, G., Yu, Q., Ying, H., Tan, C., Chen, M., Huang, S., Liu, X., & Yu, S. (2022). Biomedical question answering: A survey of approaches and challenges. ACM Computing Surveys, 55(2), 1–36. https://doi.org/10.1145/3490238
  • Johnson, J., Douze, M., & Jégou, H. (2021). Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3), 535–547. https://doi.org/10.1109/TBDATA.2019.2921572
  • Kim, J., Podlasek, A., Shidara, K., Liu, F., Alaa, A., & Bernardo, D. (2025). Limitations of large language models in clinical problem-solving arising from inflexible reasoning. Scientific Reports, 15(1), 39426. https://doi.org/10.1038/s41598-025-22940-0
  • Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems (9459–9474). Curran Associates, Inc.
  • Li, Y., Wehbe, R. M., Ahmad, F. S., Wang, H., & Luo, Y. (2023). A comparative study of pretrained language models for long clinical text. Journal of the American Medical Informatics Association, 30(2), 340–347. https://doi.org/10.1093/jamia/ocac225
  • Lu, W., Jiang, J., Shi, Y., Zhong, X., Gu, J., Huangfu, L., & Gong, M. (2023). Application of Entity-BERT model based on neuroscience and brain-like cognition in electronic medical record entity recognition. Frontiers in Neuroscience, 17, 1259652. https://doi.org/10.3389/fnins.2023.1259652
  • Maharjan, J., Garikipati, A., Singh, N. P., Cyrus, L., Sharma, M., Ciobanu, M., Barnes, G., Thapa, R., Mao, Q., & Das, R. (2024). OpenMedLM: Prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models. Scientific Reports, 14, 14156. https://doi.org/10.1038/s41598-024-64827-6
  • Manas, G., Aribandi, V., Kursuncu, U., Alambo, A., Shalin, V. L., Thirunarayan, K., Beich, J., Narasimhan, M., & Sheth, A. (2021). Knowledge-infused abstractive summarization of clinical diagnostic interviews: Framework development study. JMIR Mental Health, 8(5), e20865. https://doi.org/10.2196/20865
  • Mishra, R., Bian, J., Fiszman, M., Weir, C. R., Jonnalagadda, S., Mostafa, J., & Del Fiol, G. (2014). Text summarization in the biomedical domain: A systematic review of recent research. Journal of Biomedical Informatics, 52, 457–467. https://doi.org/10.1016/j.jbi.2014.06.009
  • OpenAI. (2024). GPT-4o mini: Advancing cost-efficient intelligence. OpenAI. https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/ (accessed on 29 January 2026).
  • Robertson, S. E. (1997). The probability ranking principle in IR. K. Sparck Jones & P. Willett (Editörler), Readings in information retrieval içinde (pp. 281–286). Morgan Kaufmann.
  • Selmi, W., Kammoun, H., & Amous, I. (2022). Semantic-based hybrid query reformulation for biomedical information retrieval. The Computer Journal, 66(9), 2296–2316. https://doi.org/10.1093/comjnl/bxac078
  • Simonds, T., Kurniawan, K., & Lau, J. H. (2024). MoDEM: Mixture of domain expert models. Proceedings of the 22nd Annual Workshop of the Australasian Language Technology Association içinde (pp. 75–88). Association for Computational Linguistics. https://aclanthology.org/2024.alta-1.6/
  • Sun, K., Yu, D., Chen, J., Yu, D., Choi, Y., & Cardie, C. (2019). DREAM: A challenge data set and models for dialogue-based reading comprehension. Transactions of the Association for Computational Linguistics, 7, 217–231. https://doi.org/10.1162/tacl_a_00264
  • Thirunavukarasu, A. J., Ting, D. S. J., Elangovan, K., Gutierrez, L., Tan, T. F., Ber, D. S. W., Lim, J. Y., Eckhoff, H. B., Lim, G. S. W., Tso, C. F., Wong, D. S. L., Li, S., Xu, L., Hussain, R. Z., Xiang, Y., Lu, Y., Liu, N., & Ting, D. S. W. (2023). Large language models in medicine. Nature Medicine, 29, 1930–1940. https://doi.org/10.1038/s41591-023-02448-8
  • Tsatsaronis, G., Balikas, G., Malakasiotis, P., Partalas, I., Zschunke, M., Alvers, M. R., Weissenborn, D., Krithara, A., Petridis, S., Polychronopoulos, D., Almirantis, Y., Pavlopoulos, J., Baskiotis, N., Gallinari, P., Artières, T., Ngomo, A. C., Heino, N., Gaussier, E., Barrio-Alvers, L., Schroeder, M., Androutsopoulos, I., & Paliouras, G. (2015). An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinformatics, 16, 138. https://doi.org/10.1186/s12859-015-0564-6
  • Wei, Z., Guo, D., Huang, D., Zhang, Q., Zhang, S., Jiang, K., & Li, R. (2024). Detecting and mitigating the ungrounded hallucinations in text generation by LLMs. Proceedings of the 2023 International Conference on Artificial Intelligence, Systems and Network Security içinde (pp. 1–6). https://doi.org/10.1145/3661638.3661653
  • Yang, D., Wei, J., Li, M., Liu, J., Liu, L., Hu, M., He, J., Ju, Y., Zhou, W., Liu, Y., & Zhang, L. (2025). MedAide: Information fusion and anatomy of medical intents via LLM-based agent collaboration. Information Fusion, 127, 103743. https://doi.org/10.1016/j.inffus.2025.103743
  • Zhang, Y., Yang, R., Xu, X., Li, R., Xiao, J., Shen, J., & Han, J. (2025). TELEClass: Taxonomy enrichment and LLM-enhanced hierarchical text classification with minimal supervision. WWW '25: Proceedings of the ACM on Web Conference 2025 içinde (pp. 2032–2042). https://doi.org/10.1145/3696410.3714940
  • Zhao, W., Deng, Z., Yadav, S., & Yu, P. S. (2024). Heterogeneous knowledge grounding for medical question answering with retrieval augmented large language model. Companion Proceedings of the ACM Web Conference 2024 içinde (pp. 1535–1538). https://doi.org/10.1145/3589335.3651941

Dissecting Medical RAG: Why Reranking Matters More Than Complexity in Question Answering

Year 2026, Volume: 9 Issue: 2, 549 - 561, 15.03.2026
https://doi.org/10.34248/bsengineering.1849342
https://izlik.org/JA59DL84TU

Abstract

Retrieval-Augmented Generation (RAG) systems integrate large language models with information retrieval to ground responses in factual data. This study systematically evaluates the contribution of each RAG component in a medical question answering system through comprehensive ablation analysis. We designed a hierarchical RAG architecture with six key components: hierarchical intent classification, query rewriting, two-stage retrieval (dense retrieval with FAISS + cross-encoder reranking using Clinical-Longformer), and specialist routing. We conducted systematic ablation studies across seven configurations on 476 medical questions from MedQA benchmarks. Each configuration was evaluated independently using GPT-4o mini as an LLM judge across four metrics: context relevance, completeness, faithfulness, and correctness (1-5 Likert scale), with each metric assessed through separate evaluation calls to minimize inter-metric bias. Statistical significance was validated through paired t-tests with effect size calculations (Cohen’s d). The full system achieved an overall score of 3.64/5.0. Systematic ablation revealed two critical components: reranking (removal: -0.24 overall, P<0.001, d = -0.44) and specialists (removal: -0.17 overall, P<0.001, d = -0.29), both showing small but statistically significant effect sizes. Surprisingly, hierarchical intent classification degraded performance when included (+0.09 when removed, p = 0.010 for completeness), suggesting simpler query processing may be preferable. Query rewriting showed minimal impact (-0.04 overall), while raw query inclusion significantly affected completeness (-0.15, P<0.001). Reranking and specialist components are essential for medical RAG systems, with statistical significance confirmed across 476 queries. The counterintuitive finding that hierarchical intent classification degrades performance (P<0.05) suggests that architectural complexity does not always improve RAG system quality. These results provide evidence-based guidance for designing medical question answering systems, showing that reranking infrastructure and domain expertise are more critical than sophisticated query understanding techniques.

Ethical Statement

Ethics committee approval was not required for this study because of there was no study on animals or humans.

Supporting Institution

-

Project Number

-

Thanks

-

References

  • Belkin, N. J., Oddy, R. N., & Brooks, H. M. (1982). ASK for information retrieval: Part I. Background and theory. Journal of Documentation, 38(2), 61–71. https://doi.org/10.1108/eb026722
  • Beltagy, I., Peters, M. E., & Cohan, A. (2020). Longformer: The long-document transformer. arXiv. https://arxiv.org/abs/2004.05150
  • Ben Abacha, A., & Demner-Fushman, D. (2019). On the summarization of consumer health questions. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics içinde (pp. 2228–2234). https://doi.org/10.18653/v1/P19-1215
  • Ben Abacha, A., Yim, W., Michalopoulos, G., & Lin, T. (2023). An investigation of evaluation methods in automatic medical note generation. Findings of the Association for Computational Linguistics: ACL 2023 içinde (pp. 2575–2588). https://doi.org/10.18653/v1/2023.findings-acl.161
  • Carbonell, J., & Goldstein, J. (1998). The use of MMR, diversity-based reranking for reordering documents and producing summaries. Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval içinde (pp. 335–336). https://doi.org/10.1145/290941.291025
  • Casanueva, I., Temčinas, T., Gerz, D., Henderson, M., & Vulić, I. (2020). Efficient intent detection with dual sentence encoders. Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI içinde (pp. 38–45). https://doi.org/10.18653/v1/2020.nlp4convai-1.5
  • Chen, N., Su, X., Liu, T., Hao, Q., & Wei, M. (2020). A benchmark dataset and case study for Chinese medical question intent classification. BMC Medical Informatics and Decision Making, 20, 125. https://doi.org/10.1186/s12911-020-1122-3
  • Chu, Y. W., Zhang, K., Malon, C., & Min, M. R. (2025). Reducing hallucinations of medical multimodal large language models with visual retrieval-augmented generation. arXiv. https://arxiv.org/abs/2502.15040
  • Deka, P., Jurek-Loughrey, A., & Padmanabhan, D. (2022). Improved methods to aid unsupervised evidence-based fact checking for online health news. Journal of Data Intelligence, 3(4), 474–505. https://doi.org/10.26421/JDI3.4-5
  • Dorfner, F. J., Dada, A., Busch, F., Makowski, M. R., Han, T., Truhn, D., Kleesiek, J., Sushil, M., Adams, L. C., & Bressem, K. K. (2025). Evaluating the effectiveness of biomedical fine-tuning for large language models on clinical tasks. Journal of the American Medical Informatics Association, 32(6), 1015–1024. https://doi.org/10.1093/jamia/ocaf045
  • Fu, T., Huang, K., Xiao, C., Glass, L. M., & Sun, J. (2022). HINT: Hierarchical interaction network for clinical-trial-outcome predictions. Patterns, 3(4), 100445. https://doi.org/10.1016/j.patter.2022.100445
  • Gu, Y., Tinn, R., Cheng, H., Lucas, M., Usuyama, N., Liu, X., Naumann, T., Gao, J., & Poon, H. (2021). Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare, 3(1), 1–23. https://doi.org/10.1145/3458754
  • Jeong, M., Sohn, J., Sung, M., & Kang, J. (2024). Improving medical reasoning through retrieval and self-reflection with retrieval-augmented large language models. Bioinformatics, 40(Ek 1), i119–i129. https://doi.org/10.1093/bioinformatics/btae238
  • Jin, D., Pan, E., Oufattole, N., Weng, W.-H., Fang, H., & Szolovits, P. (2021). What disease does this patient have? A large-scale open domain question answering dataset from medical exams. Applied Sciences, 11(14), 6421. https://doi.org/10.3390/app11146421
  • Jin, Q., Yuan, Z., Xiong, G., Yu, Q., Ying, H., Tan, C., Chen, M., Huang, S., Liu, X., & Yu, S. (2022). Biomedical question answering: A survey of approaches and challenges. ACM Computing Surveys, 55(2), 1–36. https://doi.org/10.1145/3490238
  • Johnson, J., Douze, M., & Jégou, H. (2021). Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3), 535–547. https://doi.org/10.1109/TBDATA.2019.2921572
  • Kim, J., Podlasek, A., Shidara, K., Liu, F., Alaa, A., & Bernardo, D. (2025). Limitations of large language models in clinical problem-solving arising from inflexible reasoning. Scientific Reports, 15(1), 39426. https://doi.org/10.1038/s41598-025-22940-0
  • Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems (9459–9474). Curran Associates, Inc.
  • Li, Y., Wehbe, R. M., Ahmad, F. S., Wang, H., & Luo, Y. (2023). A comparative study of pretrained language models for long clinical text. Journal of the American Medical Informatics Association, 30(2), 340–347. https://doi.org/10.1093/jamia/ocac225
  • Lu, W., Jiang, J., Shi, Y., Zhong, X., Gu, J., Huangfu, L., & Gong, M. (2023). Application of Entity-BERT model based on neuroscience and brain-like cognition in electronic medical record entity recognition. Frontiers in Neuroscience, 17, 1259652. https://doi.org/10.3389/fnins.2023.1259652
  • Maharjan, J., Garikipati, A., Singh, N. P., Cyrus, L., Sharma, M., Ciobanu, M., Barnes, G., Thapa, R., Mao, Q., & Das, R. (2024). OpenMedLM: Prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models. Scientific Reports, 14, 14156. https://doi.org/10.1038/s41598-024-64827-6
  • Manas, G., Aribandi, V., Kursuncu, U., Alambo, A., Shalin, V. L., Thirunarayan, K., Beich, J., Narasimhan, M., & Sheth, A. (2021). Knowledge-infused abstractive summarization of clinical diagnostic interviews: Framework development study. JMIR Mental Health, 8(5), e20865. https://doi.org/10.2196/20865
  • Mishra, R., Bian, J., Fiszman, M., Weir, C. R., Jonnalagadda, S., Mostafa, J., & Del Fiol, G. (2014). Text summarization in the biomedical domain: A systematic review of recent research. Journal of Biomedical Informatics, 52, 457–467. https://doi.org/10.1016/j.jbi.2014.06.009
  • OpenAI. (2024). GPT-4o mini: Advancing cost-efficient intelligence. OpenAI. https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/ (accessed on 29 January 2026).
  • Robertson, S. E. (1997). The probability ranking principle in IR. K. Sparck Jones & P. Willett (Editörler), Readings in information retrieval içinde (pp. 281–286). Morgan Kaufmann.
  • Selmi, W., Kammoun, H., & Amous, I. (2022). Semantic-based hybrid query reformulation for biomedical information retrieval. The Computer Journal, 66(9), 2296–2316. https://doi.org/10.1093/comjnl/bxac078
  • Simonds, T., Kurniawan, K., & Lau, J. H. (2024). MoDEM: Mixture of domain expert models. Proceedings of the 22nd Annual Workshop of the Australasian Language Technology Association içinde (pp. 75–88). Association for Computational Linguistics. https://aclanthology.org/2024.alta-1.6/
  • Sun, K., Yu, D., Chen, J., Yu, D., Choi, Y., & Cardie, C. (2019). DREAM: A challenge data set and models for dialogue-based reading comprehension. Transactions of the Association for Computational Linguistics, 7, 217–231. https://doi.org/10.1162/tacl_a_00264
  • Thirunavukarasu, A. J., Ting, D. S. J., Elangovan, K., Gutierrez, L., Tan, T. F., Ber, D. S. W., Lim, J. Y., Eckhoff, H. B., Lim, G. S. W., Tso, C. F., Wong, D. S. L., Li, S., Xu, L., Hussain, R. Z., Xiang, Y., Lu, Y., Liu, N., & Ting, D. S. W. (2023). Large language models in medicine. Nature Medicine, 29, 1930–1940. https://doi.org/10.1038/s41591-023-02448-8
  • Tsatsaronis, G., Balikas, G., Malakasiotis, P., Partalas, I., Zschunke, M., Alvers, M. R., Weissenborn, D., Krithara, A., Petridis, S., Polychronopoulos, D., Almirantis, Y., Pavlopoulos, J., Baskiotis, N., Gallinari, P., Artières, T., Ngomo, A. C., Heino, N., Gaussier, E., Barrio-Alvers, L., Schroeder, M., Androutsopoulos, I., & Paliouras, G. (2015). An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinformatics, 16, 138. https://doi.org/10.1186/s12859-015-0564-6
  • Wei, Z., Guo, D., Huang, D., Zhang, Q., Zhang, S., Jiang, K., & Li, R. (2024). Detecting and mitigating the ungrounded hallucinations in text generation by LLMs. Proceedings of the 2023 International Conference on Artificial Intelligence, Systems and Network Security içinde (pp. 1–6). https://doi.org/10.1145/3661638.3661653
  • Yang, D., Wei, J., Li, M., Liu, J., Liu, L., Hu, M., He, J., Ju, Y., Zhou, W., Liu, Y., & Zhang, L. (2025). MedAide: Information fusion and anatomy of medical intents via LLM-based agent collaboration. Information Fusion, 127, 103743. https://doi.org/10.1016/j.inffus.2025.103743
  • Zhang, Y., Yang, R., Xu, X., Li, R., Xiao, J., Shen, J., & Han, J. (2025). TELEClass: Taxonomy enrichment and LLM-enhanced hierarchical text classification with minimal supervision. WWW '25: Proceedings of the ACM on Web Conference 2025 içinde (pp. 2032–2042). https://doi.org/10.1145/3696410.3714940
  • Zhao, W., Deng, Z., Yadav, S., & Yu, P. S. (2024). Heterogeneous knowledge grounding for medical question answering with retrieval augmented large language model. Companion Proceedings of the ACM Web Conference 2024 içinde (pp. 1535–1538). https://doi.org/10.1145/3589335.3651941
There are 34 citations in total.

Details

Primary Language English
Subjects Information Modelling, Management and Ontologies
Journal Section Research Article
Authors

Hakan Emekci 0000-0002-4074-5600

Daniel Quillan Roxas 0009-0000-4484-6751

Project Number -
Submission Date December 28, 2025
Acceptance Date January 28, 2026
Publication Date March 15, 2026
DOI https://doi.org/10.34248/bsengineering.1849342
IZ https://izlik.org/JA59DL84TU
Published in Issue Year 2026 Volume: 9 Issue: 2

Cite

APA Emekci, H., & Roxas, D. Q. (2026). Dissecting Medical RAG: Why Reranking Matters More Than Complexity in Question Answering. Black Sea Journal of Engineering and Science, 9(2), 549-561. https://doi.org/10.34248/bsengineering.1849342
AMA 1.Emekci H, Roxas DQ. Dissecting Medical RAG: Why Reranking Matters More Than Complexity in Question Answering. BSJ Eng. Sci. 2026;9(2):549-561. doi:10.34248/bsengineering.1849342
Chicago Emekci, Hakan, and Daniel Quillan Roxas. 2026. “Dissecting Medical RAG: Why Reranking Matters More Than Complexity in Question Answering”. Black Sea Journal of Engineering and Science 9 (2): 549-61. https://doi.org/10.34248/bsengineering.1849342.
EndNote Emekci H, Roxas DQ (March 1, 2026) Dissecting Medical RAG: Why Reranking Matters More Than Complexity in Question Answering. Black Sea Journal of Engineering and Science 9 2 549–561.
IEEE [1]H. Emekci and D. Q. Roxas, “Dissecting Medical RAG: Why Reranking Matters More Than Complexity in Question Answering”, BSJ Eng. Sci., vol. 9, no. 2, pp. 549–561, Mar. 2026, doi: 10.34248/bsengineering.1849342.
ISNAD Emekci, Hakan - Roxas, Daniel Quillan. “Dissecting Medical RAG: Why Reranking Matters More Than Complexity in Question Answering”. Black Sea Journal of Engineering and Science 9/2 (March 1, 2026): 549-561. https://doi.org/10.34248/bsengineering.1849342.
JAMA 1.Emekci H, Roxas DQ. Dissecting Medical RAG: Why Reranking Matters More Than Complexity in Question Answering. BSJ Eng. Sci. 2026;9:549–561.
MLA Emekci, Hakan, and Daniel Quillan Roxas. “Dissecting Medical RAG: Why Reranking Matters More Than Complexity in Question Answering”. Black Sea Journal of Engineering and Science, vol. 9, no. 2, Mar. 2026, pp. 549-61, doi:10.34248/bsengineering.1849342.
Vancouver 1.Hakan Emekci, Daniel Quillan Roxas. Dissecting Medical RAG: Why Reranking Matters More Than Complexity in Question Answering. BSJ Eng. Sci. 2026 Mar. 1;9(2):549-61. doi:10.34248/bsengineering.1849342

                            24890