RAG Fusion
Definition
RAG Fusion is an approach that combines multi-query generation with Reciprocal Rank Fusion (RRF) to improve retrieval comprehensiveness. An LLM generates multiple reformulations of the original query (typically 3-5 variations approaching the topic from different angles), each variation is used for a separate vector search, and the result lists are merged using RRF — a rank fusion algorithm that favors documents appearing consistently across multiple query result lists. Documents that appear in multiple query result lists are boosted, while documents unique to one query result list are treated as weaker signals. The result is a more comprehensive candidate set than any single query provides.
Why It Matters
RAG Fusion addresses a core limitation of single-query retrieval: any single query formulation has blind spots. A user asking 'how do I integrate with Salesforce?' may trigger retrieval of integration docs, but miss CRM-specific articles, API authentication guides, and field mapping documentation that would all contribute to a complete answer. Multiple query variations collectively cover more of the relevant document space. RRF's rank fusion is particularly effective because documents that multiple independent queries agree on are almost certainly highly relevant.
How It Works
RAG Fusion is implemented by: 1) using an LLM to generate N query variations from the original query (prompt: 'Generate 4 related search queries for: {original_query}'), 2) performing separate vector searches for each variation (and optionally BM25 for sparse retrieval), 3) applying RRF to merge the N result lists (score each document as sum of 1/(k + rank_i) across all lists i where it appears), 4) sorting by merged RRF score and selecting the top documents. The RRF constant k (typically 60) controls how strongly the algorithm rewards documents that appear at the top of individual result lists. After fusion, optional reranking can further refine the merged list.
RAG Fusion — Merging Multiple Query Retrievals
Original query
"How do I set up the chatbot API?"
chatbot API setup guide
integrate chatbot REST API
API configuration chatbot docs
Fused Ranked List
Real-World Example
A 99helpers customer implements RAG Fusion for their knowledge base covering a complex product with many interconnected features. For questions that span multiple feature areas ('How does the AI chatbot handle pricing questions for Enterprise customers?'), single-query retrieval returns an average of 2.3 relevant documents. RAG Fusion with 4 query variations returns 5.7 relevant documents on average, providing the LLM with broader context. Complete answer rate (answers that address all aspects of multi-part questions) improves from 54% to 79%.
Common Mistakes
- ✕Generating too many query variations — beyond 5-6 variations, additional queries add noise and latency without improving results meaningfully
- ✕Not tuning the RRF constant k for your use case — the default k=60 is a reasonable starting point but may not be optimal for your query distribution
- ✕Using RAG Fusion for simple, direct questions — single-hop factual questions ('What is the refund policy?') do not benefit from multi-query; apply fusion selectively to complex queries
Related Terms
Multi-Query Retrieval
Multi-query retrieval generates multiple alternative phrasings of the user's question and retrieves documents for each phrasing separately, then merges results to achieve higher recall than any single query formulation would provide.
Hybrid Retrieval
Hybrid retrieval combines dense (semantic) and sparse (keyword) search methods to leverage the strengths of both, using a fusion step to merge their results into a single ranked list for better overall retrieval quality.
Reranking
Reranking is a second-stage retrieval step that takes an initial set of candidate documents returned by a fast retrieval method and reorders them using a more accurate but computationally expensive model to improve final result quality.
Retrieval-Augmented Generation
Retrieval-Augmented Generation (RAG) is an AI architecture that enhances large language model responses by first retrieving relevant documents from an external knowledge base and then using that retrieved content as context when generating an answer.
Dense Retrieval
Dense retrieval is a retrieval approach that encodes both queries and documents into dense embedding vectors and finds relevant documents by computing vector similarity, enabling semantic matching beyond exact keyword overlap.
Ready to build your AI chatbot?
Put these concepts into practice with 99helpers — no code required.
Start free trial →