Retrieval-Augmented Generation (RAG)

RAG Fusion

Definition

RAG Fusion is an approach that combines multi-query generation with Reciprocal Rank Fusion (RRF) to improve retrieval comprehensiveness. An LLM generates multiple reformulations of the original query (typically 3-5 variations approaching the topic from different angles), each variation is used for a separate vector search, and the result lists are merged using RRF — a rank fusion algorithm that favors documents appearing consistently across multiple query result lists. Documents that appear in multiple query result lists are boosted, while documents unique to one query result list are treated as weaker signals. The result is a more comprehensive candidate set than any single query provides.

Why It Matters

RAG Fusion addresses a core limitation of single-query retrieval: any single query formulation has blind spots. A user asking 'how do I integrate with Salesforce?' may trigger retrieval of integration docs, but miss CRM-specific articles, API authentication guides, and field mapping documentation that would all contribute to a complete answer. Multiple query variations collectively cover more of the relevant document space. RRF's rank fusion is particularly effective because documents that multiple independent queries agree on are almost certainly highly relevant.

How It Works

RAG Fusion is implemented by: 1) using an LLM to generate N query variations from the original query (prompt: 'Generate 4 related search queries for: {original_query}'), 2) performing separate vector searches for each variation (and optionally BM25 for sparse retrieval), 3) applying RRF to merge the N result lists (score each document as sum of 1/(k + rank_i) across all lists i where it appears), 4) sorting by merged RRF score and selecting the top documents. The RRF constant k (typically 60) controls how strongly the algorithm rewards documents that appear at the top of individual result lists. After fusion, optional reranking can further refine the merged list.

RAG Fusion — Merging Multiple Query Retrievals

Original query

"How do I set up the chatbot API?"

Generate N query variants
1

chatbot API setup guide

1.Doc A
2.Doc C
3.Doc F
4.Doc B
2

integrate chatbot REST API

1.Doc C
2.Doc A
3.Doc E
4.Doc D
3

API configuration chatbot docs

1.Doc B
2.Doc C
3.Doc A
4.Doc G
Reciprocal Rank Fusion (RRF)

Fused Ranked List

Doc CRRF: 0.164Top-3 in all lists
Doc ARRF: 0.158Top-2 in 2 lists
Doc BRRF: 0.121Appeared in 2 lists
Doc FRRF: 0.097Single-list contributor
RRF reduces positional bias — documents consistently ranked across queries rise to the top

Real-World Example

A 99helpers customer implements RAG Fusion for their knowledge base covering a complex product with many interconnected features. For questions that span multiple feature areas ('How does the AI chatbot handle pricing questions for Enterprise customers?'), single-query retrieval returns an average of 2.3 relevant documents. RAG Fusion with 4 query variations returns 5.7 relevant documents on average, providing the LLM with broader context. Complete answer rate (answers that address all aspects of multi-part questions) improves from 54% to 79%.

Common Mistakes

  • Generating too many query variations — beyond 5-6 variations, additional queries add noise and latency without improving results meaningfully
  • Not tuning the RRF constant k for your use case — the default k=60 is a reasonable starting point but may not be optimal for your query distribution
  • Using RAG Fusion for simple, direct questions — single-hop factual questions ('What is the refund policy?') do not benefit from multi-query; apply fusion selectively to complex queries

Related Terms

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →
What is RAG Fusion? RAG Fusion Definition & Guide | 99helpers | 99helpers.com