Retrieval-Augmented Generation (RAG)

Hypothetical Document Embedding

Definition

Hypothetical Document Embedding (HyDE) is an approach that inverts the standard RAG retrieval step. Instead of embedding the user's query and searching for similar document chunks, HyDE first uses an LLM to generate a hypothetical document that would answer the query (without accessing the actual knowledge base), then embeds this hypothetical document and uses it for similarity search. The intuition is that a document-like text (a hypothetical answer) is closer in embedding space to real document chunks than a short query is — improving retrieval by closing the query-document embedding gap that makes short queries poor search vectors.

Why It Matters

HyDE addresses a fundamental challenge in asymmetric retrieval: query embeddings and document embeddings occupy different regions of the vector space, even when the query is asking exactly what the document answers. Short, direct queries produce compact embeddings while long document passages produce more spread-out embeddings, creating a systematic gap that harms retrieval. By generating a hypothetical document, HyDE produces an embedding that looks more like the real documents in the index, landing closer to relevant documents in the vector space. Research shows HyDE improves retrieval particularly for factoid questions and technical domains.

How It Works

HyDE implementation requires two LLM calls per query (versus zero for standard retrieval): a query rewriting call that generates the hypothetical document, and then the final generation call for the actual answer. The hypothetical document generation prompt asks the LLM to write a passage that would answer the query in the style of the knowledge base. The hypothetical document is embedded (not the original query), and this embedding is used for ANN search. Retrieved real documents are then passed as context to the LLM for final answer generation. HyDE adds 100-300ms latency from the extra LLM call but can significantly improve retrieval for certain query types.

HyDE — Hypothetical Document Embedding Flow

Original Query

“What are the return policy rules?”

Short — low information density

LLM Generates Hypothetical Answer

Hypothetical Document (fake but plausible)

Our return policy allows customers to return products within 30 days of purchase for a full refund. Items must be in original condition. Digital downloads are non-refundable. To initiate a return, contact support with your order number...

not a real doc — LLM-generated

Embed Hypothetical Document

Long hypothetical doc text...

Embedding

0.41-0.620.33...

Vector Search Using Hypothetical Embedding

Refund & Return Policy

real doc0.93

How to Return an Item

real doc0.89

Order Cancellation FAQ

real doc0.81

Direct query embedding

Short vector — sparse signal, lower recall

HyDE embedding

Rich vector — closer to document space, higher recall

Real-World Example

A 99helpers customer with a technical API documentation knowledge base compares standard query embedding versus HyDE for retrieval. For precise factual questions like 'What is the rate limit for the search endpoint?', standard query embedding recall@5 is 72% while HyDE recall@5 is 89%. The improvement is driven by HyDE generating hypothetical answers that contain the specific technical vocabulary (rate limits, API endpoints, request headers) that matches the documentation language, while the raw user query may use different phrasing.

Common Mistakes

✕Applying HyDE universally without testing whether it improves your specific query distribution — HyDE helps for some query types and may hurt for others (particularly short, precise keyword queries)
✕Ignoring the latency cost — HyDE requires an additional LLM call; measure whether the quality improvement justifies the added latency for your use case
✕Not conditioning the hypothetical document generation on the knowledge base domain — generic hypothetical documents may miss domain-specific vocabulary that would improve retrieval

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →

Hypothetical Document Embedding

Definition

Why It Matters

How It Works

Real-World Example

Common Mistakes

Related Terms

Query Expansion

Query Rewriting

Dense Retrieval

Embedding Model

Retrieval-Augmented Generation

Ready to build your AI chatbot?