Hypothetical Document Embedding
Definition
Hypothetical Document Embedding (HyDE) is an approach that inverts the standard RAG retrieval step. Instead of embedding the user's query and searching for similar document chunks, HyDE first uses an LLM to generate a hypothetical document that would answer the query (without accessing the actual knowledge base), then embeds this hypothetical document and uses it for similarity search. The intuition is that a document-like text (a hypothetical answer) is closer in embedding space to real document chunks than a short query is — improving retrieval by closing the query-document embedding gap that makes short queries poor search vectors.
Why It Matters
HyDE addresses a fundamental challenge in asymmetric retrieval: query embeddings and document embeddings occupy different regions of the vector space, even when the query is asking exactly what the document answers. Short, direct queries produce compact embeddings while long document passages produce more spread-out embeddings, creating a systematic gap that harms retrieval. By generating a hypothetical document, HyDE produces an embedding that looks more like the real documents in the index, landing closer to relevant documents in the vector space. Research shows HyDE improves retrieval particularly for factoid questions and technical domains.
How It Works
HyDE implementation requires two LLM calls per query (versus zero for standard retrieval): a query rewriting call that generates the hypothetical document, and then the final generation call for the actual answer. The hypothetical document generation prompt asks the LLM to write a passage that would answer the query in the style of the knowledge base. The hypothetical document is embedded (not the original query), and this embedding is used for ANN search. Retrieved real documents are then passed as context to the LLM for final answer generation. HyDE adds 100-300ms latency from the extra LLM call but can significantly improve retrieval for certain query types.
HyDE — Hypothetical Document Embedding Flow
Original Query
“What are the return policy rules?”
Short — low information density
LLM Generates Hypothetical Answer
Hypothetical Document (fake but plausible)
Our return policy allows customers to return products within 30 days of purchase for a full refund. Items must be in original condition. Digital downloads are non-refundable. To initiate a return, contact support with your order number...
not a real doc — LLM-generatedEmbed Hypothetical Document
Embedding
Vector Search Using Hypothetical Embedding
Direct query embedding
Short vector — sparse signal, lower recall
HyDE embedding
Rich vector — closer to document space, higher recall
Real-World Example
A 99helpers customer with a technical API documentation knowledge base compares standard query embedding versus HyDE for retrieval. For precise factual questions like 'What is the rate limit for the search endpoint?', standard query embedding recall@5 is 72% while HyDE recall@5 is 89%. The improvement is driven by HyDE generating hypothetical answers that contain the specific technical vocabulary (rate limits, API endpoints, request headers) that matches the documentation language, while the raw user query may use different phrasing.
Common Mistakes
- ✕Applying HyDE universally without testing whether it improves your specific query distribution — HyDE helps for some query types and may hurt for others (particularly short, precise keyword queries)
- ✕Ignoring the latency cost — HyDE requires an additional LLM call; measure whether the quality improvement justifies the added latency for your use case
- ✕Not conditioning the hypothetical document generation on the knowledge base domain — generic hypothetical documents may miss domain-specific vocabulary that would improve retrieval
Related Terms
Query Expansion
Query expansion is a retrieval technique that augments the original user query with related terms, synonyms, or alternative phrasings before search, improving recall by retrieving relevant documents that would not match the original query vocabulary.
Query Rewriting
Query rewriting is a technique that transforms a user's original query into an improved version — clearer, more complete, or better suited for retrieval — using an LLM to improve recall and relevance before searching the knowledge base.
Dense Retrieval
Dense retrieval is a retrieval approach that encodes both queries and documents into dense embedding vectors and finds relevant documents by computing vector similarity, enabling semantic matching beyond exact keyword overlap.
Embedding Model
An embedding model is a machine learning model that converts text (or other data) into dense numerical vectors that capture semantic meaning, enabling similarity search and serving as the foundation of RAG retrieval systems.
Retrieval-Augmented Generation
Retrieval-Augmented Generation (RAG) is an AI architecture that enhances large language model responses by first retrieving relevant documents from an external knowledge base and then using that retrieved content as context when generating an answer.
Ready to build your AI chatbot?
Put these concepts into practice with 99helpers — no code required.
Start free trial →