Retrieval-Augmented Generation (RAG)

Dense Retrieval

Definition

Dense retrieval (also called dense passage retrieval or neural retrieval) is a search approach where both documents and queries are encoded as dense real-valued vectors using neural embedding models, and retrieval is performed by finding the document vectors most similar to the query vector. 'Dense' refers to the vector representation — unlike sparse vectors (like TF-IDF or BM25 representations where most dimensions are zero), dense embedding vectors have non-zero values across all dimensions, encoding rich semantic information. Dense retrieval excels at finding semantically relevant content when the query and document use different vocabulary — the key advantage over keyword-based sparse retrieval.

Why It Matters

Dense retrieval is the technological leap that enables AI chatbots to understand user intent rather than just matching keywords. Before dense retrieval, search systems required users to phrase queries in the same vocabulary used in the documentation. With dense retrieval, a user asking 'why is my account locked?' can retrieve the article titled 'Password Attempts and Account Security' even though neither 'locked', 'attempts', nor 'security' appear in the query. This vocabulary-independence is transformative for support applications where customers describe problems in their own words.

How It Works

Dense retrieval is implemented through a bi-encoder architecture: a single embedding model (or two separate models) encodes queries and documents into vectors in a shared semantic space. Document embeddings are computed offline during indexing. Query embeddings are computed in real time when a user sends a message. Retrieval finds document vectors most similar to the query vector using ANN search. Bi-encoder models can be general-purpose (OpenAI embeddings, SBERT) or fine-tuned for specific domains or query-document asymmetry. Dense retrieval quality is measured by recall@k (what fraction of relevant documents appear in the top-k results).

Dense vs Sparse Retrieval

Query

"stop my plan"

Dense Retrieval

Query

"stop my plan"

Embed

[0.23, -0.81, 0.45...]

Semantic similarity search

Found (score: 0.91)

"How to cancel your subscription"

Understands semantic synonyms

"stop" = "cancel" = "terminate"

Sparse (BM25)

Query

"stop my plan"

Keyword tokenization

["stop", "plan"]

Inverted index lookup

Not found

"How to cancel your subscription"

Misses semantic synonyms

No "stop" in doc → zero match

Dense retrieval advantage

cancel = stop = terminatepurchase = buyissue = problem = errorguide = tutorial = walkthrough

Real-World Example

A 99helpers customer compares keyword search versus dense retrieval on their knowledge base. For the query 'I cannot get in to my dashboard', keyword search retrieves only documents containing the words 'cannot', 'get', and 'dashboard'. Dense retrieval additionally retrieves articles about login issues, authentication errors, and access problems — all highly relevant despite not containing the exact query words. On a 100-query evaluation set, dense retrieval finds the relevant article in the top 3 results 84% of the time, versus 51% for keyword search.

Common Mistakes

✕Assuming dense retrieval is always superior to sparse retrieval — dense retrieval underperforms on precise technical queries (product names, error codes) where BM25 keyword matching excels; hybrid retrieval combines both
✕Not fine-tuning the embedding model on domain-specific data — general-purpose embeddings underperform specialized embeddings for highly technical or niche domains
✕Ignoring retrieval evaluation — deploying dense retrieval without measuring recall on representative queries makes it impossible to detect retrieval failures

Related Terms

Sparse Retrieval

Sparse retrieval is a search approach based on exact or weighted keyword matching, where documents and queries are represented as high-dimensional sparse vectors with most values being zero, and similarity is measured by term overlap.

Hybrid Retrieval

Hybrid retrieval combines dense (semantic) and sparse (keyword) search methods to leverage the strengths of both, using a fusion step to merge their results into a single ranked list for better overall retrieval quality.

Embedding Model

An embedding model is a machine learning model that converts text (or other data) into dense numerical vectors that capture semantic meaning, enabling similarity search and serving as the foundation of RAG retrieval systems.

BM25

BM25 (Best Match 25) is the industry-standard sparse retrieval algorithm that scores documents against a query based on term frequency, inverse document frequency, and document length normalization, widely used in search engines and hybrid RAG systems.

Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) is an AI architecture that enhances large language model responses by first retrieving relevant documents from an external knowledge base and then using that retrieved content as context when generating an answer.

← Retrieval-Augmented Generation (RAG)← Glossary Hub

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →