Retrieval-Augmented Generation (RAG)

Embedding Model

Definition

An embedding model is a neural network trained to convert arbitrary input — sentences, paragraphs, documents, images — into fixed-length numerical vectors (embeddings) in a high-dimensional space, where semantically similar inputs produce vectors that are geometrically close together. Embedding models are trained on large corpora using contrastive or self-supervised objectives that teach the model to encode meaning into the vector space. In RAG systems, embedding models serve two functions: indexing (converting knowledge base documents into stored vectors) and query processing (converting user queries into query vectors for similarity search). Common embedding models include OpenAI's text-embedding-3-large, Google's text-embedding-004, and open-source models like sentence-transformers/all-mpnet-base-v2.

Why It Matters

The embedding model is the semantic understanding engine of a RAG system — it determines whether 'how to cancel my subscription' and 'subscription cancellation process' are recognized as the same question, or treated as unrelated text. A poor embedding model produces inaccurate vector representations where semantically similar content ends up far apart in vector space, causing retrieval to miss relevant documents even when they exist. The choice of embedding model (size, training domain, dimensionality) significantly affects retrieval quality, latency, and cost. For domain-specific applications, fine-tuned or domain-specialized embedding models can substantially outperform general-purpose ones.

How It Works

Embedding models process input text through transformer layers that progressively abstract the text into a dense vector representation. Most text embedding models use a pooling strategy (mean pooling of token embeddings, or the [CLS] token) to produce a single fixed-length vector from variable-length input. Embedding dimensions typically range from 384 (small, fast models) to 3072 (large, high-quality models like OpenAI text-embedding-3-large). When choosing an embedding model, key tradeoffs include: vector quality vs. latency (larger models are more accurate but slower), cost (API-based models charge per token), maximum input length (models have a token limit; documents exceeding it must be truncated or chunked), and dimensionality (higher dimensions improve quality but increase storage and search costs).

Embedding Model — Text to Vector Transformation

Input Text

“How do I reset my password?”

Embedding Model

Transformer

Dense Vector

0.23

-0.81

0.45

0.12

...

Different Texts — Different Vectors

How do I reset my password?[0.23, -0.81, 0.45, 0.12, ...]

Steps to change account password[0.21, -0.79, 0.47, 0.14, ...]

How to cancel my subscription?[-0.54, 0.33, -0.12, 0.68, ...]

Similar queries (rows 1-2) produce vectors that are geometrically close

2D Vector Space (Simplified)

dim-1dim-2

reset password

change password

cancel subscription

close cluster

Real-World Example

A 99helpers customer switches their knowledge base embedding model from a smaller general-purpose model to a larger model optimized for asymmetric retrieval (short query to long document matching). On their evaluation dataset of 200 customer questions with known correct answer articles, retrieval recall@5 (the relevant article appearing in the top 5 results) improves from 71% to 89%. Chatbot answer accuracy correspondingly improves from 64% to 81%, validating that the embedding model upgrade was the primary bottleneck.

Common Mistakes

✕Using the same embedding model for indexing and querying is required — if you embed documents with model A, you must embed queries with model A; mixing models produces meaningless similarity scores
✕Not re-embedding the knowledge base when switching models — stored embeddings from the old model are incompatible with queries embedded by the new model
✕Ignoring the model's maximum input length — text exceeding the model's context window is silently truncated, producing low-quality embeddings for long documents

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →

Embedding Model

Definition

Why It Matters

How It Works

Real-World Example

Common Mistakes

Related Terms

Vector Database

Retrieval-Augmented Generation

Text Embedding

Semantic Similarity

Dense Retrieval

Ready to build your AI chatbot?