Retrieval-Augmented Generation (RAG)

Text Embedding

Definition

A text embedding maps a piece of text—a word, sentence, paragraph, or document—to a dense vector of floating-point numbers (typically 256 to 3072 dimensions). Semantically similar texts produce vectors that are geometrically close in this high-dimensional space, as measured by cosine similarity or dot product. Embedding models are trained on large text corpora to learn these semantic representations: 'dog' and 'puppy' land near each other; 'bank (financial)' and 'bank (river)' land in different regions. In RAG systems, both documents (at indexing time) and user queries (at retrieval time) are converted to embeddings, and retrieval finds documents whose embeddings are nearest to the query embedding.

Why It Matters

Text embeddings enable semantic search—the ability to find relevant content even when the exact query words don't appear in the document. This is transformative for knowledge base search: a user asking 'how do I fix login issues?' should retrieve content about 'authentication troubleshooting' even though the words don't overlap. Without embeddings, only keyword search is possible, missing synonyms, paraphrases, and conceptual matches. For 99helpers chatbots, switching from keyword search to embedding-based semantic search typically increases the fraction of queries that retrieve relevant content by 30-50%, dramatically improving answer quality.

How It Works

Embedding models accept text as input and return a fixed-size vector. Popular models include OpenAI's text-embedding-3-small (1536 dimensions, cheap) and text-embedding-3-large (3072 dimensions, higher quality), Cohere's embed-v3, and open-source models like BGE-large and E5-large. Embeddings are computed by passing text through a transformer encoder and pooling the output token representations. Quality varies by model and domain—a model trained on web text may perform poorly on technical jargon. The embedding dimensionality trades off storage cost and retrieval speed against representation quality. Normalized embeddings (unit vectors) support cosine similarity via dot product, which is faster to compute.

Text Embedding Space — 2D Projection of Semantic Clusters

2D t-SNE projection — actual embeddings are 768–1536 dimensions

Billing / Paymentinvoicerefundpayment failedbilling cycleTechnical / APIAPI keyrate limitwebhookSDK errorAccount / Loginreset password2FA setuplogin errorSSOQueryreset my passworddim1dim2

Top-3 nearest neighbors

1.reset passworddist: 0.08
2.login errordist: 0.14
3.2FA setupdist: 0.19

Billing / Payment

4 terms

Technical / API

4 terms

Account / Login

4 terms

Real-World Example

A 99helpers knowledge base contains 10,000 chunks. All chunks are embedded with OpenAI text-embedding-3-small, producing 1536-dimensional vectors stored in Pinecone. When a user asks 'Why am I getting a 403 error?', the query is embedded with the same model, producing a query vector. Pinecone performs approximate nearest neighbor search, finding the 5 chunks most similar to the query vector. The top result is a chunk discussing 'API authentication and permission errors'—retrieved despite zero word overlap with the query, because both vectors occupy a similar region of the embedding space.

Common Mistakes

  • Using different embedding models for indexing and querying—query vectors must be in the same embedding space as document vectors or retrieval is meaningless.
  • Embedding overly long chunks that exceed the model's token limit—text is silently truncated, producing embeddings that represent only part of the chunk.
  • Choosing an embedding model based only on benchmark scores without evaluating on your specific domain—general benchmarks may not reflect performance on technical support content.

Related Terms

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →
What is Text Embedding? Text Embedding Definition & Guide | 99helpers | 99helpers.com