Retrieval-Augmented Generation (RAG)

Cosine Similarity

Definition

Cosine similarity is a metric that quantifies the similarity of two vectors by measuring the cosine of the angle between them in a multi-dimensional space, rather than their absolute distance. The formula is: cosine_similarity(A, B) = (A · B) / (|A| × |B|), where A · B is the dot product and |A|, |B| are the vector magnitudes. The result ranges from -1 (perfectly opposite) through 0 (orthogonal/unrelated) to 1 (identical direction). In NLP and RAG systems, embedding vectors representing semantically similar texts point in similar directions in the vector space, making cosine similarity the standard metric for comparing their meaning.

Why It Matters

Cosine similarity is the mathematical backbone of semantic search and RAG retrieval. When a user query is embedded into a vector, finding the most relevant documents means finding the knowledge base vectors with the highest cosine similarity to the query vector. Cosine similarity is preferred over Euclidean distance for text embeddings because it is magnitude-invariant — it compares directional meaning rather than raw numerical values, making it robust to differences in text length (a short and long document about the same topic will point in the same direction even if their vector magnitudes differ).

How It Works

Cosine similarity computation is straightforward: normalize both vectors to unit length (divide each by its magnitude), then compute the dot product. After normalization, the dot product equals the cosine of the angle between them. Vector databases and embedding libraries handle this computation efficiently at scale. For a vector database with a million stored vectors, computing exact cosine similarity against all vectors for every query would be too slow — approximate nearest neighbor algorithms (HNSW, IVF) exploit vector space geometry to find high-similarity vectors without exhaustive comparison.

Cosine Similarity — Vector Angle Comparison

Formula

cos(θ) = A · B / (|A| × |B|)

Identical meaning

1.0

Same direction

θ = 0°

Unrelated

0.0

Perpendicular

θ = 90°

Opposite meaning

-1.0

Opposite direction

θ = 180°

Score scale

-1.0

1.0

OppositeUnrelatedIdentical

In RAG: chunks with cosine score above ~0.75 are typically considered relevant; below 0.5 are discarded to reduce noise.

Real-World Example

A 99helpers customer building their RAG system calibrates their similarity threshold by running 100 test queries through the system and reviewing the retrieved documents. They find that chunks with cosine similarity above 0.78 consistently contain relevant information, while chunks below 0.65 are usually irrelevant. They configure the system to include all chunks above 0.78, add a low-confidence flag when the highest score is between 0.65 and 0.78, and escalate to human agents when the best score is below 0.65.

Common Mistakes

✕Using cosine similarity thresholds from one embedding model for a different model — similarity score ranges vary by model and must be recalibrated
✕Relying solely on cosine similarity without considering whether the retrieved content actually answers the question — high similarity to a tangentially related document can mislead the LLM
✕Computing cosine similarity without normalizing vectors — without normalization, longer documents produce higher raw dot products regardless of semantic relevance

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →

Cosine Similarity

Definition

Why It Matters

How It Works

Real-World Example

Common Mistakes

Related Terms

Semantic Similarity

Embedding Model

Vector Database

Dense Retrieval

Approximate Nearest Neighbor

Ready to build your AI chatbot?