Cosine Similarity
Definition
Cosine similarity is a metric that quantifies the similarity of two vectors by measuring the cosine of the angle between them in a multi-dimensional space, rather than their absolute distance. The formula is: cosine_similarity(A, B) = (A · B) / (|A| × |B|), where A · B is the dot product and |A|, |B| are the vector magnitudes. The result ranges from -1 (perfectly opposite) through 0 (orthogonal/unrelated) to 1 (identical direction). In NLP and RAG systems, embedding vectors representing semantically similar texts point in similar directions in the vector space, making cosine similarity the standard metric for comparing their meaning.
Why It Matters
Cosine similarity is the mathematical backbone of semantic search and RAG retrieval. When a user query is embedded into a vector, finding the most relevant documents means finding the knowledge base vectors with the highest cosine similarity to the query vector. Cosine similarity is preferred over Euclidean distance for text embeddings because it is magnitude-invariant — it compares directional meaning rather than raw numerical values, making it robust to differences in text length (a short and long document about the same topic will point in the same direction even if their vector magnitudes differ).
How It Works
Cosine similarity computation is straightforward: normalize both vectors to unit length (divide each by its magnitude), then compute the dot product. After normalization, the dot product equals the cosine of the angle between them. Vector databases and embedding libraries handle this computation efficiently at scale. For a vector database with a million stored vectors, computing exact cosine similarity against all vectors for every query would be too slow — approximate nearest neighbor algorithms (HNSW, IVF) exploit vector space geometry to find high-similarity vectors without exhaustive comparison.
Cosine Similarity — Vector Angle Comparison
Formula
cos(θ) = A · B / (|A| × |B|)
Identical meaning
1.0
Same direction
θ = 0°
Unrelated
0.0
Perpendicular
θ = 90°
Opposite meaning
-1.0
Opposite direction
θ = 180°
Score scale
In RAG: chunks with cosine score above ~0.75 are typically considered relevant; below 0.5 are discarded to reduce noise.
Real-World Example
A 99helpers customer building their RAG system calibrates their similarity threshold by running 100 test queries through the system and reviewing the retrieved documents. They find that chunks with cosine similarity above 0.78 consistently contain relevant information, while chunks below 0.65 are usually irrelevant. They configure the system to include all chunks above 0.78, add a low-confidence flag when the highest score is between 0.65 and 0.78, and escalate to human agents when the best score is below 0.65.
Common Mistakes
- ✕Using cosine similarity thresholds from one embedding model for a different model — similarity score ranges vary by model and must be recalibrated
- ✕Relying solely on cosine similarity without considering whether the retrieved content actually answers the question — high similarity to a tangentially related document can mislead the LLM
- ✕Computing cosine similarity without normalizing vectors — without normalization, longer documents produce higher raw dot products regardless of semantic relevance
Related Terms
Semantic Similarity
Semantic similarity is a measure of how alike two pieces of text are in meaning, regardless of the exact words used, computed by comparing their embedding vectors using metrics such as cosine similarity.
Embedding Model
An embedding model is a machine learning model that converts text (or other data) into dense numerical vectors that capture semantic meaning, enabling similarity search and serving as the foundation of RAG retrieval systems.
Vector Database
A vector database is a purpose-built data store optimized for storing, indexing, and querying high-dimensional numerical vectors (embeddings), enabling fast similarity search across large collections of embedded documents.
Dense Retrieval
Dense retrieval is a retrieval approach that encodes both queries and documents into dense embedding vectors and finds relevant documents by computing vector similarity, enabling semantic matching beyond exact keyword overlap.
Approximate Nearest Neighbor
Approximate Nearest Neighbor (ANN) search finds vectors that are close to a query vector with high probability but without guaranteeing exactness, enabling fast similarity search across millions of vectors at the cost of small accuracy tradeoffs.
Ready to build your AI chatbot?
Put these concepts into practice with 99helpers — no code required.
Start free trial →