BM25
Definition
BM25 (Best Match 25) is a probabilistic ranking function for information retrieval that scores how relevant a document is to a query by computing a weighted sum of scores for each query term found in the document. The scoring formula rewards: term frequency saturation (the first occurrence of a query term contributes more than subsequent ones, preventing documents from gaming the score by repeating terms), inverse document frequency (rare terms across the corpus contribute more to relevance than common terms), and length normalization (a query term appearing in a short document signals more relevance than the same term in a long document). BM25 is the standard baseline for keyword search in Elasticsearch, Solr, Lucene, and hybrid vector search systems.
Why It Matters
BM25 remains one of the most effective retrieval algorithms despite being over 30 years old. Its continued relevance stems from being excellently suited to a common real-world pattern: users querying with specific technical terms, product names, version numbers, and error codes that should be matched exactly. Dense semantic search cannot reliably match these because it treats all tokens as semantic units — the embedding for 'NullPointerException' may be less distinctive than a keyword match. Modern RAG best practices treat BM25 as the indispensable sparse component in hybrid retrieval systems.
How It Works
BM25 scoring for a query with terms q1, q2, ..., qn against a document d is: score(d,Q) = sum over query terms of [IDF(qi) × (tf(qi,d) × (k1+1)) / (tf(qi,d) + k1 × (1-b+b×|d|/avgdl))]. Parameters: k1 controls term frequency saturation (typically 1.2-2.0), b controls length normalization (typically 0.75), |d| is document length, avgdl is average document length. In practice, BM25 is implemented in Elasticsearch (or OpenSearch) for the sparse retrieval component of hybrid RAG systems, or in lightweight Python libraries like rank_bm25 for smaller collections.
BM25 Scoring — Query: "reset password"
Score components
TF
Term frequency in doc
IDF
Rarity across corpus
Length norm
Penalizes long docs
Reset password by clicking the reset link.
Our platform provides account management including password changes, profile updates, billing, notifications, and security settings.
To reset your password, visit account settings and confirm your email.
Ranked Results
Real-World Example
A 99helpers customer building their RAG evaluation discovers that 23% of their most common customer queries include specific product identifiers, version numbers, or error codes. For these queries, dense retrieval recall@5 is only 61% because the embedding model does not strongly differentiate specific identifiers. After adding BM25 as the sparse component in a hybrid system, recall@5 for identifier queries improves to 94%. Overall hybrid system recall@5 across all query types improves from 82% to 91%.
Common Mistakes
- ✕Treating BM25 as inferior to dense retrieval — BM25 is state-of-the-art for many query types and an essential component of production RAG systems
- ✕Not tuning BM25 parameters (k1, b) for your document collection — default parameters may not be optimal for your specific document length distribution and query style
- ✕Applying BM25 to raw text without preprocessing — apply tokenization, lowercasing, stop word removal, and stemming to improve BM25 retrieval quality
Related Terms
Sparse Retrieval
Sparse retrieval is a search approach based on exact or weighted keyword matching, where documents and queries are represented as high-dimensional sparse vectors with most values being zero, and similarity is measured by term overlap.
Hybrid Retrieval
Hybrid retrieval combines dense (semantic) and sparse (keyword) search methods to leverage the strengths of both, using a fusion step to merge their results into a single ranked list for better overall retrieval quality.
TF-IDF
TF-IDF (Term Frequency-Inverse Document Frequency) is a statistical weighting scheme that scores how important a term is to a specific document relative to a collection, used in keyword search and as the conceptual foundation for BM25.
Inverted Index
An inverted index is a data structure that maps each unique term in a document collection to the list of documents containing that term, enabling fast full-text keyword search and powering BM25 and other sparse retrieval algorithms.
Dense Retrieval
Dense retrieval is a retrieval approach that encodes both queries and documents into dense embedding vectors and finds relevant documents by computing vector similarity, enabling semantic matching beyond exact keyword overlap.
Ready to build your AI chatbot?
Put these concepts into practice with 99helpers — no code required.
Start free trial →