Chunk Size
Definition
Chunk size is a fundamental hyperparameter in RAG system design that determines how large each indexed document segment is. Chunk size is typically measured in tokens (the unit for LLMs and embedding models) or characters. The choice of chunk size involves a fundamental tradeoff: small chunks (100-200 tokens) produce highly focused, semantically pure embeddings that enable precise retrieval — the embedding vector for a short passage closely represents the specific information in that passage. Large chunks (800-1,500 tokens) contain more context around each piece of information, providing the LLM with more surrounding context for answer generation but producing embeddings that represent a broader, less focused semantic space.
Why It Matters
Chunk size is one of the most impactful configuration choices in a RAG system, yet it is often set to a default value without domain-specific tuning. Optimal chunk size depends on the nature of the source documents and the types of questions users ask. For dense technical documentation where each paragraph contains distinct information, small chunks (200-400 tokens) enable precise retrieval. For conversational knowledge base articles where context flows across paragraphs, larger chunks (500-800 tokens) preserve the narrative context needed for accurate answers. Many production systems use different chunk sizes for different document types.
How It Works
Chunk size is configured in the text splitter used in the indexing pipeline. Common frameworks (LangChain, LlamaIndex) accept chunk_size as a parameter to text splitters. The optimal chunk size is determined empirically: create evaluation datasets for your specific documents and queries, then test different chunk sizes (e.g., 100, 200, 400, 800, 1,200 tokens) and measure retrieval recall and answer quality metrics for each. The optimal size typically emerges from this evaluation. Advanced approaches use adaptive chunk sizes — different sizes for different document types within the same knowledge base — rather than a single global chunk size.
Chunk Size Tradeoffs
Small
128
tokens
Strengths
- +Precise retrieval
- +Low noise per chunk
- +Exact matches
Weaknesses
- –Misses broader context
- –Many chunks to index
- –May fragment ideas
Medium
512
tokens
RecommendedStrengths
- +Balanced context
- +Good for most RAG
- +Recommended default
Weaknesses
- –Minor noise occasionally
Large
2048
tokens
Strengths
- +Broad context
- +Fewer chunks total
Weaknesses
- –High noise in retrieval
- –May exceed context window
- –Less precise ranking
Tradeoff spectrum
Real-World Example
A 99helpers customer runs a chunk size evaluation on their knowledge base, testing 200, 400, 800, and 1,200-token chunks. For product feature queries, 400-token chunks perform best (focused enough to match specific feature descriptions precisely). For troubleshooting queries that require multi-step context, 800-token chunks perform best (enough context to understand the full diagnostic sequence). They implement a hybrid approach: FAQ articles use 400-token chunks, while troubleshooting guides use 800-token chunks. Overall evaluation accuracy improves 15% over using a uniform 500-token chunk size.
Common Mistakes
- ✕Using default chunk size without domain-specific evaluation — the right chunk size depends on your specific documents and queries; always test multiple sizes
- ✕Applying the same chunk size to all document types — Q&A pairs, prose articles, technical specifications, and code documentation have different optimal chunk sizes
- ✕Setting chunk size to the embedding model's maximum input length — the maximum input length is not the optimal chunk size; very long chunks produce poor-quality embeddings
Related Terms
Document Chunking
Document chunking is the process of splitting large documents into smaller text segments before embedding and indexing for RAG, balancing chunk size to preserve context while staying within embedding model limits and enabling precise retrieval.
Chunk Overlap
Chunk overlap is a chunking strategy where consecutive document chunks share a portion of overlapping text, ensuring that information spanning chunk boundaries is captured in at least one complete chunk.
Embedding Model
An embedding model is a machine learning model that converts text (or other data) into dense numerical vectors that capture semantic meaning, enabling similarity search and serving as the foundation of RAG retrieval systems.
Indexing Pipeline
An indexing pipeline is the offline data processing workflow that transforms raw documents into searchable vector embeddings, running during knowledge base setup and when content is updated.
Parent Document Retrieval
Parent document retrieval is a RAG strategy that indexes small chunks for precise retrieval but returns the larger parent document (or section) to the LLM as context, balancing retrieval precision with sufficient context for answer generation.
Ready to build your AI chatbot?
Put these concepts into practice with 99helpers — no code required.
Start free trial →