Retrieval-Augmented Generation (RAG)

Chunk Size

Definition

Chunk size is a fundamental hyperparameter in RAG system design that determines how large each indexed document segment is. Chunk size is typically measured in tokens (the unit for LLMs and embedding models) or characters. The choice of chunk size involves a fundamental tradeoff: small chunks (100-200 tokens) produce highly focused, semantically pure embeddings that enable precise retrieval — the embedding vector for a short passage closely represents the specific information in that passage. Large chunks (800-1,500 tokens) contain more context around each piece of information, providing the LLM with more surrounding context for answer generation but producing embeddings that represent a broader, less focused semantic space.

Why It Matters

Chunk size is one of the most impactful configuration choices in a RAG system, yet it is often set to a default value without domain-specific tuning. Optimal chunk size depends on the nature of the source documents and the types of questions users ask. For dense technical documentation where each paragraph contains distinct information, small chunks (200-400 tokens) enable precise retrieval. For conversational knowledge base articles where context flows across paragraphs, larger chunks (500-800 tokens) preserve the narrative context needed for accurate answers. Many production systems use different chunk sizes for different document types.

How It Works

Chunk size is configured in the text splitter used in the indexing pipeline. Common frameworks (LangChain, LlamaIndex) accept chunk_size as a parameter to text splitters. The optimal chunk size is determined empirically: create evaluation datasets for your specific documents and queries, then test different chunk sizes (e.g., 100, 200, 400, 800, 1,200 tokens) and measure retrieval recall and answer quality metrics for each. The optimal size typically emerges from this evaluation. Advanced approaches use adaptive chunk sizes — different sizes for different document types within the same knowledge base — rather than a single global chunk size.

Chunk Size Tradeoffs

Small

128

tokens

Strengths

  • +Precise retrieval
  • +Low noise per chunk
  • +Exact matches

Weaknesses

  • Misses broader context
  • Many chunks to index
  • May fragment ideas

Medium

512

tokens

Recommended

Strengths

  • +Balanced context
  • +Good for most RAG
  • +Recommended default

Weaknesses

  • Minor noise occasionally

Large

2048

tokens

Strengths

  • +Broad context
  • +Fewer chunks total

Weaknesses

  • High noise in retrieval
  • May exceed context window
  • Less precise ranking

Tradeoff spectrum

Precise
Broad
128512 (recommended)2048

Real-World Example

A 99helpers customer runs a chunk size evaluation on their knowledge base, testing 200, 400, 800, and 1,200-token chunks. For product feature queries, 400-token chunks perform best (focused enough to match specific feature descriptions precisely). For troubleshooting queries that require multi-step context, 800-token chunks perform best (enough context to understand the full diagnostic sequence). They implement a hybrid approach: FAQ articles use 400-token chunks, while troubleshooting guides use 800-token chunks. Overall evaluation accuracy improves 15% over using a uniform 500-token chunk size.

Common Mistakes

  • Using default chunk size without domain-specific evaluation — the right chunk size depends on your specific documents and queries; always test multiple sizes
  • Applying the same chunk size to all document types — Q&A pairs, prose articles, technical specifications, and code documentation have different optimal chunk sizes
  • Setting chunk size to the embedding model's maximum input length — the maximum input length is not the optimal chunk size; very long chunks produce poor-quality embeddings

Related Terms

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →
What is Chunk Size? Chunk Size Definition & Guide | 99helpers | 99helpers.com