Retrieval-Augmented Generation (RAG)

Document Chunking

Definition

Document chunking (also called text splitting) is the pre-processing step that divides source documents into segments of manageable size before embedding and storing in a vector database. Chunking is necessary because: embedding models have maximum input length limits (typically 512 to 8,191 tokens), embedding a full document as a single vector loses granular meaning, and retrieving entire documents is unnecessarily verbose for the LLM context window. The chunking strategy — how large each chunk is, where splits occur, and how much overlap exists between consecutive chunks — significantly impacts both retrieval quality (finding the right chunk) and answer quality (having enough context in the chunk).

Why It Matters

Chunking strategy is one of the most impactful decisions in RAG system design, yet it is often treated as a default parameter. Chunks that are too small lose necessary context — a single sentence may not contain enough information to answer a question without surrounding context. Chunks that are too large include irrelevant content that dilutes the LLM's attention and wastes context window space. Chunks split at arbitrary character positions may break mid-sentence, producing incoherent units. Optimal chunking respects semantic boundaries (paragraphs, sections, sentences) and matches the granularity at which users ask questions.

How It Works

Document chunking is performed by a text splitter in the indexing pipeline. Strategies include: fixed-size splitting (split at every N characters, often with overlap), recursive character splitting (split at natural boundaries in priority order: paragraphs → sentences → words → characters), semantic chunking (use embeddings to detect semantic shifts and split at meaning boundaries), and structure-aware splitting (use document structure like headings and list items as natural split points). The best strategy depends on document type — prose articles benefit from semantic or recursive splitting; structured FAQs with natural Q&A units benefit from structure-aware splitting.

Document Chunking Strategies

Source Document

~2,000 tokens

Split 4 ways

Fixed-size

4 chunks500 tok avg
500
500
500
250

Ignores sentence boundaries — may split mid-sentence

Sentence

12 chunks~180 tok avg
s1
s2
s3
...

Clean sentence boundaries, variable size

Paragraph / Semantic

Popular
6 chunks~340 tok avg
para 1
para 2
para 3
...

Respects meaning units — best for semantic search

Recursive

8 chunks~260 tok avg
para
sent
word
para

Tries paragraph → sentence → word. Falls back if too large

Real-World Example

A 99helpers customer tests three chunking strategies on their 500-article knowledge base: fixed 500-character chunks, recursive splitting (split at paragraph → sentence boundaries), and structure-aware splitting that keeps Q&A pairs together. On 150 test queries, recall@5 is 74% for fixed chunks, 82% for recursive, and 89% for structure-aware. The structure-aware approach works best because their knowledge base articles are naturally organized as Q&A pairs — splitting within a Q&A pair creates incomplete chunks, while keeping them together preserves the complete answer.

Common Mistakes

  • Using the same chunk size for all document types — prose articles, FAQs, code documentation, and structured tables have different natural units and optimal chunk sizes
  • Ignoring the embedding model's maximum token limit — chunks exceeding the limit are silently truncated, producing degraded embeddings
  • Not evaluating chunking strategy impact on end-to-end retrieval — the right chunk size depends on your specific queries and documents; measure with representative test cases

Related Terms

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →
What is Document Chunking? Document Chunking Definition & Guide | 99helpers | 99helpers.com