Recursive Chunking
Definition
Recursive chunking, popularized by LangChain's RecursiveCharacterTextSplitter, uses a list of separator tokens tried in order of preference: paragraph breaks, newlines, periods, spaces. The algorithm first splits on the highest-priority separator (e.g., double newlines, which separate paragraphs). If any resulting piece is still larger than the target chunk size, it recursively splits that piece using the next separator in the list. This continues until all chunks are within the size limit or the separator list is exhausted. The result is chunks that respect as much structural hierarchy as possible—paragraphs first, then sentences, then words as a last resort.
Why It Matters
Recursive chunking produces more readable, coherent chunks than naive fixed-size splitting without requiring the embedding computation overhead of semantic chunking. It is the practical default for most RAG implementations because it is fast, deterministic, and respects document structure. For 99helpers documentation written with clear paragraph and section structure, recursive chunking reliably keeps related sentences together while still enforcing a maximum chunk size that fits within embedding model token limits. It is particularly effective for Markdown documents where heading levels provide clear hierarchical boundaries.
How It Works
Implementation is straightforward: define a list of separators in priority order, a max_chunk_size in tokens or characters, and an optional overlap. The splitter scans for the first separator in the list that produces chunks below max_chunk_size. If none of the primary separators produce small enough chunks, it falls back to character-level splitting. LangChain's implementation supports language-aware separators for Markdown (splitting on ## headers before paragraphs), Python code (splitting on class and function definitions), and HTML (splitting on block elements). Choosing max_chunk_size requires balancing embedding model limits, LLM context windows, and retrieval quality.
Recursive Chunking — Split Until Chunks Fit
Try: Paragraph separator (\n\n)
First attempt — paragraph boundaries
Too large — recurse
Chunk still 1,200 tokens
Try: Sentence separator (.)
Second attempt — sentence boundaries
Too large — recurse
Chunk still 600 tokens
Try: Word boundary ( )
Third attempt — word boundaries
Fits
Chunk = 380 tokens
max_size
512 tokens
Level 1 chunk
1,200 tok
Level 2 chunk
600 tok
Level 3 chunk
380 tok
Separators tried in order: [\n\n, ., , ] — recurse until chunk size fits within max_size
Real-World Example
A 99helpers integration guide is a Markdown document with H2 sections for Setup, Configuration, and Troubleshooting. Recursive chunking with Markdown-aware separators splits first on H2 headers, producing three high-level chunks. Each H2 section is under 800 tokens so no further splitting is needed. A user querying 'configure Slack integration' retrieves only the Configuration chunk, not the entire document. When a section later grows to 1,200 tokens, the recursive splitter automatically splits it on paragraph breaks, maintaining coherence without manual tuning.
Common Mistakes
- ✕Using the same separator list for all document types—code files need code-aware separators, not prose-optimized ones.
- ✕Setting max_chunk_size in characters when the embedding model limit is in tokens—a 1,000-character limit may produce chunks that exceed the 512-token embedding limit.
- ✕Ignoring chunk overlap when using recursive chunking, reintroducing boundary information loss that overlap would prevent.
Related Terms
Document Chunking
Document chunking is the process of splitting large documents into smaller text segments before embedding and indexing for RAG, balancing chunk size to preserve context while staying within embedding model limits and enabling precise retrieval.
Sliding Window Chunking
Sliding window chunking splits documents into overlapping segments by advancing a fixed-size window across the text. Overlap between consecutive chunks ensures that information near chunk boundaries is captured in multiple chunks, reducing information loss.
Semantic Chunking
Semantic chunking splits documents into segments based on meaning boundaries—grouping sentences that discuss the same topic together—rather than fixed character counts. This produces more coherent, self-contained chunks that improve retrieval quality.
Chunk Size
Chunk size is the maximum number of tokens or characters in each document segment created during the chunking phase of RAG indexing, controlling the granularity of retrieval and the amount of context available per retrieved chunk.
Chunk Overlap
Chunk overlap is a chunking strategy where consecutive document chunks share a portion of overlapping text, ensuring that information spanning chunk boundaries is captured in at least one complete chunk.
Ready to build your AI chatbot?
Put these concepts into practice with 99helpers — no code required.
Start free trial →