Retrieval-Augmented Generation (RAG)

Recursive Chunking

Definition

Recursive chunking, popularized by LangChain's RecursiveCharacterTextSplitter, uses a list of separator tokens tried in order of preference: paragraph breaks, newlines, periods, spaces. The algorithm first splits on the highest-priority separator (e.g., double newlines, which separate paragraphs). If any resulting piece is still larger than the target chunk size, it recursively splits that piece using the next separator in the list. This continues until all chunks are within the size limit or the separator list is exhausted. The result is chunks that respect as much structural hierarchy as possible—paragraphs first, then sentences, then words as a last resort.

Why It Matters

Recursive chunking produces more readable, coherent chunks than naive fixed-size splitting without requiring the embedding computation overhead of semantic chunking. It is the practical default for most RAG implementations because it is fast, deterministic, and respects document structure. For 99helpers documentation written with clear paragraph and section structure, recursive chunking reliably keeps related sentences together while still enforcing a maximum chunk size that fits within embedding model token limits. It is particularly effective for Markdown documents where heading levels provide clear hierarchical boundaries.

How It Works

Implementation is straightforward: define a list of separators in priority order, a max_chunk_size in tokens or characters, and an optional overlap. The splitter scans for the first separator in the list that produces chunks below max_chunk_size. If none of the primary separators produce small enough chunks, it falls back to character-level splitting. LangChain's implementation supports language-aware separators for Markdown (splitting on ## headers before paragraphs), Python code (splitting on class and function definitions), and HTML (splitting on block elements). Choosing max_chunk_size requires balancing embedding model limits, LLM context windows, and retrieval quality.

Recursive Chunking — Split Until Chunks Fit

Try: Paragraph separator (\n\n)

First attempt — paragraph boundaries

Too large — recurse

Chunk still 1,200 tokens

chunk exceeds max_size — try next separator

Try: Sentence separator (.)

Second attempt — sentence boundaries

Too large — recurse

Chunk still 600 tokens

chunk exceeds max_size — try next separator

Try: Word boundary ( )

Third attempt — word boundaries

Fits

Chunk = 380 tokens

max_size

512 tokens

Level 1 chunk

1,200 tok

Level 2 chunk

600 tok

Level 3 chunk

380 tok

Separators tried in order: [\n\n, ., , ] — recurse until chunk size fits within max_size

Real-World Example

A 99helpers integration guide is a Markdown document with H2 sections for Setup, Configuration, and Troubleshooting. Recursive chunking with Markdown-aware separators splits first on H2 headers, producing three high-level chunks. Each H2 section is under 800 tokens so no further splitting is needed. A user querying 'configure Slack integration' retrieves only the Configuration chunk, not the entire document. When a section later grows to 1,200 tokens, the recursive splitter automatically splits it on paragraph breaks, maintaining coherence without manual tuning.

Common Mistakes

✕Using the same separator list for all document types—code files need code-aware separators, not prose-optimized ones.
✕Setting max_chunk_size in characters when the embedding model limit is in tokens—a 1,000-character limit may produce chunks that exceed the 512-token embedding limit.
✕Ignoring chunk overlap when using recursive chunking, reintroducing boundary information loss that overlap would prevent.

Related Terms

Document Chunking

Document chunking is the process of splitting large documents into smaller text segments before embedding and indexing for RAG, balancing chunk size to preserve context while staying within embedding model limits and enabling precise retrieval.

Sliding Window Chunking

Sliding window chunking splits documents into overlapping segments by advancing a fixed-size window across the text. Overlap between consecutive chunks ensures that information near chunk boundaries is captured in multiple chunks, reducing information loss.

Semantic Chunking

Semantic chunking splits documents into segments based on meaning boundaries—grouping sentences that discuss the same topic together—rather than fixed character counts. This produces more coherent, self-contained chunks that improve retrieval quality.

Chunk Size

Chunk size is the maximum number of tokens or characters in each document segment created during the chunking phase of RAG indexing, controlling the granularity of retrieval and the amount of context available per retrieved chunk.

Chunk Overlap

Chunk overlap is a chunking strategy where consecutive document chunks share a portion of overlapping text, ensuring that information spanning chunk boundaries is captured in at least one complete chunk.

← Retrieval-Augmented Generation (RAG)← Glossary Hub

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →