Parent-Child Chunking
Definition
Parent-child chunking (also called small-to-big retrieval or parent document retrieval) addresses a fundamental tension in RAG chunking: small chunks produce focused embeddings that retrieve precisely, but they lack context; large chunks provide rich context for generation but produce unfocused embeddings that retrieve poorly. Parent-child chunking resolves this by maintaining two representations for each document section: small child chunks (128-256 tokens) used for embedding and retrieval, and larger parent chunks (512-2048 tokens) returned to the LLM after retrieval. When a child chunk is retrieved, its parent chunk is looked up and provided as context, giving the LLM broader information than the small retrieved segment alone would provide.
Why It Matters
Answer quality often depends on surrounding context that doesn't fit in a small, precise chunk. A child chunk might retrieve the exact sentence answering a question, but the LLM may need the surrounding two paragraphs to understand how to apply the answer in context. Parent-child chunking provides exactly this—precision in retrieval (small chunks match queries tightly) with richness in generation (parent chunks give the LLM full context). For 99helpers customers with knowledge base articles that contain highly specific technical instructions embedded in broader conceptual explanations, parent-child chunking improves both retrieval precision and answer completeness.
How It Works
Implementation with LlamaIndex: use ParentDocumentRetriever which stores small chunks in the vector index but maps each chunk ID to its parent document in a separate document store. During indexing, the large parent document is stored by ID, then split into small children, each indexed with a metadata field referencing the parent ID. At query time: embed query, retrieve top-K small child chunks by similarity, look up each child's parent ID, fetch the full parent chunks from the document store, pass parent chunks (not child chunks) to the LLM. In LangChain, ParentDocumentRetriever implements this pattern with configurable child_splitter and parent_splitter.
Parent-Child Chunking Strategy
Parent Chunk (large)
Subscription Management (500 tokens)
Child A — Cancel plan
~100 tokens
Child B — Pause plan
~100 tokens
Child C — Upgrade plan
~100 tokens
child
parent
User Query
"how do I cancel my plan?"
Matched Child
Child A — Cancel plan (high similarity)
Returned to LLM
Full parent chunk — richer context
Real-World Example
A 99helpers API documentation page covers authentication in 2,000 tokens across three sections: overview, implementation, and common errors. Chunked into 256-token children, the authentication overview becomes 8 small chunks. A user asking 'How does token-based auth work?' retrieves the 2 most relevant child chunks (covering the token generation process), but these chunks lack context about token expiry from adjacent sections. Parent-child chunking returns the full 2,000-token parent document for both child chunks. The LLM now has the complete authentication picture and provides a comprehensive answer covering token generation, usage, and expiry in one response.
Common Mistakes
- ✕Setting parent chunks too large—if parents exceed the LLM's context window, they need to be truncated, partially defeating the purpose.
- ✕Storing parents only in the document store without a fast lookup mechanism—parent retrieval must be sub-millisecond to avoid adding significant latency.
- ✕Using the same splitting parameters for parent and child—parents should use coarser splits (paragraph/section level) while children use fine-grained splits (sentence level).
Related Terms
Document Chunking
Document chunking is the process of splitting large documents into smaller text segments before embedding and indexing for RAG, balancing chunk size to preserve context while staying within embedding model limits and enabling precise retrieval.
Semantic Chunking
Semantic chunking splits documents into segments based on meaning boundaries—grouping sentences that discuss the same topic together—rather than fixed character counts. This produces more coherent, self-contained chunks that improve retrieval quality.
Chunk Size
Chunk size is the maximum number of tokens or characters in each document segment created during the chunking phase of RAG indexing, controlling the granularity of retrieval and the amount of context available per retrieved chunk.
Retrieval Pipeline
A retrieval pipeline is the online query-time workflow that transforms a user question into a ranked set of relevant document chunks, serving as the information retrieval stage of a RAG system.
Context Window
A context window is the maximum amount of text (measured in tokens) that a language model can process in a single inference call, determining how much retrieved content, conversation history, and instructions can be included in a RAG prompt.
Ready to build your AI chatbot?
Put these concepts into practice with 99helpers — no code required.
Start free trial →