Retrieval-Augmented Generation (RAG)

Parent-Child Chunking

Definition

Parent-child chunking (also called small-to-big retrieval or parent document retrieval) addresses a fundamental tension in RAG chunking: small chunks produce focused embeddings that retrieve precisely, but they lack context; large chunks provide rich context for generation but produce unfocused embeddings that retrieve poorly. Parent-child chunking resolves this by maintaining two representations for each document section: small child chunks (128-256 tokens) used for embedding and retrieval, and larger parent chunks (512-2048 tokens) returned to the LLM after retrieval. When a child chunk is retrieved, its parent chunk is looked up and provided as context, giving the LLM broader information than the small retrieved segment alone would provide.

Why It Matters

Answer quality often depends on surrounding context that doesn't fit in a small, precise chunk. A child chunk might retrieve the exact sentence answering a question, but the LLM may need the surrounding two paragraphs to understand how to apply the answer in context. Parent-child chunking provides exactly this—precision in retrieval (small chunks match queries tightly) with richness in generation (parent chunks give the LLM full context). For 99helpers customers with knowledge base articles that contain highly specific technical instructions embedded in broader conceptual explanations, parent-child chunking improves both retrieval precision and answer completeness.

How It Works

Implementation with LlamaIndex: use ParentDocumentRetriever which stores small chunks in the vector index but maps each chunk ID to its parent document in a separate document store. During indexing, the large parent document is stored by ID, then split into small children, each indexed with a metadata field referencing the parent ID. At query time: embed query, retrieve top-K small child chunks by similarity, look up each child's parent ID, fetch the full parent chunks from the document store, pass parent chunks (not child chunks) to the LLM. In LangChain, ParentDocumentRetriever implements this pattern with configurable child_splitter and parent_splitter.

Parent-Child Chunking Strategy

Parent Chunk (large)

Subscription Management (500 tokens)

Child A — Cancel plan

~100 tokens

Child B — Pause plan

~100 tokens

Child C — Upgrade plan

~100 tokens

Retrieve
child
Return
parent

User Query

"how do I cancel my plan?"

Matched Child

Child A — Cancel plan (high similarity)

Returned to LLM

Full parent chunk — richer context

Real-World Example

A 99helpers API documentation page covers authentication in 2,000 tokens across three sections: overview, implementation, and common errors. Chunked into 256-token children, the authentication overview becomes 8 small chunks. A user asking 'How does token-based auth work?' retrieves the 2 most relevant child chunks (covering the token generation process), but these chunks lack context about token expiry from adjacent sections. Parent-child chunking returns the full 2,000-token parent document for both child chunks. The LLM now has the complete authentication picture and provides a comprehensive answer covering token generation, usage, and expiry in one response.

Common Mistakes

  • Setting parent chunks too large—if parents exceed the LLM's context window, they need to be truncated, partially defeating the purpose.
  • Storing parents only in the document store without a fast lookup mechanism—parent retrieval must be sub-millisecond to avoid adding significant latency.
  • Using the same splitting parameters for parent and child—parents should use coarser splits (paragraph/section level) while children use fine-grained splits (sentence level).

Related Terms

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →
What is Parent-Child Chunking? Parent-Child Chunking Definition & Guide | 99helpers | 99helpers.com