Retrieval-Augmented Generation (RAG)

Document Chunking

Definition

Document chunking (also called text splitting) is the pre-processing step that divides source documents into segments of manageable size before embedding and storing in a vector database. Chunking is necessary because: embedding models have maximum input length limits (typically 512 to 8,191 tokens), embedding a full document as a single vector loses granular meaning, and retrieving entire documents is unnecessarily verbose for the LLM context window. The chunking strategy — how large each chunk is, where splits occur, and how much overlap exists between consecutive chunks — significantly impacts both retrieval quality (finding the right chunk) and answer quality (having enough context in the chunk).

Why It Matters

Chunking strategy is one of the most impactful decisions in RAG system design, yet it is often treated as a default parameter. Chunks that are too small lose necessary context — a single sentence may not contain enough information to answer a question without surrounding context. Chunks that are too large include irrelevant content that dilutes the LLM's attention and wastes context window space. Chunks split at arbitrary character positions may break mid-sentence, producing incoherent units. Optimal chunking respects semantic boundaries (paragraphs, sections, sentences) and matches the granularity at which users ask questions.

How It Works

Document chunking is performed by a text splitter in the indexing pipeline. Strategies include: fixed-size splitting (split at every N characters, often with overlap), recursive character splitting (split at natural boundaries in priority order: paragraphs → sentences → words → characters), semantic chunking (use embeddings to detect semantic shifts and split at meaning boundaries), and structure-aware splitting (use document structure like headings and list items as natural split points). The best strategy depends on document type — prose articles benefit from semantic or recursive splitting; structured FAQs with natural Q&A units benefit from structure-aware splitting.

Document Chunking Strategies

Source Document

~2,000 tokens

Split 4 ways

Fixed-size

4 chunks500 tok avg

500

250

Ignores sentence boundaries — may split mid-sentence

Sentence

12 chunks~180 tok avg

...

Clean sentence boundaries, variable size

Paragraph / Semantic

Popular

6 chunks~340 tok avg

para 1

para 2

para 3

...

Respects meaning units — best for semantic search

Recursive

8 chunks~260 tok avg

para

sent

word

para

Tries paragraph → sentence → word. Falls back if too large

Real-World Example

A 99helpers customer tests three chunking strategies on their 500-article knowledge base: fixed 500-character chunks, recursive splitting (split at paragraph → sentence boundaries), and structure-aware splitting that keeps Q&A pairs together. On 150 test queries, recall@5 is 74% for fixed chunks, 82% for recursive, and 89% for structure-aware. The structure-aware approach works best because their knowledge base articles are naturally organized as Q&A pairs — splitting within a Q&A pair creates incomplete chunks, while keeping them together preserves the complete answer.

Common Mistakes

✕Using the same chunk size for all document types — prose articles, FAQs, code documentation, and structured tables have different natural units and optimal chunk sizes
✕Ignoring the embedding model's maximum token limit — chunks exceeding the limit are silently truncated, producing degraded embeddings
✕Not evaluating chunking strategy impact on end-to-end retrieval — the right chunk size depends on your specific queries and documents; measure with representative test cases

Related Terms

Chunk Overlap

Chunk overlap is a chunking strategy where consecutive document chunks share a portion of overlapping text, ensuring that information spanning chunk boundaries is captured in at least one complete chunk.

Chunk Size

Chunk size is the maximum number of tokens or characters in each document segment created during the chunking phase of RAG indexing, controlling the granularity of retrieval and the amount of context available per retrieved chunk.

Sliding Window Chunking

Sliding window chunking splits documents into overlapping segments by advancing a fixed-size window across the text. Overlap between consecutive chunks ensures that information near chunk boundaries is captured in multiple chunks, reducing information loss.

Semantic Chunking

Semantic chunking splits documents into segments based on meaning boundaries—grouping sentences that discuss the same topic together—rather than fixed character counts. This produces more coherent, self-contained chunks that improve retrieval quality.

Indexing Pipeline

An indexing pipeline is the offline data processing workflow that transforms raw documents into searchable vector embeddings, running during knowledge base setup and when content is updated.

← Retrieval-Augmented Generation (RAG)← Glossary Hub

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →