Retrieval-Augmented Generation (RAG)

Chunk Overlap

Definition

When documents are split into chunks for RAG indexing, information that spans a natural chunk boundary can be split across two chunks — with the first half in one chunk and the second half in the next. This split can cause retrieval to miss the complete context needed to answer a question, even if the relevant information exists in the knowledge base. Chunk overlap addresses this by having consecutive chunks share some text: if chunk 1 covers characters 0-500 and chunk 2 covers characters 400-900 (with 100-character overlap), then any key passage spanning the boundary is fully represented in at least one of the two chunks. Typical overlap is 10-20% of chunk size.

Why It Matters

Chunk overlap is a practical quality improvement for RAG systems that costs some additional storage and compute in exchange for more complete context in retrieved chunks. Without overlap, a question whose answer spans two sentences that happen to fall on either side of a chunk boundary cannot be answered from either chunk alone. With overlap, the boundary region is duplicated, ensuring complete coverage. The tradeoff is that overlap increases the total number of stored chunks (and therefore storage and embedding costs) and may introduce slight redundancy in retrieved context.

How It Works

Chunk overlap is configured during the document splitting stage of the RAG indexing pipeline. A RecursiveCharacterTextSplitter (common in LangChain and LlamaIndex) takes two parameters: chunk_size (maximum characters per chunk) and chunk_overlap (characters of overlap between consecutive chunks). For example, chunk_size=500, chunk_overlap=50 produces chunks where each chunk shares 50 characters with the previous chunk. This means a 1,000-character document produces chunks at positions 0-500, 450-950 rather than 0-500, 500-1000. The overlap region is stored in both chunks, ensuring boundary-spanning passages are complete.

Chunk Overlap — Preserving Context at Boundaries

04005009001300

Full document (1300 tokens)

Chunk 1

0 – 500 tokens

Chunk 2

400 – 900 tokens

Chunk 3

800 – 1300 tokens

Overlap 1–2

Positions 400 – 500

100 token overlap

Overlap 2–3

Positions 800 – 900

100 token overlap

Why overlap matters

Without overlap

Sentence split mid-idea at chunk boundary — retrieval misses the complete thought.

With overlap (100 tokens)

Boundary sentences appear in both chunks — context preserved, no information loss.

Real-World Example

A 99helpers customer tests different overlap settings on their knowledge base. With zero overlap, evaluation shows that 8% of test queries fail because the answer spans a chunk boundary. With 50-character overlap (10% of 500-character chunks), boundary-related failures drop to 2%. With 100-character overlap (20%), failures drop to under 1%. They choose 10% overlap as the standard configuration, accepting the 11% storage increase (from duplicated overlap regions) in exchange for the significant quality improvement.

Common Mistakes

✕Setting overlap too high — very large overlaps create nearly duplicate chunks that waste storage, inflate the chunk count, and dilute retrieval by increasing noise
✕Using fixed-size overlap without considering semantic boundaries — overlap based on character count may split mid-sentence; use recursive or semantic chunking to align splits with natural boundaries
✕Assuming overlap eliminates all boundary problems — very large knowledge items that exceed chunk size will still be split; use parent-document retrieval for these cases

Related Terms

Document Chunking

Document chunking is the process of splitting large documents into smaller text segments before embedding and indexing for RAG, balancing chunk size to preserve context while staying within embedding model limits and enabling precise retrieval.

Chunk Size

Chunk size is the maximum number of tokens or characters in each document segment created during the chunking phase of RAG indexing, controlling the granularity of retrieval and the amount of context available per retrieved chunk.

Sliding Window Chunking

Sliding window chunking splits documents into overlapping segments by advancing a fixed-size window across the text. Overlap between consecutive chunks ensures that information near chunk boundaries is captured in multiple chunks, reducing information loss.

Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) is an AI architecture that enhances large language model responses by first retrieving relevant documents from an external knowledge base and then using that retrieved content as context when generating an answer.

Indexing Pipeline

An indexing pipeline is the offline data processing workflow that transforms raw documents into searchable vector embeddings, running during knowledge base setup and when content is updated.

← Retrieval-Augmented Generation (RAG)← Glossary Hub

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →