Sliding Window Chunking
Definition
Sliding window chunking is a document segmentation technique where a window of fixed length (e.g., 512 tokens) moves through a document with a specified stride smaller than the window size (e.g., 256 tokens), producing overlapping chunks. The overlap—equal to window size minus stride—ensures that sentences or paragraphs spanning a chunk boundary appear fully in at least one chunk. This addresses a key failure mode of non-overlapping fixed chunking, where critical information split across a boundary may be partially missing in both adjacent chunks. Sliding window is one of the simplest chunking strategies and is widely used as a baseline in RAG systems.
Why It Matters
Information at chunk boundaries is often the most vulnerable to being lost or distorted during retrieval. A support answer that requires two sentences—one at the end of chunk N and one at the start of chunk N+1—may be incompletely captured in either chunk alone. Sliding window chunking mitigates this by including boundary content in both surrounding chunks. For 99helpers chatbots processing long documentation pages, a modest overlap (e.g., 15-20% of chunk size) significantly reduces boundary-split errors with only a moderate increase in storage and embedding costs.
How It Works
Configure two parameters: window_size (number of tokens per chunk) and stride (number of tokens to advance between windows). Window size determines how much context each chunk contains; stride controls how much chunks overlap. Stride = window size produces non-overlapping chunking; stride = window size / 2 produces 50% overlap. Smaller strides mean more chunks and more overlap—which improves boundary coverage but increases storage, embedding cost, and retrieval latency. In practice, 10-20% overlap (stride = 80-90% of window size) provides a good cost-quality tradeoff. Libraries like LangChain's RecursiveCharacterTextSplitter support overlap natively.
Sliding Window Chunking — Overlapping Chunks Across Document
500 tokens
Window size
300 tokens
Step size
200 tokens
Overlap
Document tokens (1,400 total)
Overlap formula
overlap = window_size − step_size = 500 − 300 = 200 tokens per chunk pair
Why overlap matters
Without overlap
Concepts at chunk boundaries get split — key sentences truncated mid-thought
With overlap
Each boundary concept appears in at least two chunks — retrieval never misses it
Real-World Example
A 99helpers API reference page has a code example that spans two natural 512-token boundaries. With non-overlapping chunking, the example is split: the function signature is in chunk 3, and the parameters are in chunk 4. Neither chunk is self-contained. With sliding window chunking using 512-token windows and 100-token overlap, the code example appears complete in either chunk 3 or chunk 4. A user asking 'how do I call the messages API?' retrieves a chunk containing the full example, enabling a complete answer.
Common Mistakes
- ✕Setting overlap so high (e.g., 50%) that the index size doubles without proportional quality improvement.
- ✕Using sliding window on structured documents (tables, lists) where semantic boundaries don't align with character counts.
- ✕Forgetting that overlapping chunks will surface the same content multiple times—use deduplication in the retriever or reranker.
Related Terms
Document Chunking
Document chunking is the process of splitting large documents into smaller text segments before embedding and indexing for RAG, balancing chunk size to preserve context while staying within embedding model limits and enabling precise retrieval.
Chunk Overlap
Chunk overlap is a chunking strategy where consecutive document chunks share a portion of overlapping text, ensuring that information spanning chunk boundaries is captured in at least one complete chunk.
Chunk Size
Chunk size is the maximum number of tokens or characters in each document segment created during the chunking phase of RAG indexing, controlling the granularity of retrieval and the amount of context available per retrieved chunk.
Semantic Chunking
Semantic chunking splits documents into segments based on meaning boundaries—grouping sentences that discuss the same topic together—rather than fixed character counts. This produces more coherent, self-contained chunks that improve retrieval quality.
Recursive Chunking
Recursive chunking splits documents hierarchically using a priority list of separators—first by double newlines, then single newlines, then sentences, then words—ensuring chunks respect natural structural boundaries before falling back to finer splits.
Ready to build your AI chatbot?
Put these concepts into practice with 99helpers — no code required.
Start free trial →