Semantic Chunking
Definition
Semantic chunking is a document segmentation strategy that uses embedding similarity to detect topical boundaries in text. Instead of splitting every 512 tokens regardless of content, semantic chunking compares the embedding of each sentence with its neighbors and inserts a split when the similarity drops below a threshold—indicating a topic shift. The resulting chunks contain semantically cohesive content, making each chunk more likely to match queries about that specific topic. This contrasts with fixed-size chunking, which can sever mid-sentence or split a single concept across two chunks, and with recursive chunking, which splits on structural markers like paragraph breaks.
Why It Matters
The quality of chunks directly determines the quality of retrieval and generation. Fixed-size chunks are fast to implement but frequently produce incomplete thoughts or merge unrelated content, causing the retriever to surface partially relevant passages. Semantic chunking ensures that when a chunk is retrieved, it contains a complete, coherent discussion of one topic—giving the LLM a better foundation for generating accurate answers. For 99helpers knowledge bases containing long, multi-topic help articles, semantic chunking can significantly improve answer quality by preventing topic contamination between chunks.
How It Works
To implement semantic chunking, first split the document into individual sentences. Compute an embedding for each sentence, then compute the cosine similarity between consecutive sentences or a sliding window of sentences. When similarity drops below a configurable threshold (e.g., 0.7), mark a chunk boundary. Collect sentences within each boundary into a single chunk. Libraries like LangChain and LlamaIndex provide semantic chunking implementations. The threshold is a hyperparameter—lower values create fewer, larger chunks; higher values create more, smaller chunks—and should be tuned against retrieval quality metrics on your specific corpus.
Semantic Chunking — Similarity-Based Boundary Detection
Document sentences
Cosine similarity between consecutive sentences
Chunk 1
Setup
3 sentences
Chunk 2
Rate limiting
2 sentences
Chunk 3
Error handling
2 sentences
Real-World Example
A 99helpers help article covers three topics: initial setup, advanced configuration, and troubleshooting. With fixed-size chunking at 400 tokens, the article splits mid-paragraph, producing one chunk that mixes setup and configuration content and another that spans configuration and troubleshooting. With semantic chunking, the embedding similarity drops at the natural topic transitions, producing three coherent chunks. Retrieval tests show that queries about 'troubleshooting' now reliably surface only the troubleshooting chunk rather than mixed-content chunks, improving answer precision by 28%.
Common Mistakes
- ✕Choosing a similarity threshold without testing it—default values may over-split or under-split your specific document style.
- ✕Applying semantic chunking to short documents where it adds overhead without meaningfully improving over paragraph-based chunking.
- ✕Ignoring chunk size distribution—semantic chunks can vary widely in length, and very long chunks may overflow the LLM context window.
Related Terms
Document Chunking
Document chunking is the process of splitting large documents into smaller text segments before embedding and indexing for RAG, balancing chunk size to preserve context while staying within embedding model limits and enabling precise retrieval.
Chunk Size
Chunk size is the maximum number of tokens or characters in each document segment created during the chunking phase of RAG indexing, controlling the granularity of retrieval and the amount of context available per retrieved chunk.
Chunk Overlap
Chunk overlap is a chunking strategy where consecutive document chunks share a portion of overlapping text, ensuring that information spanning chunk boundaries is captured in at least one complete chunk.
Sliding Window Chunking
Sliding window chunking splits documents into overlapping segments by advancing a fixed-size window across the text. Overlap between consecutive chunks ensures that information near chunk boundaries is captured in multiple chunks, reducing information loss.
Recursive Chunking
Recursive chunking splits documents hierarchically using a priority list of separators—first by double newlines, then single newlines, then sentences, then words—ensuring chunks respect natural structural boundaries before falling back to finer splits.
Ready to build your AI chatbot?
Put these concepts into practice with 99helpers — no code required.
Start free trial →