Retrieval-Augmented Generation (RAG)

Semantic Chunking

Definition

Semantic chunking is a document segmentation strategy that uses embedding similarity to detect topical boundaries in text. Instead of splitting every 512 tokens regardless of content, semantic chunking compares the embedding of each sentence with its neighbors and inserts a split when the similarity drops below a threshold—indicating a topic shift. The resulting chunks contain semantically cohesive content, making each chunk more likely to match queries about that specific topic. This contrasts with fixed-size chunking, which can sever mid-sentence or split a single concept across two chunks, and with recursive chunking, which splits on structural markers like paragraph breaks.

Why It Matters

The quality of chunks directly determines the quality of retrieval and generation. Fixed-size chunks are fast to implement but frequently produce incomplete thoughts or merge unrelated content, causing the retriever to surface partially relevant passages. Semantic chunking ensures that when a chunk is retrieved, it contains a complete, coherent discussion of one topic—giving the LLM a better foundation for generating accurate answers. For 99helpers knowledge bases containing long, multi-topic help articles, semantic chunking can significantly improve answer quality by preventing topic contamination between chunks.

How It Works

To implement semantic chunking, first split the document into individual sentences. Compute an embedding for each sentence, then compute the cosine similarity between consecutive sentences or a sliding window of sentences. When similarity drops below a configurable threshold (e.g., 0.7), mark a chunk boundary. Collect sentences within each boundary into a single chunk. Libraries like LangChain and LlamaIndex provide semantic chunking implementations. The threshold is a hyperparameter—lower values create fewer, larger chunks; higher values create more, smaller chunks—and should be tuned against retrieval quality metrics on your specific corpus.

Semantic Chunking — Similarity-Based Boundary Detection

Document sentences

S1Initial setup requires installing the SDK.Chunk 1
S2Configure environment variables in your .env file.Chunk 1
S3Run the initialization script to complete setup.Chunk 1
Chunk boundary (sim=0.31)
S4Advanced rate limiting lets you control API usage.Chunk 2
S5Set throttle limits per user or per endpoint.Chunk 2
Chunk boundary (sim=0.28)
S6Common errors include 401 unauthorized responses.Chunk 3
S7Retry with exponential backoff after 429 errors.Chunk 3

Cosine similarity between consecutive sentences

S1 → S2
0.91
S2 → S3
0.88
S3 → S4
0.31CUT
S4 → S5
0.85
S5 → S6
0.28CUT
S6 → S7
0.87
Threshold: 0.60 — below = chunk boundary

Chunk 1

Setup

3 sentences

Chunk 2

Rate limiting

2 sentences

Chunk 3

Error handling

2 sentences

Real-World Example

A 99helpers help article covers three topics: initial setup, advanced configuration, and troubleshooting. With fixed-size chunking at 400 tokens, the article splits mid-paragraph, producing one chunk that mixes setup and configuration content and another that spans configuration and troubleshooting. With semantic chunking, the embedding similarity drops at the natural topic transitions, producing three coherent chunks. Retrieval tests show that queries about 'troubleshooting' now reliably surface only the troubleshooting chunk rather than mixed-content chunks, improving answer precision by 28%.

Common Mistakes

  • Choosing a similarity threshold without testing it—default values may over-split or under-split your specific document style.
  • Applying semantic chunking to short documents where it adds overhead without meaningfully improving over paragraph-based chunking.
  • Ignoring chunk size distribution—semantic chunks can vary widely in length, and very long chunks may overflow the LLM context window.

Related Terms

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →
What is Semantic Chunking? Semantic Chunking Definition & Guide | 99helpers | 99helpers.com