Retrieval-Augmented Generation (RAG)

Indexing Pipeline

Definition

The indexing pipeline is the offline half of a RAG system, responsible for preparing knowledge base content for retrieval. It runs asynchronously from query processing, typically triggered on a schedule or when source documents change. The pipeline stages are: (1) data loading—connectors fetch documents from source systems; (2) preprocessing—cleaning, deduplication, format normalization; (3) chunking—splitting documents into retrievable segments; (4) embedding—converting chunks to vector representations; (5) upsertion—writing chunk vectors and metadata to the vector database; (6) optionally, graph extraction for GraphRAG or BM25 index construction for hybrid retrieval. Pipeline health directly determines knowledge base freshness and completeness.

Why It Matters

If the indexing pipeline is slow, unreliable, or incomplete, the knowledge base becomes stale and the chatbot gives outdated answers. If it crashes silently, newly added documentation never reaches users. For 99helpers customers who update their help content frequently—adding new features, publishing policy changes—a robust indexing pipeline with error handling, progress monitoring, and incremental updates ensures the chatbot always reflects current knowledge. Indexing pipeline failures are often invisible to end users until they notice the chatbot answering with outdated information.

How It Works

Production indexing pipelines are built as job queues or data processing workflows. A job is triggered by a webhook (document saved in the CMS), a schedule (nightly re-index), or a manual trigger. The job fetches new or modified documents using source connectors, computes content hashes to skip unchanged content (deduplication), chunks and embeds only new content, then upserts vectors with metadata into the vector database. Failed jobs are logged and retried with exponential backoff. Monitoring tracks: documents indexed, chunks created, embedding API calls, error rates, and time-to-visible (how long after a document is created before it is searchable).

Document Indexing Pipeline

Load Documents

PDF, HTML, DOCX, TXT

Parse & Clean

Extract text, strip noise

Chunk Text

Split into segments

Embed Chunks

Convert to vectors

Store Vectors

Write to vector DB

Output

Queryable vector index ready for RAG retrieval

Real-World Example

A 99helpers customer publishes a new help article at 2 PM. Their indexing pipeline runs on a 15-minute schedule: at 2:15 PM the Zendesk connector fetches articles modified since the last run, detects the new article, chunks it into 6 segments, embeds them via OpenAI, and upserts all 6 vectors to Pinecone. By 2:16 PM the chatbot can answer questions based on the new article. A pipeline health dashboard shows indexing lag (time between publish and searchability), error rate, and daily document count, alerting on-call staff if lag exceeds 30 minutes.

Common Mistakes

✕Running full re-indexes on every update instead of incremental sync—for large knowledge bases, this is prohibitively slow and expensive.
✕Ignoring pipeline failures—silent errors mean documents silently fail to index without any alert.
✕Not tracking the embedding model version used for each chunk—when switching models, all chunks must be re-embedded for consistency.

Related Terms

RAG Pipeline

A RAG pipeline is the end-to-end sequence of components—ingestion, chunking, embedding, storage, retrieval, and generation—that transforms raw documents into AI-generated answers grounded in a knowledge base.

Data Connector

A data connector in RAG systems is an integration component that ingests content from a specific external source—such as Confluence, Notion, Google Drive, or Zendesk—and transforms it into a format suitable for embedding and storage in a vector database.

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →