Retrieval-Augmented Generation (RAG)

Chroma

Definition

Chroma is the go-to vector database for getting started with RAG and semantic search, prioritizing developer experience and ease of setup over enterprise-scale features. It can run entirely in-memory (no persistence) or with local file-based persistence using SQLite and HNSWLIB, requiring no separate database infrastructure. Chroma's Python API is minimal: create a collection, add documents with optional metadata, query with a text or vector. It handles embedding automatically if given a collection embedding function or accepts pre-computed vectors. Collections serve as namespace equivalents. Chroma can also run as a client-server application for shared access. Its simplicity makes it ideal for development, experimentation, and small-to-medium production deployments.

Why It Matters

Starting a RAG project doesn't require production-grade infrastructure. Chroma lets developers build and test a complete RAG pipeline in a single Python script without running any servers, Docker containers, or cloud services—just pip install chromadb and write code. For 99helpers developers prototyping new chatbot features, Chroma provides the fastest path from idea to working RAG demo. When the prototype matures into production, teams can migrate to Pinecone or Weaviate, but many small deployments (under ~1M vectors) run Chroma in server mode successfully in production.

How It Works

A minimal Chroma RAG setup: import chromadb; client = chromadb.PersistentClient('/path/to/db'); collection = client.create_collection('support_docs', embedding_function=openai_ef); collection.add(documents=['Article text...'], metadatas=[{'source': 'help.example.com'}], ids=['doc1']); results = collection.query(query_texts=['How do I reset password?'], n_results=5). Chroma handles embedding and similarity search internally. For larger datasets, Chroma's server mode (chroma run --path /path/to/db) accepts connections from multiple clients. Filtering uses metadata: collection.query(query_texts=[...], where={'category': 'billing'}).

Chroma Vector Store — Operations

Documents

Text input

Embedding Model

Encoder

Chroma Collection

Local / embedded DB

Collection: my_docs

id	embedding	metadata	document
doc_001	[0.23, -0.81, ...]	{"source":"faq","page":1}	How to reset your pass...
doc_002	[0.67, 0.12, ...]	{"source":"guide","page":3}	Billing and subscription...
doc_003	[-0.11, 0.55, ...]	{"source":"faq","page":2}	Supported file formats...

Query

"password reset"

Embed query

Same model

Similarity search

top-k=3

Top-k results with distances

1.doc_001How to reset your password...d=0.03

2.doc_003Supported file formats...d=0.28

3.doc_002Billing and subscription...d=0.41

Local / embedded

Zero config, in-process, ideal for dev

Cloud alternatives

Pinecone, Weaviate, Qdrant — managed

Real-World Example

A 99helpers developer is building a proof-of-concept for a new feature that lets users query their uploaded PDFs. Using Chroma, they write a 50-line Python script: load PDFs with LangChain's PyPDFLoader, chunk with RecursiveCharacterTextSplitter, store in a local Chroma collection, and query with natural language. The entire RAG prototype works without any cloud setup, runs on their laptop, and demonstrates the feature to stakeholders in 2 hours. When the feature is approved for production, they swap Chroma for Pinecone with minimal code changes, keeping the rest of the pipeline identical.

Common Mistakes

✕Using Chroma's in-memory mode in production—data is lost on restart; use PersistentClient for any data that needs to survive restarts.
✕Running Chroma in production for workloads exceeding ~1M vectors without benchmarking—Chroma is not optimized for very large-scale deployments.
✕Assuming Chroma's default embedding function (if any) will produce the same vectors as your production embedding model—always specify the same embedding function in development and production.

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →

Chroma

Definition

Why It Matters

How It Works

Real-World Example

Common Mistakes

Related Terms

Vector Database

Pinecone

Weaviate

pgvector

Retrieval-Augmented Generation

Ready to build your AI chatbot?