Chroma
Definition
Chroma is the go-to vector database for getting started with RAG and semantic search, prioritizing developer experience and ease of setup over enterprise-scale features. It can run entirely in-memory (no persistence) or with local file-based persistence using SQLite and HNSWLIB, requiring no separate database infrastructure. Chroma's Python API is minimal: create a collection, add documents with optional metadata, query with a text or vector. It handles embedding automatically if given a collection embedding function or accepts pre-computed vectors. Collections serve as namespace equivalents. Chroma can also run as a client-server application for shared access. Its simplicity makes it ideal for development, experimentation, and small-to-medium production deployments.
Why It Matters
Starting a RAG project doesn't require production-grade infrastructure. Chroma lets developers build and test a complete RAG pipeline in a single Python script without running any servers, Docker containers, or cloud services—just pip install chromadb and write code. For 99helpers developers prototyping new chatbot features, Chroma provides the fastest path from idea to working RAG demo. When the prototype matures into production, teams can migrate to Pinecone or Weaviate, but many small deployments (under ~1M vectors) run Chroma in server mode successfully in production.
How It Works
A minimal Chroma RAG setup: import chromadb; client = chromadb.PersistentClient('/path/to/db'); collection = client.create_collection('support_docs', embedding_function=openai_ef); collection.add(documents=['Article text...'], metadatas=[{'source': 'help.example.com'}], ids=['doc1']); results = collection.query(query_texts=['How do I reset password?'], n_results=5). Chroma handles embedding and similarity search internally. For larger datasets, Chroma's server mode (chroma run --path /path/to/db) accepts connections from multiple clients. Filtering uses metadata: collection.query(query_texts=[...], where={'category': 'billing'}).
Chroma Vector Store — Operations
Documents
Text input
Embedding Model
Encoder
Chroma Collection
Local / embedded DB
Collection: my_docs
| id | embedding | metadata | document |
|---|---|---|---|
| doc_001 | [0.23, -0.81, ...] | {"source":"faq","page":1} | How to reset your pass... |
| doc_002 | [0.67, 0.12, ...] | {"source":"guide","page":3} | Billing and subscription... |
| doc_003 | [-0.11, 0.55, ...] | {"source":"faq","page":2} | Supported file formats... |
Query
"password reset"
Embed query
Same model
Similarity search
top-k=3
Top-k results with distances
Local / embedded
Zero config, in-process, ideal for dev
Cloud alternatives
Pinecone, Weaviate, Qdrant — managed
Real-World Example
A 99helpers developer is building a proof-of-concept for a new feature that lets users query their uploaded PDFs. Using Chroma, they write a 50-line Python script: load PDFs with LangChain's PyPDFLoader, chunk with RecursiveCharacterTextSplitter, store in a local Chroma collection, and query with natural language. The entire RAG prototype works without any cloud setup, runs on their laptop, and demonstrates the feature to stakeholders in 2 hours. When the feature is approved for production, they swap Chroma for Pinecone with minimal code changes, keeping the rest of the pipeline identical.
Common Mistakes
- ✕Using Chroma's in-memory mode in production—data is lost on restart; use PersistentClient for any data that needs to survive restarts.
- ✕Running Chroma in production for workloads exceeding ~1M vectors without benchmarking—Chroma is not optimized for very large-scale deployments.
- ✕Assuming Chroma's default embedding function (if any) will produce the same vectors as your production embedding model—always specify the same embedding function in development and production.
Related Terms
Vector Database
A vector database is a purpose-built data store optimized for storing, indexing, and querying high-dimensional numerical vectors (embeddings), enabling fast similarity search across large collections of embedded documents.
Pinecone
Pinecone is a fully managed vector database service designed for production machine learning applications, providing high-performance similarity search with simple APIs and automatic scaling for RAG and semantic search systems.
Weaviate
Weaviate is an open-source vector database with built-in support for hybrid search, multi-tenancy, and automatic vectorization, popular in enterprise RAG deployments for its flexibility and self-hosting capability.
pgvector
pgvector is a PostgreSQL extension that adds vector similarity search capabilities to Postgres, enabling teams to run RAG retrieval directly in their existing database without a separate vector store.
Retrieval-Augmented Generation
Retrieval-Augmented Generation (RAG) is an AI architecture that enhances large language model responses by first retrieving relevant documents from an external knowledge base and then using that retrieved content as context when generating an answer.
Ready to build your AI chatbot?
Put these concepts into practice with 99helpers — no code required.
Start free trial →