Vector Database
Definition
A vector database stores embedding vectors — the numerical representations of text, images, or other data produced by machine learning models — alongside the original content and associated metadata. Its defining capability is approximate nearest neighbor (ANN) search: given a query vector, it efficiently finds the k vectors in the database most similar to the query vector using metrics like cosine similarity or dot product. Vector databases are the retrieval engine in RAG architectures, enabling AI systems to semantically search across millions of document chunks in milliseconds. Popular vector databases include Pinecone, Weaviate, Chroma, Milvus, Qdrant, and the pgvector extension for PostgreSQL.
Why It Matters
Vector databases are the infrastructure that makes semantic search and RAG possible at production scale. Traditional databases (SQL, NoSQL) are optimized for exact lookup and keyword search — they cannot find semantically similar content across millions of documents in milliseconds. Vector databases solve this with specialized indexing algorithms (HNSW, IVF, LSH) that enable approximate nearest neighbor search at scale. For AI chatbot deployments, the vector database is where the knowledge base is stored in searchable form — its performance (latency, recall, cost) directly impacts chatbot response quality and speed.
How It Works
Vector databases work by organizing embedding vectors in specialized index structures that enable fast similarity search without comparing every vector against the query. The HNSW (Hierarchical Navigable Small World) algorithm, used by most modern vector databases, organizes vectors in a graph structure that allows O(log n) search complexity rather than O(n). When a query vector arrives, the index traverses this graph to find approximate nearest neighbors efficiently. Metadata filters can be applied alongside vector search to restrict results (e.g., 'find semantically similar chunks, but only from articles in the billing category'). Vector databases also handle upserts (adding or updating vectors), deletions, and namespace management for multi-tenant applications.
Vector DB vs SQL DB — Storage and Query Model
SQL Database
SELECT * FROM docs WHERE id = 42
Vector Database
query_vector = embed(input) → top-5 by cosine
Example: 1M document vectors — query in <10ms
Top-5 results returned
Core operations
upsert
Insert or update vector
query
ANN similarity search
delete
Remove by ID
filter
Metadata pre-filter
Real-World Example
A 99helpers customer builds their AI chatbot knowledge base in a vector database with 15,000 document chunks across 500 knowledge base articles. When a user asks a question, the system embeds the query and searches the vector database for the 5 most semantically similar chunks — completing the search in under 20 milliseconds. The retrieved chunks are passed to the LLM as context. The entire retrieval-to-response latency is under 2 seconds, meeting the real-time chat experience requirement.
Common Mistakes
- ✕Choosing a vector database based on benchmark performance alone without considering operational factors (managed vs. self-hosted, cost, developer experience)
- ✕Not implementing metadata filtering — filtering by category, date, or document source dramatically improves retrieval precision by reducing the candidate set
- ✕Embedding full documents as single vectors instead of chunked passages — long documents lose granular semantic meaning; chunk before embedding
Related Terms
Retrieval-Augmented Generation
Retrieval-Augmented Generation (RAG) is an AI architecture that enhances large language model responses by first retrieving relevant documents from an external knowledge base and then using that retrieved content as context when generating an answer.
Embedding Model
An embedding model is a machine learning model that converts text (or other data) into dense numerical vectors that capture semantic meaning, enabling similarity search and serving as the foundation of RAG retrieval systems.
Approximate Nearest Neighbor
Approximate Nearest Neighbor (ANN) search finds vectors that are close to a query vector with high probability but without guaranteeing exactness, enabling fast similarity search across millions of vectors at the cost of small accuracy tradeoffs.
Cosine Similarity
Cosine similarity is a mathematical metric that measures the similarity between two vectors by calculating the cosine of the angle between them, producing a score from -1 to 1 where 1 indicates identical direction and is widely used in RAG and semantic search.
Indexing Pipeline
An indexing pipeline is the offline data processing workflow that transforms raw documents into searchable vector embeddings, running during knowledge base setup and when content is updated.
Ready to build your AI chatbot?
Put these concepts into practice with 99helpers — no code required.
Start free trial →