Retrieval-Augmented Generation (RAG)

Vector Database

Definition

A vector database stores embedding vectors — the numerical representations of text, images, or other data produced by machine learning models — alongside the original content and associated metadata. Its defining capability is approximate nearest neighbor (ANN) search: given a query vector, it efficiently finds the k vectors in the database most similar to the query vector using metrics like cosine similarity or dot product. Vector databases are the retrieval engine in RAG architectures, enabling AI systems to semantically search across millions of document chunks in milliseconds. Popular vector databases include Pinecone, Weaviate, Chroma, Milvus, Qdrant, and the pgvector extension for PostgreSQL.

Why It Matters

Vector databases are the infrastructure that makes semantic search and RAG possible at production scale. Traditional databases (SQL, NoSQL) are optimized for exact lookup and keyword search — they cannot find semantically similar content across millions of documents in milliseconds. Vector databases solve this with specialized indexing algorithms (HNSW, IVF, LSH) that enable approximate nearest neighbor search at scale. For AI chatbot deployments, the vector database is where the knowledge base is stored in searchable form — its performance (latency, recall, cost) directly impacts chatbot response quality and speed.

How It Works

Vector databases work by organizing embedding vectors in specialized index structures that enable fast similarity search without comparing every vector against the query. The HNSW (Hierarchical Navigable Small World) algorithm, used by most modern vector databases, organizes vectors in a graph structure that allows O(log n) search complexity rather than O(n). When a query vector arrives, the index traverses this graph to find approximate nearest neighbors efficiently. Metadata filters can be applied alongside vector search to restrict results (e.g., 'find semantically similar chunks, but only from articles in the billing category'). Vector databases also handle upserts (adding or updating vectors), deletions, and namespace management for multi-tenant applications.

Vector DB vs SQL DB — Storage and Query Model

SQL Database

SELECT * FROM docs WHERE id = 42

Query type:Exact match
Storage:Row-based table storage
Index:B-tree, hash index
Speed:Fast for lookups by key
Example:Find doc by ID

Vector Database

query_vector = embed(input) → top-5 by cosine

Query type:Similarity search
Storage:Vector index storage
Index:HNSW / IVF index
Speed:Fast for ANN search
Example:Find semantically similar docs

Example: 1M document vectors — query in <10ms

1.Embed user query → [0.23, -0.44, 0.71, ...]
2.HNSW index traversal → candidate set (~200 docs)
3.Score candidates → rank by cosine similarity
4.Return top-5 with scores + metadata

Top-5 results returned

1.How to reset your password
0.97
2.Account recovery options
0.91
3.Two-factor authentication setup
0.84
4.Login troubleshooting guide
0.79
5.Security settings overview
0.73

Core operations

upsert

Insert or update vector

query

ANN similarity search

delete

Remove by ID

filter

Metadata pre-filter

Real-World Example

A 99helpers customer builds their AI chatbot knowledge base in a vector database with 15,000 document chunks across 500 knowledge base articles. When a user asks a question, the system embeds the query and searches the vector database for the 5 most semantically similar chunks — completing the search in under 20 milliseconds. The retrieved chunks are passed to the LLM as context. The entire retrieval-to-response latency is under 2 seconds, meeting the real-time chat experience requirement.

Common Mistakes

  • Choosing a vector database based on benchmark performance alone without considering operational factors (managed vs. self-hosted, cost, developer experience)
  • Not implementing metadata filtering — filtering by category, date, or document source dramatically improves retrieval precision by reducing the candidate set
  • Embedding full documents as single vectors instead of chunked passages — long documents lose granular semantic meaning; chunk before embedding

Related Terms

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →
What is Vector Database? Vector Database Definition & Guide | 99helpers | 99helpers.com