Sentence Transformers
Definition
Sentence transformers adapt transformer encoder models (BERT, RoBERTa) to produce high-quality sentence-level embeddings using siamese and triplet network architectures trained with contrastive objectives. The SBERT (Sentence-BERT) model family, introduced in 2019, applies mean-pooling over token embeddings to produce sentence vectors where cosine similarity measures semantic relatedness. Unlike cross-encoders that require pairwise inference for similarity (slow at scale), sentence transformers encode each text independently, enabling pre-computation and approximate nearest-neighbor search over millions of embeddings.
Why It Matters
Sentence transformers power the embedding layer in RAG systems, semantic search engines, duplicate detection, and recommendation systems. When a user queries a knowledge base, their question is encoded into a vector and compared against pre-computed vectors for all knowledge base chunks—retrieving the most semantically relevant content in milliseconds. This semantic retrieval capability is what makes modern AI chatbots dramatically more effective than keyword-based search: 'How do I recover my account?' retrieves articles about 'account recovery' and 'password reset' even without exact keyword overlap.
How It Works
SBERT training uses siamese networks: two identical transformer encoders share weights and each independently encodes one of the two sentences in a pair. The pooled embeddings are compared using cosine similarity or concatenated and passed to a classifier. Training with natural language inference (NLI) data uses (premise, hypothesis) pairs as semantic similarity signal: entailment pairs have high similarity, contradiction pairs have low similarity. Triplet loss training uses (anchor, positive, negative) triplets to push semantically similar texts together and dissimilar ones apart. The sentence-transformers Python library provides 200+ pre-trained and fine-tuned models.
Sentence Transformers — Fixed-Size Embedding Pipeline
Input Sentence
"How are you?"
Tokenize → Transformer layers → Token embeddings
Fixed-Size Embedding Vector (768 dims)
Real-World Example
A software documentation platform embeds all 2,000 help center articles using the 'all-MiniLM-L6-v2' sentence transformer model, storing 384-dimensional vectors in pgvector. When a developer submits a question via the chatbot, the question is embedded in real-time and the top-5 most similar document chunks are retrieved using approximate nearest-neighbor search. This semantic retrieval layer handles 'how do I authenticate my API requests' matching 'API authentication guide' correctly despite zero keyword overlap—achieving 41% better retrieval precision than the previous BM25 keyword search.
Common Mistakes
- ✕Using generic pre-trained models for specialized domains without fine-tuning—domain-specific embeddings significantly outperform generic ones for technical or medical content
- ✕Comparing embeddings from different model families—cosine similarity is only meaningful within the same embedding space
- ✕Embedding entire long documents as single vectors—meaningful semantic comparisons require embedding at paragraph or chunk granularity
Related Terms
Word Embeddings
Word embeddings are dense vector representations of words in a continuous numerical space where semantically similar words are positioned close together, enabling machines to understand word meaning through geometry.
BERT
BERT (Bidirectional Encoder Representations from Transformers) is a transformer-based language model pre-trained on massive text corpora that revolutionized NLP by providing rich contextual word representations that dramatically improved nearly every language task.
Transformer Encoder
The transformer encoder is a neural network architecture that processes entire input sequences bidirectionally using self-attention, producing rich contextual representations of each token that power state-of-the-art NLP models.
Semantic Parsing
Semantic parsing converts natural language sentences into formal logical representations—such as SQL queries, executable programs, or knowledge graph queries—enabling AI systems to understand and act on user requests precisely.
Multilingual NLP
Multilingual NLP extends language models and processing pipelines to handle multiple human languages, enabling a single AI system to understand and generate text across languages without building separate models for each.
Ready to build your AI chatbot?
Put these concepts into practice with 99helpers — no code required.
Start free trial →