Natural Language Processing (NLP)

Sentence Transformers

Definition

Sentence transformers adapt transformer encoder models (BERT, RoBERTa) to produce high-quality sentence-level embeddings using siamese and triplet network architectures trained with contrastive objectives. The SBERT (Sentence-BERT) model family, introduced in 2019, applies mean-pooling over token embeddings to produce sentence vectors where cosine similarity measures semantic relatedness. Unlike cross-encoders that require pairwise inference for similarity (slow at scale), sentence transformers encode each text independently, enabling pre-computation and approximate nearest-neighbor search over millions of embeddings.

Why It Matters

Sentence transformers power the embedding layer in RAG systems, semantic search engines, duplicate detection, and recommendation systems. When a user queries a knowledge base, their question is encoded into a vector and compared against pre-computed vectors for all knowledge base chunks—retrieving the most semantically relevant content in milliseconds. This semantic retrieval capability is what makes modern AI chatbots dramatically more effective than keyword-based search: 'How do I recover my account?' retrieves articles about 'account recovery' and 'password reset' even without exact keyword overlap.

How It Works

SBERT training uses siamese networks: two identical transformer encoders share weights and each independently encodes one of the two sentences in a pair. The pooled embeddings are compared using cosine similarity or concatenated and passed to a classifier. Training with natural language inference (NLI) data uses (premise, hypothesis) pairs as semantic similarity signal: entailment pairs have high similarity, contradiction pairs have low similarity. Triplet loss training uses (anchor, positive, negative) triplets to push semantically similar texts together and dissimilar ones apart. The sentence-transformers Python library provides 200+ pre-trained and fine-tuned models.

Sentence Transformers — Fixed-Size Embedding Pipeline

Input Sentence

"How are you?"

Tokenize → Transformer layers → Token embeddings

[CLS]

How

are

you

[SEP]

Mean Pooling

Average all token embeddings — best general quality

RECOMMENDED

CLS Token

Use [CLS] representation directly

Max Pooling

Element-wise maximum across tokens

Fixed-Size Embedding Vector (768 dims)

0.82

-0.34

0.61

0.15

-0.72

0.45

0.93

-0.21

… ×760

Real-World Example

A software documentation platform embeds all 2,000 help center articles using the 'all-MiniLM-L6-v2' sentence transformer model, storing 384-dimensional vectors in pgvector. When a developer submits a question via the chatbot, the question is embedded in real-time and the top-5 most similar document chunks are retrieved using approximate nearest-neighbor search. This semantic retrieval layer handles 'how do I authenticate my API requests' matching 'API authentication guide' correctly despite zero keyword overlap—achieving 41% better retrieval precision than the previous BM25 keyword search.

Common Mistakes

✕Using generic pre-trained models for specialized domains without fine-tuning—domain-specific embeddings significantly outperform generic ones for technical or medical content
✕Comparing embeddings from different model families—cosine similarity is only meaningful within the same embedding space
✕Embedding entire long documents as single vectors—meaningful semantic comparisons require embedding at paragraph or chunk granularity

Related Terms

Word Embeddings

Word embeddings are dense vector representations of words in a continuous numerical space where semantically similar words are positioned close together, enabling machines to understand word meaning through geometry.

BERT

BERT (Bidirectional Encoder Representations from Transformers) is a transformer-based language model pre-trained on massive text corpora that revolutionized NLP by providing rich contextual word representations that dramatically improved nearly every language task.

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →