Natural Language Processing (NLP)

Word Embeddings

Definition

Word embeddings represent each word as a fixed-size dense vector (typically 50-300 dimensions) learned from large text corpora. The key insight is the distributional hypothesis: words that appear in similar contexts have similar meanings. Classic models like Word2Vec (2013) and GloVe (2014) trained shallow neural networks on co-occurrence statistics to learn these representations. The resulting vectors encode semantic relationships—'king' minus 'man' plus 'woman' approximates 'queen'—making arithmetic on meaning possible. Contextual embeddings from transformers replaced static embeddings for most tasks but static embeddings remain useful for efficiency-critical applications.

Why It Matters

Word embeddings democratized NLP by providing semantic representations without manually crafting features. For chatbots, embeddings enable semantic similarity search, allowing retrieval of relevant knowledge base articles even when the user's wording differs from the stored text. They also power recommendation systems, document clustering, and spelling correction. Understanding embeddings is foundational for anyone working with language models, as transformer models use embedding layers as their first processing stage.

How It Works

Word2Vec's Skip-gram model trains a shallow neural network to predict surrounding context words given a target word. The hidden layer weights become the word vectors. GloVe builds a global co-occurrence matrix and factorizes it to produce vectors where dot products approximate log co-occurrence probability. FastText extends Word2Vec by representing each word as a sum of character n-gram vectors, handling out-of-vocabulary words gracefully. All methods produce a lookup table mapping tokens to dense vectors used as neural network inputs.

Word Embeddings — Vector Space Clusters

2D Vector Space (PCA projected)

Royalty

Animals

Food

• king

• queen

• prince

• dog

• cat

• wolf

• bread

• pasta

• rice

Vector Arithmetic: king − man + woman ≈ queen

king

[0.82, 0.14, −0.35, ...]

− man

[0.21, −0.43, 0.12, ...]

+ woman

[0.19, 0.67, −0.22, ...]

≈ queen

[0.78, 0.69, −0.19, ...]

Real-World Example

A knowledge base search system uses GloVe embeddings to expand user queries beyond exact keyword matching. When a user searches for 'reset credentials,' the embedding layer recognizes that 'credentials' is semantically close to 'password,' 'login,' and 'account access,' returning relevant articles even if they don't contain the word 'credentials.' This semantic expansion reduced zero-result searches from 18% to 4%.

Common Mistakes

✕Using static embeddings for polysemous words—'bank' (financial) and 'bank' (river) get the same vector
✕Training embeddings on too small a corpus—reliable representations require hundreds of millions of tokens
✕Treating embedding dimensions as interpretable features—the dimensions have no human-readable meaning

Related Terms

Sentence Transformers

Sentence transformers are neural models that produce fixed-size semantic embeddings for entire sentences, enabling efficient semantic similarity search, clustering, and retrieval by representing meaning as comparable vectors.

BERT

BERT (Bidirectional Encoder Representations from Transformers) is a transformer-based language model pre-trained on massive text corpora that revolutionized NLP by providing rich contextual word representations that dramatically improved nearly every language task.

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →