Natural Language Processing (NLP)

Sequence Labeling

Definition

Sequence labeling is the NLP task of assigning a categorical label to each element in a sequence of tokens. Unlike text classification which assigns one label per document, sequence labeling produces one label per token. Applications include named entity recognition (label each token as B-PER, I-ORG, O), part-of-speech tagging (label each token with its grammatical role), chunking (label sequences of tokens forming phrases), and slot filling (label tokens contributing to dialogue slot values). Modern sequence labelers use transformer encoders with token-level classification heads, often combined with CRF (Conditional Random Field) decoders to enforce label sequence consistency.

Why It Matters

Sequence labeling is the technical foundation for named entity recognition, slot filling, relation extraction, and syntactic analysis. Any NLP application that needs to locate specific information within text—not just classify the whole text—relies on sequence labeling. For chatbot slot filling, sequence labeling identifies exactly which words in 'book a flight to Paris on Friday' contribute to the destination ('Paris') and date ('Friday') slots. Understanding sequence labeling as a paradigm helps practitioners design better NLP pipelines and interpret model architectures.

How It Works

Transformer-based sequence labelers use a BERT-style encoder to produce contextual token embeddings, followed by a linear classification layer that predicts label probabilities for each token. CRF decoding adds a transition score matrix that captures valid label transitions (I-PER cannot follow B-ORG in BIO tagging), enforcing globally consistent label sequences via the Viterbi algorithm. Training minimizes cross-entropy loss over all token labels. WordPiece tokenization creates subword tokens that must be aligned back to original word boundaries for final predictions—typically using the first subword token's label for each word.

Sequence Labeling — BIO Tagging

Input Tokens

Marie

↓

B-PER

Curie

↓

I-PER

worked

↓

CNRS

↓

B-ORG

↓

Paris

↓

B-LOC

Tag Legend

B-PERBegin Person

I-PERInside Person

B-ORGBegin Organization

B-LOCBegin Location

OOutside (no entity)

Real-World Example

An accounts payable automation system uses sequence labeling to extract invoice fields from OCR-processed purchase orders. Trained on 5,000 labeled invoices, the model labels each token as B-vendor, I-vendor, B-amount, I-amount, B-date, I-date, or O. For an invoice reading '...Amazon Web Services TOTAL: $4,821.50 due 2026-04-01...', the sequence labeler correctly identifies entity spans that the accounts payable system then normalizes and enters into the ERP system—processing invoices that previously required 15 minutes of manual data entry in under 2 seconds.

Common Mistakes

✕Ignoring subword-to-word alignment when using WordPiece tokenizers—models must map subword labels back to original word boundaries
✕Skipping CRF decoding for fine-grained tasks—independent per-token softmax produces invalid BIO label sequences (I-PER without preceding B-PER)
✕Treating sequence labeling as classification—the sequential nature and label dependencies require different architectures and evaluation metrics

Related Terms

Named Entity Recognition (NER)

Named Entity Recognition (NER) is an NLP task that identifies and classifies named entities in text—people, organizations, locations, dates, product names, and other specific items—enabling structured extraction from unstructured text.

Part-of-Speech Tagging

Part-of-speech (POS) tagging assigns grammatical labels—noun, verb, adjective, preposition—to each word in a sentence, providing syntactic context that downstream NLP tasks use for deeper language understanding.

Slot Filling

Slot filling is the dialogue management process of collecting all the required pieces of information (slots) needed to complete a task. The chatbot systematically asks for any missing slots — like date, time, or account number — until it has everything needed to fulfill the user's request.

BERT

BERT (Bidirectional Encoder Representations from Transformers) is a transformer-based language model pre-trained on massive text corpora that revolutionized NLP by providing rich contextual word representations that dramatically improved nearly every language task.

Linguistic Annotation

Linguistic annotation is the process of manually or automatically labeling text with linguistic information—such as POS tags, parse trees, named entities, or coreference chains—creating training data for supervised NLP models.

← Natural Language Processing (NLP)← Glossary Hub

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →