Natural Language Processing (NLP)

Part-of-Speech Tagging

Definition

Part-of-speech tagging is the process of marking each token in a text with its grammatical role. Tags follow standards like Penn Treebank (NN=noun, VB=verb, JJ=adjective) or Universal Dependencies (NOUN, VERB, ADJ). Modern POS taggers use bidirectional LSTMs or transformer encoders fine-tuned on annotated corpora, achieving over 97% accuracy on standard English text. POS information feeds into dependency parsing, named entity recognition, and rule-based extraction systems.

Why It Matters

POS tagging enables chatbots and NLP pipelines to understand the grammatical structure of utterances, which improves intent classification and entity extraction accuracy. Knowing that 'book' is a noun vs. a verb (as in 'book a flight' vs. 'hand me the book') resolves critical ambiguities before downstream processing. For multilingual systems, POS tags provide language-agnostic structural information that enables cross-lingual transfer learning.

How It Works

POS taggers use sequence labeling models where each token's tag depends on its neighbors. A Viterbi decoder over a Hidden Markov Model was the classic approach; modern systems use BiLSTM-CRF or transformer encoders that attend to full sentence context. The model learns that 'runs' after 'she' is likely VBZ (3rd person singular verb), while 'runs' in 'the runs' is NNS (plural noun). Pre-trained language models like BERT encode enough syntactic information to achieve near-human POS accuracy.

Part-of-Speech Tagging — Penn Treebank Tags

Tagged sentence

The

Determiner

quick

Adjective

brown

Adjective

fox

Noun

VBZ

jumps

Verb (3rd sg.)

over

Preposition

NNS

fences

Noun (plural)

Nouns

NNNNSNNPNNPS

Verbs

VBVBZVBDVBGVBNVBP

Adjectives

JJJJRJJS

Function Words

DTINCCPRPMD

Real-World Example

An NLP pipeline for a legal document analyzer uses POS tagging to identify all verb phrases in contract clauses. By extracting patterns like 'PARTY_NAME + VBZ + obligation_noun' (e.g., 'Vendor shall provide maintenance'), the system automatically catalogs contractual obligations without reading every clause manually. The POS tags allow regex-style patterns to work robustly across varied phrasing.

Common Mistakes

✕Using POS tags as the sole disambiguation signal—context and semantics also matter
✕Ignoring POS degradation on domain-specific text—medical or legal language needs domain-adapted taggers
✕Treating POS tagging as optional overhead—many downstream tasks perform significantly worse without it

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →

Part-of-Speech Tagging

Definition

Why It Matters

How It Works

Real-World Example

Common Mistakes

Related Terms

Dependency Parsing

Named Entity Recognition (NER)

Text Preprocessing

Linguistic Annotation

Constituency Parsing

Ready to build your AI chatbot?