Natural Language Processing (NLP)

Spell Checking

Definition

Spell checking combines a dictionary lookup (is this a valid word?) with a correction mechanism (what word was probably intended?). Non-word error detection identifies tokens not found in a lexicon. Real-word error detection identifies valid words used in the wrong context ('their' vs. 'there'). Correction uses edit distance algorithms (Levenshtein distance) to find the closest valid word by character substitutions, insertions, deletions, and transpositions. Neural spell checkers use sequence-to-sequence models to correct entire sentences, handling context-sensitive errors that character-level distance metrics miss.

Why It Matters

User-generated text in chat interfaces contains frequent typos, phonetic misspellings, and autocorrect errors that degrade NLP model performance. Intent classifiers and entity extractors trained on clean text fail on misspelled inputs: 'cancle my subscrpition' may not match any intent if the model has never seen these misspellings. Spell correction as a preprocessing step normalizes input to clean text, significantly improving downstream accuracy. For voice-to-text inputs, spell checking also corrects transcription errors.

How It Works

The Norvig spell checker uses edit-distance-1 and edit-distance-2 candidates from a known word list, then selects the candidate with the highest unigram probability in a training corpus. This simple approach handles the majority of single-character typos. Neural spell checkers use encoder-decoder architectures trained on (misspelled, correct) text pairs, learning to correct context-sensitive errors that pure edit-distance methods miss. SymSpell uses pre-computed delete-only edit distance variants to achieve orders-of-magnitude faster correction than edit-distance computation.

Spell Checking — Edit Distance Correction

Misspelled input

recieve

→

Corrected output

receive

Ranked Candidates

Candidate

Edit Dist.

Freq. score

Selected

receive

✓

relieve

retrieve

recipe

1. Detect

Token not found in dictionary

2. Generate

Enumerate candidates within edit distance ≤ 3

3. Score

Combine edit distance + language model probability

4. Rank & Replace

Return top-ranked candidate

Real-World Example

A chatbot for a banking app adds spell correction as the first preprocessing step in its NLP pipeline. When a user types 'tranfer moeny to savigns account,' the spell checker normalizes it to 'transfer money to savings account' before intent classification, achieving 96% intent accuracy on misspelled inputs vs. 71% without correction. The improvement is especially notable for domain-specific terms like 'beneficiry' → 'beneficiary' that general spell checkers don't cover without custom dictionary additions.

Common Mistakes

✕Blindly correcting all non-dictionary words—product names, usernames, and technical terms are valid non-words that should not be corrected
✕Ignoring context when correcting—edit-distance-based correction without language model scoring produces nonsensical fixes
✕Over-correcting in multilingual environments—words from other languages are not misspellings and should trigger language detection instead

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →

Spell Checking

Definition

Why It Matters

How It Works

Real-World Example

Common Mistakes

Related Terms

Text Preprocessing

Text Normalization

Natural Language Processing (NLP)

Language Detection

Intent Detection

Ready to build your AI chatbot?