Natural Language Processing (NLP)

Spell Checking

Definition

Spell checking combines a dictionary lookup (is this a valid word?) with a correction mechanism (what word was probably intended?). Non-word error detection identifies tokens not found in a lexicon. Real-word error detection identifies valid words used in the wrong context ('their' vs. 'there'). Correction uses edit distance algorithms (Levenshtein distance) to find the closest valid word by character substitutions, insertions, deletions, and transpositions. Neural spell checkers use sequence-to-sequence models to correct entire sentences, handling context-sensitive errors that character-level distance metrics miss.

Why It Matters

User-generated text in chat interfaces contains frequent typos, phonetic misspellings, and autocorrect errors that degrade NLP model performance. Intent classifiers and entity extractors trained on clean text fail on misspelled inputs: 'cancle my subscrpition' may not match any intent if the model has never seen these misspellings. Spell correction as a preprocessing step normalizes input to clean text, significantly improving downstream accuracy. For voice-to-text inputs, spell checking also corrects transcription errors.

How It Works

The Norvig spell checker uses edit-distance-1 and edit-distance-2 candidates from a known word list, then selects the candidate with the highest unigram probability in a training corpus. This simple approach handles the majority of single-character typos. Neural spell checkers use encoder-decoder architectures trained on (misspelled, correct) text pairs, learning to correct context-sensitive errors that pure edit-distance methods miss. SymSpell uses pre-computed delete-only edit distance variants to achieve orders-of-magnitude faster correction than edit-distance computation.

Spell Checking — Edit Distance Correction

Misspelled input
recieve
Corrected output
receive

Ranked Candidates

#
Candidate
Edit Dist.
Freq. score
Selected
1
receive
1
98
2
relieve
2
72
3
retrieve
3
61
4
recipe
3
44
1. Detect
Token not found in dictionary
2. Generate
Enumerate candidates within edit distance ≤ 3
3. Score
Combine edit distance + language model probability
4. Rank & Replace
Return top-ranked candidate

Real-World Example

A chatbot for a banking app adds spell correction as the first preprocessing step in its NLP pipeline. When a user types 'tranfer moeny to savigns account,' the spell checker normalizes it to 'transfer money to savings account' before intent classification, achieving 96% intent accuracy on misspelled inputs vs. 71% without correction. The improvement is especially notable for domain-specific terms like 'beneficiry' → 'beneficiary' that general spell checkers don't cover without custom dictionary additions.

Common Mistakes

  • Blindly correcting all non-dictionary words—product names, usernames, and technical terms are valid non-words that should not be corrected
  • Ignoring context when correcting—edit-distance-based correction without language model scoring produces nonsensical fixes
  • Over-correcting in multilingual environments—words from other languages are not misspellings and should trigger language detection instead

Related Terms

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →
What is Spell Checking? Spell Checking Definition & Guide | 99helpers | 99helpers.com