Spell Checking
Definition
Spell checking combines a dictionary lookup (is this a valid word?) with a correction mechanism (what word was probably intended?). Non-word error detection identifies tokens not found in a lexicon. Real-word error detection identifies valid words used in the wrong context ('their' vs. 'there'). Correction uses edit distance algorithms (Levenshtein distance) to find the closest valid word by character substitutions, insertions, deletions, and transpositions. Neural spell checkers use sequence-to-sequence models to correct entire sentences, handling context-sensitive errors that character-level distance metrics miss.
Why It Matters
User-generated text in chat interfaces contains frequent typos, phonetic misspellings, and autocorrect errors that degrade NLP model performance. Intent classifiers and entity extractors trained on clean text fail on misspelled inputs: 'cancle my subscrpition' may not match any intent if the model has never seen these misspellings. Spell correction as a preprocessing step normalizes input to clean text, significantly improving downstream accuracy. For voice-to-text inputs, spell checking also corrects transcription errors.
How It Works
The Norvig spell checker uses edit-distance-1 and edit-distance-2 candidates from a known word list, then selects the candidate with the highest unigram probability in a training corpus. This simple approach handles the majority of single-character typos. Neural spell checkers use encoder-decoder architectures trained on (misspelled, correct) text pairs, learning to correct context-sensitive errors that pure edit-distance methods miss. SymSpell uses pre-computed delete-only edit distance variants to achieve orders-of-magnitude faster correction than edit-distance computation.
Spell Checking — Edit Distance Correction
Ranked Candidates
Real-World Example
A chatbot for a banking app adds spell correction as the first preprocessing step in its NLP pipeline. When a user types 'tranfer moeny to savigns account,' the spell checker normalizes it to 'transfer money to savings account' before intent classification, achieving 96% intent accuracy on misspelled inputs vs. 71% without correction. The improvement is especially notable for domain-specific terms like 'beneficiry' → 'beneficiary' that general spell checkers don't cover without custom dictionary additions.
Common Mistakes
- ✕Blindly correcting all non-dictionary words—product names, usernames, and technical terms are valid non-words that should not be corrected
- ✕Ignoring context when correcting—edit-distance-based correction without language model scoring produces nonsensical fixes
- ✕Over-correcting in multilingual environments—words from other languages are not misspellings and should trigger language detection instead
Related Terms
Text Preprocessing
Text preprocessing is the collection of transformations applied to raw text before NLP model training or inference—including tokenization, normalization, and filtering—determining the quality and consistency of model inputs.
Text Normalization
Text normalization standardizes raw text into a consistent format—lowercasing, expanding contractions, removing special characters, and resolving abbreviations—ensuring NLP pipelines receive clean, uniform input.
Natural Language Processing (NLP)
Natural Language Processing (NLP) is the field of AI focused on enabling computers to understand, interpret, and generate human language—powering applications from chatbots and search engines to translation and sentiment analysis.
Language Detection
Language detection automatically identifies which human language a text is written in—enabling multilingual systems to route inputs to the correct processing pipeline, translation service, or localized response.
Intent Detection
Intent detection classifies user messages into predefined categories representing the user's goal—such as 'check order status' or 'report a bug'—enabling chatbots to route queries to the appropriate responses or workflows.
Ready to build your AI chatbot?
Put these concepts into practice with 99helpers — no code required.
Start free trial →