Explainability
Definition
AI explainability (also called interpretability, though the terms have subtle differences) encompasses techniques that make model outputs understandable to humans. Post-hoc explainability methods generate explanations for already-trained models: SHAP (SHapley Additive exPlanations) assigns each feature a contribution score for a specific prediction; LIME (Local Interpretable Model-agnostic Explanations) fits a simple interpretable model locally around a specific prediction; attention visualization shows which input tokens a transformer model focused on. Intrinsically interpretable models—decision trees, logistic regression, linear models—provide explanations by design. Explanations can target different audiences: end users ('you were denied because your income was too low'), practitioners ('feature importance rankings'), and regulators ('model documentation and audit trails').
Why It Matters
Explainability is both a regulatory requirement and a trust enabler. GDPR's 'right to explanation' requires that automated decisions affecting individuals be explained in understandable terms. The EU AI Act mandates explainability for high-risk AI systems. In healthcare and legal contexts, practitioners cannot responsibly act on AI recommendations they cannot understand or verify. For business users, unexplainable AI creates dependency risk—if the model breaks or must be replaced, no one understands what it was doing. For AI product teams, explainability enables faster debugging: when a model produces wrong predictions, feature importance explanations often immediately reveal which input drove the error.
How It Works
SHAP explanations work by computing the marginal contribution of each feature to a prediction relative to a baseline, averaging over all possible feature subsets. For a specific loan applicant, SHAP might output: credit_score (+0.12), debt_to_income_ratio (-0.18), employment_years (+0.07), meaning credit score pushed the prediction positive, debt-to-income ratio pushed it negative more strongly, and employment years had a small positive effect. Global SHAP shows feature importance across the full dataset. LIME fits a local linear approximation around a specific prediction—which features, if changed, would flip the prediction—producing a minimal set of 'reasons' for the decision.
Feature Attribution — Why This Prediction?
Keyword: 'refund'
Sentiment: negative
User tier: free
Session length: long
Time of day: evening
Real-World Example
A healthcare insurer implemented an AI model to predict hospital readmission risk. Clinicians refused to use the model because they couldn't understand why it flagged specific patients as high-risk—they couldn't verify the model's reasoning against their clinical judgment. After adding SHAP explanations to each prediction, clinicians saw that the model's top factors matched their clinical intuition (recent emergency visits, medication adherence signals) while also revealing non-obvious risk signals (specific lab value trends they had not considered as predictors). Adoption increased from 12% to 78% of eligible clinicians after explainability was added, directly improving care coordination for high-risk patients.
Common Mistakes
- ✕Confusing explanation with justification—SHAP and LIME explain what features drove a prediction, not whether the model's reasoning is correct
- ✕Providing explanations too technical for their audience—an explanation for an end user ('your credit score was 620') differs from one for an auditor (SHAP feature contributions)
- ✕Treating post-hoc explanations as ground truth about model reasoning—SHAP and LIME approximate model behavior; they are not a window into the model's actual computation
Related Terms
Interpretability
Interpretability refers to the degree to which a model's internal mechanisms and decision logic can be understood by humans—distinguished from explainability by focusing on the model's structure rather than post-hoc rationalizations of its outputs.
SHAP Values
SHAP (SHapley Additive exPlanations) values assign each feature a precise contribution score for a specific model prediction—using game theory to fairly distribute the prediction value among all input features for interpretable AI explanations.
AI Bias
AI bias is the systematic tendency of AI models to produce unfair outcomes for certain groups—arising from skewed training data, biased features, or flawed objective functions—leading to discriminatory predictions or decisions.
Algorithmic Fairness
Algorithmic fairness defines formal mathematical criteria for measuring and achieving equitable treatment across demographic groups in AI decision systems—including demographic parity, equalized odds, and individual fairness.
Responsible AI
Responsible AI is a framework of organizational practices and principles—encompassing fairness, transparency, privacy, safety, and accountability—that guide how teams build and deploy AI systems that are trustworthy and beneficial.
Ready to build your AI chatbot?
Put these concepts into practice with 99helpers — no code required.
Start free trial →