AI Infrastructure, Safety & Ethics

Explainability

Definition

AI explainability (also called interpretability, though the terms have subtle differences) encompasses techniques that make model outputs understandable to humans. Post-hoc explainability methods generate explanations for already-trained models: SHAP (SHapley Additive exPlanations) assigns each feature a contribution score for a specific prediction; LIME (Local Interpretable Model-agnostic Explanations) fits a simple interpretable model locally around a specific prediction; attention visualization shows which input tokens a transformer model focused on. Intrinsically interpretable models—decision trees, logistic regression, linear models—provide explanations by design. Explanations can target different audiences: end users ('you were denied because your income was too low'), practitioners ('feature importance rankings'), and regulators ('model documentation and audit trails').

Why It Matters

Explainability is both a regulatory requirement and a trust enabler. GDPR's 'right to explanation' requires that automated decisions affecting individuals be explained in understandable terms. The EU AI Act mandates explainability for high-risk AI systems. In healthcare and legal contexts, practitioners cannot responsibly act on AI recommendations they cannot understand or verify. For business users, unexplainable AI creates dependency risk—if the model breaks or must be replaced, no one understands what it was doing. For AI product teams, explainability enables faster debugging: when a model produces wrong predictions, feature importance explanations often immediately reveal which input drove the error.

How It Works

SHAP explanations work by computing the marginal contribution of each feature to a prediction relative to a baseline, averaging over all possible feature subsets. For a specific loan applicant, SHAP might output: credit_score (+0.12), debt_to_income_ratio (-0.18), employment_years (+0.07), meaning credit score pushed the prediction positive, debt-to-income ratio pushed it negative more strongly, and employment years had a small positive effect. Global SHAP shows feature importance across the full dataset. LIME fits a local linear approximation around a specific prediction—which features, if changed, would flip the prediction—producing a minimal set of 'reasons' for the decision.

Feature Attribution — Why This Prediction?

Keyword: 'refund'

+0.32

Sentiment: negative

+0.24

User tier: free

+0.18

Session length: long

-0.09

Time of day: evening

-0.05

Real-World Example

A healthcare insurer implemented an AI model to predict hospital readmission risk. Clinicians refused to use the model because they couldn't understand why it flagged specific patients as high-risk—they couldn't verify the model's reasoning against their clinical judgment. After adding SHAP explanations to each prediction, clinicians saw that the model's top factors matched their clinical intuition (recent emergency visits, medication adherence signals) while also revealing non-obvious risk signals (specific lab value trends they had not considered as predictors). Adoption increased from 12% to 78% of eligible clinicians after explainability was added, directly improving care coordination for high-risk patients.

Common Mistakes

  • Confusing explanation with justification—SHAP and LIME explain what features drove a prediction, not whether the model's reasoning is correct
  • Providing explanations too technical for their audience—an explanation for an end user ('your credit score was 620') differs from one for an auditor (SHAP feature contributions)
  • Treating post-hoc explanations as ground truth about model reasoning—SHAP and LIME approximate model behavior; they are not a window into the model's actual computation

Related Terms

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →
What is Explainability? Explainability Definition & Guide | 99helpers | 99helpers.com