SHAP Values
Definition
SHAP values, developed by Lundberg and Lee (2017), apply the Shapley value concept from cooperative game theory to ML model explanations. For each prediction, SHAP computes the marginal contribution of each feature by averaging over all possible orderings in which features could be introduced. Formally, a feature's SHAP value is the expected change in prediction when that feature is included vs. excluded, averaged over all feature subsets. SHAP satisfies three key axioms: local accuracy (SHAP values sum to the predicted value), missingness (absent features get zero contribution), and consistency (if a model changes so a feature contributes more, its SHAP value never decreases). These properties make SHAP uniquely principled among feature attribution methods.
Why It Matters
SHAP values have become the standard for feature-level model explanations in production AI systems. They provide the most rigorous answer to 'why did the model predict X?' available for black-box models. For regulated industries, SHAP values provide the mathematically principled explanations required for adverse action notices (credit denial reasons), clinical AI justifications, and algorithm audit documentation. For model debugging, SHAP summary plots reveal which features drive predictions globally—and per-prediction SHAP values immediately identify when a model has relied on a spurious feature for a specific prediction. Libraries: shap (Python), DALEX (R), and interpretML all implement SHAP.
How It Works
SHAP computation methods vary by model type: TreeSHAP is an exact polynomial-time algorithm for tree-based models (random forests, XGBoost, LightGBM); KernelSHAP uses Shapley-weighted linear regression for any model using sampled feature coalitions; DeepSHAP is a fast approximation for neural networks using DeepLIFT backpropagation. For a credit decision, TreeSHAP might output: base_value=0.3 (average approval probability), credit_score=+0.25, debt_to_income=-0.18, employment_years=+0.08, final_prediction=0.45. Positive values push toward approval; negative values push toward denial. The customer receives: 'Your application was primarily affected by your credit score (positive) and debt-to-income ratio (negative).'
SHAP Values — Churn Prediction Feature Impact
hours_since_last_login
ticket_count_30d
plan_tier
nps_score
support_resolved_rate
Real-World Example
A P2P lending platform used SHAP values to comply with the Equal Credit Opportunity Act requirement to provide adverse action notices to denied applicants. For each denial, the system computed TreeSHAP values and extracted the top 3 negative contributors. Automated adverse action notices stated: 'Your application was declined. The primary reasons were: (1) Debt-to-income ratio above threshold, (2) Insufficient credit history, (3) Recent missed payments.' The SHAP-powered explanation system replaced a previous system that provided generic boilerplate reasons—customer complaint rates about adverse action notices dropped 71%, and regulatory examination scores for the lending platform improved significantly.
Common Mistakes
- ✕Using SHAP explanations as ground truth about causality—SHAP values explain feature contributions to predictions, not causal relationships
- ✕Computing SHAP values on test sets that differ from production input distributions—SHAP values are only meaningful relative to the baseline distribution used for computation
- ✕Presenting raw SHAP values to end users without translation—numerical SHAP values require business-language translation to be meaningful to non-technical users
Related Terms
Explainability
Explainability provides human-understandable reasons for why an AI system produced a specific output—enabling users, operators, and regulators to understand, audit, and trust AI decisions rather than treating the model as an inscrutable black box.
Interpretability
Interpretability refers to the degree to which a model's internal mechanisms and decision logic can be understood by humans—distinguished from explainability by focusing on the model's structure rather than post-hoc rationalizations of its outputs.
Algorithmic Fairness
Algorithmic fairness defines formal mathematical criteria for measuring and achieving equitable treatment across demographic groups in AI decision systems—including demographic parity, equalized odds, and individual fairness.
AI Bias
AI bias is the systematic tendency of AI models to produce unfair outcomes for certain groups—arising from skewed training data, biased features, or flawed objective functions—leading to discriminatory predictions or decisions.
Responsible AI
Responsible AI is a framework of organizational practices and principles—encompassing fairness, transparency, privacy, safety, and accountability—that guide how teams build and deploy AI systems that are trustworthy and beneficial.
Ready to build your AI chatbot?
Put these concepts into practice with 99helpers — no code required.
Start free trial →