Shadow Deployment
Definition
Shadow deployment (also called shadow mode or dark launch) routes every production request to both the current model and a new candidate model simultaneously. The current model's response is served to the user; the new model's response is logged but discarded. This allows the candidate model to be evaluated on the full distribution of real production inputs without any user impact—no A/B split, no canary percentage, no risk. Shadow mode is particularly valuable for validating that a new model handles production edge cases correctly before any live exposure, and for comparing output distributions between model versions on identical inputs.
Why It Matters
Shadow deployment is the safest way to evaluate a new model on production data when correctness is critical and risks are asymmetric. For models making consequential decisions—loan approvals, medical triage support, fraud detection, autonomous systems—exposing even 1% of users to an unvalidated model may be unacceptable. Shadow mode allows unlimited validation time on real inputs with zero user risk. It also enables qualitative comparison: team members can review shadow outputs alongside production outputs to assess quality before any live exposure.
How It Works
Shadow deployment implementation: the API gateway or serving infrastructure routes each request to both models. The production model's response is returned to the client synchronously; the shadow model runs asynchronously (to avoid adding latency to the user response). Shadow responses are logged to a separate store for analysis. Comparing production vs. shadow response distributions reveals: prediction agreement rate, distribution divergences, cases where models disagree (most valuable for evaluation), and latency comparison. For LLMs, shadow evaluation teams can review side-by-side response comparisons to assess quality differences qualitatively.
Shadow Deployment — Safe Model Testing
Incoming Request
Production Model v1
Response served to user
↑ Live traffic
Shadow Model v2
Mirrored request, silent
↓ Logged for comparison
Users unaffected — v2 quality evaluated offline before promotion
Real-World Example
A bank's credit underwriting team developed a new ML model to replace a legacy scorecard. Before any customer exposure, they ran the new model in shadow mode on all loan applications for 8 weeks. During this period, the team reviewed 500 cases where the new model's decision differed from the legacy model—identifying 23 systematic error patterns including incorrect handling of joint applications and edge cases in income verification for self-employed applicants. All issues were fixed before any customer saw the new model. The subsequent canary deployment had zero safety incidents; the offline testing period had surfaced every critical issue.
Common Mistakes
- ✕Not monitoring shadow deployment resource costs—running two models on every request doubles inference compute; budget accordingly
- ✕Treating shadow deployment as a replacement for staged rollout—shadow mode validates correctness but not user experience or A/B business impact
- ✕Running shadow mode indefinitely without a promotion decision—shadow mode is a validation stage, not a permanent state
Related Terms
Canary Deployment
Canary deployment gradually routes a small percentage of production traffic to a new model version, monitoring its behavior before full rollout—allowing real-world validation with limited blast radius if something goes wrong.
Model Deployment
Model deployment is the process of moving a trained ML model from development into a production environment where it can serve real users—encompassing packaging, testing, infrastructure provisioning, and release management.
Model Monitoring
Model monitoring continuously tracks the health of deployed ML models—measuring prediction quality, input distributions, and system performance in production to detect degradation before it impacts users or business outcomes.
Blue-Green Deployment
Blue-green deployment maintains two identical production environments—one active (blue), one idle (green)—enabling instant, zero-downtime model upgrades and immediate rollback by switching traffic between environments.
MLOps
MLOps (Machine Learning Operations) applies DevOps principles to ML systems—combining engineering practices for model development, deployment, monitoring, and retraining into a disciplined operational lifecycle.
Ready to build your AI chatbot?
Put these concepts into practice with 99helpers — no code required.
Start free trial →