Model Versioning
Definition
Model versioning extends software version control concepts to ML artifacts. A model version captures the trained weights file, the hyperparameter configuration, the dataset version used for training, the preprocessing pipeline, evaluation metrics, and metadata such as training date and compute environment. Version control systems like MLflow Model Registry, DVC, and Weights & Biases Artifacts organize models through lifecycle stages: staging, production, and archived. Semantic versioning (major.minor.patch) communicates the scope of changes between versions.
Why It Matters
Without model versioning, teams cannot reproduce past results, safely roll back failing deployments, or track which model version is serving production traffic. When a new model version degrades performance, versioning enables instant rollback to the last known good version. Regulatory compliance in finance and healthcare often requires proving which model version made a specific decision — impossible without versioning records. Model versioning also enables A/B testing by routing traffic across multiple simultaneous versions.
How It Works
A model version is created at the end of each training run and stored in a model registry. The registry records lineage: which dataset version, code commit, and hyperparameters produced each model. Lifecycle management transitions models through stages — from candidate to staging (for testing) to production (for serving) to archived (retired). Deployment pipelines reference specific model versions by ID rather than mutable latest pointers, ensuring deterministic deployments.
Model Versioning — Semantic Versioning
v3.2.1
3
Major
Breaking changes
2
Minor
New capabilities
1
Patch
Bug fixes
Each version includes: weights, config, tokenizer, training metadata
Real-World Example
A startup's customer support chatbot starts degrading after a model update. Because they use MLflow Model Registry, they can identify that version 2.4.1 was deployed three days ago and version 2.3.8 had better accuracy metrics. They promote v2.3.8 back to production in under two minutes, restoring service quality while they investigate what went wrong with the v2.4.1 training run.
Common Mistakes
- ✕Versioning only the model weights without capturing the associated preprocessing pipeline and data version
- ✕Using mutable 'latest' pointers in production deployments instead of pinned version IDs
- ✕Deleting archived model versions to save storage — removing the ability to reproduce past decisions
Related Terms
Model Registry
A model registry is a centralized repository that stores versioned model artifacts with their metadata—training parameters, evaluation metrics, data lineage, and deployment status—serving as the single source of truth for production models.
Experiment Tracking
Experiment tracking records the parameters, metrics, code versions, and artifacts of every ML training run, enabling reproducibility, systematic comparison of approaches, and traceability from production models back to their training conditions.
MLOps
MLOps (Machine Learning Operations) applies DevOps principles to ML systems—combining engineering practices for model development, deployment, monitoring, and retraining into a disciplined operational lifecycle.
Model Deployment
Model deployment is the process of moving a trained ML model from development into a production environment where it can serve real users—encompassing packaging, testing, infrastructure provisioning, and release management.
Continuous Training
Continuous training automatically retrains ML models on fresh data when triggered by drift detection, schedule, or performance degradation—keeping models current with evolving real-world patterns without manual intervention.
Ready to build your AI chatbot?
Put these concepts into practice with 99helpers — no code required.
Start free trial →