AI Infrastructure, Safety & Ethics

Feature Store

Definition

A feature store is a specialized data platform that sits between raw data sources and ML models, providing a single source of truth for feature computation and storage. It serves two modes: offline store (for training data retrieval, typically columnar storage like Parquet on S3) and online store (for low-latency inference-time feature lookup, typically key-value stores like Redis or DynamoDB). The central promise is consistent feature computation: the same feature definition produces identical values in training and serving, eliminating training-serving skew. Feature stores also enable feature discovery and reuse—features computed for one model are accessible to all models. Platforms include Feast, Tecton, Hopsworks, and Vertex AI Feature Store.

Why It Matters

Training-serving skew—where features computed during training differ from features computed at inference—is one of the most insidious sources of model degradation. A model trained on one version of a feature calculation deployed against a different version will underperform with no obvious error signal. Feature stores enforce consistency by defining features as code once and applying that definition identically in both training and serving contexts. They also reduce duplication: if five models all need 'customer 30-day purchase count,' each team building it separately creates five divergent implementations; a feature store provides it once, correctly.

How It Works

A feature store operates via feature pipelines: raw data sources (event streams, databases) are processed by feature transformation jobs that compute feature values, which are then written to both the offline store (for historical training data) and the online store (for real-time serving). At training time, the ML pipeline queries the offline store via point-in-time correct joins—fetching feature values as they existed at training label timestamps, preventing leakage of future information. At serving time, the model runtime queries the online store by entity key (e.g., user_id) to retrieve the latest feature values within sub-millisecond latency.

Feature Store Architecture

Batch Sources

Stream Sources

Request Sources

Feature Store

Online (low-latency) + Offline (batch) stores

Training Jobs

Serving / Inference

Real-World Example

A fintech company had 8 different teams computing 'customer creditworthiness score' differently in their respective models—slight calculation variations caused the loan approval model to treat the same customer differently depending on which touchpoint initiated the request. After centralizing creditworthiness features in a Feast feature store with a single canonical implementation, all models retrieved the same value. Model accuracy improved 4.2% (training-serving skew had been introducing noise), and the engineering time to add a new model dropped from 6 weeks (build all features) to 1 week (assemble from existing feature store features).

Common Mistakes

✕Building a feature store before understanding which features are actually reused—start by identifying high-value shared features, not building the platform speculatively
✕Ignoring point-in-time correctness in the offline store—training data that includes future feature values introduces data leakage that inflates offline metrics but destroys production performance
✕Treating the feature store as an ETL pipeline—it's a product with consumers and SLAs; feature freshness, availability, and correctness are as important as the features themselves

Ready to build your AI chatbot?

Put these concepts into practice with 99helpers — no code required.

Start free trial →

Feature Store

Definition

Why It Matters

How It Works

Real-World Example

Common Mistakes

Related Terms

MLOps

Data Pipeline

Model Deployment

Continuous Training

Experiment Tracking

Ready to build your AI chatbot?