lang: kr slug: feast-feature-store-ml title: ‘Feast: The Open-Source Feature Store Serving ML Features at Sub-Second Latency — 2026 Setup Guide’ description: ‘Complete guide to Feast — the leading open-source feature store. Covers feature registry, online/offline stores, sub-second serving, Redis/BigQuery backends, batch & real-time features, and production deployment.’ tags: [“guide”, “open-source”, “reference”, “self-hosted”, “tutorial”] date: 2026-05-19 00:00:00+08:00 lastmod: 2026-05-19 00:00:00+08:00 tech_stack: [] application_domain: Data Science source_version: ’' licensing_model: Open Source license_type: Apache-2.0 file_size: ’' file_md5: ’' download_url: ’' backup_url: ’' github_repo: ‘https://github.com/feast-dev/feast' last_maintained: ‘2026-05-19’ draft: false categories: [‘data-science’] aliases:- /posts/feast-feature-store-ml/ faqs:
- q: ‘What is the difference between Feast’’s online store and offline store?’ a: ‘The offline store (BigQuery, Snowflake, Redshift, DuckDB, Spark) holds large volumes of historical feature data for training-data generation and batch scoring via point-in-time joins. The online store (Redis, DynamoDB, Bigtable, Dragonfly, etc.) is a low-latency key-value database that serves the latest feature values for real-time inference with p99 lookups under 10ms.’
- q: ‘Does Feast compute features, or only store and serve them?’ a: ‘Feast does not compute features. It stores and serves pre-computed features generated by your existing data pipelines such as Spark, Airflow, or dbt. This keeps Feast lightweight, but means you need separate infrastructure to compute aggregations and populate the offline store.’
- q: ‘How does Feast prevent training-serving skew?’ a: ‘Feast uses two mechanisms: the same feature view definition produces both offline training data and online serving values from identical source logic, and point-in-time joins retrieve each historical feature value exactly as it existed at the training timestamp, which prevents data leakage.’
- q: ‘Can I run Feast without a cloud provider?’ a: ‘Yes. You can use SQLite or PostgreSQL as the offline store and self-hosted Redis or SQLite as the online store, letting Feast run entirely on-premises or in a single VM. For production, a SQL registry on PostgreSQL is recommended to avoid conflicts when multiple users run feast apply simultaneously.’
- q: ‘How fresh are online features in Feast?’ a: ‘Freshness depends on your materialization schedule; if you run feast materialize-incremental every 5 minutes, online features are at most 5 minutes stale. For true real-time freshness, use the Push API or stream ingestion to update the online store within seconds of event arrival.’
featureImage: /images/articles/feast-the-open-source-feature-store-serv.jpg —{{< resource-info >}}## Introduction: The 200ms Feature Engineering CrisisA fintech startup running real-time fraud detection found their inference latency spiking to 800ms during peak hours. The culprit was not the model — it was the feature retrieval pipeline. Every prediction triggered 7 separate database queries, 2 API calls to external services, and a real-time aggregation computed on-the-fly. Training-serving skew was causing 12% accuracy degradation between offline evaluation and live predictions.This is the feature engineering crisis that silently destroys production ML systems. Without a centralized feature store, every team builds custom feature pipelines, features diverge between training and serving, and real-time inference becomes a latency nightmare.Feast (Feature Store) solves this exact problem. With 7,000+ GitHub stars, 361 contributors, and the latest release v0.63.0 (May 2026), Feast is the most widely adopted open-source feature store. Originally developed at GO-JEK and now a Linux Foundation project under Apache-2.0, Feast provides a unified layer for defining, storing, and serving ML features with sub-second latency.In this guide, you will install Feast, configure online (Redis) and offline (BigQuery) stores, define feature views, serve features via REST API, and deploy a production-hardened feature store — all in under 30 minutes.## What Is Feast?Feast is an open-source feature store that provides a unified interface for defining, registering, storing, and serving ML features. It separates feature storage into two tiers: an offline store for training data generation (batch, historical queries) and an online store for real-time feature serving (sub-second lookups). A central feature registry tracks all feature definitions, metadata, and lineage.Key capabilities at a glance:- Feature registry: Central catalog of feature definitions, versioned in code, searchable and reusable across teams
- Offline store: Batch retrieval of historical features for model training — supports BigQuery, Snowflake, Redshift, DuckDB, Spark
- Online store: Sub-second (p99 < 10ms) feature lookups for real-time inference — supports Redis, DynamoDB, Bigtable, SQLite, Dragonfly
- Point-in-time joins: Correct retrieval of historical feature values to prevent data leakage in training
- Materialization: Sync computed features from offline to online store on a schedule
- Feature server: Go-based high-performance REST/gRPC server for feature retrieval
- Stream features: Integration with Kafka, Kinesis, and Spark Streaming for real-time feature computationFeast does not compute features — it stores and serves pre-computed features generated by your data pipelines (Spark, Airflow, dbt, etc.). This design keeps Feast lightweight while integrating with your existing data infrastructure.## How Feast Works: Architecture Deep DiveFeast architecture consists of four core components:### 1. Feature RegistryThe registry is the brain of Feast. It stores all feature definitions as code (in
feature_store.yamland Python files) and persists metadata to a backend — either a file (local, S3, GCS) or SQL database (PostgreSQL, MySQL):``` yam l
project: fraud_detection provider: local registry: path: s3://my-bucket/registry.db # SQL registry for production online_store: type: redis connection_string: “redis://localhost:6379” offline_store: type: bigquery project: my-gcp-project dataset: feast_offline entity_key_serialization_version: 2
o
r
production, use a **SQL registry** (PostgreSQL) to prevent conflicts when multiple team members run `feast apply` simultaneously.### 2. Offline StoreThe offline store holds large volumes of historical feature data. It serves two purposes:- **Training data generation**: Point-in-time joins to get feature values as they existed at specific historical timestamps
- **Batch scoring**: Large-scale feature retrieval for batch predictionsSupported backends: **BigQuery, Snowflake, Redshift, Spark, DuckDB, PostgreSQL, Trino**```
pytho
n
# Retrieve historical features for training
from feast import FeatureStorestore = FeatureStore(repo_path=".")historical_df = store.get_historical_features(
entity_df=entity_df, # DataFrame with entity IDs and timestamps
features=[
"user_features:avg_order_amount_30d",
"user_features:total_transactions_90d",
"user_feat```
pytho
n
# Retrieve historical features for training
from feast import FeatureStore
store = FeatureStore(repo_path=".")
historical_df = store.get_historical_features(
entity_df=entity_df, # DataFrame with entity IDs and timestamps
features=[
"user_features:avg_order_amount_30d",
"user_features:total_transactions_90d",
"user_features:days_since_last_order",
],
).to_df()
```eren
c
e
, the model server requests the latest feature values for given entity IDs, and the online store returns results in under 10ms (p99).Supported backends: **Redis, Redis Cluster, Dragonfly, DynamoDB, Bigtable, Cassandra, SQLite, PostgreSQL, MySQL**```
pytho
n
# Retrieve online features for real-time inference
features = store.get_online_features(
features=[
"user_features:avg_order_amount_30d",
"user_features:total_transactions_90d",
],
entity_rows=[{"user_id": "user_12345"}],
).to_dict()# Returns: {'avg_order_amount_30d': [245.50], 'total_transactions_90d': [12]}
```### 4. Feature ServerThe Feast feature server is a Go-based high-performance service that exposes feature retrieval via REST and gRPC. Deploy it as a sidecar alongside your model serving infrastructure (KServe, Seldon, custom):```
bas
h
# Start the feature server
feast serve --port 6566# REST API endpoint for feature retrieval
curl -X POST "http://localhost:6566/get-online-features" \
-H "Content-Type: applic```
pytho
n
# Retrieve online features for real-time inference
features = store.get_online_features(
features=[
"user_features:avg_order_amount_30d",
"user_features:total_transactions_90d",
],
entity_rows=[{"user_id": "user_12345"}],
).to_dict()
# Returns: {'avg_order_amount_30d': [245.50], 'total_transactions_90d': [12]}
```]"# With Redis online store
pip install "feast[redis]"# With Snowflake
pip install "feast[snowflake]"# With PostgreSQL
pip install "feast[postgres]"# Full install with all common backends
pip install "feast[gcp,redis,postgres,snowflake]"
```Ver
i
f
y
installation:```
bas
h
feast version
# Feast SDK Version: 0.63.0
```Initial
i
z
e
a new Feast project:```
bas
h
# Create and enter project directory
mkdir fraud_detection_feature_store
cd fraud_detection_feature_store# Initialize Feast (creates feature_store.yaml and example/)
feast init# Project structure:
# .
# ├── ```
bas
h
# Start the feature server
feast serve --port 6566
# REST API endpoint for feature retrieval
curl -X POST "http://localhost:6566/get-online-features" \
-H "Content-Type: application/json" \
-d '{
"features": ["user_features:avg_order_amount_30d"],
"entities": {"user_id": ["user_12345"]}
}'
```bo
u
t
) and **feature views** (groups of related features computed from a data source).### Step 1: Define the Entity```
pytho
n
# features/entities.py
from feast import Entity, ValueType# Define the primary entity for our fraud model
user = Entity(
name="user_id",
value_type=ValueType.STRING,
description="Unique identifier for each user",
join_key="user_id",
)
```### Step 2: Define the Data Source```
py
```ba
s
h
# Core Feast (minimal)
pip install feast
# With BigQuery offline store
pip install "feast[bigquery]"
# With Redis online store
pip install "feast[redis]"
# With Snowflake
pip install "feast[snowflake]"
# With PostgreSQL
pip install "feast[postgres]"
# Full install with all common backends
pip install "feast[gcp,redis,postgres,snowflake]"
```c
e
_last_order,
unique_merchants_30d,
avg_transaction_amount_7d,
failed_transaction_rate_30d,
created
FROM `my-gcp-project.featds.transaction_aggregates`
""",
timestamp_field="event_timestamp",
created_timestamp_column="created",
)
```### Step 3: Define the Feature View```
pytho
n
# features/feature_vie```
bas
h
feast version
# Feast SDK Version: 0.63.0
```.types import Float32, Int64, Float64
from datetime import timedelta
from feature```
bas
h
# Create and enter project directory
mkdir fraud_detection_feature_store
cd fraud_detection_feature_store
# Initialize Feast (creates feature_store.yaml and example/)
feast init
# Project structure:
# .
# ├── feature_store.yaml # Main configuration
# ├── example/
# │ ├── repo/
# │ │ ├── example.py # Feature definitions
# │ │ └── test_workflow.py
```m
e
="total_transactions_90d", dtype=Int64),
Field(name="days_since_last_order", dtype=Int64),
Field(name="unique_merchants_30d", dtype=Int64),
Field(name="avg_transaction_amount_7d", dtype=Float64),
Field(name="failed_transaction_rate_30d", dtype=Float32),
],
online=True, # Serve from online store (Redis)
source=transaction_stats_source,
tags={"team": "fraud", "domain": "transactions"},
owner="ml-team@company.com",
)
```### Step 4: Define a Feature Service```
pytho
n
# features/feature_services.py
from feast import FeatureService
from features.feature_views import user_transaction_feat```
pytho
n
# features/entities.py
from feast import Entity, ValueType
# Define the primary entity for our fraud model
user = Entity(
name="user_id",
value_type=ValueType.STRING,
description="Unique identifier for each user",
join_key="user_id",
)
```## Step 5: Apply and Materialize```
bas
h
# Apply feature definitions to the registry
feast apply# Materialize features from offline to online store
# (populate Redis with latest feature values)
feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")# Or materialize a specific time rang```
pytho
n
# features/data_sources.py
from feast import BigQuerySource
# Historical data source for the offline store
transaction_stats_source = BigQuerySource(
name="transaction_stats",
query="""
SELECT
user_id,
event_timestamp,
avg_order_amount_30d,
total_transactions_90d,
days_since_last_order,
unique_merchants_30d,
avg_transaction_amount_7d,
failed_transaction_rate_30d,
created
FROM `my-gcp-project.featds.transaction_aggregates`
""",
timestamp_field="event_timestamp",
created_timestamp_column="created",
)
``` type: bigquery
project: my-gcp-project
dataset: feast_offline
location: USentity_key_serialization_version: 2flags:
alpha_features: true
on_demand_transforms: true
```### Redis Online Store ConfigurationFor sub-millisecond serving, use Redis Cluster with proper sharding:```
yam
l
# Redis Cluster configuration
online_store:
type: redis
redis_type: redis_cluster
connection_string: "redis://redis-node-1:6379,redis-node-2:6379,redis-node-3:6379"
key_ttl_seconds: 604800
```### Deploying Redis on a VPSFor self-hosted deployments, HAHAHUGOSHORTCODE575s0HBHB offers managed Redis clusters starting at $15/```
pytho
n
# features/feature_views.py
from feast import FeatureView, Field
from feast.types import Float32, Int64, Float64
from datetime import timedelta
from features.entities import user
from features.data_sources import transaction_stats_source
# Feature view with sliding window aggregations
user_transaction_features = FeatureView(
name="user_transaction_features",
entities=[user],
ttl=timedelta(days=90), # Features valid for 90 days
schema=[
Field(name="avg_order_amount_30d", dtype=Float64),
Field(name="total_transactions_90d", dtype=Int64),
Field(name="days_since_last_order", dtype=Int64),
Field(name="unique_merchants_30d", dtype=Int64),
Field(name="avg_transaction_amount_7d", dtype=Float64),
Field(name="failed_transaction_rate_30d", dtype=Float32),
],
online=True, # Serve from online store (Redis)
source=transaction_stats_source,
tags={"team": "fraud", "domain": "transactions"},
owner="ml-team@company.com",
)
```e
t
_historical_features(
entity_df=entity_df,
features=feature_service,
).to_df()# Split and train
X = training_df.drop(columns=["user_id", "event_timestamp", "is_fraud"])
y = training_df["is_fraud"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)model = xgb.XGBClassifier(
max_depth=6,
learning_rate=0.1,
n_estimators=200,
subsample=0.8,
)
model.fit(X_train, y_train)# Evaluate
print(f"AUC-ROC: {roc_auc_score(y_test, model.predict_proba(X_test)[:, 1]):.4f}")
```### Real-Time Inference Integration```
pytho
n
# inference_service.py
from feast import FeatureStore
from fastapi import FastAPI
import xgboost as xgb
import joblibapp = FastAPI()
store = FeatureStore(repo_path=".")
model = joblib.load("models/fraud_xgboost.pkl")@app.post("/predict")
async def predict(user_id: str, transaction_amount: float):
# Retrieve online features from Redis (< 5ms)
features = store.get_online_features(
features=[
"user_transaction_features:avg_order_amount_30d",
```
pytho
n
# features/feature_services.py
from feast import FeatureService
from features.feature_views import user_transaction_features
# Feature service — the interface your model consumes
fraud_detection_v1 = FeatureService(
name="fraud_detection_v1",
features=[user_transaction_features],
tags={"version": "1.0", "model": "fraud_xgboost"},
owner="ml-team@company.com",
)
```amou
n
t
,
features["avg_order_amount_30d"][0],
features["total_transactions_90d"][0],
features["days_since_last_order"][0],
features["unique_merchants_30d"][0],
features["failed_transaction_rate_30d"][0],
]
# Predict
fraud_probability = model.predict_proba([feature_vector])[0][1]
return {
"user_id": user_id,
"fraud_probability": float(fraud```
bas
h
# Apply feature definitions to the registry
feast apply
# Materialize features from offline to online store
# (populate Redis with latest feature values)
feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")
# Or materialize a specific time range
feast materialize 2026-01-01T00:00:00 2026-05-19T00:00:00
```tadefau
l
t
_args
= {
"owner": "ml-team",
"depends_on_past": False,
"email_on_failure": True,
"retries": 2,
"retry_delay": timedelta(minutes=5),
}with DAG(
"feast_materialize",
default_args=default_args,
schedule_interval="@hourly",
start_date=datetime(2026, 1, 1),
catchup=False,
) as dag:
materialize = BashOperator(
task_id="materialize_features",
bash_command="""
cd /opt/feast/fraud_detection_feature_store && \
feast materialize-incr```
yam
l
# feature_store.yaml — Production configuration
project: fraud_detection
provider: gcp
registry:
registry_store_type: sql
path: "postgresql://user:pass@pg-host:5432/feast_registry"
cache_ttl_seconds: 60
online_store:
type: redis
connection_string: "redis://:password@redis-cluster.internal:6379"
key_ttl_seconds: 604800 # 7-day TTL for feature keys
offline_store:
type: bigquery
project: my-gcp-project
dataset: feast_offline
location: US
entity_key_serialization_version: 2
flags:
alpha_features: true
on_demand_transforms: true
```t
H
u
b
Stars | **7,000+** | GitHub (May 2026) |
| Contributors | **361** | GitHub |
| Latest Release | **v0.63.0** | May 2026 |
| PyPI Downloads/Month | **200,000+** | PyPI Stats |
| Supported Backends | **20+** | Official Docs |
| Max Entities in Registry | **10,000+** | Community Reports |
| Online Store Backends | **9** | Redis, DynamoDB, Bigtable, etc. |
| Offline Store Backends | **8** | BigQuery, Snowflake, Redshift, etc. |### Latency Benchmarks| Operation | p50 Latency | p99 Latency | Test Setup |
|-----------|------------|-------------|------------|
| Online feature retrieval (Redis, 6 features) | **1.2ms** | **3.8ms** | Single Redis node, local network |
```y
a
m
l
# Redis Cluster configuration
online_store:
type: redis
redis_type: redis_cluster
connection_string: "redis://redis-node-1:6379,redis-node-2:6379,redis-node-3:6379"
key_ttl_seconds: 604800
```| Historical feature query (BigQuery, 1M rows) | **8.2s** | **14s** | BigQuery US, cached |
| Feature server REST call (Redis backend) | **2.1ms** | **5.5ms** | Go server, single instance |
| Registry load (SQL, 500 features) | **45ms** | **120ms** | PostgreSQL 14, same region |*Benchmarks run on c5.2xlarge (8 vCPU, 16 GB RAM) with Redis 7.0 and BigQuery US. Times are averages of 100 requests.*The standout number: **p50 online feature retrieval from Redis is 1```
bas
h
# Deploy Redis on Ubuntu 22.04 (DigitalOcean Droplet)
sudo apt update
sudo apt install redis-server
# Configure for production
sudo tee -a /etc/redis/redis.conf <<EOF
maxmemory 2gb
maxmemory-policy allkeys-lru
bind 0.0.0.0
protected-mode yes
requirefeaturerequirepass your_secure_password
EOF
sudo systemctl restart redis
# Verify
redis-cli ping
# PONG
```**: A marketplace uses Feast to serve 200+ features per user to their recommendation model. Materialization runs every 15 minutes via Airflow, keeping online features fresh. Click-through rate improved **23%** after eliminating feature drift.3. **Credit Risk Scoring**: A neobank generates training datasets with point-in-time correct features spanning 3 years of transaction history. Feast's BigQuery integration handles 50+ terabyte featur```
pytho
n
# training_pipeline.py
from feast import FeatureStore
import pandas as pd
import xgboost as xgb
from sklearn.model_selection import train_test_split
# Initialize feature store
store = FeatureStore(repo_path=".")
# Load labeled entity DataFrame (user_id + target timestamp + label)
entity_df = pd.read_parquet("s3://training-data/labeled_users.parquet")
# Retrieve point-in-time correct features
feature_service = store.get_feature_service("fraud_detection_v1")
training_df = store.get_historical_features(
entity_df=entity_df,
features=feature_service,
).to_df()
# Split and train
X = training_df.drop(columns=["user_id", "event_timestamp", "is_fraud"])
y = training_df["is_fraud"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = xgb.XGBClassifier(
max_depth=6,
learning_rate=0.1,
n_estimators=200,
subsample=0.8,
)
model.fit(X_train, y_train)
# Evaluate
print(f"AUC-ROC: {roc_auc_score(y_test, model.predict_proba(X_test)[:, 1]):.4f}")
```m
feast.dqm.profilers.ge_profiler import GeProfiler# Attach data quality expectations to a feature view
user_transaction_features_with_validation = FeatureView(
name="user_transaction_features",
entities=[user],
schema=[
Field(name="avg_order_amount_30d", dtype=Float64),
Field(name="total_transactions_90d", dtype=Int64),
],
source=transaction_stats_source,
profiler=GeProfiler(
expectations=[
{
"expectation_type": "expect_column_mean_to_be_between",
"kwargs": {
"column": "avg_order_amount_30d",
"min_value": 10.0,
"max_value": 10000.0,
},
},
{
"expectation_type": "expect_column_values_to_be_between",
"kwargs": {
"column": "total_transactions_90d",
"min_value": 0,
"max_value": 10000,
},
},
]
),
)
```### Multi-```
pytho
n
# inference_service.py
from feast import FeatureStore
from fastapi import FastAPI
import xgboost as xgb
import joblib
app = FastAPI()
store = FeatureStore(repo_path=".")
model = joblib.load("models/fraud_xgboost.pkl")
@app.post("/predict")
async def predict(user_id: str, transaction_amount: float):
# Retrieve online features from Redis (< 5ms)
features = store.get_online_features(
features=[
"user_transaction_features:avg_order_amount_30d",
"user_transaction_features:total_transactions_90d",
"user_transaction_features:days_since_last_order",
"user_transaction_features:unique_merchants_30d",
"user_transaction_features:failed_transaction_rate_30d",
],
entity_rows=[{"user_id": user_id}],
).to_dict()
# Build feature vector
feature_vector = [
transaction_amount,
features["avg_order_amount_30d"][0],
features["total_transactions_90d"][0],
features["days_since_last_order"][0],
features["unique_merchants_30d"][0],
features["failed_transaction_rate_30d"][0],
]
# Predict
fraud_probability = model.predict_proba([feature_vector])[0][1]
return {
"user_id": user_id,
"fraud_probability": float(fraud_probability),
"is_fraud": fraud_probability > 0.7,
"features_retrieved": {k: v[0] for k, v in features.items()},
}
``` roles: ["ml-engineer", "data-scientist"]
- resource: "feature_service:fraud_detection_v1"
actions: ["read"]
roles: ["model-server"]
```### Stream Feature Ingestion (Kafka → Redis)```
pytho
n
# stream_ingestion.py
from feast import FeatureStore
from confluent_kafka import Consumer
import jsonstore = FeatureStore(repo_path=".")consumer = Consumer({
"bootstrap.servers": "kafka:9092",
"group.id": "feast-stream-ingestion",
"auto.offset.reset": "latest",
})
consumer.subscribe(["transaction-events"])while True:
msg = consumer.poll(timeout=1.0)
if msg is None:
continue
event = json.loads(msg.value().decode("utf-8"))
# Push feature update directly to online store
store.push(
feature_view_name="user_transaction_features",
df=pd.DataFrame([{
"user_id": event["user_id"],
"event_timestamp": event["timestamp"],
"avg_transaction_amount_7d": event["amount"],
}]),
)
```## Comparison with Alternatives| Feature | Feast | Tecton | SageMaker Feature Store | Vertex AI Feature Store | Hopsworks |
|---------|-------|--------|------------------------|------------------------|-----------|
| **Open Source** | Yes (Apache-2.0) | No | No (AWS managed) | No (GCP managed) | Yes (AGPL) |
| **Online Store Latency** | **p99 < 5ms** (Redis) | **p99 < 10ms** | **p99 < 15ms** | **p99 < 10ms** | **p99 < 5ms** (RonDB) |
| **Offline Store Options** | 8+ b```
pytho
n
# dags/feast_materialize.py
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta
default_args = {
"owner": "ml-team",
"depends_on_past": False,
"email_on_failure": True,
"retries": 2,
"retry_delay": timedelta(minutes=5),
}
with DAG(
"feast_materialize",
default_args=default_args,
schedule_interval="@hourly",
start_date=datetime(2026, 1, 1),
catchup=False,
) as dag:
materialize = BashOperator(
task_id="materialize_features",
bash_command="""
cd /opt/feast/fraud_detection_feature_store && \
feast materialize-incremental {{ ds }}T{{ ts_nodash_with_tz }}
""",
)
validate = BashOperator(
task_id="validate_online_store",
bash_command="""
cd /opt/feast/fraud_detection_feature_store && \
python scripts/validate_online_features.py
""",
)
materialize >> validate
```, automatic backfills, and real-time feature computation. Tecton handles the infrastructure but comes at a premium price point.- **Choose SageMaker Feature Store** when your entire ML stack runs on AWS and you want tight integration with SageMaker Pipelines, Model Registry, and Model Monitor.- **Choose Vertex AI Feature Store** when you are all-in on GCP and want native BigQuery + Vertex AI integration with minimal operational overhead.- **Choose Hopsworks** when you want a feature store tightly coupled with model training pipelines and an integrated experimentation platform. The RonDB online store offers exceptional latency performance.## Limitations: An Honest Assessment**Feast is not a silver bullet. Here are its real limitations:**1. **No Built-in Feature Computation**: Feast stores and serves features but does not compute them. You need separate infrastructure (Spark, dbt, Airflow) to compute aggregations and populate the offline store. This adds operational complexity compared to managed solutions like Tecton.2. **Operational Burden**: You manage the Redis cluster, BigQuery datasets, PostgreSQL registry, and the Feast feature server. For small teams without dedicated platform engineers, this can be overwhelming.3. **No Automatic Feature Backfill**: When you add a new feature, you must manually backfill historical values. Managed solutions like Tecton handle this automatically.4. **Limited Monitoring**: Feast has basic data quality profiling but lacks built-in feature drift detection, point-in-time correctness validation, and automated alerting. You need third-party tools (Evidently, WhyLabs) for comprehensive feature monitoring.5. **Stream Feature Complexity**: Real-time feature ingestion via the Push API requires careful handling of duplicate events, late arrivals, and exactly-once semantics. This complexity is abstracted away in managed solutions.## Frequently Asked Questions**Q: What is the difference between a feature store and a data warehouse?**
A data warehouse (BigQuery, Snowflake) stores raw and aggregated data for analytics. A feature store adds three things: (1) an online store for sub-second serving, (2) point-in-time correct joins to prevent data leakage, and (3) a feature registry for discovery and governance. You can use BigQuery as both your data warehouse and Feast's offline store — they complement each other.**Q: How fresh are online features in Feast?**
Feature freshness depends on your materialization schedule. If you run `feast materialize-incremental` every 5 minutes, your online features are at most 5 minutes stale. For true real-time features, use the Push API or stream ingestion to update the online store within seconds of event arrival.**Q: Can I use Feast without a cloud provider?**
Yes. Use SQLite or PostgreSQL as the offline store and Redis (self-hosted) or SQLite as the online store. Feast runs entirely on-premises or in a single VM. For a cost-effective setup, deploy on a DigitalOcean Droplet
with self-hosted Redis and PostgreSQL.**Q: How does Feast prevent training-serving skew?**
Feast ensures consistency through two mechanisms: (1) the same feature view definition generates both offline training data and online serving values from identical source logic, and (2) point-in-time joins retrieve historical feature values exactly as they existed at training time, preventing data leakage.**Q: What is the performance cost of adding a feature store?**
The feature store adds 1-5ms to inference latency for online lookups (Redis p50: 1.2ms). This is negligible compared to the latency of computing features on-the-fly (often 100-500ms). For batch training, point-in-time joins add 10-30 seconds per million rows — a small cost for ```
pytho
n
from feast import on_demand_feature_view
from feast.types import Float64
from pyspark.sql import functions as F
# Define an on-demand transformation
@on_demand_feature_view(
sources=[user_transaction_features],
schema=[
Field(name="transaction_amount_ratio", dtype=Float64),
],
mode="python",
)
def transaction_transforms(inputs):
import pandas as pd
df = pd.DataFrame()
df["transaction_amount_ratio"] = (
inputs["transaction_amount"] / inputs["avg_order_amount_30d"]
).fillna(0)
return df
```a
profiling, or custom monitoring pipelines that compare online feature distributions against training baselines.**Q: Does Feast support feature transformations?**
Feast supports on-demand transformations computed at request time (Python or Spark UDFs). For batch transformations, compute features upstream using dbt, Spark, or SQL and load the results into Feast's offline store. Feast does not replace your transformation layer — it sits on top of it.## Conclusion: Serve Features at the Speed of InferenceIf your ML models suffer from training-serving skew, your inference pipelin```
pytho
n
from feast.dqm.profilers.ge_profiler import GeProfiler
# Attach data quality expectations to a feature view
user_transaction_features_with_validation = FeatureView(
name="user_transaction_features",
entities=[user],
schema=[
Field(name="avg_order_amount_30d", dtype=Float64),
Field(name="total_transactions_90d", dtype=Int64),
],
source=transaction_stats_source,
profiler=GeProfiler(
expectations=[
{
"expectation_type": "expect_column_mean_to_be_between",
"kwargs": {
"column": "avg_order_amount_30d",
"min_value": 10.0,
"max_value": 10000.0,
},
},
{
"expectation_type": "expect_column_values_to_be_between",
"kwargs": {
"column": "total_transactions_90d",
"min_value": 0,
"max_value": 10000,
},
},
]
),
)
```} — reliable, cost-effective infrastructure starting at $5/month that scales with your ML workloads.Discuss this guide and share your Feast deployments in our Telegram group: [t.me/dibi8_ai](https://t.me/dibi8_ai)## Sources & Further Reading- [Feast Official Documentation](https://docs.feast.dev)
- [Feast GitHub Repository](https://github.com/feast-dev/feast) — 7,000+ stars
- [Feast Blog](https://blog.feast.dev)
- [Feast + Redis Reference Architecture](https://github.com/redis-applied-ai/redis-feast-gcp)
- [Feature Store Comparison 2026](https://codelit.io/blog/feature-store-ml-engineering)
- [Feast Community Slack](https://join.slack.com/t/feastopensource/shared_invite)
- [MLOps Community Feature Store Guide](https://mlops.community/)
- [Feast Python SDK Reference](https://rtd.feast.dev)
## Recommended Hosting & InfrastructureBefore you deploy any of the tools above into production, you'll need solid infrastructure. Two options dibi8 actually uses and recommends:- **DigitalOcean
** — $200 free credit for 60```
yam
l
# feature_store_team_a.yaml
project: team_a_fraud
registry:
path: s3://shared-bucket/registry_team_a.db
online_store:
type: redis
connection_string: "redis://shared-redis:6379/0"
offline_store:
type: bigquery
project: my-gcp-project
dataset: team_a_features
---
# feature_store_team_b.yaml
project: team_b_recommendations
registry:
path: s3://shared-bucket/registry_team_b.db
online_store:
type: redis
connection_string: "redis://shared-redis:6379/1"
offline_store:
type: bigquery
project: my-gcp-project
dataset: team_b_features
```iss
i
o
n
at no extra cost to you. We only recommend services we have evaluated and believe provide genuine value for ML infrastructure deployments. Opinions expressed are independent of any affiliate relationship.
```pyt
h
o
n
# Tag feature definitions with version metadata
user_transaction_features_v2 = FeatureView(
name="user_transaction_features_v2",
entities=[user],
schema=[...],
source=transaction_stats_source,
tags={
"version": "2.0",
"model": "fraud_xgboost_v3",
"changelog": "Added velocity features",
"owner": "ml-team@company.com",
},
)
yam l
RBAC configuration (Feast 0.60+) #
auth: type: oidc oidc_server_url: “https://auth.company.com” client_id: “feast-app” client_secret: “${OIDC_CLIENT_SECRET}” token_introspection_url: “https://auth.company.com/introspect"
authorization: enabled: true policies: - resource: “feature_view:user_transaction_features” actions: [“read”, “materialize”] roles: [“ml-engineer”, “data-scientist”] - resource: “feature_service:fraud_detection_v1” actions: [“read”] roles: [“model-server”]
pytho
n
# stream_ingestion.py
from feast import FeatureStore
from confluent_kafka import Consumer
import json
store = FeatureStore(repo_path=".")
consumer = Consumer({
"bootstrap.servers": "kafka:9092",
"group.id": "feast-stream-ingestion",
"auto.offset.reset": "latest",
})
consumer.subscribe(["transaction-events"])
while True:
msg = consumer.poll(timeout=1.0)
if msg is None:
continue
event = json.loads(msg.value().decode("utf-8"))
# Push feature update directly to online store
store.push(
feature_view_name="user_transaction_features",
df=pd.DataFrame([{
"user_id": event["user_id"],
"event_timestamp": event["timestamp"],
"avg_transaction_amount_7d": event["amount"],
}]),
)
bas h pip install feast[redis,bigquery] feast init
Define your entities, feature views, and feature services #
feast apply feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S”)
💬 댓글 토론