Weights & Biases (W&B): Track Every Experiment Like a Pro — ML Experiment Platform 2026
Weights & Biases (wandb/wandb) is the AI developer platform for tracking, comparing, and deploying ML experiments. Supports PyTorch, TensorFlow, Hugging Face, and LLM fine-tuning. Covers experiment tracking, dataset versioning, model registry, and production monitoring.
- ⭐ 11114
- Updated 2026-06-09
Introduction #
Training a machine learning model without experiment tracking is like driving blindfolded. You might reach your destination eventually, but you’ll never know which turn was the right one. Weights & Biases (W&B) solves this by providing a unified platform that logs, visualizes, and compares every experiment — from hyperparameter sweeps to LLM fine-tuning runs. With 11,000+ GitHub stars and integration with PyTorch, TensorFlow, Hugging Face, and 200+ frameworks, W&B has become the de facto standard for ML teams that take their experiments seriously.
What Is W&B? #
Weights & Biases is an end-to-end ML development platform that covers the entire experiment lifecycle. At its core is the logger — a lightweight library you add to your training script that automatically tracks metrics, configurations, artifacts, and even model checkpoints. Beyond logging, W&B provides a web dashboard for visualizing runs, comparing experiments side by side, sharing results with your team, and managing models from training to deployment.
┌───────────────────────────────────────────────┐
│ W&B Platform Architecture │
├───────────────────────────────────────────────┤
│ │
│ SDK (pip install wandb) │
│ ├─ Logger (metrics, params, tables) │
│ ├─ Artifact Tracker (datasets, models) │
│ ├─ Sweeps (hyperparameter tuning) │
│ ├─ Reports (visual dashboards) │
│ └─ Model Registry (production models) │
│ │
│ Cloud Dashboard │
│ ├─ Run comparison (up to 100 runs) │
│ ├─ Project-level statistics │
│ ├─ Artifact lineage graph │
│ └─ Team collaboration & sharing │
│ │
│ Integrations │
│ ├─ PyTorch, TensorFlow, JAX │
│ ├─ Hugging Face Transformers │
│ ├─ PyTorch Lightning, FastAI │
│ └─ Ray Tune, Optuna, Ax │
└───────────────────────────────────────────────┘
How W&B Works #
W&B works by instrumenting your training loop. You initialize a run, log metrics at each step, and W&B sends the data to the cloud dashboard in real time. The SDK is designed to have minimal overhead — logging a metric takes roughly 0.1ms, and the network calls are batched and compressed to reduce bandwidth usage.
import wandb
# Initialize a new run with your configuration
wandb.init(
project="my-nlp-finetune",
config={
"learning_rate": 2e-5,
"batch_size": 32,
"epochs": 3,
"model": "bert-base-uncased",
}
)
for epoch in range(config.epochs):
for batch in train_dataloader:
loss = model.train_step(batch)
# Log metrics — W&B handles the rest
wandb.log({"train_loss": loss, "lr": config.learning_rate})
The platform distinguishes between three types of tracked data: metrics (scalar values like loss and accuracy logged over time), artifacts (versioned files like datasets and model checkpoints), and media (images, audio, text samples visualized directly in the dashboard).
Installation & Setup #
Option 1: pip install (standard)
pip install wandb
Option 2: Authenticate with W&B
wandb login
# Paste your API key from https://wandb.ai/authorize
Option 3: Docker
docker pull wandb/launch
docker run -e WANDB_API_KEY=$WANDB_API_KEY \
-v /path/to/code:/app wandb/launch python train.py
Option 4: Hugging Face Integration
pip install wandb transformers
# W&B is pre-configured for Hugging Face Trainer
Integration with PyTorch, Hugging Face, and Ray Tune #
W&B integrates with virtually every popular ML framework. Here are the most common setups.
PyTorch Lightning
import pytorch_lightning as pl
from pytorch_lightning.callbacks import WandbCallback
class MyModel(pl.LightningModule):
def training_step(self, batch, batch_idx):
loss = self.forward(batch)
self.log("train_loss", loss)
return loss
# W&B callback auto-logs everything
trainer = pl.Trainer(callbacks=[WandbCallback()])
trainer.fit(model)
Hugging Face Transformers
from transformers import Trainer, TrainingArguments
import wandb
training_args = TrainingArguments(
output_dir="./results",
report_to="wandb", # Enable W&B reporting
num_train_epochs=3,
per_device_train_batch_size=16,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset,
)
trainer.train()
Ray Tune for Hyperparameter Sweeps
import ray
from ray import tune
import wandb
ray.init()
def train_model(config):
# W&B automatically captures the sweep config
wandb.init(config=config)
score = my_training_function(config)
wandb.log({"score": score})
sweep = tune.run(
train_model,
config={
"learning_rate": tune.choice([1e-4, 2e-5, 5e-5]),
"batch_size": tune.choice([16, 32, 64]),
},
metric="score",
mode="max",
)
Benchmarks / Real-World Use Cases #
W&B’s logging performance has been benchmarked across various training scales. At typical training workloads, the overhead is negligible:
| Scenario | Logging Overhead | Network Bandwidth | Dashboard Load Time |
|---|---|---|---|
| Small model (10K params) | 0.5% | <1 MB/run | <1s |
| Medium model (100M params) | 1.2% | <5 MB/run | <2s |
| Large model (1B params) | 2.1% | <20 MB/run | <3s |
| LLM fine-tuning (7B params) | 3.5% | <50 MB/run | <5s |
| Distributed training (8 GPUs) | 4.0% | <100 MB/run | <3s |
Real-world usage examples:
# Log a confusion matrix as a W&B table
import numpy as np
import wandb
conf_matrix = compute_confusion_matrix(y_true, y_pred)
wandb.log({
"confusion_matrix": wandb.plot.confusion_matrix(
probs=None,
preds=conf_matrix,
class_names=["Cat", "Dog", "Bird"]
)
})
# Version a dataset as an artifact
artifact = wandb.Artifact("training_data", type="dataset")
artifact.add_file("dataset.csv")
wandb.log_artifact(artifact)
Advanced Usage / Production Hardening #
Artifact Versioning and Lineage
# Log a model checkpoint as an artifact
model_artifact = wandb.Artifact("best_model", type="model")
model_artifact.add(model, "model.pt")
# Tag it for easy retrieval
wandb.log_artifact(model_artifact, aliases=["best", "v1.0"])
# Later, retrieve the exact version
run = wandb.init()
art = run.use_artifact("project/model:v1", type="model")
path = art.download()
Custom Reports and Dashboards
# Create a report with custom panels
report = wandb.Report(
title="Experiment Results",
slides=[
("Overview", wandb.plot.line_series(
xs="step", ys="loss", keys=["train_loss", "val_loss"],
title="Training Loss Curve"
)),
("Comparison", wandb.plot.bar(
wandb.Table(columns=["run", "accuracy"],
data=[["run-a", 0.92], ["run-b", 0.89]])
)),
]
)
report.save("experiment-report")
Sweeps Configuration
# sweeps.yaml
name: nlp-sweep
program: train.py
metric:
name: val_accuracy
goal: maximize
parameters:
learning_rate:
values: [1e-5, 2e-5, 5e-5, 1e-4]
optimizer:
values: [adamw, adam]
warmup_ratio:
min: 0.0
max: 0.1
command:
- python
- train.py
Run the sweep:
wandb sweep sweeps.yaml
wandb agent $SWEEP_ID
Model Registry for Deployment
# Log model to the registry
run.log_model(
path="./fine_tuned_model",
name="my-nlp-model",
aliases=["staging", "production"]
)
# Transition model to production
api = wandb.Api()
model = api.model("my-nlp-model:staging")
model.change_alias("production")
W&B SDK for Custom Training Loops
import wandb
import torch
from torch.optim import AdamW
wandb.init(project="custom-training")
config = wandb.config
config.learning_rate = 1e-4
config.batch_size = 64
model = MyModel()
optimizer = AdamW(model.parameters(), lr=config.learning_rate)
for epoch in range(config.epochs):
model.train()
epoch_loss = 0
for i, (x, y) in enumerate(train_loader):
optimizer.zero_grad()
output = model(x)
loss = criterion(output, y)
loss.backward()
optimizer.step()
epoch_loss += loss.item()
# Log every 100 steps
if i % 100 == 0:
wandb.log({
"train_loss": loss.item(),
"learning_rate": config.learning_rate,
"epoch": epoch
})
# Log epoch summary
wandb.log({
"epoch_loss_avg": epoch_loss / len(train_loader),
"epoch": epoch,
})
Artifact Versioning for Dataset Workflows
# Create and log a dataset artifact
dataset_artifact = wandb.Artifact(
name="cleaned_dataset",
type="dataset",
description="Cleaned and tokenized training data",
)
# Add files
dataset_artifact.add_file("data/train.csv")
dataset_artifact.add_file("data/val.csv")
dataset_artifact.add_file("data/test.csv")
# Log with alias for versioning
wandb.log_artifact(dataset_artifact, aliases=["latest", "v1.2"])
# Later, use this exact dataset version
run = wandb.init()
clean_data = run.use_artifact("project/cleaned_dataset:v1.2", type="dataset")
data_path = clean_data.download()
Comparison with Alternatives #
| Feature | W&B | MLflow | Weights & Biases | TensorBoard | Neptune.ai |
|---|---|---|---|---|---|
| Experiment tracking | ✓ | ✓ | ✓ | ✓ | ✓ |
| Hyperparameter sweeps | ✓ (Native) | ✓ | ✓ | No | ✓ |
| Dataset versioning | ✓ (Artifacts) | ✓ (MLflow) | ✓ | No | ✓ |
| Model registry | ✓ | ✓ | ✓ | No | ✓ |
| LLM fine-tuning support | ✓ | ✓ | ✓ | Limited | ✓ |
| Distributed training | ✓ | ✓ | ✓ | ✓ | ✓ |
| Free tier | 5 users | Unlimited | 1 user | Unlimited | 3 users |
| Self-hostable | No (Cloud) | ✓ (Open) | No | ✓ (Local) | ✓ (Open) |
| Custom reports | ✓ | No | No | No | ✓ |
| Team collaboration | ✓ | ✓ | ✓ | No | ✓ |
| API quality | Excellent | Good | Good | Basic | Good |
| Active GitHub stars | 11K+ | 20K+ | 2K+ | 10K+ | 1K+ |
Limitations / Honest Assessment #
W&B is the most polished ML tracking platform available, but it has some trade-offs:
- Cloud-first model: W&B’s free tier requires using their cloud platform. While they offer self-hosted W&B Enterprise for teams that need on-premise deployment, the free tier does not support self-hosting. If your organization requires all data to stay within your infrastructure, this may be a dealbreaker.
- Free tier limits: The free tier is limited to 1 team member. For larger research teams, the paid plans start at a significant cost, especially when you factor in the additional storage required for large model artifacts.
- Learning curve for advanced features: Basic logging is straightforward, but features like sweeps, artifact versioning, and custom reports require understanding W&B’s data model. New users may need 1-2 hours to get comfortable with the platform’s full capabilities.
- Limited offline capability: If your training environment has intermittent internet connectivity, W&B syncs data when the connection is restored. However, the SDK does support
wandb.init(mode="offline")for fully disconnected environments, with manual sync later. - Vendor lock-in risk: While W&B exports data in standard formats (JSON, CSV), building a migration to another platform after committing to W&B for hundreds of experiments can be time-consuming.
Frequently Asked Questions #
Q: Is W&B free for academic research?
Yes. W&B offers a generous free tier for individual researchers and small teams. Academic institutions can also apply for W&B Pro credits through the W&B for Academia program, which provides additional team members and storage at no cost.
Q: Can I use W&B with Jupyter Notebooks?
Absolutely. W&B works seamlessly in Jupyter notebooks. Initialize your run with wandb.init() at the top of a notebook cell, and all subsequent wandb.log() calls will stream to the dashboard. Use wandb.jupyter for automatic integration with Jupyter widgets.
Q: How does W&B handle large model artifacts?
W&B uploads model artifacts incrementally — only changed bytes are transferred on subsequent uploads. For very large models (multi-gigabyte checkpoints), W&B uses a content-addressable storage system that deduplicates uploads and supports resuming interrupted transfers.
Q: Can I track LLM fine-tuning experiments?
Yes. W&B has excellent support for LLM fine-tuning workflows. It integrates with Hugging Face Transformers, PyTorch Lightning, and frameworks like Axolotl and Unsloth. You can log training loss, evaluation metrics, generated samples, and even benchmark scores for each fine-tuning run.
Q: How do I collaborate with my team on experiments?
W&B supports team workspaces where all runs, artifacts, and reports are shared by default. Team members can comment on specific runs, share direct links to experiment comparisons, and use the built-in chat feature to discuss results without switching to a separate communication platform.
Conclusion #
Weights & Biases has transformed how ML teams approach experiment tracking. By combining real-time logging, intuitive visualization, and powerful collaboration features, W&B turns the chaos of model training into a structured, reproducible workflow. Whether you’re fine-tuning a 7B-parameter LLM or running a small hyperparameter sweep, W&B provides the visibility you need to make better decisions faster.
The platform’s deep integrations with PyTorch, Hugging Face, and Ray Tune mean you can start tracking experiments with a single line of code (report_to="wandb"). For teams building ML applications at scale, DigitalOcean offers affordable GPU instances that pair well with W&B’s tracking infrastructure.
For teams deploying ML pipelines: WebShare provides reliable proxy infrastructure for distributed training workflows.
For ML teams exploring algorithmic strategies: Binance offers advanced API access for quantitative research and backtesting.
For enterprise data pipelines: HTStack delivers the network performance needed for large-scale data transfers.
For AI finance teams: OKX provides comprehensive market data and trading tools.
Explore more guides on MLflow vs W&B Comparison and LLM Fine-Tuning Best Practices for comprehensive technical deep-dives.
Join the DIBI8 community on Telegram for ongoing discussions about ML tools, experiment tracking, and MLOps practices.
Sources & Further Reading:
- W&B documentation: https://docs.wandb.ai/
- W&B GitHub repository: https://github.com/wandb/wandb
- W&B API reference: https://docs.wandb.ai/ref/python/
- Sweeps documentation: https://docs.wandb.ai/guides/sweeps
- Model registry guide: https://docs.wandb.ai/guides/model-registry
- Community discussion: https://community.wandb.ai/
Disclosure: This article contains affiliate links. If you sign up through our links, we may earn a small commission at no additional cost to you. This helps support independent tech journalism and keeps resources like dibi8.com free and ad-free.
💬 Discussion