nanochat is an open-source AI tool that helps with artificial intelligence workflows. It provides a practical solution for developers and teams looking to leverage AI in their projects.

Is nanochat free to use?

Yes, nanochat is open-source and free to use. Check the project GitHub repository for the specific license and any premium features.

How do I install nanochat?

Install nanochat by following the setup guide in the article. Most tools can be installed via pip, npm, Homebrew, or Docker depending on the platform.

nanochat: Karpathy's $100 ChatGPT — Build Your Own AI Chat App on a Single GPU

nanoChat — the $100 ChatGPT you train yourself

Introduction #

Crawl4AI jumped from 12,000 to 63,000 GitHub stars in 90 days. nanochat, on the other hand, grew from 0 to 54,800 in just under 8 months — and it has no server, no API key dependency, no $20/month subscription. It’s a single Python script by Andrej Karpathy that lets you train a ChatGPT-like chat application from scratch on a single consumer GPU, starting with approximately $100 of compute. Not a fine-tuning tutorial. Not a LoRA adapter. A full chat app built from a single file of Python code, complete with streaming, conversation history, and a web UI. If you’ve ever wanted to understand what happens under the hood of an AI chat interface, nanochat is the hands-on laboratory.

What Is nanochat? #

nanochat is an open-source, minimal chat application written by Andrej Karpathy that demonstrates how to build a ChatGPT-like experience using models you train yourself on a single GPU. It is not a framework or a library. It is a single app.py file (~400 lines) that implements:

Tokenizer-based text generation with streaming
Conversation history management (multi-turn)
A web UI rendered via Streamlit
Two modes: SGLang (train from scratch with real data) and vLLM (serve pre-trained models locally)

The philosophy is “build it to understand it.” Karpathy has a track record of making complex AI concepts accessible through minimal code — from nanoGPT to karpathy/llm.c — and nanochat continues this tradition by showing you exactly how a chat app works, end to end.

How nanochat Works #

nanochat operates in two distinct modes, each with a different training/inference pipeline:

SGLang Mode: Train from Scratch #

Raw text corpus → Tokenizer training → Model training → Chat UI

Data collection — Download and parse a text corpus (e.g., Wikipedia, books, code)
Tokenizer training — Train a BytePair Encoding (BPE) tokenizer on the corpus
Model training — Train a GPT-style transformer using SGLang’s distributed training
Chat serving — The trained model is served through nanochat’s web interface

vLLM Mode: Serve Pre-Trained Models #

Pre-trained model (HuggingFace) → vLLM serving → Chat UI

Model download — Pull a pre-trained model from HuggingFace (e.g., Qwen, Llama, Mistral)
vLLM serving — Use vLLM’s PagedAttention for high-throughput inference
Chat serving — Nanochat wraps the vLLM endpoint with a streaming chat UI

┌──────────────────────────────────────────────┐
│              nanochat Web UI                 │
│           (Streamlit + WebSocket)             │
├──────────────────────────────────────────────┤
│           SGLang / vLLM Inference             │
├──────────────────────────────────────────────┤
│  SGLang Mode: Train from scratch  │  vLLM Mode: Serve HF models  │
└──────────────────────────────────────────────┘

nanoChat architecture: two modes, one web UI

The key insight: both modes share the same chat interface. The only difference is whether you’re generating tokens from a model you trained yourself (SGLang) or a model you downloaded (vLLM).

Spin up a GPU-enabled droplet on DigitalOcean to run nanochat training

Installation & Setup #

Install Dependencies with uv #

nanochat uses uv for dependency management. Install uv first, then run: uv sync --extra gpu (for CUDA/A100/H100) or uv sync --extra cpu (for CPU-only/MPS). The project manages all dependencies through pyproject.toml.

# Clone the repository
git clone https://github.com/karpathy/nanochat.git
cd nanochat

# GPU mode (CUDA/A100/H100)
uv sync --extra gpu

# CPU-only mode (for CPU-only/MPS)
uv sync --extra cpu

SGLang Mode: Train from Scratch #

# Train a tokenizer on your corpus
python train_tokenizer.py --input data/wikipedia.txt --output tokenizer.json --vocab_size 50000

# Train the model (example: 1B parameter GPT)
python train_model.py --tokenizer tokenizer.json --epochs 3 --batch_size 32

# Launch the chat app
python app.py --mode sglang --model_path checkpoints/latest.pth

vLLM Mode: Serve Pre-Trained Models #

# Launch vLLM server with a HuggingFace model
python -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen2.5-1.5B-Instruct \
  --port 8000 \
  --max-model-len 4096

# Launch the chat app (points to vLLM)
python app.py --mode vllm --api_url http://localhost:8000/v1/chat/completions

Access the web UI at http://localhost:8501.

Integration with SGLang, vLLM, HuggingFace Models #

nanochat is designed to work seamlessly with the broader AI inference ecosystem. Here’s how each integration works in practice:

SGLang Integration #

SGLang (Structured Generation Language) is the training backend. It provides distributed training capabilities optimized for transformer models:

# sglang_config.py — SGLang-specific settings
config = {
    "model_type": "gpt",
    "vocab_size": 50000,
    "num_hidden_layers": 24,
    "num_attention_heads": 16,
    "hidden_size": 1024,
    "intermediate_size": 4096,
    "max_position_embeddings": 4096,
    "learning_rate": 3e-4,
    "warmup_ratio": 0.05,
    "weight_decay": 0.01,
    "bf16": True,
}

vLLM Integration #

vLLM provides high-throughput inference with PagedAttention, managing KV cache memory dynamically:

# vllm_config.py — vLLM serving settings
from vllm import LLM, SamplingParams

llm = LLM(
    model="Qwen/Qwen2.5-7B-Instruct",
    tensor_parallel_size=1,
    max_model_len=8192,
    enable_chunked_prefill=True,
)

sampling_params = SamplingParams(
    temperature=0.7,
    top_p=0.9,
    max_tokens=2048,
    stop=["<|im_end|>"],
)

HuggingFace Model Compatibility #

nanochat supports any HuggingFace model that follows the standard transformer architecture. The model list includes:

Introduction #

What Is nanochat? #

How nanochat Works #

SGLang Mode: Train from Scratch #

vLLM Mode: Serve Pre-Trained Models #

Installation & Setup #

Install Dependencies with uv #

SGLang Mode: Train from Scratch #

vLLM Mode: Serve Pre-Trained Models #

Integration with SGLang, vLLM, HuggingFace Models #

SGLang Integration #

vLLM Integration #

HuggingFace Model Compatibility #

🔗 Related Resources

💬 Discussion