The AI coding assistant revolution has created a paradox for developers: we have unprecedented access to world-class language models through tools like Claude Code, OpenAI Codex, Cursor, and GitHub Copilot — but managing subscriptions, quotas, and rate limits across multiple platforms is becoming increasingly expensive and frustrating. Many developers find themselves burning through their Claude Pro monthly quota within two weeks, only to stare at rate-limit walls while trying to meet sprint deadlines.
Enter 9Router — an open-source smart proxy and token management system that eliminates this pain entirely. With over 6,900 GitHub stars, 1,200+ forks, and rapid community growth, 9Router has emerged as the go-to solution for developers who want maximum AI capability without paying for unnecessary premium tiers. Built on Node.js 20+ with Next.js 16 and React 19, it provides a unified interface that routes your AI coding requests across 40+ providers using intelligent fallback logic and powerful token-saving compression.
What Is 9Router and How Does It Work?
9Router is a locally-hosted intermediary service (running on localhost:20128 by default) that sits between your AI coding tool and the underlying model provider. Instead of sending API requests directly to Claude, OpenAI, or any single provider, your tool talks to 9Router — which then intelligently decides which backend provider to route the request to.
This architecture gives you three major advantages:
- Multi-provider access from one place: Configure Claude, Gemini, GLM, MiniMax, Kiro, OpenCode, Vertex AI, and 40+ other providers in a single dashboard. Your CLI tools send requests to localhost; 9Router handles the rest.
- Automatic fallback: When your primary provider hits a quota limit or experiences downtime, 9Router seamlessly switches to the next tier — whether that’s a cheap backup provider or a completely free option. Zero interruptions to your workflow.
- Token compression before requests leave your machine: Through its integration with RTK (~40K stars), 9Router compresses tool outputs (git diffs, grep results, directory listings, log dumps) before they reach the LLM. This alone saves 20–40% of input tokens per request.
Core Features That Set 9Router Apart
🚀 RTK Token Compression Engine
Tool outputs frequently account for 30–50% of your total prompt budget. When Claude Code runs git diff, ls -R, or grep in a large codebase, it sends megabytes of text to the model — much of which is irrelevant noise.
9Router’s built-in RTK integration detects these tool outputs automatically and applies smart, lossless compression filters:
- git-diff: Reduces diff output to essential changed lines
- git-status: Compresses status into summary format
- grep / find: Prunes irrelevant matches, keeps context-rich lines
- tree / ls: Collapses directory structures meaningfully
- dedup-log: Removes duplicate consecutive log entries
- smart-truncate: Preserves head/tail while removing redundant middle sections
Crucially, if any filter fails or produces worse output than the original, RTK silently falls back to the unmodified text. Errors never break your requests. The compression runs before any format translation, so it works universally across all supported formats (OpenAI, Claude, Gemini, Cursor, Kiro, OpenAI Responses).
1Without RTK: 47K tokens sent to LLM
2With RTK: 28K tokens sent to LLM (40% saved · same quality answer)
In practice, developers report seeing token savings of 20–40% on every single request — effectively extending the lifetime of every subscription by days or even weeks.
🪨 Caveman Mode (Output Compression)
Beyond input optimization, 9Router also reduces what the LLM sends back. By injecting a “caveman-style” system prompt (inspired by Caveman with ~52K stars), 9Router instructs the model to respond tersely — preserving all technical substance while eliminating conversational filler.
This can save up to 65% of output tokens. For complex refactoring tasks or long code generation sessions, these savings compound rapidly across hundreds of API calls.
🎯 Smart Three-Tier Fallback System
This is arguably 9Router’s killer feature. You define “combos” — ordered lists of models spanning different pricing tiers — and 9Router automatically routes requests accordingly:
1Combo: "my-coding-stack"
2 1. cc/claude-opus-4-6 → Your Claude Code Pro subscription
3 2. glm/glm-4.7 → Cheap backup ($0.6 per 1M tokens)
4 3. kr/claude-sonnet-4.5 → Free emergency fallback via Kiro AI
When Opus quota runs out (or when an error occurs), 9Router instantly transitions to GLM. If GLM also exhausts, it drops to Kiro’s free unlimited tier. You never hit a wall.
The system supports five distinct pricing layers:
| Tier | Providers | Typical Cost | Reset Pattern |
|---|---|---|---|
| Subscription | Claude Code, Codex, Copilot, Cursor | $10–200/mo | 5h rolling + weekly/monthly |
| Cheap | GLM-5.1, MiniMax M2.7, Kimi K2.5 | $0.2–$0.6/1M tokens | Daily/rolling/fixed monthly |
| Free | Kiro AI, OpenCode Free, Vertex AI | $0 | Unlimited |
📊 Real-Time Quota Tracking & Analytics
The web dashboard displays live token consumption per provider, reset countdown timers (5-hour, daily, weekly, monthly), and estimated cost tracking. While the dashboard shows “costs” as a reference comparison tool — 9Router itself is free software and never charges anything — the analytics help you understand usage patterns and optimize spending.
If your dashboard shows “$290 total cost” while using Kiro’s free tier, that $290 represents what you would have paid if you used those APIs directly. Your actual payment remains $0. It’s essentially a savings tracker showing how much money you’re avoiding spending.
🔄 Format Translation Across Every Major Protocol
9Router translates between OpenAI, Claude, Gemini, Cursor, Kiro, Vertex AI, Antigravity, Ollama, and OpenAI Responses formats transparently. Your CLI tool sends a standard OpenAI-compatible payload; 9Router translates it into the native format each provider expects. This means you can use any tool supporting custom OpenAI endpoints and plug it into any backed provider.
👥 Multi-Account Support
Need load balancing or redundancy across accounts? 9Router lets you add multiple accounts per provider, with automatic round-robin distribution or priority-based routing. If one account hits its quota, requests automatically shift to the next available account. OAuth tokens refresh automatically, eliminating manual re-authentication cycles.
💾 Cloud Sync
Sync your entire configuration — providers, combos, aliases, settings — across devices via encrypted cloud storage. Set up your perfect combo on your local machine, then access the exact same configuration on your VPS, Docker deployment, or teammate’s workstation.
Supported Coding Tools and IDEs
9Router acts as a universal adapter, supporting virtually every popular AI coding tool:
- Claude Code (
~/.claude/config.jsonwith custom API base) - OpenAI Codex CLI (environment variable override)
- Cursor IDE (Custom OpenAI endpoint settings)
- GitHub Copilot
- OpenClaw (WhatsApp, Telegram, Slack messaging)
- Cline
- Continue
- Roo Code
- Antigravity
- Droid
- Kilo Code
- OpenCode
Any tool that supports a custom OpenAI-compatible API endpoint can connect to 9Router. The service exposes a standard OpenAI-compatible interface at http://localhost:20128/v1.
Getting Started: Installation and Setup
Quick Start: Localhost (Recommended for Most Users)
1# Clone and install
2git clone https://github.com/decolua/9router.git
3cd 9router
4npm install
5npm run build
6
7# Optional environment setup
8export JWT_SECRET="your-secure-secret-change-this"
9export INITIAL_PASSWORD="your-dashboard-password"
10export PORT="20128"
11export NODE_ENV="production"
12
13# Start the server
14npm run start
After startup, open http://localhost:20128 to access the web dashboard. From there, connect your first provider.
Docker Deployment
For production or multi-device setups, Docker makes deployment trivial:
1docker build -t 9router .
2
3docker run -d \
4 --name 9router \
5 -p 20128:20128 \
6 --env-file ./.env \
7 -v 9router-data:/app/data \
8 -v 9router-usage:/root/.9router \
9 9router
Connecting Your First Provider
Let’s set up a complete free-tier combo — no payment methods required:
- Connect Kiro AI in the dashboard (uses AWS Builder ID, Google, or GitHub OAuth — no API key needed)
- Connect OpenCode Free (zero auth, passthrough proxy, models auto-fetched)
- Create a combo named
free-devwith models:kr/claude-sonnet-4.5(Claude Sonnet 4.5 via Kiro — free unlimited)kr/glm-5(GLM-5 via Kiro — free unlimited)vertex/gemini-3.1-pro-preview(Google Cloud — $300 free credits)
Then configure your preferred tool to point at http://localhost:20128/v1 with your dashboard API key:
1{
2 "anthropic_api_base": "http://localhost:20128/v1",
3 "anthropic_api_key": "your-9router-api-key"
4}
Configuring Cursor IDE
In Cursor Settings → Models → Advanced:
1OpenAI API Base URL: http://localhost:20128/v1
2OpenAI API Key: [copy from 9Router dashboard]
3Model: cc/claude-opus-4-7
Now every model call from Cursor flows through 9Router’s routing intelligence.
Real-World Use Cases
Scenario A: Maximize Your Existing Subscriptions
You pay $20/month for Claude Pro. Without 9Router, once the quota expires, coding stops until the reset.
With 9Router’s “maximize-claude” combo:
- Primary:
cc/claude-opus-4-7(use full subscription) - Backup:
glm/glm-5.1($0.6/1M, resets daily at 10 AM) - Emergency:
kr/claude-sonnet-4.5(Kiro free fallback)
Result: Your $20 subscription lasts longer because RTK saves 20–40% tokens, and when it does expire, you have seamless backups. Total effective cost increases by roughly $5 for the cheap tier — far less than upgrading to Claude Max ($200/mo).
Scenario B: Complete $0 Monthly Budget
Start with 100% free models:
gc/gemini-3-flash(180K free queries/month from Google)kr/claude-sonnet-4.5(Kiro free unlimited)oc/<auto>(OpenCode Free, no authentication needed)
Combined with RTK compression, this setup delivers production-quality model responses with literally zero monthly cost.
Scenario C: Uninterrupted 24/7 Development
For teams and freelancers under deadline pressure, layer five fallback tiers:
- Claude Opus (premium quality)
- GPT-5.5 via Codex (second subscription)
- GLM-5.1 (cheap daily-reset)
- MiniMax M2.7 (cheapest at $0.2/1M, 5h rolling reset)
- Kiro Claude Sonnet 4.5 (free unlimited)
Five layers guarantee zero downtime regardless of quota exhaustion or provider outages.
Pricing Transparency: How Much Will This Actually Cost?
A critical question for anyone evaluating 9Router: does 9Router charge you? No. Ever.
Here’s how the economics actually work:
- 9Router software = FREE forever (open-source MIT license, self-hosted on your own hardware)
- Dashboard costs = display/tracking only (not real billing statements)
- You pay providers directly (subscriptions, API keys, whatever you configure)
- Free providers stay free (Kiro AI, OpenCode Free, Vertex AI credits)
9Router is purely a local proxy router running on your own computer. It doesn’t have access to your credit card, cannot generate invoices, and has no billing infrastructure. It simply forwards requests and optionally compresses tokens.
The dashboard’s cost display serves as a “savings tracker” — showing you what equivalent usage would have cost using paid APIs directly. If you configure all free providers, the displayed cost might read “$290” while your actual bank transaction is $0. That $290 is the money you’re actively saving.
9Router vs. Alternatives
How does 9Router compare to existing solutions?
| Feature | 9Router | Direct Provider Access | Other Proxy Tools |
|---|---|---|---|
| Smart fallback routing | ✅ Auto 3+ tier | ❌ Single provider | Partial |
| Token compression (RTK) | ✅ Built-in | ❌ None | Rarely |
| Multi-format translation | ✅ 8+ protocols | N/A | Limited |
| Multi-account rotation | ✅ Round-robin | ❌ Manual | Manual |
| Free provider support | ✅ Kiro, OpenCode, Vertex | ❌ Not applicable | Usually not |
| Real-time analytics | ✅ Dashboard + logs | ❌ Provider portals | Basic |
| Self-hosted | ✅ Full control | N/A | Variable |
| Cost | Free software + provider costs | Full provider prices | Often paid |
The main alternative worth noting is OmniRoute , a TypeScript fork of 9Router that adds 36+ providers, 4-tier auto-fallback, multi-modal APIs (images, embeddings, audio, TTS), circuit breaker patterns, semantic caching, LLM evaluation harnesses, and a polished dashboard with 368+ unit tests. OmniRoute is available via npm and Docker for users who want extended capabilities beyond the core 9Router feature set.
Why 9Router Matters Right Now
We’re living through a golden age of AI coding tools, but the economic reality hasn’t caught up. Each major provider independently restricts access behind paywalls, quota limits, and rate caps. Managing six different subscriptions across Claude, OpenAI, Google, Anthropic, DeepSeek, and xAI creates both financial burden and operational complexity.
9Router solves this by treating all these providers as interchangeable commodities routed through a single intelligence layer. You get the best model for each task, the cheapest path for routine ones, and guaranteed availability when quotas run dry — all while compressing token waste before it ever leaves your machine.
The combination of RTK token compression (~20–40% savings), Caveman mode output reduction (~65% savings), and intelligent multi-tier fallback creates a compounding effect. Developers reporting 500+ daily API calls see their effective model consumption drop by 40–60%, transforming a $200/month AI stack into something manageable at $20–30.
Technical Architecture Highlights
9Router is built on a modern JavaScript stack optimized for reliability:
- Runtime: Node.js 20+ for consistent, performant async I/O
- Framework: Next.js 16 with React 19 for the web dashboard
- Database: LowDB (JSON file-based) — simple, portable, version-controllable config
- Streaming: Server-Sent Events (SSE) for real-time progress feedback
- Auth: OAuth 2.0 with PKCE, JWT session cookies, HMAC-signed API keys
- Proxy: Full HTTP passthrough with configurable upstream proxies
Environment variables give granular control over deployment:
JWT_SECRET: Change for production securityREQUIRE_API_KEY: Enforce bearer token auth on/v1/*routesENABLE_REQUEST_LOGS: Enable debug-level request/response loggingAUTH_COOKIE_SECURE: Force Secure cookie flag behind HTTPS reverse proxyHTTP_PROXY/HTTPS_PROXY: Route upstream requests through corporate proxies
The service listens on port 20128 by default and requires no external dependencies or databases beyond the JSON files stored in ${DATA_DIR}.
Final Thoughts
9Router addresses a genuinely painful problem that more developers are feeling as AI tool subscriptions multiply. Rather than accepting escalating costs and arbitrary rate limits as the inevitable price of AI-assisted development, 9Router flips the script: use whatever providers you already have, fill gaps with cheap or free alternatives, compress everything possible, and maintain continuous coding flow regardless of quota state.
For solo developers on tight budgets, the free-first strategy can deliver fully functional AI coding assistance at exactly $0. For teams willing to invest in premium subscriptions, the token compression and smart routing maximize ROI by ensuring every dollar spent stretches further.
It’s free, open-source, and takes minutes to self-host. Given the current trajectory of AI tool pricing, adding 9Router to your development infrastructure probably isn’t just useful — it’s becoming essential.
Repository: github.com/decolua/9router Website: 9router.com
Related Articles
- oMLX: Run Local LLMs on Your Mac with Zero Config
- Chrome DevTools MCP: Browser Superpowers for AI Agents
- GenericAgent: Self-Evolving AI Agent Framework
- Anthropic’s Financial Services: AI-Powered Claude Agents
- Easy Vibe Coding: Beginner’s Guide to Vibe Programming
- Addy Osmani’s Agent Skills: Production-Grade AI Coding Agents
Have questions or ideas? Feel free to leave a comment below. Sign in with GitHub to join the discussion.