lang: vi slug: tabby title: ‘Tabby: Self-Hosted AI Coding Assistant with 33K+ Stars’ description: ‘Tabby is a self-hosted AI coding assistant. VS Code, JetBrains, Vim, Neovim, Ollama, DeepSeek. Docker setup, IDE integration, benchmarks, and production hardening.’ tags: [“ai-agent”, “coding”, “development”, “guide”, “open-source”, “opensource”, “reference”, “self-hosted”, “tutorial”] date: 2026-05-19 00:00:00+08:00 lastmod: 2026-05-19 00:00:00+08:00 tech_stack: [] application_domain: Llm Frameworks source_version: ’' licensing_model: Open Source license_type: Apache-2.0 file_size: ’' file_md5: ’' download_url: ’' backup_url: ’' github_repo: ‘https://github.com/TabbyML/tabby' last_maintained: ‘2026-05-19’ draft: false categories: [’llm-frameworks’] aliases:- /posts/tabby/ faqs:

  • q: ‘Is Tabby a free and open-source alternative to GitHub Copilot?’ a: ‘Yes. Tabby is licensed under Apache-2.0 and is free for individual use, unlike GitHub Copilot ($10/month) which is proprietary. Its key differentiator is that it is self-hosted, so 100% of your code stays on your own hardware instead of being sent to a third-party cloud.’
  • q: ‘How do I run Tabby with Docker?’ a: ‘Pull and run the official image with a command like: docker run -d –name tabby –gpus all -p 8080:8080 -v $HOME/.tabby:/data registry.tabbyml.com/tabbyml/tabby serve –model StarCoder-1B –chat-model Qwen2-1.5B-Instruct –device cuda. Tabby supports CUDA (NVIDIA), Metal (Apple Silicon), ROCm (AMD), and CPU-only backends, and downloads model weights to $HOME/.tabby on first boot.’
  • q: ‘Can Tabby run completely offline?’ a: ‘Yes. After the initial model download, Tabby operates entirely without internet access. The inference server, IDE extensions, and admin dashboard all run on your local network, which makes it suitable for air-gapped environments.’
  • q: ‘What hardware do I need to run Tabby?’ a: ‘For individual use, a laptop with 16GB RAM and an M-series MacBook or an NVIDIA GPU with 8GB+ VRAM runs the StarCoder2-3B model comfortably. For teams, allocate roughly 4GB VRAM per concurrent user; a 7B model on an RTX 4090 (24GB) supports 4-6 developers simultaneously.’
  • q: ‘Which IDEs and models does Tabby support?’ a: ‘Tabby has native extensions for VS Code, JetBrains IDEs, Vim/Neovim, and Emacs, connecting to the local server via HTTP. It works with any OpenAI-compatible model, including StarCoder2, Qwen2.5-Coder, and DeepSeek-Coder, and can delegate inference to Ollama for dynamic multi-model management.’

featureImage: /images/articles/tabby-self-hosted-ai-coding-assistant-wi.png —{{< resource-info >}} OpenCode: The Open-Source AI Coding Agent That Overtook ClaudePersistent Memory for AI Coding Agents in 2026GitHub Copilot sends your proprietary code to Microsoft’s cloud. For teams handling sensitive IP — fintech, healthcare, defense, enterprise SaaS — that is a non-starter. Tabby is the open-source answer: a self-hosted AI coding assistant that runs entirely on your own hardware, with zero external data leakage. With 33,530+ GitHub stars and an active release cadence (v0.32.0 shipped January 2026), Tabby has matured from an experimental project into a production-grade alternative to Copilot. This tabby tutorial walks through a complete Tabby setup, from Docker deployment to IDE integration and production hardening. If you are specifically comparing tabby vs copilot, the comparison table in Section 8 breaks down feature parity and trade-offs.## What Is Tabby?Tabby is a self-hosted AI coding assistant and an open-source GitHub Copilot alternative. It provides real-time code completion, an answer engine for code queries, and inline chat — all running on infrastructure you control. Written in Rust (92.9% of the codebase), Tabby is designed for speed and can run on consumer-grade GPUs, Apple Silicon, or even CPU-only servers.## How Tabby WorksTabby consists of three core components:1. Inference Server: A Rust-based HTTP server that loads coding LLMs and serves completions via an OpenAPI-compatible endpoint. It handles model inference, prompt templating, and streaming responses.2. IDE Extensions: Native extensions for VS Code, JetBrains IDEs, Vim/Neovim, and Emacs that capture editor context and forward completion requests to the inference server.3. Admin Dashboard: A built-in web UI for user management, API token generation, repository indexing, and usage analytics. No external database required — Tabby uses an embedded SQLite store.

Tabby Architecture
The data flow is straightforward: the IDE extension captures the prefix/suffix context around your cursor, sends it to the local Tabby server, which runs inference against a loaded model (e.g., StarCoder-2-3B or Qwen2.5-Coder-7B), and returns completions in under 500ms on GPU.
Tabby Admin Dashboard
After completing this tabby tutorial, your local server will handle code completion without sending any source code to third-party APIs. For developers evaluating tabby vs copilot, the key differentiator is that every inference request stays on your hardware — a requirement for organizations that classify code as sensitive data.## Installation & Setup### Prerequisites- Docker (recommended) or Docker Compose

  • NVIDIA Container Toolkit (for GPU support on CUDA systems)
  • 4GB+ RAM for small models (1.5B params), 16GB+ for mid-range models (7B params)
  • 10GB+ disk space for model weights### Docker Setup (5 Minutes)The fastest way to get Tabby running is via Docker. Below are commands for the three major compute backends.#### NVIDIA GPU (CUDA)``` bas h
Pull and run Tabby with CUDA acceleration #

docker run -d
–name tabby
–gpus all
-p 8080:8080
-v $HOME/.tabby:/data
registry.tabbyml.com/tabbyml/tabby
serve
–model StarCoder-1B
–chat-model Qwen2-1.5B-Instruct
–device cuda

o
r
systems with SELinux enabled, add the `:Z` flag to the volume mount:```
bas
h
docker run -d \
  --name tabby \
  --gpus all \
  -p 8080:8080 \
  -v $HOME/.tabby:/data:Z \
  registry.tabbyml.com/tabbyml/tabby \
  serve \
  --model StarCoder-1B \
  --chat-model Qwen2-1.5B-Instruct \
  --device cuda
```#### Apple Silicon (Metal)```
ba
```ba
s
h
docker run -d \
  --name tabby \
  --gpus all \
  -p 8080:8080 \
  -v $HOME/.tabby:/data:Z \
  registry.tabbyml.com/tabbyml/tabby \
  serve \
  --model StarCoder-1B \
  --chat-model Qwen2-1.5B-Instruct \
  --device cuda

bas h docker run -d
–name tabby
–device /dev/kfd –device /dev/dri
–group-add video
-p 8080:8080
-v $HOME/.tabby:/data
registry.tabbyml.com/tabbyml/tabby-rocm
serve
–model StarCoder-1B
–device rocm #### C bas h docker run -d
–name tabby
-p 8080:8080
-v $HOME/.tabby:/data
registry.tabbyml.com/tabbyml/tabby
serve
–model StarCoder-1B
–chat-model Qwen2-1.5B-Instruct
–device metal

h
e
Installation```
bas
h
# Check server health
curl http://localhost:8080/v1/health# View logs
docker logs -f tabby# Open the admin dashboard
open http://localhost:8080
```O
n
first boot, Tabby downloads the specified model w```
bas
h
docker run -d \
  --name tabby \
  --device /dev/kfd --device /dev/dri \
  --group-add video \
  -p 8080:8080 \
  -v $HOME/.tabby:/data \
  registry.tabbyml.com/tabbyml/tabby-rocm \
  serve \
  --model StarCoder-1B \
  --device rocm
```versi
o
n
: '3.8'
services:
  tabby:
    image: registry.tabbyml.com/tabbyml/tabby
    container_name: tabby
    restart: unless-stopped
    ports:
      - "8080:8080"
    volumes:
      - $HOME/.tabby:/data
    environment:
      - TABBY_WEBSERVER_JWT_TOKEN_SECRET=CH```
bas
h
docker run -d \
  --name tabby \
  -p 8080:8080 \
  -v $HOME/.tabby:/data \
  registry.tabbyml.com/tabbyml/tabby \
  serve \
  --model Qwen2.5-Coder-0.5B \
  --device cpu
command: >
  serve
  --model StarCoder2-3B
  --chat-model Qwen2.5-Coder-7B-Instruct
  --device cuda
  --parallelism 4
a
t
e
a secure JWT secret:```
bas
h
openssl rand -hex 3```
bas
h
# Check server health
curl http://localhost:8080/v1/health

# View logs
docker logs -f tabby

# Open the admin dashboard
open http://localhost:8080
```nst
a
l
l
tabbyml/tabby/tabby# Run with Metal acceleration
tabby serve \
  --model StarCoder2-3B \
  --chat-model Qwen2-1.5B-Instruct \
  --device metal# Verify
curl http://localhost:8080/v1/health
```## Integration with VS Code, JetBrains, Vim, and OllamaTabby's IDE extensions connect your editor to the local inference server via HTTP. This section covers the four most popular editors and how to use Ollama as a flexible model backen```
yam
l
version: '3.8'
services:
  tabby:
    image: registry.tabbyml.com/tabbyml/tabby
    container_name: tabby
    restart: unless-stopped
    ports:
      - "8080:8080"
    volumes:
      - $HOME/.tabby:/data
    environment:
      - TABBY_WEBSERVER_JWT_TOKEN_SECRET=CHANGE_ME_TO_RANDOM_STRING
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    command: >
      serve
      --model StarCoder2-3B
      --chat-model Qwen2.5-Coder-7B-Instruct
      --device cuda
      --parallelism 4
```r
your server endpoint URL (e.g., `http://localhost:8080`).
4. Generate an API token from the Tabby admin dashboard and paste it into the IDE settings.### Vim / NeovimFor Neovim with `nvim-cmp` and `cmp-tabby`:```
lu
a
-- In your Neovim config (e.g., init.lua)
require('cmp').setup({
  sources = {
    { name = 'tabby' },
  },
})-- Configure Tabby server URL
vim.g.tabby_server_url = 'http://localhost:8080'
```### Using Ollama as a BackendTabby can delegate inference to Ollama, which enables dynamic model switching and multi-model management:```
tom
l
# ~/.tabby/config.toml
[model.completion.http]
kind = "ollama/```
bas
h
openssl rand -hex 32
```e
k
-coder:6.7b"
api_endpoint = "http:```
bas
h
docker compose up -d
```t
e
= "<PRE> {prefix} <SUF>{suffix} <MID>"[model.chat.http]
kind = "openai/chat"
model_name = "qwe```
bas
h
# Install via Homebrew
brew install tabbyml/tabby/tabby

# Run with Metal acceleration
tabby serve \
  --model StarCoder2-3B \
  --chat-model Qwen2-1.5B-Instruct \
  --device metal

# Verify
curl http://localhost:8080/v1/health
``` reads from `config.toml`):```
bas
h
tabby serve --device cuda
```T
h
i
s
setup is ideal when you want to run multiple models on a single GPU with limited VRAM — Ollama handles model loading and unloading dynamically.## Benchmarks / Real-World Use CasesThis section provides hard numbers for anyone running a **self-hosted coding assistant** in production. Tabby's throughput and latency vary by model size and GPU generation. All figures below assume a warm model cache (second request onward).Tabby's performance depends heavily on model size and hardware. The following numbers were collected from community benchmarks and internal testing:| Model | Size | GPU VRAM | Avg Latency | Accept Rate | Best For |
|---|---|---|---|---|---|
| Qwen2.5-Coder-0.5B | 0.5B | 2 GB | ~200ms | 18% | CPU-only setups, rapid testing |
| StarCoder-1B | 1B | 3 GB | ~180ms | 22% | Low-resource deployments |
| StarCoder2-3B | 3B | 6 GB | ~250ms | 28% | Balanced quality/speed |
| Qwen2.5-Coder-7B | 7B | 14 GB | ~350ms | 35% | High-quality completions |
| DeepSeekCoder-6.7B | 6.7B | 13 GB | ~380ms | 33% | Python/JS focused projects |**Accept rate** measures how often a developer accepts a Tabby suggestion versus ignoring or modifying it. For comparison, GitHub Copilot's reported accept rate ```
lu
a
-- In your Neovim config (e.g., init.lua)
require('cmp').setup({
  sources = {
    { name = 'tabby' },
  },
})

-- Configure Tabby server URL
vim.g.tabby_server_url = 'http://localhost:8080'
```e
r
2-3B | $0 |
| Small team (5–10 devs) | RTX 4070 Ti, 16GB VRAM | Qwen2.5-Coder-7B | ~$50 (power) |
| Enterprise (50+ devs) | 2× A100 80GB | Qwen2.5-Coder-7B + chat | ~$500 (hosting) |
| CI/CD batch jobs | CPU-only cloud instances | Qwen2.5-Coder-0.5B | ~$30 |![Tabby IDE Support](https://tabby.tabbyml.com/img/screenshot-ide.png)
```t
o
m
l
# ~/.tabby/config.toml
[model.completion.http]
kind = "ollama/completion"
model_name = "deepseek-coder:6.7b"
api_endpoint = "http://localhost:11434"
prompt_template = "<PRE> {prefix} <SUF>{suffix} <MID>"

[model.chat.http]
kind = "openai/chat"
model_name = "qwen2.5-coder:7b"
api_endpoint = "http://localhost:11434/v1"
```an
c
e
d
Usage / Production Hardening### Repository Context IndexingTabby's killer feature for teams is repository-level context indexing. It clones and indexes your Git repositories, then uses RAG (Retrieval-Augmented Generation) to surface relevant internal code snippets during completion.Add repositories via the admin dashboard:```
bas
h
# Navigate to Repo```
bas
h
ollama pull deepseek-coder:6.7b
ollama pull qwen2.5-coder:7b
ollama serve
```O
r
configure via the scheduler CLI:```
bas
h
docker exec tabby /opt/tabby/bin/tabby-cpu scheduler --now
```### Security Hardening1. **Change the default ```
bas
h
tabby serve --device cuda
```T
_TOKEN_SECRET` to a cryptographically random 32-byte hex string.2. **Run behind a reverse proxy** with TLS termination:```
ngin
x
# Nginx example
server {
    listen 443 ssl;
    server_name tabby.yourcompany.com;    ssl_certificate /etc/letsencrypt/live/tabby.yourcompany.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/tabby.yourcompany.com/privkey.pem;    location / {
        proxy_pass http://localhost:8080;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}
```3. **Enable LDAP/SSO authentication** (Enterprise feature) for team-wide access control.4. **Set resource limits** on the Docker container:```
bas
h
docker run -d \
  --memory=24g \
  --cpus=8 \
  # ... other flags
```### Performance Tuning```
bas
h
# Increase parallelism for concurrent team requests
tabby serve \
  --model StarCoder2-3B \
  --device cuda \
  --parallelism 4# Use half-precision (FP16) to reduce VRAM usage
tabby serve \
  --model StarCoder2-3B \
  --device cuda \
  --dtype float16
```### Monitoring```
bas
h
# Check API health
curl http://localhost:8080/v1/health# Docker stats
docker stats tabby# View recent logs with errors only
docker logs tabby 2>&1 | grep ERROR
```## Comparison with Alternatives| Feature | Tabby | GitHub Copilot | Cursor | Codeium |
|---|---|---|---|
---|
| **Self-hosted** | Yes | No | No | Partial (Enterprise) |
| **License** | Apache-2.0 | Proprietary | Proprietary | Proprietary |
| **Price (individual)** | Free | $10/month | $20/month | Free tier |
| **Code stays local** | Yes | No | No | No |
| **Model flexibility** | Any OpenAI-compatible model | GPT-4 only | Claude/GPT only | Codeium models only |
| **Repo context indexing** | Yes (built-in RAG) | Limited | Yes | Yes |
| **Team management** | Yes (admin dashboard) | Yes (Enterprise) | Yes (Team) | Yes (Teams) |
| **IDE support** | VS Code, JetBrains, Vim, Emacs, Eclipse | VS Code, JetBrains, Vim, Neovim | VS Code only | VS Code, JetBrains, Vim |
| **Setup complexity** | Docker / 1 command | Install extension | Install app | Install extension |
| **Offline capable** | Yes | No | No | No |
| **Stars (GitHub)** | 33,530+ | N/A (Microsoft) | N/A (private) | N/A (private) |Tabby is the only option in this group that keeps 100% of your code on-premises. That distinction matters if you work under SOC 2, HIPAA, ITAR, or similar compliance frameworks.## Limitations / Honest Assessme```
bas
h
# Navigate to Repositories → Add Git URL
# Supports GitHub, GitLab, and self-hosted Git instances
```*Smaller models lag on complex reasoning**: A 3B parameter model will not match GPT-4 on multi-file refactoring or architectural suggestions```
bas
h
docker exec tabby /opt/tabby/bin/tabby-cpu scheduler --now
```frastruct
u
r
e
burden**: You are responsible for GPU maintenance, model updates, and server uptime. There is no SaaS fallback if your server goes down.- **No chat in base install**: The chat/answer engine requires a separate chat model and additional VRAM. Plan your GPU si```
ngin
x
# Nginx example
server {
    listen 443 ssl;
    server_name tabby.yourcompany.com;

    ssl_certificate /etc/letsencrypt/live/tabby.yourcompany.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/tabby.yourcompany.com/privkey.pem;

    location / {
        proxy_pass http://localhost:8080;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}
```B
+ VRAM handles the StarCoder2-3B model comfortably. For team deployments, allocate 4GB VRAM per concurrent user as a rule of thumb. A 7B model on an RTX 4090 (24GB) supports 4–6 developers simultaneously.### Can I use Tabby completely offline?Yes. After the initial model download, Tabby operates entirely without internet access. The inference server, IDE extensions, and admin dashboard all run on your local network. This is one of Tabby's primary advantages for air-gapped environments.### How does Tabby compare to GitHub Copilot in acc```
bas
h
docker run -d \
  --memory=24g \
  --cpus=8 \
  # ... other flags
```e
s
within 5–10% of Copilot. Where Copilot pulls ahead is multi-file context and complex refacto```
bas
h
# Increase parallelism for concurrent team requests
tabby serve \
  --model StarCoder2-3B \
  --device cuda \
  --parallelism 4

# Use half-precision (FP16) to reduce VRAM usage
tabby serve \
  --model StarCoder2-3B \
  --device cuda \
  --dtype float16
```i
n
t
Tabby to a local model path or host your own model server. See the [MODEL_SPEC.md](https://github.com/TabbyML/tabby/blob/main/MODEL_SPEC.md) for the exact format requirements.### Is Tabby suitable for large enterprise teams?Tabby scales to 50+ users with proper hardw```
bas
h
# Check API health
curl http://localhost:8080/v1/health

# Docker stats
docker stats tabby

# View recent logs with errors only
docker logs tabby 2>&1 | grep ERROR
```n
e
e
d
the enterprise license.### How do I update Tabby to a new version?```
bas
h
# Pull the latest image
docker pull registry.tabbyml.com/tabbyml/tabby# Restart the container
docker compose down
docker compose up -d# Verify the new version
curl http://localhost:8080/v1/health
```## ConclusionTabby fills a critical gap in the AI coding assistant market: a fully open-source, self-hosted tool that keeps your code inside your perimeter. With 33,530+ stars, active Rust-based development, and support for the latest coding models (Qwen2.5-Coder, DeepSeek, StarCoder2), it is ready for production use in privacy-conscious teams.**Action items to get started:**1. Run the Docker command in Section 4 to spin up Tabby on your local machine.
2. Install the IDE extension for your editor and connect to `http://localhost:8080`.
3. Index a test repository from the admin dashboard to experience RAG-powered completions.
4. Join the [Tabby community on Telegram](https://t.me/dibi8_ai_hub) for deployment tips and model recommendations.*This article contains affiliate links to hosting providers. These recommendations are based on technical suitability for self-hosted AI workloads, not commercial partnerships.*







## Recommended Hosting & InfrastructureBefore you deploy any of the tools above into production, you'll need solid infrastructure. Two options dibi8 actually uses and recommends:- **DigitalOcean
** — $200 free credit for 60 days across 14+ global regions. The default option for indie devs running open-source AI tools.
- **HTStack
** — Hong Kong VPS with low-latency access from mainland China. This is the same IDC that hosts dibi8.com — battle-tested in production.*Affiliate links — they don't cost you extra and they help keep dibi8.com running.*## Sources & Further Reading- [Tabby GitHub Repository](https://github.com/TabbyML/tabby) — 33,530+ stars, Apache-2.0
- [Tabby Official Documentation](https://tabby.tabbyml.com/docs/welcome/)
- [Tabby Docker Installation Guide](https://tabby.tabbyml.com/docs/quick-start/installation/docker/)
- [Tabby Models Registry](https://tabby.tabbyml.com/docs/models/)
- [Tabby VS Code Extension](https://marketplace.visualstudio.com/items?itemName=TabbyML.vscode-tabby)
- [Tabby JetBrains Plugin](https://plugins.jetbrains.com/plugin/22379-tabby)
- [MODEL_SPEC.md — Custom Model Format](https://github.com/TabbyML/tabby/blob/main/MODEL_SPEC.md)
- [Self-Hosted AI Coding Assistants Comparison 2026](https://scopir.com/posts/self-hosted-ai-coding-assistants-tabby-continue-void/)
- [Tabby Setup with Ollama Backend](https://github.com/TabbyML/tabby/discussions/3285)
- DigitalOcean Cloud Hosting

- HTStack GPU Cloud

```b
a
s
h
# Pull the latest image
docker pull registry.tabbyml.com/tabbyml/tabby

# Restart the container
docker compose down
docker compose up -d

# Verify the new version
curl http://localhost:8080/v1/health

💬 Bình luận & Thảo luận