TurboVec is an open-source AI tool that helps with artificial intelligence workflows. It provides a practical solution for developers and teams looking to leverage AI in their projects.

Is TurboVec free to use?

Yes, TurboVec is open-source and free to use. Check the project GitHub repository for the specific license and any premium features.

How do I install TurboVec?

Install TurboVec by following the setup guide in the article. Most tools can be installed via pip, npm, Homebrew, or Docker depending on the platform.

TurboVec: Rust-Powered Vector Index

Introduction #

Supabase 2026: The Open-Source Firebase Alternative Powering 1M+ • Weaviate 2026: The AI-Native Vector Search Engine Handling 10B+ Objects — Enterprise Deployment Guide RAG applications spend most of their inference time waiting for vector search to return results. TurboVec changes this equation by combining Rust-level performance with Python convenience, using Google Research’s TurboQuant — a data-oblivious quantization technique that achieves near-optimal compression. With 10,700+ GitHub stars and drop-in replacements for LangChain, LlamaIndex, Haystack, and Agno, TurboVec is becoming the default vector store for teams building high-performance RAG systems. The library is actively maintained, with regular releases that add framework integrations, new quantization modes, and performance improvements based on community feedback from production deployments.

For teams building high-performance RAG systems, the choice between vector search libraries ultimately comes down to three factors: query latency, memory efficiency, and integration depth. TurboVec excels on all three dimensions, making it a compelling default choice for new projects in 2026.

What Is TurboVec? #

TurboVec is a high-performance vector index that prioritizes two things: query speed and memory efficiency. Under the hood, it uses TurboQuant — a custom quantization scheme that compresses embeddings to 4-bit precision while maintaining 99%+ retrieval accuracy. Written in Rust and exposed via Python bindings, it gives you C-level performance without leaving the Python ecosystem.

┌─────────────────────────────────────────────────┐
│              TurboVec Architecture               │
├─────────────────────────────────────────────────┤
│                                                  │
│  Python API Layer (pip install turbovec)         │
│    ├─ VectorStore (LangChain drop-in)           │
│    ├─ VectorStore (LlamaIndex drop-in)          │
│    ├─ VectorStore (Haystack drop-in)            │
│    └─ VectorDB (Agno drop-in)                   │
│                                                  │
│  TurboQuant Engine (Rust)                        │
│    ├─ 4-bit vector compression                   │
│    ├─ AVX2/AVX-512 optimized search             │
│    ├─ Disk-backed indexing (10M+ vectors)       │
│    └─ Multi-threaded query execution            │
│                                                  │
│  Persistence Layer                               │
│    ├─ In-memory index                            │
│    ├─ On-disk checkpoint                         │
│    └─ Incremental updates                         │
└─────────────────────────────────────────────────┘

How TurboQuant Works #

Traditional vector stores store embeddings as 32-bit floats (4 bytes per dimension). TurboQuant compresses these to 4 bits (0.5 bytes per dimension) using a combination of product quantization and residual coding.

from turbovec import TurboQuantIndex

# Create a TurboVec index with 4-bit quantization
index = TurboQuantIndex(
    dim=1536,                    # embedding dimension
    bit_width=4,                 # 4-bit TurboQuant compression
)

# Index embeddings
embeddings = generate_embeddings(documents)  # your embedding function
index.add(embeddings)

# Search — returns top-k results in milliseconds
scores, indices = index.search(query_embedding, k=10)

The quantization pipeline works in three stages. First, the embedding space is divided into subspaces using product quantization. Second, residual vectors capture quantization error for high-frequency components. Third, runtime feature detection selects between AVX2 (2013+ CPUs) and AVX-512 (2017+ CPUs) kernels automatically.

Deploy TurboVec: Rust-Powered Vector Index on DigitalOcean

Installation & Setup #

Option 1: pip install (recommended)

pip install turbovec

Option 2: Framework-specific installation

# LangChain integration
pip install turbovec[langchain]

# LlamaIndex integration
pip install turbovec[llama-index]

# Haystack integration
pip install turbovec[haystack]

# Agno integration
pip install turbovec[agno]

Option 3: Build from source (Rust development)

git clone https://github.com/RyanCodrai/turbovec.git
cd turbovec
pip install maturin
maturin develop --release

Option 4: Docker

docker build -t turbovec:latest .
docker run -p 8000:8000 turbovec:latest

Integration with LangChain, LlamaIndex, and Haystack #

TurboVec’s killer feature is its drop-in replacement design. You swap the import and your pipeline keeps running without code changes.

LangChain Integration

from turbovec.integrations.langchain_vectorstore import TurboVecVectorStore
from langchain_core.vectorstores import VectorStoreRetriever

# Drop-in replacement for InMemoryVectorStore
store = TurboVecVectorStore(
    embedding_function=embeddings,
    dim=1536,
    bit_width=4,
)

# Same API as any LangChain vector store
store.add_documents(documents)
retriever = store.as_retriever(search_kwargs={"k": 5})
results = retriever.invoke("your query")

LlamaIndex Integration

from llama_index.vector_stores import TurboVecVectorStore

vector_store = TurboVecVectorStore(
    client=client,
    dim=1536,
    metric="cosine",
)

index = VectorStoreIndex.from_vector_store(vector_store)
query_engine = index.as_query_engine()
response = query_engine.query("What did the author learn?")

Haystack Integration

from haystack.document_stores import TurboVecDocumentStore

document_store = TurboVecDocumentStore(
    embedding_dim=1536,
    similarity="cosine",
)

# Use with Haystack's Retriever
retriever = Retriever(document_store=document_store)
documents = retriever.run(query="your query")

Benchmarks / Real-World Use Cases #

TurboVec’s performance advantage comes from TurboQuant’s 4-bit compression combined with hand-tuned SIMD kernels (NEON on ARM, AVX-512BW on x86).

Official benchmark results (100K vectors, k=64, median of 5 runs):

ARM (Apple M3 Max): TurboQuant beats FAISS IndexPQFastScan by 10–19% across all configurations.

x86 (Intel Xeon Platinum): TurboQuant wins 4-bit configs by up to ~5% and is comparable on 2-bit (within ~8% single-threaded on d=1536, within a few percent on larger dimensions).

Recall (d=1536, 4-bit vs FAISS IndexPQ):

TurboQuant: 99.2% @ R@64
FAISS IndexPQ (8-bit LUT): 97.8% @ R@64

Real-world benchmark command:

# Clone and benchmark locally
git clone https://github.com/RyanCodrai/turbovec.git
cd turbovec
cargo build --release

# Run benchmarks
cargo bench --release

In practice, TurboVec delivers the best performance when used with embeddings that are 768 dimensions or higher. Below 384 dimensions, the quantization savings diminish because the overhead of the quantization pipeline itself becomes significant relative to the small vector sizes. For embeddings in the 384-512 range, consider using 8-bit quantization for the best accuracy-speed tradeoff.

Advanced Usage / Production Hardening #

Persistent Index with Checkpointing

import turbovec

# Create a disk-backed index
index = turbovec.Index(
    dim=1536,
    metric="cosine",
    quantization="4bit",
    capacity=10_000_000,
)

# Add vectors over time
for batch in document_batches:
    embeddings = embed(batch)
    index.add(embeddings)

# Save checkpoint to disk
index.save("my_index.turbovec")

# Load checkpoint in a new process
loaded = turbovec.Index.load("my_index.turbovec")
results = loaded.search(query_emb, k=10)

Multi-threaded Query Execution

# TurboVec uses all available CPU cores by default
import os
os.environ["RAYON_NUM_THREADS"] = "16"

# Each query runs in parallel across threads
results = index.search_parallel(
    query_embeddings,    # multiple queries
    k=10,
    num_threads=16
)

Monitoring Index Performance in Production

import time

# Benchmark current index throughput
start = time.perf_counter()
for _ in range(1000):
    index.search(query_emb, k=10)
elapsed = time.perf_counter() - start
print(f"Throughput: {1000/elapsed:.0f} queries/sec")
print(f"Average latency: {elapsed/1000*1000:.2f} ms per query")

Custom Quantization Configurations

# Trade accuracy for speed: 3-bit quantization
index_3bit = turbovec.Index(
    dim=1536,
    quantization="3bit",    # even smaller, ~98.5% accuracy
)

# Conservative: 8-bit for maximum accuracy
index_8bit = turbovec.Index(
    dim=1536,
    quantization="8bit",    # 99.8% accuracy, 2x bigger
)

Building a Full RAG Pipeline with TurboVec

import turbovec
from transformers import AutoTokenizer, AutoModel

# Load embedding model
tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")

def embed_texts(texts):
    inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
    with torch.no_grad():
        outputs = model(**inputs)
    return outputs.last_hidden_state.mean(dim=1).numpy()

# Build index
index = turbovec.Index(dim=384, metric="cosine", quantization="4bit", capacity=1_000_000)
index.add(embed_texts(document_chunks))

# Query pipeline
query_emb = embed_texts(["What is machine learning?"])[0]
results = index.search(query_emb, k=5)
for i, (idx, score) in enumerate(results):
    print(f"  [{i}] score={score:.4f} chunk={document_chunks[idx][:100]}")

Docker Compose for Production Serving

version: '3.8'
services:
  turbovec:
    image: ryan-codrai/turbovec:latest
    ports:
      - "8000:8000"
    volumes:
      - ./index:/data
    environment:
      - TURBOVEC_CAPACITY=10000000
      - TURBOVEC_DIM=1536
      - TURBOVEC_METRIC=cosine

Comparison with Alternatives #

TurboVec: Rust-Powered Vector Index

Introduction #

What Is TurboVec? #

How TurboQuant Works #

Installation & Setup #

Integration with LangChain, LlamaIndex, and Haystack #

Benchmarks / Real-World Use Cases #

Advanced Usage / Production Hardening #

Comparison with Alternatives #

📦 Featured in collections

💬 Discussion

Introduction #

What Is TurboVec? #

How TurboQuant Works #

Installation & Setup #

Integration with LangChain, LlamaIndex, and Haystack #

Benchmarks / Real-World Use Cases #

Advanced Usage / Production Hardening #

Comparison with Alternatives #

🔗 Related Resources

📦 Featured in collections

💬 Discussion