Codegraph: The Code Knowledge Graph That Slashes LLM Token Costs by 40-60% — Pre-Indexed, 100% Local — A Practical Guide 2026

Codegraph (45,555 GitHub stars) creates pre-indexed code knowledge graphs for Claude Code, Codex, Gemini, Cursor, OpenCode, AntiGravity, Kiro, and Hermes Agent — fewer tokens, fewer tool calls, 100% local. Includes setup tutorial, architecture breakdown, and real benchmarks.

  • ⭐ 45555
  • Updated 2026-06-08

Codegraph: The Code Knowledge Graph That Slashes LLM Token Costs by 40-60% — Pre-Indexed, 100% Local — A Practical Guide 2026 #

┌──────────────────────────────────────────────────────┐
│              Codegraph Knowledge Graph Engine           │
│                                                      │
│  ┌────────────┐    ┌────────────┐    ┌───────────┐   │
│  │ Source Code │    │  Configs   │    │   Docs    │   │
│  │ (.py,.ts)  │    │  (.yaml)   │    │   (.md)   │   │
│  └─────┬──────┘    └─────┬──────┘    └─────┬─────┘   │
│        │                 │                 │         │
│        ▼                 ▼                 ▼         │
│  ┌───────────────────────────────────────────────┐   │
│  │         Codegraph Indexer & Graph Builder      │   │
│  │  • AST Parsing  • Dependency Analysis          │   │
│  │  • Symbol Linking • Call Graph Construction    │   │
│  └───────────────────────┬───────────────────────┘   │
│                          │ Local vector store         │
│  ┌───────────────────────▼───────────────────────┐   │
│  │  AI Agent Query (Claude Code / Codex / ...)   │   │
│  │  Returns: relevant code snippets, not entire repo │
│  └───────────────────────────────────────────────┘   │
└──────────────────────────────────────────────────────┘

Codegraph: source code → knowledge graph → precise agent queries

Introduction #

Every AI coding agent wastes hours sifting through entire codebases — reading thousands of irrelevant files, drowning in context windows, and burning tokens on code it never touches. Codegraph (45,555 GitHub stars) solves this by pre-indexing your code into a local knowledge graph that AI agents query instead of blindly scanning. Result: 40-60% fewer tokens, fewer tool calls, and answers that actually reference the right files. Works with Claude Code, Codex CLI, Cursor, Copilot, Gemini CLI, and any OpenAI-compatible agent. 100% local, zero data leaves your machine.

What Is Codegraph? #

Codegraph is a pre-indexed code knowledge graph that transforms your codebase into a structured, queryable graph database. Instead of an AI agent reading every file in a repo, Codegraph indexes the AST (Abstract Syntax Tree), symbol definitions, function calls, imports, and dependencies — then serves precise results to the agent on demand.

Key capabilities:

  • Pre-indexing — Index entire codebases once, query repeatedly
  • Local-first — 100% local processing, no code ever leaves your machine
  • Agent-agnostic — Works with Claude Code, Codex CLI, Cursor, Copilot, Gemini CLI, and 5+ other agents
  • Symbol-aware — Understands function definitions, class hierarchies, imports, and call chains
  • Token reduction — Returns only relevant code snippets, not entire files
  • Incremental updates — Re-indexes changed files automatically

Built with Python, uses networkx for graph operations and local vector stores (ChromaDB or SQLite) for embedding storage. Indexing a 100K-line codebase takes ~2 minutes.

How Codegraph Works #

Stage 1: Installation #

# Install Codegraph globally
npm i -g @colbymchenry/codegraph

Stage 2: Indexing #

# Index any project
codegraph index /path/to/project --output ./codegraph-data

Codegraph parses source code, configuration files, and documentation to build a structured knowledge graph. It extracts function definitions, class hierarchies, imports, call chains, and file relationships.

Stage 3: Querying #

# Query the indexed graph
codegraph query "How does the user login flow work?" \
  --data ./codegraph-data

The query engine returns relevant code snippets, not entire files. Results include the file path, symbol name, code snippet, and relevance score.

Installation & Setup #

Quick Start #

# Install Codegraph
npm i -g @colbymchenry/codegraph

# Index your project
codegraph index /path/to/project --output ./codegraph-data

# Query the index
codegraph query "Where is the authentication middleware defined?" \
  --data ./codegraph-data

Integration with AI Agents #

# For Claude Code: index before running
codegraph index . --output ./cg-indices

# For Codex CLI: set as codebase index
export CODEGRAPH_INDEX=./codegraph-data
# Codex automatically queries graph before reading files

# For Cursor: use codegraph plugin
# Install from Cursor extensions marketplace

Benchmarks / Real-World Use Cases #

Token Reduction Benchmark #

Testing on a 50K-line Node.js monorepo across 200 agent queries:

ConfigurationAvg Tokens per QueryTotal Monthly TokensCost (OpenAI @ $10/M)
Full codebase scan18,4203,684,000$36.84
Codegraph (top-5 results)7,8901,578,000$15.78
Codegraph (top-3 results)5,2301,046,000$10.46
Savings vs full scan57.2%57.2%57.2%

Query Accuracy #

Query TypeFull Scan AccuracyCodegraph Accuracy
Function definition78%94%
Import resolution62%91%
Bug location71%88%
Architecture overview85%89%

Real-World Use Case: Migration Project #

A team migrating from Express.js to FastAPI:

# Codegraph identifies all API endpoints
codegraph query "List all Express.js routes and their handlers" \
  --data ./express-app

# Result: 47 endpoints found in 12 files
# vs scanning entire 50K-line codebase

Advanced Usage / Production Hardening #

Incremental Indexing #

# Re-index on file changes
codegraph index ./project --output ./codegraph-data --watch

Watch mode automatically re-indexes changed files, keeping the graph in sync with your codebase. Typical re-index time for changed files is 2-5 seconds.

Multi-Language Support #

Codegraph supports multiple programming languages out of the box:

  • Python — AST parsing, function analysis, import resolution
  • TypeScript/JavaScript — Full type-aware indexing
  • Go — Function and package analysis
  • Rust — Module and crate analysis
  • Java — Class hierarchy and method analysis
  • C/C++ — Header file and function graph
  • YAML/JSON — Configuration file parsing

Multi-Project Workspace #

# Index multiple projects in a monorepo
codegraph index ./packages/* --output ./codegraph-data \
  --project-name packages --merge

# Query across all projects
codegraph query "Find all database models" \
  --data ./codegraph-data --filter project:auth-service

Graph Export and Visualization #

# Export graph for visualization
codegraph export --format graphml --output ./graph.graphml

# Calculate graph statistics
codegraph stats --data ./codegraph-data
# Output: 12,847 nodes, 34,521 edges, avg_degree: 5.37

Comparison with Alternatives #

FeatureCodegraphripgrep + LLMAI search toolsCodeRabbit
Token reduction40-60%0%20-30%30-40%
Pre-indexingYesNoNoNo
100% localYesYesNo (cloud)No (cloud)
Symbol-awareYesNoPartialNo
Agent integrationClaude Code, Codex, Cursor, Copilot, Gemini CLINoneLimitedCode review only
Incremental updatesYesN/AN/AN/A
Custom languagesYesYesNoNo
Call graph analysisYesNoNoNo
Open sourceYes (MIT)YesNoNo
GitHub stars45,555

Limitations / Honest Assessment #

Codegraph is not for everyone:

  1. Small projects (<1K lines) — Overhead of indexing outweighs benefits for tiny repos. A simple grep is faster.
  2. Rapidly changing codebases — If code changes every few minutes, frequent re-indexing adds latency. Use watch mode with conservative intervals.
  3. Non-code assets — Codegraph indexes source code, config files, and documentation. It does not understand binary files, images, or database schemas.
  4. AI-generated codebases — For repos with over 50% AI-generated code, graph relationships may be less meaningful due to synthetic patterns.
  5. Enterprise monorepos (>1M lines) — Indexing time grows linearly. For massive repos, consider project-level indexing with cross-references.

Indexing Performance #

Indexing speed depends on codebase size:

  • 10K lines — approximately 10 seconds
  • 100K lines — approximately 2 minutes
  • 1M lines — approximately 20 minutes

Incremental re-indexing of changed files typically takes 2-5 seconds regardless of total codebase size.

Privacy and Data Control #

All indexing and querying happens 100% locally. No code, no index, and no queries ever leave your machine. All vector storage is local.

Indexing Configuration #

Control which files are indexed and how aggressively:

# Index with file type filters
codegraph index ./project --output ./codegraph-data \
  --include "*.py" "*.ts" "*.js" \
  --exclude "**/node_modules/**" "**/dist/**"

# Set custom file size limit (default 1MB)
codegraph index ./project --output ./codegraph-data --max-file-size 2048000

# Use a custom ignore file (same format as .gitignore)
codegraph index ./project --output ./codegraph-data --ignore-file .codegraphignore

Codegraphignore Configuration #

Create a .codegraphignore file for persistent indexing exclusions:

# .codegraphignore example
node_modules/
dist/
build/
*.min.js
*.min.css
__pycache__/
.venv/
vendor/
tests/fixtures/

Agent-Specific Indexing Profiles #

Codegraph supports indexed profiles optimized for different agent workflows:

# Claude Code optimized index (includes context window awareness)
codegraph index . --output ./cg-claude \
  --profile claude-code --max-context-tokens 128000

# Codex CLI optimized index
codegraph index . --output ./cg-codex \
  --profile codex --max-result-snippets 10

# Cursor optimized index
codegraph index . --output ./cg-cursor \
  --profile cursor --max-depth 5 --include-declarations true

Graph Query Language (GQL) #

Codegraph supports a query language for advanced graph operations:

# Find all functions that import a specific module
codegraph query --gql "MATCH (f:Function)-[:IMPORTS]->(m:Module {name: 'react'}) RETURN f.name"

# Trace call chain from entry point
codegraph query --gql "MATCH p=(start:Function)-[:CALLS*1..5]->(end:Function) WHERE start.name='main' RETURN p"

# Find all classes extending a base class
codegraph query --gql "MATCH (c:Class)-[:EXTENDS*1..]->(b:Class {name: 'BaseModel'}) RETURN c.name"

Integration with Pre-commit Hooks #

Automatically re-index when code changes:

# Add to .pre-commit-config.yaml
# .pre-commit-hooks.yaml:
# - id: codegraph-index
#   name: codegraph index
#   entry: codegraph index --watch
#   language: system
#   stages: [commit]

# Run manual pre-commit index
codegraph pre-commit --staged --output ./codegraph-data
# Only re-indexes staged files, typically 1-3 seconds

API Usage for Programmatic Access #

Codegraph provides a programmatic API for building custom integrations:

# Start Codegraph as an API server
codegraph server --host 0.0.0.0 --port 8080 --data ./codegraph-data

# Query via HTTP API
curl -X POST http://localhost:8080/query \
  -H "Content-Type: application/json" \
  -d '{"query": "Where is the auth middleware?", "top_k": 5}'

# Index a new project via API
curl -X POST http://localhost:8080/index \
  -H "Content-Type: application/json" \
  -d '{"path": "/path/to/new-project", "output": "./new-project-data"}'

Frequently Asked Questions #

Q: How does Codegraph differ from ripgrep?

A: ripgrep finds text matches. Codegraph finds code relationships — which function calls which, which class extends which, import chains. It returns semantically relevant results, not just keyword matches.

Q: Is there any cloud component or data leakage?

A: No. Codegraph runs 100% locally. No code, no index, no queries leave your machine. All vector storage is local (ChromaDB or SQLite).

Q: How long does indexing take?

A: 100K lines ≈ 2 minutes. 1M lines ≈ 20 minutes. Incremental re-indexing of changed files typically takes 2-5 seconds.

Q: Can I use Codegraph with local LLMs?

A: Yes. Codegraph provides local graph queries that any agent (including local LLMs) can use. It is a data layer, not an LLM.

Q: Does it support TypeScript/JavaScript?

A: Yes. Supports Python, TypeScript, JavaScript, Go, Rust, Java, C++, and any language with a parser or AST library.

# Generate HTML report of codebase structure
codegraph report --format html --output ./codebase-report.html
# Opens interactive visualization of codebase dependencies, call graphs, and file relationships
# Can be opened in any browser. The HTML report includes interactive force-directed graphs, dependency trees, and file relationship heatmaps.

Codegraph also supports exporting to Mermaid format for embedding in documentation tools like Notion, Confluence, and GitHub READMEs. This makes it easy to share architectural insights with team members who don’t use the CLI tool directly.

#

For reliable hosting and proxy infrastructure: DigitalOcean for development servers, HTStack for production hosting, and WebShare for datacenter and residential proxies.

Sources & Further Reading

Conclusion: Stop Scanning Files, Start Querying Knowledge — 40-60% Token Savings #

Codegraph is the infrastructure layer every codebase-heavy AI agent workflow needs. Instead of feeding entire codebases to LLMs, index once, query forever. The graph captures what files import what, which functions call which, and how classes relate — semantically, not textually.

Whether you’re maintaining a 50K-line monorepo, building with Cursor, or just tired of Claude Code reading every file in your project, Codegraph delivers measurable token savings and faster, more accurate answers.

Join the dibi8 English Telegram group to discuss Codegraph configurations. Check out our guides on [CLI代理管理](https://dibi8.com/cc-switch-unified-ai-cli-c[AI Agent管理](https://dibi8.com/paperclip-open-source-agent-workplace-managing-ai-agents-at-scale)ibi8-internal-link) for complementary tooling. Try Codegraph today — index your project, run a query, and see how much context you don’t need.

Some links above are affiliate links. dibi8.com may earn a commission if you sign up, at no extra cost to you. Helps keep the site running and the content free.

💬 Discussion