agentmemory: Never Explain Your Codebase to an AI Agent Again

The Problem Every Developer Faces

You spend thirty minutes explaining your authentication architecture to Claude Code. You describe your JWT setup, your middleware choices, your testing strategy. The agent writes great code.

Then you close the session. Start a new one. And spend another thirty minutes explaining the exact same thing.

This is the “cold start problem” that every user of AI coding assistants faces. Built-in memory solutions like CLAUDE.md, .cursorrules, or Notepads are essentially sticky notes — they have line limits, can’t be searched, and degrade into unreadable walls of text after a few sessions.

agentmemory solves this problem. It’s an open-source persistent memory engine designed specifically for AI coding agents. After just one installation (npx @agentmemory/agentmemory), your coding agent remembers everything from previous sessions — automatically.

What Is agentmemory?

agentmemory is a memory engine + MCP server that runs silently in the background, captures what your coding agent does, compresses observations into structured facts, and injects relevant context when you start a new session.

Key stats at a glance:

Metric	Value
GitHub Stars	3,700+ (trending fast — 533 stars/day on GitHub Trending)
Current Version	v0.9.5 (just released May 9, 2026)
License	Apache 2.0
Retrieval Accuracy	95.2% R@5 (on LongMemEval-S benchmark)
Token Savings	~92% fewer tokens vs. manual context pasting
Cost per Year	~$10/year (with local embeddings: $0)
External Dependencies	None (SQLite + iii-engine only)
Source Files	118 files, ~21,800 lines of code

Why agentmemory Matters

The Token Cost Crisis

Consider these annual costs for context management:

Approach	Tokens/Year	Annual Cost
Paste full context every session	19.5M+	Impossible (exceeds context window)
LLM-summarized notes	~650K	~$500
agentmemory	~170K	~$10
agentmemory + local embeddings	~170K	$0

A typical senior developer who re-explains their architecture three times per week burns through over 19 million tokens annually. With GPT-4-level models charging $10–$30 per million tokens, that’s hundreds of dollars wasted on redundant context.

agentmemory Reduces This to ~1,900 Tokens Per Session

Here’s how it works:

Session 1: "Add auth to the API"
  → Agent writes code, runs tests, fixes bugs
  → agentmemory silently captures every tool use via hooks
  → Session ends → observations compressed into structured memory

Session 2: "Now add rate limiting"
  → Agent already knows:
    - Auth uses JWT middleware in src/middleware/auth.ts
    - Tests in test/auth.test.ts cover token validation
    - You chose jose over jsonwebtoken for Edge compatibility
  → Zero re-explaining. Starts working immediately.

That’s a 92% reduction in token usage — which translates directly to money saved.

Core Architecture: How It Actually Works

The Memory Pipeline

agentmemory doesn’t just dump everything into a file. It implements a sophisticated pipeline inspired by human memory consolidation:

PostToolUse hook fires
  → SHA-256 deduplication (5-minute window)
  → Privacy filter (strips secrets, API keys)
  → Store raw observation
  → LLM compression → structured facts + concepts + narratives
  → Vector embedding (6 providers available + local option)
  → Index in BM25 + vector databases

Stop / SessionEnd hook fires
  → Summarize session
  → Knowledge graph extraction (optional)
  → Slot reflection (optional)

SessionStart hook fires
  → Load project profile (top concepts, files, patterns)
  → Hybrid search (BM25 + vector + knowledge graph)
  → Token budget (default: 2,000 tokens)
  → Inject only relevant context into conversation

Four-Tier Memory Consolidation

Inspired by how the human brain processes memories during sleep:

Tier	What It Stores	Human Analogy
Working	Raw observations from tool calls	Short-term memory
Episodic	Compressed session summaries	Episodic memory (“what happened”)
Semantic	Extracted facts and patterns	Semantic memory (“what I know”)
Procedural	Workflows and decision patterns	Procedural memory (“how to do things”)

Frequently accessed memories strengthen over time. Stale memories decay according to an Ebbinghaus-inspired curve. Contradictions between memories are detected and resolved automatically.

Triple-Stream Search Engine

When you start a new session, agentmemory needs to find the right context to inject. It uses three parallel retrieval streams fused together:

Stream	Method	When Used
BM25	Stemmed keyword matching with synonym expansion	Always active
Vector	Cosine similarity over dense embeddings	Embedding provider configured
Graph	Knowledge graph traversal via entity matching	Entities detected in query

Results are merged using Reciprocal Rank Fusion (RRF) with a k-value of 60, plus session diversification (maximum 3 results per topic). This achieves 95.2% recall at rank 5 — significantly higher than pure BM25 fallbacks (86.2%) or competitor approaches.

Works With Every Major AI Coding Agent

One of agentmemory’s strongest selling points is universal compatibility. A single memory server serves all your agents:

Agent	Integration Method
Claude Code	12 hooks + MCP + plugin (auto-wires 51 tools)
Cursor	MCP server config (one JSON entry)
Gemini CLI	`gemini mcp add` command
Codex CLI	MCP config in `.codex/config.toml`
OpenCode	MCP config in `opencode.json`
Cline	MCP server settings
Goose	MCP server settings
Kilo Code	MCP server settings
Roo Code	MCP server settings
Aider	REST API calls
Claude Desktop	MCP server config
Windsurf	MCP server settings
Any other MCP client	107 REST endpoints

The beauty: all agents share the same memory database. If you switch from Claude Code to Cursor mid-project, the agent in Cursor already knows everything the agent in Claude Code learned.

Quick Setup: 30 Seconds to Memory

Installation

# Terminal 1: Start the memory server
npx @agentmemory/agentmemory

# Terminal 2: Seed sample data and see search in action
npx @agentmemory/agentmemory demo

That’s literally it. The server starts on port 3111 (API) and 3113 (real-time viewer). No PostgreSQL, no Redis, no Qdrant — zero external dependencies.

Configure Your Agent

For Cursor, add this to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "agentmemory": {
      "command": "npx",
      "args": ["-y", "@agentmemory/mcp"]
    }
  }
}

For Claude Code, paste this instruction:

Install agentmemory: run `npx @agentmemory/agentmemory` in a separate terminal to start the memory server. Then run `/plugin marketplace add rohitg00/agentmemory` and `/plugin install agentmemory` — the plugin registers all 12 hooks, 4 skills, AND auto-wires the `@agentmemory/mcp` stdio server.

Real-Time Viewer

Open http://localhost:3113 to watch memories build live. The viewer shows:

Live observation stream
Session explorer
Memory browser
Knowledge graph visualization
Health dashboard

Advanced Features

Session Replay

Every recorded session is fully replayable. Open the viewer’s Replay tab and scrub through timelines showing prompts, tool calls, results, and responses — with play/pause, speed control (0.5× to 4×), and keyboard shortcuts.

Import existing transcripts too:

# Import all Claude Code JSONL transcripts
npx @agentmemory/agentmemory import-jsonl

# Or import a specific file
npx @agentmemory/agentmemory import-jsonl ~/.claude/projects/-my-project/abc123.jsonl

Embedded Local Models (Free!)

Want the highest quality embeddings without any API cost?

npm install @xenova/transformers

This installs all-MiniLM-L6-v2 locally — providing +8 percentage points of recall improvement over BM25-only searches. Completely offline, no API keys needed.

Available embedding providers:

Provider	Model	Cost
Local (recommended)	`all-MiniLM-L6-v2`	Free, offline
Gemini	`text-embedding-004`	Free tier (1500 RPM)
OpenAI	`text-embedding-3-small`	$0.02/1M tokens
Voyage AI	`voyage-code-3`	Paid (code-optimized)
Cohere	`embed-english-v3.0`	Free trial

Privacy & Security

Your code is never sent to third-party servers for memory storage:

Secrets, API keys, and <private> tags are filtered before storage
All processing happens locally on your machine
Memory data is stored in a local SQLite database
Optional bearer token auth for REST endpoints (AGENTMEMORY_SECRET)
Full audit trail for every memory operation
Git-versioned snapshots for rollback capability

51 MCP Tools

Beyond basic save/retrieve, agentmemory offers an extensive toolkit:

Core tools (always available):

memory_smart_search — hybrid semantic + keyword search
memory_recall — search past observations
memory_save — save insights, decisions, patterns
memory_patterns — detect recurring patterns automatically
memory_timeline — chronological view of observations
memory_profile — get project intelligence summary
memory_export / memory_import — backup and restore

Extended tools (51 total with AGENTMEMORY_TOOLS=all):

memory_graph_query — knowledge graph traversal
memory_consolidate — force 4-tier memory consolidation
memory_claude_bridge_sync — bidirectional sync with MEMORY.md
memory_team_share / memory_team_feed — team memory sharing
memory_audit / memory_governance_delete — governed deletion
memory_action_create / memory_frontier — action item tracking
memory_lease — exclusive multi-agent action leases
memory_checkpoint — external condition gates
memory_diagnose / memory_heal — self-healing infrastructure

Comparison: agentmemory vs Competitors

Feature	agentmemory	mem0	Letta/MemGPT	Built-in (CLAUDE.md)
Type	Memory engine + MCP server	Memory layer API	Full agent runtime	Static file
Retriev al R@5	95.2%	68.5%	83.2%	N/A (greps everything)
Auto-capture	12 hooks (zero effort)	Manual `add()` calls	Agent self-edits	Manual editing
Search	BM25 + Vector + Graph (RRF)	Vector + Graph	Vector (archival)	Loads all into context
Multi-agent	MCP + REST + leases	API only (no coordination)	Within runtime only	Per-agent files
Framework lock-in	None (any MCP client)	None	High (must use Letta)	Per-agent format
External deps	None	Qdrant/pgvector	Postgres + vector DB	None
Memory lifecycle	4-tier + decay + auto-forget	Passive extraction	Agent-managed	Manual pruning
Real-time viewer	Yes (port 3113)	Cloud dashboard	Cloud dashboard	No
Self-hosted	Yes (default)	Optional	Optional	Yes

agentmemory wins on: retrieval accuracy, zero external dependencies, cross-agent compatibility, real-time observability, and lowest annual cost.

Practical Use Cases

Use Case 1: Startup Backend Development

You’re building a SaaS product. In Session 1, you set up JWT authentication with jose middleware. By Session 5, when you need to add OAuth2 login, the agent already knows:

Your auth system uses jose, not jsonwebtoken
Your tests are in test/auth.test.ts
You prefer Edge-compatible libraries
Your deployment target is Cloudflare Workers

No re-explaining. The agent starts implementing OAuth2 immediately.

Use Case 2: Legacy Codebase Migration

You’re migrating a Rails app to Node.js. Over multiple sessions, agentmemory learns:

Which modules depend on ActiveRecord callbacks
Your team’s preferred migration pattern
Known gotchas from previous debugging sessions
The structure of your most complex business logic

Each new session becomes faster because the agent has accumulated institutional knowledge.

Use Case 3: Team Collaboration

With namespaced shared memory and the memory_team_share tool, team members can:

Share architectural decisions across the team
Propagate important bug patterns
Maintain a living knowledge base of “what worked”
Track action items and ownership via memory_action_* tools

Getting Started Today

# Install and start (requires Node.js >= 20)
npx @agentmemory/agentmemory

# See it in action
npx @agentmemory/agentmemory demo

# Open the real-time viewer
open http://localhost:3113

Or from source:

git clone https://github.com/rohitg00/agentmemory.git
cd agentmemory
npm install && npm run build && npm start

Final Verdict

agentmemory addresses one of the most expensive and frustrating problems in modern software development: the repeated cost of re-establishing context with AI coding agents. At 95.2% retrieval accuracy, 92% token savings, zero external dependencies, and support for 15+ coding agents, it’s positioned as the definitive memory solution for the AI-assisted developer workflow.

For anyone spending significant time with Claude Code, Cursor, Gemini CLI, or any MCP-enabled coding assistant, agentmemory pays for itself within days through reduced token usage alone. The fact that it’s open-source under Apache 2.0 and self-hosted makes it the clear winner over cloud-dependent alternatives.

GitHub: rohitg00/agentmemory Website: agent-memory.dev Version: v0.9.5

What AI coding agent tools are you currently using? Have you tried adding persistent memory to your workflow? Share your experience in the comments below.

The Problem Every Developer Faces#

What Is agentmemory?#

Why agentmemory Matters#

The Token Cost Crisis#

agentmemory Reduces This to ~1,900 Tokens Per Session#

Core Architecture: How It Actually Works#

The Memory Pipeline#

Four-Tier Memory Consolidation#

Triple-Stream Search Engine#

Works With Every Major AI Coding Agent#

Quick Setup: 30 Seconds to Memory#

Installation#

Configure Your Agent#

Real-Time Viewer#

Advanced Features#

Session Replay#

Embedded Local Models (Free!)#

Privacy & Security#

51 MCP Tools#

Comparison: agentmemory vs Competitors#

Practical Use Cases#

Use Case 1: Startup Backend Development#

Use Case 2: Legacy Codebase Migration#

Use Case 3: Team Collaboration#

Getting Started Today#

Final Verdict#