The Problem Every Developer Faces
You spend thirty minutes explaining your authentication architecture to Claude Code. You describe your JWT setup, your middleware choices, your testing strategy. The agent writes great code.
Then you close the session. Start a new one. And spend another thirty minutes explaining the exact same thing.
This is the “cold start problem” that every user of AI coding assistants faces. Built-in memory solutions like CLAUDE.md, .cursorrules, or Notepads are essentially sticky notes — they have line limits, can’t be searched, and degrade into unreadable walls of text after a few sessions.
agentmemory solves this problem. It’s an open-source persistent memory engine designed specifically for AI coding agents. After just one installation (npx @agentmemory/agentmemory), your coding agent remembers everything from previous sessions — automatically.
What Is agentmemory?
agentmemory is a memory engine + MCP server that runs silently in the background, captures what your coding agent does, compresses observations into structured facts, and injects relevant context when you start a new session.
Key stats at a glance:
| Metric | Value |
|---|---|
| GitHub Stars | 3,700+ (trending fast — 533 stars/day on GitHub Trending) |
| Current Version | v0.9.5 (just released May 9, 2026) |
| License | Apache 2.0 |
| Retrieval Accuracy | 95.2% R@5 (on LongMemEval-S benchmark) |
| Token Savings | ~92% fewer tokens vs. manual context pasting |
| Cost per Year | ~$10/year (with local embeddings: $0) |
| External Dependencies | None (SQLite + iii-engine only) |
| Source Files | 118 files, ~21,800 lines of code |
Why agentmemory Matters
The Token Cost Crisis
Consider these annual costs for context management:
| Approach | Tokens/Year | Annual Cost |
|---|---|---|
| Paste full context every session | 19.5M+ | Impossible (exceeds context window) |
| LLM-summarized notes | ~650K | ~$500 |
| agentmemory | ~170K | ~$10 |
| agentmemory + local embeddings | ~170K | $0 |
A typical senior developer who re-explains their architecture three times per week burns through over 19 million tokens annually. With GPT-4-level models charging $10–$30 per million tokens, that’s hundreds of dollars wasted on redundant context.
agentmemory Reduces This to ~1,900 Tokens Per Session
Here’s how it works:
Session 1: "Add auth to the API"
→ Agent writes code, runs tests, fixes bugs
→ agentmemory silently captures every tool use via hooks
→ Session ends → observations compressed into structured memory
Session 2: "Now add rate limiting"
→ Agent already knows:
- Auth uses JWT middleware in src/middleware/auth.ts
- Tests in test/auth.test.ts cover token validation
- You chose jose over jsonwebtoken for Edge compatibility
→ Zero re-explaining. Starts working immediately.
That’s a 92% reduction in token usage — which translates directly to money saved.
Core Architecture: How It Actually Works
The Memory Pipeline
agentmemory doesn’t just dump everything into a file. It implements a sophisticated pipeline inspired by human memory consolidation:
PostToolUse hook fires
→ SHA-256 deduplication (5-minute window)
→ Privacy filter (strips secrets, API keys)
→ Store raw observation
→ LLM compression → structured facts + concepts + narratives
→ Vector embedding (6 providers available + local option)
→ Index in BM25 + vector databases
Stop / SessionEnd hook fires
→ Summarize session
→ Knowledge graph extraction (optional)
→ Slot reflection (optional)
SessionStart hook fires
→ Load project profile (top concepts, files, patterns)
→ Hybrid search (BM25 + vector + knowledge graph)
→ Token budget (default: 2,000 tokens)
→ Inject only relevant context into conversation
Four-Tier Memory Consolidation
Inspired by how the human brain processes memories during sleep:
| Tier | What It Stores | Human Analogy |
|---|---|---|
| Working | Raw observations from tool calls | Short-term memory |
| Episodic | Compressed session summaries | Episodic memory (“what happened”) |
| Semantic | Extracted facts and patterns | Semantic memory (“what I know”) |
| Procedural | Workflows and decision patterns | Procedural memory (“how to do things”) |
Frequently accessed memories strengthen over time. Stale memories decay according to an Ebbinghaus-inspired curve. Contradictions between memories are detected and resolved automatically.
Triple-Stream Search Engine
When you start a new session, agentmemory needs to find the right context to inject. It uses three parallel retrieval streams fused together:
| Stream | Method | When Used |
|---|---|---|
| BM25 | Stemmed keyword matching with synonym expansion | Always active |
| Vector | Cosine similarity over dense embeddings | Embedding provider configured |
| Graph | Knowledge graph traversal via entity matching | Entities detected in query |
Results are merged using Reciprocal Rank Fusion (RRF) with a k-value of 60, plus session diversification (maximum 3 results per topic). This achieves 95.2% recall at rank 5 — significantly higher than pure BM25 fallbacks (86.2%) or competitor approaches.
Works With Every Major AI Coding Agent
One of agentmemory’s strongest selling points is universal compatibility. A single memory server serves all your agents:
| Agent | Integration Method |
|---|---|
| Claude Code | 12 hooks + MCP + plugin (auto-wires 51 tools) |
| Cursor | MCP server config (one JSON entry) |
| Gemini CLI | gemini mcp add command |
| Codex CLI | MCP config in .codex/config.toml |
| OpenCode | MCP config in opencode.json |
| Cline | MCP server settings |
| Goose | MCP server settings |
| Kilo Code | MCP server settings |
| Roo Code | MCP server settings |
| Aider | REST API calls |
| Claude Desktop | MCP server config |
| Windsurf | MCP server settings |
| Any other MCP client | 107 REST endpoints |
The beauty: all agents share the same memory database. If you switch from Claude Code to Cursor mid-project, the agent in Cursor already knows everything the agent in Claude Code learned.
Quick Setup: 30 Seconds to Memory
Installation
# Terminal 1: Start the memory server
npx @agentmemory/agentmemory
# Terminal 2: Seed sample data and see search in action
npx @agentmemory/agentmemory demo
That’s literally it. The server starts on port 3111 (API) and 3113 (real-time viewer). No PostgreSQL, no Redis, no Qdrant — zero external dependencies.
Configure Your Agent
For Cursor, add this to ~/.cursor/mcp.json:
{
"mcpServers": {
"agentmemory": {
"command": "npx",
"args": ["-y", "@agentmemory/mcp"]
}
}
}
For Claude Code, paste this instruction:
Install agentmemory: run `npx @agentmemory/agentmemory` in a separate terminal to start the memory server. Then run `/plugin marketplace add rohitg00/agentmemory` and `/plugin install agentmemory` — the plugin registers all 12 hooks, 4 skills, AND auto-wires the `@agentmemory/mcp` stdio server.
Real-Time Viewer
Open http://localhost:3113 to watch memories build live. The viewer shows:
- Live observation stream
- Session explorer
- Memory browser
- Knowledge graph visualization
- Health dashboard
Advanced Features
Session Replay
Every recorded session is fully replayable. Open the viewer’s Replay tab and scrub through timelines showing prompts, tool calls, results, and responses — with play/pause, speed control (0.5× to 4×), and keyboard shortcuts.
Import existing transcripts too:
# Import all Claude Code JSONL transcripts
npx @agentmemory/agentmemory import-jsonl
# Or import a specific file
npx @agentmemory/agentmemory import-jsonl ~/.claude/projects/-my-project/abc123.jsonl
Embedded Local Models (Free!)
Want the highest quality embeddings without any API cost?
npm install @xenova/transformers
This installs all-MiniLM-L6-v2 locally — providing +8 percentage points of recall improvement over BM25-only searches. Completely offline, no API keys needed.
Available embedding providers:
| Provider | Model | Cost |
|---|---|---|
| Local (recommended) | all-MiniLM-L6-v2 | Free, offline |
| Gemini | text-embedding-004 | Free tier (1500 RPM) |
| OpenAI | text-embedding-3-small | $0.02/1M tokens |
| Voyage AI | voyage-code-3 | Paid (code-optimized) |
| Cohere | embed-english-v3.0 | Free trial |
Privacy & Security
Your code is never sent to third-party servers for memory storage:
- Secrets, API keys, and
<private>tags are filtered before storage - All processing happens locally on your machine
- Memory data is stored in a local SQLite database
- Optional bearer token auth for REST endpoints (
AGENTMEMORY_SECRET) - Full audit trail for every memory operation
- Git-versioned snapshots for rollback capability
51 MCP Tools
Beyond basic save/retrieve, agentmemory offers an extensive toolkit:
Core tools (always available):
memory_smart_search— hybrid semantic + keyword searchmemory_recall— search past observationsmemory_save— save insights, decisions, patternsmemory_patterns— detect recurring patterns automaticallymemory_timeline— chronological view of observationsmemory_profile— get project intelligence summarymemory_export/memory_import— backup and restore
Extended tools (51 total with AGENTMEMORY_TOOLS=all):
memory_graph_query— knowledge graph traversalmemory_consolidate— force 4-tier memory consolidationmemory_claude_bridge_sync— bidirectional sync with MEMORY.mdmemory_team_share/memory_team_feed— team memory sharingmemory_audit/memory_governance_delete— governed deletionmemory_action_create/memory_frontier— action item trackingmemory_lease— exclusive multi-agent action leasesmemory_checkpoint— external condition gatesmemory_diagnose/memory_heal— self-healing infrastructure
Comparison: agentmemory vs Competitors
| Feature | agentmemory | mem0 | Letta/MemGPT | Built-in (CLAUDE.md) |
|---|---|---|---|---|
| Type | Memory engine + MCP server | Memory layer API | Full agent runtime | Static file |
| Retriev al R@5 | 95.2% | 68.5% | 83.2% | N/A (greps everything) |
| Auto-capture | 12 hooks (zero effort) | Manual add() calls | Agent self-edits | Manual editing |
| Search | BM25 + Vector + Graph (RRF) | Vector + Graph | Vector (archival) | Loads all into context |
| Multi-agent | MCP + REST + leases | API only (no coordination) | Within runtime only | Per-agent files |
| Framework lock-in | None (any MCP client) | None | High (must use Letta) | Per-agent format |
| External deps | None | Qdrant/pgvector | Postgres + vector DB | None |
| Memory lifecycle | 4-tier + decay + auto-forget | Passive extraction | Agent-managed | Manual pruning |
| Real-time viewer | Yes (port 3113) | Cloud dashboard | Cloud dashboard | No |
| Self-hosted | Yes (default) | Optional | Optional | Yes |
agentmemory wins on: retrieval accuracy, zero external dependencies, cross-agent compatibility, real-time observability, and lowest annual cost.
Practical Use Cases
Use Case 1: Startup Backend Development
You’re building a SaaS product. In Session 1, you set up JWT authentication with jose middleware. By Session 5, when you need to add OAuth2 login, the agent already knows:
- Your auth system uses jose, not jsonwebtoken
- Your tests are in
test/auth.test.ts - You prefer Edge-compatible libraries
- Your deployment target is Cloudflare Workers
No re-explaining. The agent starts implementing OAuth2 immediately.
Use Case 2: Legacy Codebase Migration
You’re migrating a Rails app to Node.js. Over multiple sessions, agentmemory learns:
- Which modules depend on ActiveRecord callbacks
- Your team’s preferred migration pattern
- Known gotchas from previous debugging sessions
- The structure of your most complex business logic
Each new session becomes faster because the agent has accumulated institutional knowledge.
Use Case 3: Team Collaboration
With namespaced shared memory and the memory_team_share tool, team members can:
- Share architectural decisions across the team
- Propagate important bug patterns
- Maintain a living knowledge base of “what worked”
- Track action items and ownership via
memory_action_*tools
Getting Started Today
# Install and start (requires Node.js >= 20)
npx @agentmemory/agentmemory
# See it in action
npx @agentmemory/agentmemory demo
# Open the real-time viewer
open http://localhost:3113
Or from source:
git clone https://github.com/rohitg00/agentmemory.git
cd agentmemory
npm install && npm run build && npm start
Final Verdict
agentmemory addresses one of the most expensive and frustrating problems in modern software development: the repeated cost of re-establishing context with AI coding agents. At 95.2% retrieval accuracy, 92% token savings, zero external dependencies, and support for 15+ coding agents, it’s positioned as the definitive memory solution for the AI-assisted developer workflow.
For anyone spending significant time with Claude Code, Cursor, Gemini CLI, or any MCP-enabled coding assistant, agentmemory pays for itself within days through reduced token usage alone. The fact that it’s open-source under Apache 2.0 and self-hosted makes it the clear winner over cloud-dependent alternatives.
GitHub: rohitg00/agentmemory Website: agent-memory.dev Version: v0.9.5
What AI coding agent tools are you currently using? Have you tried adding persistent memory to your workflow? Share your experience in the comments below.