The Problem Every Developer Faces

You spend thirty minutes explaining your authentication architecture to Claude Code. You describe your JWT setup, your middleware choices, your testing strategy. The agent writes great code.

Then you close the session. Start a new one. And spend another thirty minutes explaining the exact same thing.

This is the “cold start problem” that every user of AI coding assistants faces. Built-in memory solutions like CLAUDE.md, .cursorrules, or Notepads are essentially sticky notes — they have line limits, can’t be searched, and degrade into unreadable walls of text after a few sessions.

agentmemory solves this problem. It’s an open-source persistent memory engine designed specifically for AI coding agents. After just one installation (npx @agentmemory/agentmemory), your coding agent remembers everything from previous sessions — automatically.

What Is agentmemory?

agentmemory is a memory engine + MCP server that runs silently in the background, captures what your coding agent does, compresses observations into structured facts, and injects relevant context when you start a new session.

Key stats at a glance:

MetricValue
GitHub Stars3,700+ (trending fast — 533 stars/day on GitHub Trending)
Current Versionv0.9.5 (just released May 9, 2026)
LicenseApache 2.0
Retrieval Accuracy95.2% R@5 (on LongMemEval-S benchmark)
Token Savings~92% fewer tokens vs. manual context pasting
Cost per Year~$10/year (with local embeddings: $0)
External DependenciesNone (SQLite + iii-engine only)
Source Files118 files, ~21,800 lines of code

Why agentmemory Matters

The Token Cost Crisis

Consider these annual costs for context management:

ApproachTokens/YearAnnual Cost
Paste full context every session19.5M+Impossible (exceeds context window)
LLM-summarized notes~650K~$500
agentmemory~170K~$10
agentmemory + local embeddings~170K$0

A typical senior developer who re-explains their architecture three times per week burns through over 19 million tokens annually. With GPT-4-level models charging $10–$30 per million tokens, that’s hundreds of dollars wasted on redundant context.

agentmemory Reduces This to ~1,900 Tokens Per Session

Here’s how it works:

Session 1: "Add auth to the API"
  → Agent writes code, runs tests, fixes bugs
  → agentmemory silently captures every tool use via hooks
  → Session ends → observations compressed into structured memory

Session 2: "Now add rate limiting"
  → Agent already knows:
    - Auth uses JWT middleware in src/middleware/auth.ts
    - Tests in test/auth.test.ts cover token validation
    - You chose jose over jsonwebtoken for Edge compatibility
  → Zero re-explaining. Starts working immediately.

That’s a 92% reduction in token usage — which translates directly to money saved.

Core Architecture: How It Actually Works

The Memory Pipeline

agentmemory doesn’t just dump everything into a file. It implements a sophisticated pipeline inspired by human memory consolidation:

PostToolUse hook fires
  → SHA-256 deduplication (5-minute window)
  → Privacy filter (strips secrets, API keys)
  → Store raw observation
  → LLM compression → structured facts + concepts + narratives
  → Vector embedding (6 providers available + local option)
  → Index in BM25 + vector databases

Stop / SessionEnd hook fires
  → Summarize session
  → Knowledge graph extraction (optional)
  → Slot reflection (optional)

SessionStart hook fires
  → Load project profile (top concepts, files, patterns)
  → Hybrid search (BM25 + vector + knowledge graph)
  → Token budget (default: 2,000 tokens)
  → Inject only relevant context into conversation

Four-Tier Memory Consolidation

Inspired by how the human brain processes memories during sleep:

TierWhat It StoresHuman Analogy
WorkingRaw observations from tool callsShort-term memory
EpisodicCompressed session summariesEpisodic memory (“what happened”)
SemanticExtracted facts and patternsSemantic memory (“what I know”)
ProceduralWorkflows and decision patternsProcedural memory (“how to do things”)

Frequently accessed memories strengthen over time. Stale memories decay according to an Ebbinghaus-inspired curve. Contradictions between memories are detected and resolved automatically.

Triple-Stream Search Engine

When you start a new session, agentmemory needs to find the right context to inject. It uses three parallel retrieval streams fused together:

StreamMethodWhen Used
BM25Stemmed keyword matching with synonym expansionAlways active
VectorCosine similarity over dense embeddingsEmbedding provider configured
GraphKnowledge graph traversal via entity matchingEntities detected in query

Results are merged using Reciprocal Rank Fusion (RRF) with a k-value of 60, plus session diversification (maximum 3 results per topic). This achieves 95.2% recall at rank 5 — significantly higher than pure BM25 fallbacks (86.2%) or competitor approaches.

Works With Every Major AI Coding Agent

One of agentmemory’s strongest selling points is universal compatibility. A single memory server serves all your agents:

AgentIntegration Method
Claude Code12 hooks + MCP + plugin (auto-wires 51 tools)
CursorMCP server config (one JSON entry)
Gemini CLIgemini mcp add command
Codex CLIMCP config in .codex/config.toml
OpenCodeMCP config in opencode.json
ClineMCP server settings
GooseMCP server settings
Kilo CodeMCP server settings
Roo CodeMCP server settings
AiderREST API calls
Claude DesktopMCP server config
WindsurfMCP server settings
Any other MCP client107 REST endpoints

The beauty: all agents share the same memory database. If you switch from Claude Code to Cursor mid-project, the agent in Cursor already knows everything the agent in Claude Code learned.

Quick Setup: 30 Seconds to Memory

Installation

# Terminal 1: Start the memory server
npx @agentmemory/agentmemory

# Terminal 2: Seed sample data and see search in action
npx @agentmemory/agentmemory demo

That’s literally it. The server starts on port 3111 (API) and 3113 (real-time viewer). No PostgreSQL, no Redis, no Qdrant — zero external dependencies.

Configure Your Agent

For Cursor, add this to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "agentmemory": {
      "command": "npx",
      "args": ["-y", "@agentmemory/mcp"]
    }
  }
}

For Claude Code, paste this instruction:

Install agentmemory: run `npx @agentmemory/agentmemory` in a separate terminal to start the memory server. Then run `/plugin marketplace add rohitg00/agentmemory` and `/plugin install agentmemory` — the plugin registers all 12 hooks, 4 skills, AND auto-wires the `@agentmemory/mcp` stdio server.

Real-Time Viewer

Open http://localhost:3113 to watch memories build live. The viewer shows:

  • Live observation stream
  • Session explorer
  • Memory browser
  • Knowledge graph visualization
  • Health dashboard

Advanced Features

Session Replay

Every recorded session is fully replayable. Open the viewer’s Replay tab and scrub through timelines showing prompts, tool calls, results, and responses — with play/pause, speed control (0.5× to 4×), and keyboard shortcuts.

Import existing transcripts too:

# Import all Claude Code JSONL transcripts
npx @agentmemory/agentmemory import-jsonl

# Or import a specific file
npx @agentmemory/agentmemory import-jsonl ~/.claude/projects/-my-project/abc123.jsonl

Embedded Local Models (Free!)

Want the highest quality embeddings without any API cost?

npm install @xenova/transformers

This installs all-MiniLM-L6-v2 locally — providing +8 percentage points of recall improvement over BM25-only searches. Completely offline, no API keys needed.

Available embedding providers:

ProviderModelCost
Local (recommended)all-MiniLM-L6-v2Free, offline
Geminitext-embedding-004Free tier (1500 RPM)
OpenAItext-embedding-3-small$0.02/1M tokens
Voyage AIvoyage-code-3Paid (code-optimized)
Cohereembed-english-v3.0Free trial

Privacy & Security

Your code is never sent to third-party servers for memory storage:

  • Secrets, API keys, and <private> tags are filtered before storage
  • All processing happens locally on your machine
  • Memory data is stored in a local SQLite database
  • Optional bearer token auth for REST endpoints (AGENTMEMORY_SECRET)
  • Full audit trail for every memory operation
  • Git-versioned snapshots for rollback capability

51 MCP Tools

Beyond basic save/retrieve, agentmemory offers an extensive toolkit:

Core tools (always available):

  • memory_smart_search — hybrid semantic + keyword search
  • memory_recall — search past observations
  • memory_save — save insights, decisions, patterns
  • memory_patterns — detect recurring patterns automatically
  • memory_timeline — chronological view of observations
  • memory_profile — get project intelligence summary
  • memory_export / memory_import — backup and restore

Extended tools (51 total with AGENTMEMORY_TOOLS=all):

  • memory_graph_query — knowledge graph traversal
  • memory_consolidate — force 4-tier memory consolidation
  • memory_claude_bridge_sync — bidirectional sync with MEMORY.md
  • memory_team_share / memory_team_feed — team memory sharing
  • memory_audit / memory_governance_delete — governed deletion
  • memory_action_create / memory_frontier — action item tracking
  • memory_lease — exclusive multi-agent action leases
  • memory_checkpoint — external condition gates
  • memory_diagnose / memory_heal — self-healing infrastructure

Comparison: agentmemory vs Competitors

Featureagentmemorymem0Letta/MemGPTBuilt-in (CLAUDE.md)
TypeMemory engine + MCP serverMemory layer APIFull agent runtimeStatic file
Retriev al R@595.2%68.5%83.2%N/A (greps everything)
Auto-capture12 hooks (zero effort)Manual add() callsAgent self-editsManual editing
SearchBM25 + Vector + Graph (RRF)Vector + GraphVector (archival)Loads all into context
Multi-agentMCP + REST + leasesAPI only (no coordination)Within runtime onlyPer-agent files
Framework lock-inNone (any MCP client)NoneHigh (must use Letta)Per-agent format
External depsNoneQdrant/pgvectorPostgres + vector DBNone
Memory lifecycle4-tier + decay + auto-forgetPassive extractionAgent-managedManual pruning
Real-time viewerYes (port 3113)Cloud dashboardCloud dashboardNo
Self-hostedYes (default)OptionalOptionalYes

agentmemory wins on: retrieval accuracy, zero external dependencies, cross-agent compatibility, real-time observability, and lowest annual cost.

Practical Use Cases

Use Case 1: Startup Backend Development

You’re building a SaaS product. In Session 1, you set up JWT authentication with jose middleware. By Session 5, when you need to add OAuth2 login, the agent already knows:

  • Your auth system uses jose, not jsonwebtoken
  • Your tests are in test/auth.test.ts
  • You prefer Edge-compatible libraries
  • Your deployment target is Cloudflare Workers

No re-explaining. The agent starts implementing OAuth2 immediately.

Use Case 2: Legacy Codebase Migration

You’re migrating a Rails app to Node.js. Over multiple sessions, agentmemory learns:

  • Which modules depend on ActiveRecord callbacks
  • Your team’s preferred migration pattern
  • Known gotchas from previous debugging sessions
  • The structure of your most complex business logic

Each new session becomes faster because the agent has accumulated institutional knowledge.

Use Case 3: Team Collaboration

With namespaced shared memory and the memory_team_share tool, team members can:

  • Share architectural decisions across the team
  • Propagate important bug patterns
  • Maintain a living knowledge base of “what worked”
  • Track action items and ownership via memory_action_* tools

Getting Started Today

# Install and start (requires Node.js >= 20)
npx @agentmemory/agentmemory

# See it in action
npx @agentmemory/agentmemory demo

# Open the real-time viewer
open http://localhost:3113

Or from source:

git clone https://github.com/rohitg00/agentmemory.git
cd agentmemory
npm install && npm run build && npm start

Final Verdict

agentmemory addresses one of the most expensive and frustrating problems in modern software development: the repeated cost of re-establishing context with AI coding agents. At 95.2% retrieval accuracy, 92% token savings, zero external dependencies, and support for 15+ coding agents, it’s positioned as the definitive memory solution for the AI-assisted developer workflow.

For anyone spending significant time with Claude Code, Cursor, Gemini CLI, or any MCP-enabled coding assistant, agentmemory pays for itself within days through reduced token usage alone. The fact that it’s open-source under Apache 2.0 and self-hosted makes it the clear winner over cloud-dependent alternatives.

GitHub: rohitg00/agentmemory Website: agent-memory.dev Version: v0.9.5


What AI coding agent tools are you currently using? Have you tried adding persistent memory to your workflow? Share your experience in the comments below.