12-Factor Agents Explained: The 12 Principles for Production-Grade LLM Software (2026 Guide)

Humanlayer's 12-Factor Agents (22K+ GitHub stars) defines the design patterns that separate demo-grade LLM prototypes from production agents real customers depend on. Full breakdown of all 12 factors — own your prompts, own your context window, stateless reducer model, control flow ownership, human-in-the-loop via tool calls, compact errors, focused agents, and more. With practical application guidance for Claude Code, Codex, OpenCode, MCP-based agent stacks.

  • ⭐ 22000
  • Apache-2.0
  • Updated 2026-05-23

Why “Just Use LangChain” Stopped Working #

Every engineer who has shipped an LLM-powered feature to real users hits the same wall: the prototype works beautifully in a notebook, then collapses the moment a paying customer hits it from a different angle. The agent hallucinates a tool call, the context window blows up halfway through a session, errors silently swallow themselves, retries spin forever, and the postmortem reveals that nobody — not even the engineer who built it — actually understands what the agent was doing when it failed.

The agentic AI ecosystem in 2025–2026 produced dozens of frameworks promising to “make agents production-ready” — LangChain, LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, Pydantic AI, the list goes on. Each one solves the demo problem (compose tool calls, route between sub-agents). Almost none of them solve the production problem (predictable behavior under unexpected inputs, observable failure modes, recoverable sessions).

12-Factor Agents (GitHub: humanlayer/12-factor-agents, 22,000+ stars as of May 2026) is Dex Horthy’s and HumanLayer’s answer to the gap. Modeled on Heroku’s 12-Factor App methodology from 2011, it is a methodology — not a framework, not a runtime, not a SaaS — for thinking about LLM-powered software that real customers will use.

Apache 2.0 for the code samples, CC BY-SA 4.0 for the prose. 273 commits and counting, mostly TypeScript with Python and Jupyter examples for accessibility.


The Core Insight #

Frameworks abstract away the four things that actually matter most when an agent breaks:

  1. The prompt that was sent.
  2. The context window that was active.
  3. The control flow that decided what to do next.
  4. The execution state that needs to survive a crash.

12-Factor Agents argues, line by line, that every one of these should be code you own — not magic the framework hides. The result is more code in your repo and fewer prayers when an incident hits at 3am.

Here are all twelve factors, with what each one actually means in practice.


The 12 Factors #

1. Natural Language to Tool Calls #

The LLM’s job is to translate user intent into a structured tool call — nothing more. Don’t ask the LLM to “do the thing.” Ask it to emit JSON that describes doing the thing, then have deterministic code execute it. This single constraint eliminates an entire category of hallucination-driven incidents.

2. Own Your Prompts #

Prompts are code. They belong in your repo, in version control, behind code review. Templates buried inside a framework’s prompt library are tech debt waiting to silently change behavior when you upgrade. If your agent’s behavior depends on a string, that string belongs to you.

3. Own Your Context Window #

The set of messages currently in the model’s context is the single biggest determinant of behavior. Frameworks that auto-summarize, auto-prune, or auto-inject memory are great until they aren’t. Build your own context assembly logic. You should be able to print the exact array of messages going into every LLM call.

4. Tools Are Just Structured Outputs #

A “tool” is not a magic Function object — it’s a JSON Schema that constrains the LLM’s output. Once you internalize this, you can build “tools” that the LLM doesn’t actually invoke: state transitions, decision branches, escalation requests. Anything that needs the LLM to commit to a shape can be a tool.

5. Unify Execution State and Business State #

Your agent has two state machines: one for “where am I in the conversation” and one for “what is the user’s order/ticket/project doing.” 12-Factor Agents argues these should be the same state machine. Keeping them separate is the most common source of “the agent thinks it finished but the order is still pending” bugs.

6. Launch / Pause / Resume with Simple APIs #

Your agent must be able to be suspended mid-run and resumed later — possibly on a different machine, possibly after a human approval. This means the entire session state must be serializable. No closures-over-local-variables magic. No “the LLM client object is keeping the conversation alive in memory.” Plain data, written somewhere durable.

7. Contact Humans with Tool Calls #

When the agent needs human input — approval, missing info, escalation — it should emit a tool call, not stop dead. The tool call goes into the same queue/UI/inbox that humans monitor. Same pattern as factor 4, applied to the human-in-the-loop case. HumanLayer’s product is the productized version of this principle.

8. Own Your Control Flow #

The for-loop that decides “call LLM → run tool → call LLM again → check if we’re done → call LLM again” is the heart of every agent. Frameworks that hide it (“just yield and we’ll handle the loop”) rob you of the ability to add custom logic — rate limiting, budget caps, human checkpoints, retries — exactly where you need them. Write the loop. It’s twenty lines.

9. Compact Errors into the Context Window #

When a tool call fails, the right thing to do is feed a short, structured error message back into the LLM’s next turn and let it react. NOT crash. NOT silently retry. NOT log-and-pray. A 200-character “TOOL_FAILED: HTTP 503 from /api/orders, payload too large” gives the LLM enough to make a reasonable next move — back off, try a smaller payload, escalate to human.

10. Small, Focused Agents #

One mega-agent that “does everything” is a debugging nightmare. Twelve small agents, each with three tools and one job, are testable and recoverable. The pattern matches microservices, with the same trade-offs: more coordination overhead, vastly better fault isolation.

11. Trigger from Anywhere, Meet Users Where They Are #

An agent’s input shouldn’t be coupled to a single channel. Slack message, email, web form, GitHub issue, Telegram bot, cron job — all the same agent. This requires factor 2 (own your prompts) and factor 5 (unified state) to be solid first, but the payoff is being able to add a new input source without rewriting the agent.

12. Make Your Agent a Stateless Reducer #

The agent is a pure function: state, event → new state, output. No hidden mutation. No “the agent remembers because it has a self.history attribute.” Everything that affects the output is in the input. This is the factor that makes everything else possible — without it, factors 5, 6, and 10 are aspirational.


Applying This to Real Stacks #

To Claude Code Workflows #

Claude Code already implements factors 1, 4, and 7 by design — tool calls are structured outputs, MCP servers add human-in-the-loop, and the tool catalog is owned by you (your project’s MCP config). The factors that need your attention are 2 (the system prompt is yours to customize via CLAUDE.md), 3 (the context window assembly is partially Claude’s, but pagefind/CodeGraph integration lets you shape it), and 9 (when a tool returns an error, ensure it’s compact and structured).

To MCP-Based Agent Stacks #

MCP nails factor 4 (tools as structured outputs over a standardized protocol) and helps with factor 11 (any MCP-aware client can drive any MCP server). MCP itself is silent on factors 5, 6, 8, and 12 — those you have to build above MCP.

To Hermes Agent / OpenCode / Custom Stacks #

These get you a head start on factors 1, 4, and 8 (built-in agent loop, structured tool calls). You still have to bring your own factor 2 (prompts), factor 3 (context shaping), factor 5–6 (state durability), and factor 12 (statelessness).


Where 12-Factor Agents Disagrees With Mainstream Framework Marketing #

A few principles in the manifesto push back hard against the “use our framework and forget the details” pitch:

  • Factor 2 vs. prompt libraries: Most agent frameworks ship a prompt library. 12-Factor says: copy the prompts into your repo, then they’re yours.
  • Factor 3 vs. auto-memory: Frameworks love offering automatic memory (“RAG out of the box”). 12-Factor says: that’s the single biggest source of “why is the agent doing this?” mysteries. Build the assembly yourself.
  • Factor 8 vs. agent runtimes: Hosted runtimes that hide the loop are convenient until you need to inject custom logic. 12-Factor says: write your own loop, it’s small.

This is not a fight against frameworks — it’s a fight against frameworks that hide too much. Use a framework as a library, not a black box.


What 12-Factor Agents Is NOT #

Set expectations:

  • Not a runtime. There’s no pip install twelve-factor-agents. It’s prose, examples, and patterns.
  • Not a single-language thing. Examples are TypeScript and Python, but the principles are language-agnostic.
  • Not a religion. Some factors (especially 10 — small focused agents) involve real trade-offs. The manifesto is honest about that.
  • Not finished. 273 commits and growing. Open issues and discussions actively iterate on the wording.

Who Should Read This #

Yes, read it cover to cover, if you:

  • Are shipping (or about to ship) an LLM-powered feature to paying customers.
  • Have debugged a “but it worked in the demo” agent incident in the last 60 days.
  • Are choosing between hand-rolling an agent loop and adopting LangGraph/CrewAI/Agents SDK.
  • Are doing a vendor comparison like Cursor vs Claude Code and want a checklist of what “production-ready” actually means.

Probably skim if you:

  • Are still in the “first agent” demo stage. (Read factor 1 and factor 2, come back later.)
  • Use only hosted no-code platforms (n8n, Zapier) without writing the agent loop yourself.

Verdict #

12-Factor Agents is the most-cited 2026 manifesto for LLM software for a reason: it puts words to the patterns that engineers shipping production agents have been arriving at independently for the past two years. The Heroku 12-factor parallel is not pretentious — both documents codify what stops being optional once real users depend on it.

The single biggest mindset shift the manifesto pushes: an agent is software, and software needs to be understood by the team that runs it. Frameworks that obscure that contract are technical debt, not productivity.

Combine the 12 factors with a token-efficient symbol layer like CodeGraph, a cost-aware LLM proxy like rtk, and a unified CLI control plane like CC Switch, and you have the architectural backbone of a 2026 production AI stack.


GitHub: humanlayer/12-factor-agents · License: Apache 2.0 (code) / CC BY-SA 4.0 (content) · Stars: 22K+ · Author: Dex Horthy / HumanLayer

💬 Discussion