Cut Claude Code Token Usage by 65% With Caveman — Same Quality, Fewer Tokens

If you use Claude Code daily, you have probably noticed the token counter climbing fast. Long explanations, polite filler sentences, and repeated context all add up. More tokens mean slower responses and a higher bill at the end of the month.

Caveman is a Claude Code skill that solves this by making Claude speak in compressed, terse language — like a caveman. The result? 65% fewer output tokens on average, with zero loss in technical accuracy. It has already earned 57,003 stars on GitHub and is one of the most popular productivity skills in the Claude Code ecosystem.

In this post, I will explain why token consumption matters, how Caveman works under the hood, how to install and use it, and what real benchmarks look like.

Why Token Consumption Matters for Claude Code Users

Claude Code is an agentic coding assistant. Unlike a simple chatbot, it reads your codebase, runs commands, and produces multi-step plans. Every word it outputs costs tokens. Every token costs money.

Here is why controlling token usage is critical:

Cost scales linearly with output length. If Claude generates 2,000 tokens instead of 500, you pay four times as much for that single interaction.
Long outputs slow you down. Reading a wall of text to find the one code block you need wastes time.
Context windows are finite. When Claude chatters, it leaves less room for your actual code and instructions in the conversation history.
Verbose does not mean better. A March 2026 paper, “Brevity Constraints Reverse Performance Hierarchies in Language Models” (arXiv:2604.00025 ), found that forcing models to be brief improved accuracy by 26 percentage points on some benchmarks. Less fluff can actually mean more correctness.

The takeaway: fewer tokens is not just cheaper — it is often better.

How Caveman Works

Caveman is a Claude Code skill (also available for Cline, Cursor, Windsurf, and Codex). It injects a lightweight prompt constraint that tells Claude to drop filler, remove articles, use sentence fragments, and get straight to the point.

The key insight is that thinking/reasoning tokens are untouched. Caveman does not make Claude “dumber.” It only compresses the output — the final text that reaches your terminal. The model still reasons at full capability; it just speaks more efficiently afterward.

Caveman offers three intensity levels:

Level	Trigger	Behavior
Lite	`/caveman lite`	Removes filler, keeps grammar. Professional but concise.
Full	`/caveman full`	Default mode. Drops articles, uses fragments, full caveman style.
Ultra	`/caveman ultra`	Maximum compression. Telegraphic. Abbreviates everything.

There is also a Wenyan mode that uses classical Chinese literary compression for the absolute minimum token count, but Full mode is what most developers use.

Installation and Setup

Caveman supports multiple AI coding tools. Choose the one you use:

Claude Code

Clone the skill into your global skills directory:

1git clone https://github.com/JuliusBrussee/caveman.git \
2  ~/.claude/skills/caveman

Restart Claude Code. The skill auto-loads.

Cursor

Copy the rules file into your Cursor project:

1cp caveman/.cursor/.cursorrules /path/to/your/project/

Cline / Roo Code

Install via the Cline marketplace or copy the .clinerules file to your project root.

Windsurf

Copy the .windsurf rules into your project configuration.

Codex (OpenAI)

Use the $caveman trigger after loading the skill.

How to Use Caveman

Once installed, trigger it with any of these phrases:

/caveman
talk like caveman
caveman mode
less tokens please

To return to normal:

stop caveman
normal mode

Example: Asking for a Code Review

Without Caveman:

1User: Review this auth middleware for security issues.
2Claude: Certainly! I would be happy to review your authentication middleware for potential security vulnerabilities. Let me examine the code carefully...
3[700+ tokens of polite preamble and explanation]

With Caveman:

1User: /caveman
2User: Review this auth middleware for security issues.
3Claude: L42: 🔴 bug — user null. Add guard.
4L58: 🟡 weak token expiry. Use 15min max.
5L71: 🔴 secret in env var? Move to vault.

Same technical content. No throat-clearing. 41% fewer tokens in this real benchmark case.

Built-in Skills

Caveman ships with several sub-commands:

Command	Purpose
`/caveman-commit`	Terse commit messages (Conventional Commits, ≤50 chars)
`/caveman-review`	One-line PR comments. No fluff.
`/caveman-help`	Quick-reference card for all modes.
`/caveman-stats`	Real session token usage + estimated savings in USD.
`/caveman:compress <file>`	Rewrites a memory file (e.g., `CLAUDE.md`) into caveman-speak. Cuts ~46% of input tokens.

Real Benchmarks and Results

The Caveman repository includes reproducible benchmarks using the real Claude API. Here are the numbers:

Task	Normal (tokens)	Caveman (tokens)	Saved
Explain React re-render bug	1,180	159	87%
Fix auth middleware token expiry	704	121	83%
Set up PostgreSQL connection pool	2,347	380	84%
Explain git rebase vs merge	702	292	58%
Refactor callback to async/await	387	301	22%
Architecture: microservices vs monolith	446	310	30%
Review PR for security issues	678	398	41%
Docker multi-stage build	1,042	290	72%
Debug PostgreSQL race condition	1,200	232	81%
Implement React error boundary	3,454	456	87%
Average	1,214	294	65%

Range: 22% to 87% savings depending on how verbose the original task is.

What You Gain

Speed: Less text to generate means faster responses.
Readability: No walls of text. Just answers.
Accuracy: All technical info is preserved; only fluff is removed.
Cost: 65% fewer output tokens directly translates to a lower API bill.

Practical Code Example

Here is a side-by-side comparison for a common developer task: asking Claude to explain a Docker multi-stage build.

Normal Mode (1,042 tokens):

1Certainly! I would be happy to explain how a multi-stage Docker build works and why it is beneficial for your Node.js application.
2
3A multi-stage build is a feature in Docker that allows you to use multiple FROM statements in a single Dockerfile. Each FROM instruction can use a different base image, and each stage can copy artifacts from previous stages. This approach is particularly useful because it helps you create smaller, more secure final images by separating the build environment from the runtime environment.
4
5Let me walk you through a typical example for a Node.js application...

Caveman Mode (290 tokens):

 1Multi-stage build = smaller image.
 2
 3Stage 1: Build
 4FROM node:20-alpine AS builder
 5WORKDIR /app
 6COPY package*.json .
 7RUN npm ci
 8COPY . .
 9RUN npm run build
10
11Stage 2: Run
12FROM node:20-alpine
13WORKDIR /app
14COPY --from=builder /app/dist ./dist
15COPY --from=builder /app/node_modules ./node_modules
16CMD ["node", "dist/main.js"]
17
18Why: builder has gcc, python, devdeps. Runtime no need. Image go from 1.2GB to 180MB.

Same technical depth. Same code. 72% fewer tokens.

Summary

Caveman is not a gimmick. It is a practical, scientifically grounded optimization for anyone who uses Claude Code heavily. By compressing Claude’s output into terse, caveman-style language, it delivers:

65% average token reduction
Faster responses
Easier-to-read output
Lower API costs
Unchanged reasoning quality

If your monthly Claude Code bill is climbing or you are tired of scrolling through polite preamble to find the actual answer, Caveman is worth installing today.

Get it here: github.com/JuliusBrussee/caveman

Star count: 57,003 and growing.

Have you tried Caveman? Drop your token savings in the comments below.

Why Token Consumption Matters for Claude Code Users#

How Caveman Works#

Installation and Setup#

Claude Code#

Cursor#

Cline / Roo Code#

Windsurf#

Codex (OpenAI)#

How to Use Caveman#

Example: Asking for a Code Review#

Built-in Skills#

Real Benchmarks and Results#

What You Gain#

Practical Code Example#

Summary#

📧 Subscribe to Weekly Picks