Back to Skills

Caveman

by JuliusBrussee

productivitybeginner
Claude Code token saverCaveman skillLLM token reductionbrevity constraint AIClaude skill HN #1agent skill token efficiencyJulius Brussee caveman

Claude, asked to refactor a loop: 'I'd be happy to help you refactor that function. Let me analyze the code carefully and walk you through what I'm doing step by step...'Claude with Caveman installed: 'Refactor: extract loop to function. Done.'Caveman is a Claude Code skill that cuts token costs by up to 75% by forcing the model to respond in stripped-down, direct language. No articles (a, an, the). No filler words. No pleasantries. No hedging. Created by Julius Brussee. 5,000+ GitHub stars. Hit #1 trending on GitHub. Ranked #1 on Hacker News with 883 points in early April 2026. MIT licensed.The Core InsightPhrases like 'I'd be happy to help you with that' and 'Let me summarize what I just did' contribute nothing to output quality. They burn tokens. They slow responses. They push users into Claude Pro usage limits faster. They are not polite — they are padding.Caveman strips the padding. It injects a compact system prompt rule at the start of every Claude conversation that enforces the caveman style. Output becomes terse, direct, and measurably cheaper to generate.The Four Grunt LevelsCaveman ships four intensity levels, from light to extreme brevity:Grunt 1 (Light): Removes obvious filler ('I'd be happy to...', 'Let me walk you through...'). Keeps articles and complete sentences. Typical savings: 22%.Grunt 2 (Medium): Strips articles where grammatically non-essential. Replaces multi-word phrases with single words. 'In order to' becomes 'to'. Typical savings: 45%.Grunt 3 (Heavy): Eliminates most prose. Responses become bullet-like. 'The function refactors the loop by extracting...' becomes 'Refactor: extract loop.' Typical savings: 65%.Grunt 4 (Extreme): Near-telegraphic output. 'Fixed: bug line 47' instead of full sentences. Typical savings: 75-87%. Not for customer-facing outputs — designed for developer workflows where you just want the answer.Built-In UtilitiesCaveman includes tools that apply the same philosophy to specific workflows. Terse commit messages (no 'chore:', no 'feat:', just the action). One-line PR reviews that identify the issue without restating the code. Input compression that rewrites your prompts to remove your own filler before sending.The input compression is a second win. Even if you do not shorten Claude's output, shortening your own prompts cuts input token costs. Combined with Grunt 3 output constraints, you hit the 65-75% savings range routinely.Backed by ResearchA March 2026 academic paper titled 'Brevity Constraints Reverse Performance Hierarchies in Language Models' found that brief-constraint outputs improved accuracy by 26 percentage points on certain reasoning benchmarks. The theory: verbose outputs encourage the model to generate filler that introduces factual drift. Forced brevity keeps the model on the essential reasoning path.Caveman operationalizes this research. The grunt levels are calibrated against the paper's findings. Grunt 2 tracks the research's 'optimal brevity' level for accuracy gains.InstallationOne line for Claude Code: /skill install juliusbrussee/caveman. Manual install: download the skill file from the GitHub repo and place it in ~/.claude/skills/. Cursor, Windsurf, and Copilot have equivalent install paths documented in the README.Configuration is minimal. Set your default grunt level in the skill config. Override per-conversation with a /grunt 3 command. That is it.Token Savings ExampleA typical coding task: 'Refactor this 40-line function to use a reducer pattern.' Claude default: 850 output tokens explaining the refactor, showing the diff, noting edge cases, suggesting tests. Claude with Grunt 3: 220 output tokens. Pure diff plus a one-line summary. Same correctness. Same usefulness for a developer who knows what they are doing. 74% fewer tokens.At Claude Opus 4.7 pricing ($5/$25 per million tokens), that is a meaningful reduction on a monthly scale. A developer generating 5M output tokens per month drops from $125 to $32 in Claude API costs.When Not to Use CavemanCustomer-facing documentation. Claude's verbose style works for users who want explanation, not answers. Code walkthroughs for learning. Teaching someone new to programming requires the filler that Caveman removes. Legal or compliance outputs where completeness matters more than brevity.Caveman is a developer tool. Turn it on for your coding sessions, turn it off when you need Claude to write human-facing text.LimitationsGrunt 4 output is hard to read for anyone but the developer who wrote the prompt. Sharing Grunt 4 outputs with a team without context causes confusion.The skill is Claude-specific in its tuning. It works on Cursor and Copilot, but the token savings are less dramatic because those editors have their own system prompts that partially override the skill.Caveman reduces output tokens, not latency. Responses are faster because there is less to generate, but the model still processes the full input. For input token savings, use the bundled input compression utility.For more Claude Code skills, browse repos.skila.ai/skills. For AI coding tools that pair well with Caveman, check tools.skila.ai. For articles on token economics and LLM cost optimization, visit news.skila.ai.

Installation

/skill install juliusbrussee/caveman

Key Features

  • Four grunt levels from light filler removal to extreme telegraphic output
  • Token savings range from 22% (Grunt 1) to 87% (Grunt 4) depending on intensity
  • Built-in terse commit message generator and one-line PR review utility
  • Input compression utility that strips filler from your prompts before sending
  • Backed by March 2026 research paper on brevity constraints improving accuracy by 26pp
  • One-line install for Claude Code, Cursor, Windsurf, and Copilot
  • Per-conversation grunt level override via /grunt command
  • MIT licensed with active maintenance from creator Julius Brussee

Use Cases

  • Developer workflows where Claude's verbose explanations waste time and tokens
  • High-volume coding agent sessions hitting Claude Pro usage limits prematurely
  • Batch code review pipelines where one-line reviews scale better than paragraph feedback
  • Terminal-based agent loops where terse output is easier to parse programmatically
  • Cost-sensitive production deployments running at Grunt 3 by default to cut API bills 65%

Related Resources

Weekly AI Digest