# Headroom
**Compress everything your AI agent reads. Same answers, fraction of the tokens.**
[](https://github.com/chopratejas/headroom/actions/workflows/ci.yml)
[](https://app.codecov.io/gh/chopratejas/headroom)
[](https://pypi.org/project/headroom-ai/)
[](https://www.npmjs.com/package/headroom-ai)
[](https://huggingface.co/chopratejas/kompress-base)
[](https://headroomlabs.ai/dashboard)
[](LICENSE)
[](https://headroom-docs.vercel.app/docs)
---
Every tool call, log line, DB read, RAG chunk, and file your agent injects into a prompt is mostly boilerplate. Headroom strips the noise and keeps the signal — **losslessly, locally, and without touching accuracy.**
> **100 logs. One FATAL error buried at position 67. Both runs found it.**
> Baseline **10,144 tokens** → Headroom **1,260 tokens** — **87% fewer, identical answer.**
> `python examples/needle_in_haystack_test.py`
---
## Quick start
Works with Anthropic, OpenAI, Google, Bedrock, Vertex, Azure, OpenRouter, and 100+ models via LiteLLM.
**Wrap your coding agent — one command:**
```bash
pip install "headroom-ai[all]"
headroom wrap claude # Claude Code
headroom wrap codex # Codex
headroom wrap cursor # Cursor
headroom wrap aider # Aider
headroom wrap copilot # GitHub Copilot CLI
```
**Prefer a one-time durable install instead of wrapping every launch:**
```bash
headroom init -g # Detect installed user-scoped agents and wire them to Headroom
headroom init claude # Install repo-local Claude hooks for just this project
headroom init copilot -g # Install user-scoped Copilot hooks and provider routing
```
**Drop it into your own code — Python or TypeScript:**
```python
from headroom import compress
result = compress(messages, model="claude-sonnet-4-5")
response = client.messages.create(model="claude-sonnet-4-5", messages=result.messages)
print(f"Saved {result.tokens_saved} tokens ({result.compression_ratio:.0%})")
```
```typescript
import { compress } from 'headroom-ai';
const result = await compress(messages, { model: 'gpt-4o' });
```
**Or run it as a proxy — zero code changes, any language:**
```bash
headroom proxy --port 8787
ANTHROPIC_BASE_URL=http://localhost:8787 your-app
OPENAI_BASE_URL=http://localhost:8787/v1 your-app
```
---
## Why Headroom
- **Accuracy-preserving.** GSM8K **0.870 → 0.870** (±0.000). TruthfulQA **+0.030**. SQuAD v2 and BFCL both **97%** accuracy after compression. Validated on public OSS benchmarks you can rerun yourself.
- **Runs on your machine.** No cloud API, no data egress. Compression latency is milliseconds — faster end-to-end for Sonnet / Opus / GPT-4 class models than a hosted service round-trip.
- **[Kompress-base](https://huggingface.co/chopratejas/kompress-base) on HuggingFace.** Our open-source text compressor, fine-tuned on real agentic traces — tool outputs, logs, RAG chunks, code. Install with `pip install "headroom-ai[ml]"`.
- **Cross-agent memory and learning.** Claude Code saves a fact, Codex reads it back. `headroom learn` mines failed sessions and writes corrections straight to `CLAUDE.md` / `AGENTS.md` / `GEMINI.md` — reliability compounds over time.
- **Reversible (CCR).** Compression is not deletion. The model can always call `headroom_retrieve` to pull the original bytes. Nothing is thrown away.
Bundles the [RTK](https://github.com/rtk-ai/rtk) binary for shell-output rewriting — full [attribution below](#compared-to).
---
## How it fits
```
Your agent / app
(Claude Code, Cursor, Codex, LangChain, Agno, Strands, your own code…)
│ prompts · tool outputs · logs · RAG results · files
▼
┌────────────────────────────────────────────────────┐
│ Headroom (runs locally — your data stays here) │
│ ─────────────────────────────────────────────── │
│ CacheAligner → ContentRouter → CCR │
│ ├─ SmartCrusher (JSON) │
│ ├─ CodeCompressor (AST) │
│ └─ Kompress-base (text, HF) │
│ │
│ Cross-agent memory · headroom learn · MCP │
└────────────────────────────────────────────────────┘
│ compressed prompt + retrieval tool
▼
LLM provider (Anthropic · OpenAI · Bedrock · …)
```
→ [Architecture](https://headroom-docs.vercel.app/docs/architecture) · [CCR reversible compression](https://headroom-docs.vercel.app/docs/ccr) · [Kompress-base model card](https://huggingface.co/chopratejas/kompress-base)
---
## Proof
**Savings on real agent workloads:**
| Workload | Before | After | Savings |
|-------------------------------|-------:|-------:|--------:|
| Code search (100 results) | 17,765 | 1,408 | **92%** |
| SRE incident debugging | 65,694 | 5,118 | **92%** |
| GitHub issue triage | 54,174 | 14,761 | **73%** |
| Codebase exploration | 78,502 | 41,254 | **47%** |
**Accuracy preserved on standard benchmarks:**
| Benchmark | Category | N | Baseline | Headroom | Delta |
|------------|----------|----:|---------:|---------:|----------:|
| GSM8K | Math | 100 | 0.870 | 0.870 | **±0.000**|
| TruthfulQA | Factual | 100 | 0.530 | 0.560 | **+0.030**|
| SQuAD v2 | QA | 100 | — | **97%** | 19% compression |
| BFCL | Tools | 100 | — | **97%** | 32% compression |
Reproduce:
```bash
python -m headroom.evals suite --tier 1
```
**Community, live:**