--- title: Headroom emoji: 🧠 colorFrom: blue colorTo: indigo sdk: docker app_port: 7860 pinned: false ---
# Headroom **Compress everything your AI agent reads. Same answers, fraction of the tokens.** [![CI](https://github.com/chopratejas/headroom/actions/workflows/ci.yml/badge.svg)](https://github.com/chopratejas/headroom/actions/workflows/ci.yml) [![codecov](https://codecov.io/gh/chopratejas/headroom/graph/badge.svg)](https://app.codecov.io/gh/chopratejas/headroom) [![PyPI](https://img.shields.io/pypi/v/headroom-ai.svg)](https://pypi.org/project/headroom-ai/) [![npm](https://img.shields.io/npm/v/headroom-ai.svg)](https://www.npmjs.com/package/headroom-ai) [![Model: Kompress-base](https://img.shields.io/badge/model-Kompress--base-yellow.svg)](https://huggingface.co/chopratejas/kompress-base) [![Tokens saved: 60B+](https://img.shields.io/badge/tokens%20saved-60B%2B-2ea44f)](https://headroomlabs.ai/dashboard) [![License: Apache 2.0](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE) [![Docs](https://img.shields.io/badge/docs-online-blue.svg)](https://headroom-docs.vercel.app/docs)

Open the live dashboard to see Headroom in action.

--- Every tool call, log line, DB read, RAG chunk, and file your agent injects into a prompt is mostly boilerplate. Headroom strips the noise and keeps the signal β€” **losslessly, locally, and without touching accuracy.** > **100 logs. One FATAL error buried at position 67. Both runs found it.** > Baseline **10,144 tokens** β†’ Headroom **1,260 tokens** β€” **87% fewer, identical answer.** > `python examples/needle_in_haystack_test.py` --- ## Quick start Works with Anthropic, OpenAI, Google, Bedrock, Vertex, Azure, OpenRouter, and 100+ models via LiteLLM. **Wrap your coding agent β€” one command:** ```bash pip install "headroom-ai[all]" headroom wrap claude # Claude Code headroom wrap codex # Codex headroom wrap cursor # Cursor headroom wrap aider # Aider headroom wrap copilot # GitHub Copilot CLI ``` **Drop it into your own code β€” Python or TypeScript:** ```python from headroom import compress result = compress(messages, model="claude-sonnet-4-5") response = client.messages.create(model="claude-sonnet-4-5", messages=result.messages) print(f"Saved {result.tokens_saved} tokens ({result.compression_ratio:.0%})") ``` ```typescript import { compress } from 'headroom-ai'; const result = await compress(messages, { model: 'gpt-4o' }); ``` **Or run it as a proxy β€” zero code changes, any language:** ```bash headroom proxy --port 8787 ANTHROPIC_BASE_URL=http://localhost:8787 your-app OPENAI_BASE_URL=http://localhost:8787/v1 your-app ``` --- ## Why Headroom - **Accuracy-preserving.** GSM8K **0.870 β†’ 0.870** (Β±0.000). TruthfulQA **+0.030**. SQuAD v2 and BFCL both **97%** accuracy after compression. Validated on public OSS benchmarks you can rerun yourself. - **Runs on your machine.** No cloud API, no data egress. Compression latency is milliseconds β€” faster end-to-end for Sonnet / Opus / GPT-4 class models than a hosted service round-trip. - **[Kompress-base](https://huggingface.co/chopratejas/kompress-base) on HuggingFace.** Our open-source text compressor, fine-tuned on real agentic traces β€” tool outputs, logs, RAG chunks, code. Install with `pip install "headroom-ai[ml]"`. - **Cross-agent memory and learning.** Claude Code saves a fact, Codex reads it back. `headroom learn` mines failed sessions and writes corrections straight to `CLAUDE.md` / `AGENTS.md` / `GEMINI.md` β€” reliability compounds over time. - **Reversible (CCR).** Compression is not deletion. The model can always call `headroom_retrieve` to pull the original bytes. Nothing is thrown away. Bundles the [RTK](https://github.com/rtk-ai/rtk) binary for shell-output rewriting β€” full [attribution below](#compared-to). --- ## How it fits ``` Your agent / app (Claude Code, Cursor, Codex, LangChain, Agno, Strands, your own code…) β”‚ prompts Β· tool outputs Β· logs Β· RAG results Β· files β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Headroom (runs locally β€” your data stays here) β”‚ β”‚ ─────────────────────────────────────────────── β”‚ β”‚ CacheAligner β†’ ContentRouter β†’ CCR β”‚ β”‚ β”œβ”€ SmartCrusher (JSON) β”‚ β”‚ β”œβ”€ CodeCompressor (AST) β”‚ β”‚ └─ Kompress-base (text, HF) β”‚ β”‚ β”‚ β”‚ Cross-agent memory Β· headroom learn Β· MCP β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ compressed prompt + retrieval tool β–Ό LLM provider (Anthropic Β· OpenAI Β· Bedrock Β· …) ``` β†’ [Architecture](https://headroom-docs.vercel.app/docs/architecture) Β· [CCR reversible compression](https://headroom-docs.vercel.app/docs/ccr) Β· [Kompress-base model card](https://huggingface.co/chopratejas/kompress-base) ### Canonical pipeline lifecycle Headroom now exposes one stable request lifecycle across `compress()`, the SDK, and the proxy: `Setup` β†’ `Pre-Start` β†’ `Post-Start` β†’ `Input Received` β†’ `Input Cached` β†’ `Input Routed` β†’ `Input Compressed` β†’ `Input Remembered` β†’ `Pre-Send` β†’ `Post-Send` β†’ `Response Received` - **Transforms** still do the work: CacheAligner, ContentRouter, SmartCrusher, CodeCompressor, Kompress-base, IntelligentContext / RollingWindow. - **Pipeline extensions** observe or customize those lifecycle stages via `on_pipeline_event(...)`. - **Compression hooks** still work and now sit alongside the canonical lifecycle instead of being the only extension seam. - **Proxy extensions** remain the server/app integration seam for ASGI middleware, routes, and startup policy. ### Provider slices Provider and tool-specific behavior is being moved behind dedicated modules under `headroom/providers/` so core orchestration stays focused on lifecycle, sequencing, and policy. - **CLI/tool slices**: `headroom/providers/claude`, `copilot`, `codex`, `openclaw` - **Provider runtime slices**: `headroom/providers/claude`, `gemini`, plus shared backend/runtime dispatch in `headroom/providers/registry.py` - **Core files stay orchestration-first**: `wrap.py`, `client.py`, `cli/proxy.py`, and `proxy/server.py` now delegate provider-specific env shaping, API target normalization, backend selection, and transport dispatch instead of inlining those rules. --- ## Proof **Savings on real agent workloads:** | Workload | Before | After | Savings | |-------------------------------|-------:|-------:|--------:| | Code search (100 results) | 17,765 | 1,408 | **92%** | | SRE incident debugging | 65,694 | 5,118 | **92%** | | GitHub issue triage | 54,174 | 14,761 | **73%** | | Codebase exploration | 78,502 | 41,254 | **47%** | **Accuracy preserved on standard benchmarks:** | Benchmark | Category | N | Baseline | Headroom | Delta | |------------|----------|----:|---------:|---------:|----------:| | GSM8K | Math | 100 | 0.870 | 0.870 | **Β±0.000**| | TruthfulQA | Factual | 100 | 0.530 | 0.560 | **+0.030**| | SQuAD v2 | QA | 100 | β€” | **97%** | 19% compression | | BFCL | Tools | 100 | β€” | **97%** | 32% compression | Reproduce: ```bash python -m headroom.evals suite --tier 1 ``` **Community, live:**

60B+ tokens saved by the community in the last 20 days β€” live leaderboard β†’

β†’ [Full benchmarks & methodology](https://headroom-docs.vercel.app/docs/benchmarks) --- ## Built for coding agents | Agent | One-command wrap | Notes | |--------------------|------------------------------------|------------------------------------------------------------------| | **Claude Code** | `headroom wrap claude` | `--memory` for cross-agent memory, `--code-graph` for codebase intel | | **Codex** | `headroom wrap codex --memory` | Shares the same memory store as Claude | | **Cursor** | `headroom wrap cursor` | Prints Cursor config β€” paste once, done | | **Aider** | `headroom wrap aider` | Starts proxy, launches Aider | | **Copilot CLI** | `headroom wrap copilot` | Starts proxy, launches Copilot | | **OpenClaw** | `headroom wrap openclaw` | Installs Headroom as ContextEngine plugin | MCP-native too β€” `headroom mcp install` exposes `headroom_compress`, `headroom_retrieve`, and `headroom_stats` to any MCP client.
headroom learn in action
--- ## Integrations
Drop Headroom into any stack | Your setup | Hook in with | |-------------------------|------------------------------------------------------------------| | Any Python app | `compress(messages, model=…)` | | Any TypeScript app | `await compress(messages, { model })` | | Anthropic / OpenAI SDK | `withHeadroom(new Anthropic())` Β· `withHeadroom(new OpenAI())` | | Vercel AI SDK | `wrapLanguageModel({ model, middleware: headroomMiddleware() })` | | LiteLLM | `litellm.callbacks = [HeadroomCallback()]` | | LangChain | `HeadroomChatModel(your_llm)` | | Agno | `HeadroomAgnoModel(your_model)` | | Strands | [Strands guide](https://headroom-docs.vercel.app/docs/strands) | | ASGI apps | `app.add_middleware(CompressionMiddleware)` | | Multi-agent | `SharedContext().put / .get` | | MCP clients | `headroom mcp install` |
What's inside - **SmartCrusher** β€” universal JSON: arrays of dicts, nested objects, mixed types. - **CodeCompressor** β€” AST-aware for Python, JS, Go, Rust, Java, C++. - **Kompress-base** β€” our HuggingFace model, trained on agentic traces. - **Image compression** β€” 40–90% reduction via trained ML router. - **CacheAligner** β€” stabilizes prefixes so Anthropic/OpenAI KV caches actually hit. - **IntelligentContext** β€” score-based context fitting with learned importance. - **CCR** β€” reversible compression; LLM retrieves originals on demand. - **Cross-agent memory** β€” shared store, agent provenance, auto-dedup. - **SharedContext** β€” compressed context passing across multi-agent workflows. - **`headroom learn`** β€” plugin-based failure mining for Claude, Codex, Gemini.
--- ## Install ```bash pip install "headroom-ai[all]" # Python, everything npm install headroom-ai # TypeScript / Node docker pull ghcr.io/chopratejas/headroom:latest ``` Granular extras: `[proxy]`, `[mcp]`, `[ml]` (Kompress-base), `[agno]`, `[langchain]`, `[evals]`. Requires **Python 3.10+**. β†’ [Installation guide](https://headroom-docs.vercel.app/docs/installation) β€” Docker tags, persistent service, PowerShell, devcontainers. --- ## Documentation | Start here | Go deeper | |-------------------------------------------------------------------------|------------------------------------------------------------------------| | [Quickstart](https://headroom-docs.vercel.app/docs/quickstart) | [Architecture](https://headroom-docs.vercel.app/docs/architecture) | | [Proxy](https://headroom-docs.vercel.app/docs/proxy) | [How compression works](https://headroom-docs.vercel.app/docs/how-compression-works) | | [MCP tools](https://headroom-docs.vercel.app/docs/mcp) | [CCR β€” reversible compression](https://headroom-docs.vercel.app/docs/ccr) | | [Memory](https://headroom-docs.vercel.app/docs/memory) | [Cache optimization](https://headroom-docs.vercel.app/docs/cache-optimization) | | [Failure learning](https://headroom-docs.vercel.app/docs/failure-learning) | [Benchmarks](https://headroom-docs.vercel.app/docs/benchmarks) | | [Configuration](https://headroom-docs.vercel.app/docs/configuration) | [Limitations](https://headroom-docs.vercel.app/docs/limitations) | --- ## Compared to Headroom runs **locally**, covers **every** content type (not just CLI or text), works with every major framework, and is **reversible**. | | Scope | Deploy | Local | Reversible | |----------------------------------|-------------------------------------------------|-------------------------------------|:-----:|:----------:| | **Headroom** | All context β€” tools, RAG, logs, files, history | Proxy Β· library Β· middleware Β· MCP | Yes | Yes | | [RTK](https://github.com/rtk-ai/rtk) | CLI command outputs | CLI wrapper | Yes | No | | [Compresr](https://compresr.ai), [Token Co.](https://thetokencompany.ai) | Text sent to their API | Hosted API call | No | No | | OpenAI Compaction | Conversation history | Provider-native | No | No | > **Attribution.** Headroom ships with the excellent [RTK](https://github.com/rtk-ai/rtk) binary for shell-output rewriting β€” `git show` β†’ `git show --short`, noisy `ls` β†’ scoped, chatty installers β†’ summarized. Huge thanks to the RTK team; their tool is a first-class part of our stack, and Headroom compresses everything downstream of it. --- ## Contributing ```bash git clone https://github.com/chopratejas/headroom.git && cd headroom pip install -e ".[dev]" && pytest ``` Devcontainers in `.devcontainer/` (default + `memory-stack` with Qdrant & Neo4j). See [CONTRIBUTING.md](CONTRIBUTING.md). --- ## Community - **[Live leaderboard](https://headroomlabs.ai/dashboard)** β€” 60B+ tokens saved and counting. - **[Discord](https://discord.gg/yRmaUNpsPJ)** β€” questions, feedback, war stories. - **[Kompress-base on HuggingFace](https://huggingface.co/chopratejas/kompress-base)** β€” the model behind our text compression. ## License Apache 2.0 β€” see [LICENSE](LICENSE).