Spaces:
Build error
Build error
Headroom
The Context Optimization Layer for LLM Applications
Cut your LLM costs by 50-90% without losing accuracy
Why Headroom?
- Zero code changes - works as a transparent proxy
- 50-90% cost savings - verified on real workloads
- Reversible compression - LLM retrieves original data via CCR
- Content-aware - code, logs, JSON each handled optimally
- Provider caching - automatic prefix optimization for cache hits
- Persistent memory - remember across conversations with zero-latency extraction
- Framework native - LangChain, Agno, MCP, agents supported
Headroom vs Alternatives
| Approach | Token Reduction | Accuracy | Reversible | Latency |
|---|---|---|---|---|
| Headroom | 50-90% | No loss | Yes (CCR) | ~1-5ms |
| Truncation | Variable | Data loss | No | ~0ms |
| Summarization | 60-80% | Lossy | No | ~500ms+ |
| No optimization | 0% | Full | N/A | 0ms |
Headroom wins because it intelligently selects relevant content while keeping a retrieval path to the original data.
30-Second Quickstart
Option 1: Proxy (Zero Code Changes)
pip install "headroom-ai[proxy]"
headroom proxy --port 8787
Point your tools at the proxy:
# Claude Code
ANTHROPIC_BASE_URL=http://localhost:8787 claude
# Any OpenAI-compatible client
OPENAI_BASE_URL=http://localhost:8787/v1 cursor
Option 2: LangChain Integration
pip install "headroom-ai[langchain]"
from langchain_openai import ChatOpenAI
from headroom.integrations import HeadroomChatModel
# Wrap your model - that's it!
llm = HeadroomChatModel(ChatOpenAI(model="gpt-4o"))
# Use exactly like before
response = llm.invoke("Hello!")
See the full LangChain Integration Guide for memory, retrievers, agents, and more.
Option 3: Agno Integration
pip install "headroom-ai[agno]"
from agno.agent import Agent
from agno.models.openai import OpenAIChat
from headroom.integrations.agno import HeadroomAgnoModel
# Wrap your model - that's it!
model = HeadroomAgnoModel(OpenAIChat(id="gpt-4o"))
agent = Agent(model=model)
# Use exactly like before
response = agent.run("Hello!")
# Check savings
print(f"Tokens saved: {model.total_tokens_saved}")
See the full Agno Integration Guide for hooks, multi-provider support, and more.
Framework Integrations
| Framework | Integration | Docs |
|---|---|---|
| LangChain | HeadroomChatModel, memory, retrievers, agents |
Guide |
| Agno | HeadroomAgnoModel, hooks, multi-provider |
Guide |
| MCP | Tool output compression for Claude | Guide |
| Any OpenAI Client | Proxy server | Guide |
Features
| Feature | Description | Docs |
|---|---|---|
| Memory | Persistent memory across conversations (zero-latency inline extraction) | Memory |
| Universal Compression | ML-based content detection + structure-preserving compression | Compression |
| SmartCrusher | Compresses JSON tool outputs statistically | Transforms |
| CacheAligner | Stabilizes prefixes for provider caching | Transforms |
| RollingWindow | Manages context limits without breaking tools | Transforms |
| CCR | Reversible compression with automatic retrieval | CCR Guide |
| LangChain | Memory, retrievers, agents, streaming | LangChain |
| Agno | Agent framework integration with hooks | Agno |
| Text Utilities | Opt-in compression for search/logs | Text Compression |
| LLMLingua-2 | ML-based 20x compression (opt-in) | LLMLingua |
| Code-Aware | AST-based code compression (tree-sitter) | Transforms |
Performance
| Scenario | Before | After | Savings |
|---|---|---|---|
| Search results (1000 items) | 45,000 tokens | 4,500 tokens | 90% |
| Log analysis (500 entries) | 22,000 tokens | 3,300 tokens | 85% |
| Long conversation (50 turns) | 80,000 tokens | 32,000 tokens | 60% |
| Agent with tools (10 calls) | 100,000 tokens | 15,000 tokens | 85% |
Overhead: ~1-5ms per request
Providers
| Provider | Token Counting | Cache Optimization |
|---|---|---|
| OpenAI | tiktoken (exact) | Automatic prefix caching |
| Anthropic | Official API | cache_control blocks |
| Official API | Context caching | |
| Cohere | Official API | - |
| Mistral | Official tokenizer | - |
New models auto-supported via naming pattern detection.
Safety Guarantees
- Never removes human content - user/assistant messages preserved
- Never breaks tool ordering - tool calls and responses stay paired
- Parse failures are no-ops - malformed content passes through unchanged
- Compression is reversible - LLM retrieves original data via CCR
Installation
pip install headroom-ai # SDK only
pip install "headroom-ai[proxy]" # Proxy server
pip install "headroom-ai[langchain]" # LangChain integration
pip install "headroom-ai[agno]" # Agno agent framework
pip install "headroom-ai[code]" # AST-based code compression
pip install "headroom-ai[llmlingua]" # ML-based compression
pip install "headroom-ai[all]" # Everything
Requirements: Python 3.10+
Documentation
| Guide | Description |
|---|---|
| Memory Guide | Persistent memory for LLMs |
| Compression Guide | Universal compression with ML detection |
| LangChain Integration | Full LangChain support |
| Agno Integration | Full Agno agent framework support |
| SDK Guide | Fine-grained control |
| Proxy Guide | Production deployment |
| Configuration | All options |
| CCR Guide | Reversible compression |
| Metrics | Monitoring |
| Troubleshooting | Common issues |
Who's Using Headroom?
Add your project here! Open a PR or start a discussion.
Contributing
git clone https://github.com/chopratejas/headroom.git
cd headroom
pip install -e ".[dev]"
pytest
See CONTRIBUTING.md for details.
License
Apache License 2.0 - see LICENSE.
Built for the AI developer community