# Changelog All notable changes to Headroom will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [Unreleased] ### Added - Production-ready proxy server with caching, rate limiting, and metrics - CLI command `headroom proxy` to start the proxy server - **IntelligentContextManager** (semantic-aware context management) - Multi-factor importance scoring: recency, semantic similarity, TOIN importance, error indicators, forward references, token density - No hardcoded patterns - all importance signals learned from TOIN or computed from metrics - TOIN integration for retrieval_rate and field_semantics-based scoring - Strategy selection: NONE, COMPRESS_FIRST, DROP_BY_SCORE based on budget overage - Atomic tool unit handling (call + response dropped together) - Configurable scoring weights via `ScoringWeights` dataclass - `IntelligentContextConfig` for full configuration control - Backwards compatible with `RollingWindowConfig` - **LLMLingua-2 Integration** (opt-in ML-based compression) - `LLMLinguaCompressor` transform using Microsoft's LLMLingua-2 model - Content-aware compression rates (code: 0.4, JSON: 0.35, text: 0.3) - Memory management utilities: `unload_llmlingua_model()`, `is_llmlingua_model_loaded()` - Proxy integration via `--llmlingua` flag - Device selection: `--llmlingua-device` (auto/cuda/cpu/mps) - Custom compression rate: `--llmlingua-rate` - Helpful startup hints when llmlingua is available but not enabled - Install with: `pip install headroom-ai[llmlingua]` - **Code-Aware Compression** (AST-based, syntax-preserving) - `CodeAwareCompressor` transform using tree-sitter for AST parsing - Supports Python, JavaScript, TypeScript, Go, Rust, Java, C, C++ - Preserves imports, function signatures, type annotations, error handlers - Compresses function bodies while maintaining structural integrity - Guarantees syntactically valid output (no broken code) - Automatic language detection from code patterns - Memory management: `is_tree_sitter_available()`, `unload_tree_sitter()` - Uses `tree-sitter-language-pack` for broad language support - Install with: `pip install headroom-ai[code]` - **ContentRouter** (intelligent compression orchestrator) - Auto-routes content to optimal compressor based on type detection - Source hint support for high-confidence routing (file paths, tool names) - Handles mixed content (e.g., markdown with code blocks) - Strategies: CODE_AWARE, SMART_CRUSHER, SEARCH, LOG, TEXT, LLMLINGUA - Configurable strategy preferences and fallbacks - Routing decision log for transparency and debugging - **Custom Model Configuration** - Support for new models: Claude 4.5 (Opus), Claude 4 (Sonnet, Haiku), o3, o3-mini - Pattern-based inference for unknown models (opus/sonnet/haiku tiers) - Custom model config via `HEADROOM_MODEL_LIMITS` environment variable - Config file support: `~/.headroom/models.json` - Graceful fallback for unknown models (no crashes) - Updated pricing data for all current models ## [0.2.0] - 2025-01-07 ### Added - **SmartCrusher**: Statistical compression for tool outputs - Keeps first/last K items, errors, anomalies, and relevance matches - Variance-based change point detection - Pattern detection (time series, logs, search results) - **Relevance Scoring Engine**: ML-powered item relevance - `BM25Scorer`: Fast keyword matching (zero dependencies) - `EmbeddingScorer`: Semantic similarity with sentence-transformers - `HybridScorer`: Adaptive combination of both methods - **CacheAligner**: Prefix stabilization for better cache hits - Dynamic date extraction - Whitespace normalization - Stable prefix hashing - **RollingWindow**: Context management within token limits - Drops oldest tool units first - Never orphans tool results - Preserves recent turns - **Multi-Provider Support**: - Anthropic with official `count_tokens` API - Google with official `countTokens` API - Cohere with official `tokenize` API - Mistral with official tokenizer - LiteLLM for unified interface - **Integrations**: - LangChain callback handler (`HeadroomOptimizer`) - MCP (Model Context Protocol) utilities - **Proxy Server** (`headroom.proxy`): - Semantic caching with LRU eviction - Token bucket rate limiting - Retry with exponential backoff - Cost tracking with budget enforcement - Prometheus metrics endpoint - Request logging (JSONL) - **Pricing Registry**: Centralized model pricing with staleness tracking - **Benchmarks**: Performance benchmarks for transforms and relevance scoring ### Changed - Improved token counting accuracy across all providers - Enhanced tool output compression with relevance-aware selection ### Fixed - Mistral tokenizer API compatibility - Google token counting for multi-turn conversations ## [0.1.0] - 2025-01-05 ### Added - Initial release - `HeadroomClient`: OpenAI-compatible client wrapper - `ToolCrusher`: Basic tool output compression - Audit mode for observation without modification - Optimize mode for applying transforms - Simulate mode for previewing changes - SQLite and JSONL storage backends - HTML report generation - Streaming support ### Safety Guarantees - Never removes human content - Never breaks tool ordering - Parse failures are no-ops - Preserves recency (last N turns) --- ## Migration Guide ### From 0.1.x to 0.2.x The 0.2.0 release is backward compatible. New features are opt-in: ```python # Old code still works from headroom import HeadroomClient, OpenAIProvider # New SmartCrusher (replaces ToolCrusher for better compression) from headroom import SmartCrusher, SmartCrusherConfig config = SmartCrusherConfig( min_tokens_to_crush=200, max_items_after_crush=50, ) crusher = SmartCrusher(config) # New relevance scoring from headroom import create_scorer scorer = create_scorer("hybrid") # or "bm25" for zero deps ``` ### Using the Proxy New in 0.2.0 - run Headroom as a proxy server: ```bash # Start the proxy python -m headroom.proxy.server --port 8787 # Use with Claude Code ANTHROPIC_BASE_URL=http://localhost:8787 claude ``` [Unreleased]: https://github.com/headroom-sdk/headroom/compare/v0.2.0...HEAD [0.2.0]: https://github.com/headroom-sdk/headroom/compare/v0.1.0...v0.2.0 [0.1.0]: https://github.com/headroom-sdk/headroom/releases/tag/v0.1.0