# Changelog All notable changes to Headroom will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [Unreleased] ### Fixed - **`Learned: error recovery` section in MEMORY.md no longer bloats with stale or contradictory entries.** The dedup key for error-recovery patterns was the literal rendered bullet text, so near-duplicate recoveries (same intent, different `| tail -N` count, same error path guessed against different successors) each created a new row. There was also no TTL or re-validation, so wrong-today entries lingered. Fixed by: (1) normalizing the hash on recovery intent — Read recoveries key on `(basename(error_path), basename(success_path))`; Bash recoveries strip volatile suffixes and hash only the primary command before the first `|`/`&&`; (2) stamping `first_seen_at` / `last_seen_at` on every pattern and bumping them in `_bump_persisted_evidence` via `json_set`; (3) refining at render time — drop rows not re-observed in 21 days, re-validate Read success paths against the filesystem, collapse same-error_path-with-multiple-targets into one "use Glob/Grep first" bullet, rank by `evidence_count * 0.5 ** (days/5)`, cap the section at 15. Other `Learned: …` categories (environment, preference, architecture) are untouched. - **`headroom unwrap codex` now actually undoes `headroom wrap codex`** — previously there was no `unwrap codex` subcommand at all, so the injected `model_provider = "headroom"` / `[model_providers.headroom]` block stayed in `~/.codex/config.toml` forever and Codex continued routing through the (potentially stopped) proxy, surfacing as `Missing environment variable: OPENAI_API_KEY`. `wrap codex` now snapshots the pre-wrap `config.toml` to `config.toml.headroom-backup` before its first injection, and `unwrap codex` restores that snapshot byte-for-byte (or, if the backup is missing, strips only the Headroom-managed block and leaves surrounding user content intact). Safe no-op when run without a prior wrap. Reported by @raenaryl in Discord. - **`headroom learn` no longer clobbers prior recommendations on re-run** — the marker block in `CLAUDE.md` / `MEMORY.md` is now merged with the prior block instead of wholesale-replaced. Sections re-surfaced by the new run win; sections not re-surfaced are carried forward so learnings accumulate across runs instead of disappearing. To fully rebuild the block, delete it manually and re-run. (#231) - **`headroom learn` no longer emits dangling cross-references when a section is re-surfaced** — the analyzer now includes the project's current `` block (from `CLAUDE.md` and `MEMORY.md`) in the LLM digest as a "Prior Learned Patterns" section, and the system prompt instructs the LLM that re-emitting a section replaces the prior one wholesale. Prevents bullets like "`X` is *also* large — same rule as `Y`, `Z`" from appearing after `Y` and `Z` got dropped during per-section replacement. The writer's section-level carry-forward from #231 remains in place as a safety net for sections the LLM omits entirely. New helper `extract_marker_block` added to `headroom.learn.writer`. ### Added - **`turn_id` linking agent-loop API calls to a single user prompt** — a new `compute_turn_id(model, system, messages)` helper in `headroom/proxy/helpers.py` hashes the message prefix up to and including the last user-text message, yielding an id that is stable across every agent-loop iteration of one prompt but rolls over when the user sends a new prompt (or runs `/compact`, `/clear`). `RequestLog` gained a `turn_id: str | None` field, which is stamped at every log site (anthropic handler bedrock + direct branches, and the streaming handler) and surfaced as `turn_id` in `/transformations/feed`. Lets downstream consumers (e.g. the Headroom Desktop Activity tab) aggregate savings per user prompt rather than per API call. - **Live flush of traffic-learned patterns to CLAUDE.md / MEMORY.md** — the `TrafficLearner` now writes to agent-native context files continuously during proxy operation, not just at shutdown. A new dirty-flag debounced `_flush_worker` (10s window, `FLUSH_DEBOUNCE_SECONDS`) calls `flush_to_file()` whenever `_accumulate()` marks the learner dirty, so patterns surface in `CLAUDE.md` / `MEMORY.md` near real-time. Flushes read both persisted rows (via `_load_persisted_patterns_from_sqlite`) and the in-memory accumulator, bucket patterns by project via the learn plugin registry (`plugin.discover_projects()` + longest-path anchoring in `_project_for_pattern`), and route by `PatternCategory` to the correct file (`_patterns_to_recommendations` + `_CATEGORY_TO_TARGET`). Live flushes require `evidence_count >= 2`; the shutdown flush accepts single-evidence rows. ### Fixed - **Traffic-learner evidence count stuck at 1; duplicate DB rows across restarts.** `_accumulate` queued patterns with the default `ExtractedPattern.evidence_count = 1` regardless of how many times the pattern was actually seen, so every persisted row landed at `1` and never crossed the live-flush gate (`evidence_count >= 2`). Worse, once a pattern was in `_saved_hashes` it was early-returned on every re-sighting, and `_saved_hashes` reset on process restart — so a second sighting in a later session inserted a duplicate row rather than bumping the existing one. Now: `_accumulate` writes the real accumulated count at save time, `start()` hydrates `_saved_hashes` + a new `_persisted_ids` map from the DB, and re-sightings bump the persisted row's `metadata.evidence_count` via an atomic `json_set` `UPDATE` (`_bump_persisted_evidence`). `_load_persisted_patterns_from_sqlite` now filters via `json_extract(metadata, '$.source')` instead of a LIKE on the raw JSON string, so rows survive metadata rewrites. ### Added - **`HEADROOM_QDRANT_*` environment variables for memory Qdrant configuration** (#31) — `Memory(backend="qdrant-neo4j")`, `Mem0Config`, `MemoryConfig`, and `ProxyConfig` now resolve their Qdrant connection from `HEADROOM_QDRANT_URL`, `HEADROOM_QDRANT_HOST`, `HEADROOM_QDRANT_PORT`, `HEADROOM_QDRANT_API_KEY`, `HEADROOM_QDRANT_HTTPS`, `HEADROOM_QDRANT_PREFER_GRPC`, and `HEADROOM_QDRANT_GRPC_PORT`. Explicit constructor arguments still win; unset env keeps the existing `localhost:6333` defaults. Adds matching `--memory-qdrant-{url,host,port,api-key}` CLI flags. Enables hosted Qdrant (Qdrant Cloud) and shared/remote Qdrant stacks without code changes. New helper: [`headroom/memory/qdrant_env.py`](headroom/memory/qdrant_env.py). - **Telemetry stack & install-mode identity fields** — anonymous beacon now reports `headroom_stack` (how Headroom is invoked: `proxy`, `wrap_claude`, `adapter_ts_openai`, ...) and `install_mode` (`wrapped` / `persistent` / `on_demand`), plus `requests_by_stack` for proxies that serve multiple integrations. Proxy exposes a `by_stack` bucket alongside `by_provider` / `by_model` on `/stats`, a matching `headroom_requests_by_stack` Prometheus counter, and an `X-Headroom-Stack` header honored by the FastAPI middleware. `headroom wrap ` sets `HEADROOM_STACK=wrap_`; the TS SDK and all four adapters (`openai`, `anthropic`, `gemini`, `vercel-ai`) tag their compress calls. Schema migration: [`sql/upgrade_telemetry_stack_context.sql`](sql/upgrade_telemetry_stack_context.sql). - **Canonical filesystem contract** (issue #175) — new `HEADROOM_CONFIG_DIR` (default `~/.headroom/config`, read-mostly) and `HEADROOM_WORKSPACE_DIR` (default `~/.headroom`, read-write state) env vars recognized by the Python proxy/CLI and the npm SDK. Additive; all existing per-resource env vars (`HEADROOM_SAVINGS_PATH`, `HEADROOM_TOIN_PATH`, `HEADROOM_SUBSCRIPTION_STATE_PATH`, `HEADROOM_MODEL_LIMITS`) continue to work with identical semantics. Docker install scripts and `docker-compose.native.yml` forward the new vars into containers so savings, logs, and telemetry resolve to the bind-mounted `.headroom` path. See [`wiki/filesystem-contract.md`](wiki/filesystem-contract.md). ### Changed - **`/stats-history` now returns compact checkpoint history by default** — the JSON response keeps recent checkpoints dense while evenly sampling older checkpoints so long-running installs do not return ever-growing payloads. Add `history_mode=full` to fetch the full retained checkpoint list, or `history_mode=none` to skip it entirely while still receiving the derived hourly/daily/weekly/monthly rollups. Responses now include a `history_summary` block describing stored versus returned points. ### Fixed - **Streaming Anthropic requests are now visible to `/stats.recent_requests` and `/transformations/feed`** — `_finalize_stream_response` did not call `self.logger.log(...)`, so the entire streaming Anthropic code path (the one Claude Code uses) silently bypassed the request logger. Only the non-streaming Anthropic path and the Bedrock streaming path were logged. As a consequence, `--log-messages` had no observable effect on the live transformations feed for typical traffic. The streaming finalizer now emits the same `RequestLog` shape the other paths do, including `request_messages` when `log_full_messages` is enabled. ## [0.5.22] - 2026-04-11 ### Added - **Cross-agent memory** — Claude saves a fact, Codex reads it back. All agents sharing one proxy share one memory store. Project-scoped DB at `.headroom/memory.db`, auto user_id from `$USER`. - **Agent provenance tracking** — every memory records which agent saved it (`source_agent`, `source_provider`, `created_via`), with edit history on updates. - **LLM-mediated dedup** — on `memory_save`, enriched response hints similar existing memories to the LLM. Background async dedup auto-removes >92% cosine duplicates. Zero extra LLM calls. - **Memory for OpenAI and Gemini handlers** — context injection + tool handling wired into all three provider handlers (Anthropic, OpenAI, Gemini). - **Plugin architecture for `headroom learn`** — each agent (Claude, Codex, Gemini) is a self-contained plugin. External plugins register via `headroom.learn_plugin` entry points. `--agent` flag for CLI. - **GeminiScanner** for `headroom learn` — reads `~/.gemini/tmp/*/chats/session-*.json` and `.jsonl`. - **Code graph integration** — `headroom wrap claude --code-graph` auto-indexes the project via [codebase-memory-mcp](https://github.com/DeusData/codebase-memory-mcp) for call-chain traversal, impact analysis, and architectural queries. Opt-in, ~200 token overhead with Claude Code's MCP Tool Search. - **OpenAI embedder auto-detection** — memory backend uses OpenAI embeddings when `sentence-transformers` is unavailable (no torch/2GB dependency needed). - **Live traffic learning flush** — `headroom wrap --learn` flushes learned patterns to the correct agent-native file (MEMORY.md / AGENTS.md / GEMINI.md) at proxy shutdown. ### Changed - **CodeCompressor disabled by default** — AST-based code compression produced invalid syntax on 40% of real files. Code now passes through uncompressed. Use `--code-graph` for code intelligence instead, or re-enable with `--code-aware`. - **Shared tool name map** — consolidated tool normalization across all learn plugins into `_shared.py`. - **Dynamic CLI agent detection** — `headroom learn` discovers agents via plugin registry, no hardcoded choices. ### Fixed - **CodeCompressor statement-based truncation** — body truncation now walks AST statements (not lines), never cuts mid-expression. Fixes syntax errors on multi-line dict literals and function calls. - **Docstring FIRST_LINE mode** — uses source lines directly instead of reconstructing from byte offsets. Properly handles all quote styles. - **Memory shutdown queue drain** — patterns in the save queue were lost on proxy shutdown. Now drained before exit. ## [Unreleased] ### Added - **Codex-proxy resilience hardening** — reduces event-loop starvation under cold-start reconnect storms - **Stage-timing instrumentation** — per-stage durations for both Codex WS accept and Anthropic `/v1/messages` pre-upstream phases emitted as a single `STAGE_TIMINGS` structured log line per request plus Prometheus histograms - **Per-pipeline shared warmup** — Anthropic + OpenAI pipelines eagerly load compressors/parsers once at startup; status merged into `WarmupRegistry` for `/debug/warmup` and `/readyz` - **WS session registry** — first-class tracking of active Codex WS sessions with deterministic relay-task cancellation and termination-cause classification (`client_disconnect`, `upstream_error`, `client_timeout`, etc.) - **Bounded pre-upstream Anthropic concurrency** — `--anthropic-pre-upstream-concurrency` / `HEADROOM_ANTHROPIC_PRE_UPSTREAM_CONCURRENCY` caps simultaneous `/v1/messages` pre-upstream work (body read, deep copy, first compression stage, memory-context lookup, upstream connect) so replay storms cannot starve `/livez`, `/readyz`, and new Codex WS opens. Default: auto `max(2, min(8, cpu_count))`; `0` or negative disables (unbounded) - **Loopback-only debug endpoints** — `/debug/tasks`, `/debug/ws-sessions`, `/debug/warmup` return `404` (not `403`) to non-loopback callers so external scanners cannot enumerate them - **Reconnect-storm repro harness** — `scripts/repro_codex_replay.py` drives concurrent WS + HTTP replay traffic against a local proxy and asserts `/livez` p99 under threshold; `--json` output routes JSON to stdout and the human summary to stderr - **Proxy liveness and readiness health checks** - Adds `GET /livez` for process liveness and `GET /readyz` for traffic readiness - Keeps `GET /health` backward compatible while expanding it with readiness details and subsystem checks - Eagerly initializes configured memory backends during proxy startup so readiness reflects real serving capability - Wires `/readyz` into the Docker image `HEALTHCHECK` and the example `docker-compose.yml` - **Durable proxy savings history** - Persists proxy compression savings history locally at `~/.headroom/proxy_savings.json` - Supports `HEADROOM_SAVINGS_PATH` to override the storage location - Adds `/stats-history` with lifetime totals plus hourly/daily/weekly/monthly rollups - Supports JSON and CSV export from `/stats-history` - Extends `/stats` with a `persistent_savings` block while keeping `savings_history` backward compatible - Adds a historical mode to `/dashboard` backed by `/stats-history`, including export actions - **Proxy telemetry SDK override** via `HEADROOM_SDK` - Downstream apps can override the anonymous telemetry `sdk` field without patching installed files - Blank values fall back to the default `proxy` label - **`headroom learn`** — Offline failure learning for coding agents - Analyzes past conversation history (Claude Code, extensible to Cursor/Codex) - **Success correlation**: for each failure, finds what succeeded after and extracts the specific correction - 5 analyzers: Environment, Structure, Command Patterns, Retry Prevention, Cross-Session - Writes specific learnings to CLAUDE.md (stable project facts) and MEMORY.md (session patterns) - Generic architecture: tool-agnostic `ToolCall` model, pluggable Scanner/Writer adapters - Dry-run by default, `--apply` to write, `--all` for all projects - Example output: "FirstClassEntity.java is not at axion-formats/ — actually at axion-scala-common/" - **Read Lifecycle Management** — Event-driven compression of stale/superseded Read outputs - Detects when a Read output becomes stale (file was edited after) or superseded (file was re-read) - Replaces stale/superseded content with compact CCR markers, stores originals for retrieval - 75% of Read output bytes are provably stale or redundant (from real-world analysis of 66K tool calls) - Fresh Reads (latest read, no subsequent edit) are never touched — Edit safety preserved - Opt-in via `ReadLifecycleConfig(enabled=True)`, disabled by default - Handles both OpenAI and Anthropic message formats - **any-llm backend** - Route requests through 38+ LLM providers (OpenAI, Mistral, Groq, Ollama, etc.) via [any-llm](https://mozilla-ai.github.io/any-llm/providers/) - Enable with `--backend anyllm --anyllm-provider ` - Install with: `pip install 'headroom-ai[anyllm]'` - Production-ready proxy server with caching, rate limiting, and metrics - CLI command `headroom proxy` to start the proxy server - **IntelligentContextManager** (semantic-aware context management) - Multi-factor importance scoring: recency, semantic similarity, TOIN importance, error indicators, forward references, token density - No hardcoded patterns - all importance signals learned from TOIN or computed from metrics - TOIN integration for retrieval_rate and field_semantics-based scoring - Strategy selection: NONE, COMPRESS_FIRST, DROP_BY_SCORE based on budget overage - Atomic tool unit handling (call + response dropped together) - Configurable scoring weights via `ScoringWeights` dataclass - `IntelligentContextConfig` for full configuration control - Backwards compatible with `RollingWindowConfig` - **LLMLingua-2 Integration** (opt-in ML-based compression) - `LLMLinguaCompressor` transform using Microsoft's LLMLingua-2 model - Content-aware compression rates (code: 0.4, JSON: 0.35, text: 0.3) - Memory management utilities: `unload_llmlingua_model()`, `is_llmlingua_model_loaded()` - Proxy integration via `--llmlingua` flag - Device selection: `--llmlingua-device` (auto/cuda/cpu/mps) - Custom compression rate: `--llmlingua-rate` - Helpful startup hints when llmlingua is available but not enabled - Install with: `pip install headroom-ai[llmlingua]` - **Code-Aware Compression** (AST-based, syntax-preserving) - `CodeAwareCompressor` transform using tree-sitter for AST parsing - Supports Python, JavaScript, TypeScript, Go, Rust, Java, C, C++ - Preserves imports, function signatures, type annotations, error handlers - Compresses function bodies while maintaining structural integrity - Guarantees syntactically valid output (no broken code) - Automatic language detection from code patterns - Memory management: `is_tree_sitter_available()`, `unload_tree_sitter()` - Uses `tree-sitter-language-pack` for broad language support - Install with: `pip install headroom-ai[code]` - **ContentRouter** (intelligent compression orchestrator) - Auto-routes content to optimal compressor based on type detection - Source hint support for high-confidence routing (file paths, tool names) - Handles mixed content (e.g., markdown with code blocks) - Strategies: CODE_AWARE, SMART_CRUSHER, SEARCH, LOG, TEXT, LLMLINGUA - Configurable strategy preferences and fallbacks - Routing decision log for transparency and debugging - **Custom Model Configuration** - Support for new models: Claude 4.5 (Opus), Claude 4 (Sonnet, Haiku), o3, o3-mini - Pattern-based inference for unknown models (opus/sonnet/haiku tiers) - Custom model config via `HEADROOM_MODEL_LIMITS` environment variable - Config file support: `~/.headroom/models.json` - Graceful fallback for unknown models (no crashes) - Updated pricing data for all current models ### Fixed - **Event.wait task leak in subscription trackers** — `asyncio.shield` pattern prevents cancellation of the outer `wait_for` from leaking the inner `Event.wait` task - **Python 3.10 compatibility for memory-context fail-open** — catches `asyncio.TimeoutError` (the 3.10-compatible alias) rather than `TimeoutError` to preserve behaviour on older runtimes - **uvicorn `proxy_headers=False`** — refuses `Forwarded` / `X-Forwarded-For` rewrites so the loopback guard on `/debug/*` cannot be spoofed by a misconfigured reverse proxy - **First-frame timeout for Codex WS accepts** — guards against a client that opens a handshake and never sends the first frame; relays cancel deterministically with `client_timeout` - **Semaphore leak on unexpected exception in Anthropic pre-upstream path** — the finalizer now releases the pre-upstream semaphore on every exit path (early 4xx, cache hit, upstream error, streaming handoff) - **`active_relay_tasks` gauge double-decrement** — `deregister_and_count` returns `(handle, released_task_count)` atomically so the handler decrements the Prometheus gauge by the exact number it registered, eliminating drift ### Internal - **IPv6-mapped loopback recognition** — the loopback guard parses `::ffff:127.0.0.1` and other dual-stack literals through `ipaddress.ip_address(...).is_loopback` - **Lock-free stage-timing accumulators** — `record_stage_timings` writes to per-path counters that do not contend with `/metrics` export or `record_request` - **Narrow `contextlib.suppress` in relay classification** — only `CancelledError` is suppressed where we reclassify it; other exceptions propagate so termination cause stays truthful - **`jitter_delay_ms` helper** — shared exponential-backoff + 50-150% jitter formula in `headroom/proxy/helpers.py`; used by three proxy retry sites and mirrored inline in the repro harness ## [0.2.0] - 2025-01-07 ### Added - **SmartCrusher**: Statistical compression for tool outputs - Keeps first/last K items, errors, anomalies, and relevance matches - Variance-based change point detection - Pattern detection (time series, logs, search results) - **Relevance Scoring Engine**: ML-powered item relevance - `BM25Scorer`: Fast keyword matching (zero dependencies) - `EmbeddingScorer`: Semantic similarity with sentence-transformers - `HybridScorer`: Adaptive combination of both methods - **CacheAligner**: Prefix stabilization for better cache hits - Dynamic date extraction - Whitespace normalization - Stable prefix hashing - **RollingWindow**: Context management within token limits - Drops oldest tool units first - Never orphans tool results - Preserves recent turns - **Multi-Provider Support**: - Anthropic with official `count_tokens` API - Google with official `countTokens` API - Cohere with official `tokenize` API - Mistral with official tokenizer - LiteLLM for unified interface - **Integrations**: - LangChain callback handler (`HeadroomOptimizer`) - MCP (Model Context Protocol) utilities - **Proxy Server** (`headroom.proxy`): - Semantic caching with LRU eviction - Token bucket rate limiting - Retry with exponential backoff - Cost tracking with budget enforcement - Prometheus metrics endpoint - Request logging (JSONL) - **Pricing Registry**: Centralized model pricing with staleness tracking - **Benchmarks**: Performance benchmarks for transforms and relevance scoring ### Changed - Improved token counting accuracy across all providers - Enhanced tool output compression with relevance-aware selection ### Fixed - Mistral tokenizer API compatibility - Google token counting for multi-turn conversations ## [0.1.0] - 2025-01-05 ### Added - Initial release - `HeadroomClient`: OpenAI-compatible client wrapper - `ToolCrusher`: Basic tool output compression - Audit mode for observation without modification - Optimize mode for applying transforms - Simulate mode for previewing changes - SQLite and JSONL storage backends - HTML report generation - Streaming support ### Safety Guarantees - Never removes human content - Never breaks tool ordering - Parse failures are no-ops - Preserves recency (last N turns) --- ## Migration Guide ### From 0.1.x to 0.2.x The 0.2.0 release is backward compatible. New features are opt-in: ```python # Old code still works from headroom import HeadroomClient, OpenAIProvider # New SmartCrusher (replaces ToolCrusher for better compression) from headroom import SmartCrusher, SmartCrusherConfig config = SmartCrusherConfig( min_tokens_to_crush=200, max_items_after_crush=50, ) crusher = SmartCrusher(config) # New relevance scoring from headroom import create_scorer scorer = create_scorer("hybrid") # or "bm25" for zero deps ``` ### Using the Proxy New in 0.2.0 - run Headroom as a proxy server: ```bash # Start the proxy python -m headroom.proxy.server --port 8787 # Use with Claude Code ANTHROPIC_BASE_URL=http://localhost:8787 claude ``` [Unreleased]: https://github.com/headroom-sdk/headroom/compare/v0.2.0...HEAD [0.2.0]: https://github.com/headroom-sdk/headroom/compare/v0.1.0...v0.2.0 [0.1.0]: https://github.com/headroom-sdk/headroom/releases/tag/v0.1.0