chopratejas commited on
Commit
bb04104
·
1 Parent(s): 905c229

Add seamless LangChain integration

Browse files

- Add HeadroomChatModel wrapper with auto provider detection (OpenAI, Anthropic, Google)
- Add HeadroomChatMessageHistory for automatic conversation compression
- Add HeadroomDocumentCompressor for retriever integration
- Add wrap_tools_with_headroom() for agent tool output compression
- Add async support (ainvoke, astream)
- Add LangSmith integration for observability
- Restructure integrations package into nested langchain/ and mcp/ subpackages
- Fix Pydantic v2 deprecation warning
- Add comprehensive docs/langchain.md guide with real-world examples
- Update README with LangChain quickstart and framework integrations

Bump version to 0.2.3

README.md CHANGED
@@ -27,45 +27,89 @@
27
 
28
  ## What It Does
29
 
30
- Headroom is a **smart compression proxy** for LLM applications:
31
 
32
  - **Compresses tool outputs** — 1000 search results → 15 items (keeps errors, anomalies, relevant items)
33
  - **Enables provider caching** — Stabilizes prefixes so cache hits actually happen
34
  - **Manages context windows** — Prevents token limit failures without breaking tool calls
35
  - **Reversible compression** — LLM can retrieve original data if needed ([CCR architecture](docs/ccr.md))
36
 
37
- **Zero code changes required** point your existing tools at the proxy.
38
 
39
  ---
40
 
41
  ## 30-Second Quickstart
42
 
 
 
43
  ```bash
44
- # Install
45
  pip install "headroom-ai[proxy]"
46
-
47
- # Start proxy
48
  headroom proxy --port 8787
49
-
50
- # Verify
51
- curl http://localhost:8787/health
52
  ```
53
 
54
- **Use with your tools:**
55
 
56
  ```bash
57
  # Claude Code
58
  ANTHROPIC_BASE_URL=http://localhost:8787 claude
59
 
60
- # Cursor / Continue / any OpenAI client
61
  OPENAI_BASE_URL=http://localhost:8787/v1 cursor
 
 
 
62
 
63
- # Python scripts
64
- export OPENAI_BASE_URL=http://localhost:8787/v1
65
- python your_script.py
66
  ```
67
 
68
- That's it. You're saving tokens.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
69
 
70
  ---
71
 
@@ -82,13 +126,21 @@ curl http://localhost:8787/stats
82
  }
83
  ```
84
 
 
 
 
 
 
 
 
85
  ---
86
 
87
  ## Installation
88
 
89
  ```bash
90
- pip install "headroom-ai[proxy]" # Proxy server (recommended)
91
  pip install headroom-ai # SDK only
 
 
92
  pip install "headroom-ai[code]" # AST-based code compression
93
  pip install "headroom-ai[llmlingua]" # ML-based compression
94
  pip install "headroom-ai[all]" # Everything
@@ -106,10 +158,10 @@ pip install "headroom-ai[all]" # Everything
106
  | **CacheAligner** | Stabilizes prefixes for provider caching | [Transforms](docs/transforms.md) |
107
  | **RollingWindow** | Manages context limits without breaking tools | [Transforms](docs/transforms.md) |
108
  | **CCR** | Reversible compression with automatic retrieval | [CCR Guide](docs/ccr.md) |
 
109
  | **Text Utilities** | Opt-in compression for search/logs | [Text Compression](docs/text-compression.md) |
110
  | **LLMLingua-2** | ML-based 20x compression (opt-in) | [LLMLingua](docs/llmlingua.md) |
111
  | **Code-Aware** | AST-based code compression (tree-sitter) | [Transforms](docs/transforms.md) |
112
- | **ContentRouter** | Auto-routes content to optimal compressor | [Transforms](docs/transforms.md) |
113
 
114
  ---
115
 
@@ -123,7 +175,7 @@ pip install "headroom-ai[all]" # Everything
123
  | Cohere | Official API | - |
124
  | Mistral | Official tokenizer | - |
125
 
126
- **New models auto-supported** — Unknown models get sensible defaults based on naming patterns (e.g., `claude-opus-*` gets Opus pricing). Custom limits via `~/.headroom/models.json` or `HEADROOM_MODEL_LIMITS` env var.
127
 
128
  ---
129
 
@@ -134,6 +186,7 @@ pip install "headroom-ai[all]" # Everything
134
  | Search results (1000 items) | 45,000 tokens | 4,500 tokens | 90% |
135
  | Log analysis (500 entries) | 22,000 tokens | 3,300 tokens | 85% |
136
  | Long conversation (50 turns) | 80,000 tokens | 32,000 tokens | 60% |
 
137
 
138
  Overhead: ~1-5ms per request.
139
 
@@ -152,13 +205,13 @@ Overhead: ~1-5ms per request.
152
 
153
  | Guide | Description |
154
  |-------|-------------|
 
155
  | [SDK Guide](docs/sdk.md) | Wrap your client for fine-grained control |
156
  | [Proxy Guide](docs/proxy.md) | Production deployment |
157
  | [Configuration](docs/configuration.md) | All configuration options |
158
  | [CCR Guide](docs/ccr.md) | Reversible compression architecture |
159
  | [Metrics](docs/metrics.md) | Monitoring and observability |
160
  | [Troubleshooting](docs/troubleshooting.md) | Common issues |
161
- | [Architecture](docs/ARCHITECTURE.md) | How it works internally |
162
 
163
  ---
164
 
@@ -168,6 +221,8 @@ See [`examples/`](examples/) for runnable code:
168
 
169
  - `basic_usage.py` — Simple SDK usage
170
  - `proxy_integration.py` — Using with different clients
 
 
171
  - `ccr_demo.py` — CCR architecture demonstration
172
 
173
  ---
 
27
 
28
  ## What It Does
29
 
30
+ Headroom is a **smart compression layer** for LLM applications:
31
 
32
  - **Compresses tool outputs** — 1000 search results → 15 items (keeps errors, anomalies, relevant items)
33
  - **Enables provider caching** — Stabilizes prefixes so cache hits actually happen
34
  - **Manages context windows** — Prevents token limit failures without breaking tool calls
35
  - **Reversible compression** — LLM can retrieve original data if needed ([CCR architecture](docs/ccr.md))
36
 
37
+ Works as a **proxy** (zero code changes) or **SDK** (fine-grained control).
38
 
39
  ---
40
 
41
  ## 30-Second Quickstart
42
 
43
+ ### Option 1: Proxy (Zero Code Changes)
44
+
45
  ```bash
 
46
  pip install "headroom-ai[proxy]"
 
 
47
  headroom proxy --port 8787
 
 
 
48
  ```
49
 
50
+ Point your tools at the proxy:
51
 
52
  ```bash
53
  # Claude Code
54
  ANTHROPIC_BASE_URL=http://localhost:8787 claude
55
 
56
+ # Any OpenAI-compatible client
57
  OPENAI_BASE_URL=http://localhost:8787/v1 cursor
58
+ ```
59
+
60
+ ### Option 2: LangChain Integration
61
 
62
+ ```bash
63
+ pip install "headroom-ai[langchain]"
 
64
  ```
65
 
66
+ ```python
67
+ from langchain_openai import ChatOpenAI
68
+ from headroom.integrations import HeadroomChatModel
69
+
70
+ # Wrap your model - that's it!
71
+ llm = HeadroomChatModel(ChatOpenAI(model="gpt-4o"))
72
+
73
+ # Use exactly like before
74
+ response = llm.invoke("Hello!")
75
+ ```
76
+
77
+ See the full [LangChain Integration Guide](docs/langchain.md) for memory, retrievers, agents, and more.
78
+
79
+ ---
80
+
81
+ ## Framework Integrations
82
+
83
+ | Framework | Integration | Docs |
84
+ |-----------|-------------|------|
85
+ | **LangChain** | `HeadroomChatModel`, memory, retrievers, agents | [Guide](docs/langchain.md) |
86
+ | **MCP** | Tool output compression for Claude | [Guide](docs/ccr.md) |
87
+ | **Any OpenAI Client** | Proxy server | [Guide](docs/proxy.md) |
88
+
89
+ ### LangChain Highlights
90
+
91
+ ```python
92
+ from headroom.integrations import (
93
+ HeadroomChatModel, # Wrap any chat model
94
+ HeadroomChatMessageHistory, # Auto-compress conversation history
95
+ HeadroomDocumentCompressor, # Filter retrieved documents
96
+ wrap_tools_with_headroom, # Compress agent tool outputs
97
+ )
98
+
99
+ # Memory that auto-compresses when over 4K tokens
100
+ memory = ConversationBufferMemory(
101
+ chat_memory=HeadroomChatMessageHistory(base_history)
102
+ )
103
+
104
+ # Retriever that keeps only relevant docs
105
+ retriever = ContextualCompressionRetriever(
106
+ base_compressor=HeadroomDocumentCompressor(max_documents=10),
107
+ base_retriever=vectorstore.as_retriever(search_kwargs={"k": 50}),
108
+ )
109
+
110
+ # Agent tools with automatic output compression
111
+ tools = wrap_tools_with_headroom([search_tool, database_tool])
112
+ ```
113
 
114
  ---
115
 
 
126
  }
127
  ```
128
 
129
+ Or in Python:
130
+
131
+ ```python
132
+ print(llm.get_metrics())
133
+ # {'tokens_saved': 12500, 'savings_percent': 45.2}
134
+ ```
135
+
136
  ---
137
 
138
  ## Installation
139
 
140
  ```bash
 
141
  pip install headroom-ai # SDK only
142
+ pip install "headroom-ai[proxy]" # Proxy server
143
+ pip install "headroom-ai[langchain]" # LangChain integration
144
  pip install "headroom-ai[code]" # AST-based code compression
145
  pip install "headroom-ai[llmlingua]" # ML-based compression
146
  pip install "headroom-ai[all]" # Everything
 
158
  | **CacheAligner** | Stabilizes prefixes for provider caching | [Transforms](docs/transforms.md) |
159
  | **RollingWindow** | Manages context limits without breaking tools | [Transforms](docs/transforms.md) |
160
  | **CCR** | Reversible compression with automatic retrieval | [CCR Guide](docs/ccr.md) |
161
+ | **LangChain** | Memory, retrievers, agents, streaming | [LangChain](docs/langchain.md) |
162
  | **Text Utilities** | Opt-in compression for search/logs | [Text Compression](docs/text-compression.md) |
163
  | **LLMLingua-2** | ML-based 20x compression (opt-in) | [LLMLingua](docs/llmlingua.md) |
164
  | **Code-Aware** | AST-based code compression (tree-sitter) | [Transforms](docs/transforms.md) |
 
165
 
166
  ---
167
 
 
175
  | Cohere | Official API | - |
176
  | Mistral | Official tokenizer | - |
177
 
178
+ **New models auto-supported** — Unknown models get sensible defaults based on naming patterns.
179
 
180
  ---
181
 
 
186
  | Search results (1000 items) | 45,000 tokens | 4,500 tokens | 90% |
187
  | Log analysis (500 entries) | 22,000 tokens | 3,300 tokens | 85% |
188
  | Long conversation (50 turns) | 80,000 tokens | 32,000 tokens | 60% |
189
+ | Agent with tools (10 calls) | 100,000 tokens | 15,000 tokens | 85% |
190
 
191
  Overhead: ~1-5ms per request.
192
 
 
205
 
206
  | Guide | Description |
207
  |-------|-------------|
208
+ | [LangChain Integration](docs/langchain.md) | Full LangChain support |
209
  | [SDK Guide](docs/sdk.md) | Wrap your client for fine-grained control |
210
  | [Proxy Guide](docs/proxy.md) | Production deployment |
211
  | [Configuration](docs/configuration.md) | All configuration options |
212
  | [CCR Guide](docs/ccr.md) | Reversible compression architecture |
213
  | [Metrics](docs/metrics.md) | Monitoring and observability |
214
  | [Troubleshooting](docs/troubleshooting.md) | Common issues |
 
215
 
216
  ---
217
 
 
221
 
222
  - `basic_usage.py` — Simple SDK usage
223
  - `proxy_integration.py` — Using with different clients
224
+ - `langchain_agent.py` — LangChain ReAct agent with Headroom
225
+ - `rag_pipeline.py` — RAG with document compression
226
  - `ccr_demo.py` — CCR architecture demonstration
227
 
228
  ---
docs/README.md CHANGED
@@ -10,6 +10,13 @@ Welcome to the Headroom documentation.
10
  | [SDK Guide](sdk.md) | Python SDK usage |
11
  | [Proxy Guide](proxy.md) | Proxy server deployment |
12
 
 
 
 
 
 
 
 
13
  ## Core Concepts
14
 
15
  | Topic | Description |
 
10
  | [SDK Guide](sdk.md) | Python SDK usage |
11
  | [Proxy Guide](proxy.md) | Proxy server deployment |
12
 
13
+ ## Framework Integrations
14
+
15
+ | Framework | Description |
16
+ |-----------|-------------|
17
+ | [LangChain](langchain.md) | Chat models, memory, retrievers, agents, streaming |
18
+ | MCP | See [CCR Guide](ccr.md) for tool compression |
19
+
20
  ## Core Concepts
21
 
22
  | Topic | Description |
docs/langchain.md ADDED
@@ -0,0 +1,622 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LangChain Integration
2
+
3
+ Headroom provides seamless integration with LangChain, enabling automatic context optimization across all LangChain patterns: chat models, memory, retrievers, agents, and observability.
4
+
5
+ ## Installation
6
+
7
+ ```bash
8
+ pip install "headroom-ai[langchain]"
9
+ ```
10
+
11
+ This installs Headroom with LangChain dependencies (`langchain-core`).
12
+
13
+ ## Quick Start
14
+
15
+ ### Wrap Any Chat Model (1 Line)
16
+
17
+ ```python
18
+ from langchain_openai import ChatOpenAI
19
+ from headroom.integrations import HeadroomChatModel
20
+
21
+ # Wrap your model - that's it!
22
+ llm = HeadroomChatModel(ChatOpenAI(model="gpt-4o"))
23
+
24
+ # Use exactly like before
25
+ response = llm.invoke("Hello!")
26
+ ```
27
+
28
+ Headroom automatically:
29
+ - Detects the provider (OpenAI, Anthropic, Google)
30
+ - Compresses tool outputs in conversation history
31
+ - Optimizes for provider caching
32
+ - Tracks token savings
33
+
34
+ ### Check Your Savings
35
+
36
+ ```python
37
+ # After some usage
38
+ print(llm.get_metrics())
39
+ # {'tokens_saved': 12500, 'savings_percent': 45.2, 'requests': 50}
40
+ ```
41
+
42
+ ---
43
+
44
+ ## Integration Patterns
45
+
46
+ ### 1. Chat Model Wrapper
47
+
48
+ The `HeadroomChatModel` wraps any LangChain `BaseChatModel`:
49
+
50
+ ```python
51
+ from langchain_openai import ChatOpenAI
52
+ from langchain_anthropic import ChatAnthropic
53
+ from headroom.integrations import HeadroomChatModel
54
+
55
+ # OpenAI
56
+ llm = HeadroomChatModel(ChatOpenAI(model="gpt-4o"))
57
+
58
+ # Anthropic (auto-detected)
59
+ llm = HeadroomChatModel(ChatAnthropic(model="claude-3-5-sonnet-20241022"))
60
+
61
+ # Custom configuration
62
+ from headroom import HeadroomConfig, HeadroomMode
63
+
64
+ config = HeadroomConfig(
65
+ default_mode=HeadroomMode.OPTIMIZE,
66
+ smart_crusher_target_ratio=0.3, # Target 70% compression
67
+ )
68
+ llm = HeadroomChatModel(
69
+ ChatOpenAI(model="gpt-4o"),
70
+ headroom_config=config,
71
+ )
72
+ ```
73
+
74
+ #### Async Support
75
+
76
+ Full async support for `ainvoke` and `astream`:
77
+
78
+ ```python
79
+ # Async invoke
80
+ response = await llm.ainvoke("Hello!")
81
+
82
+ # Async streaming
83
+ async for chunk in llm.astream("Tell me a story"):
84
+ print(chunk.content, end="", flush=True)
85
+ ```
86
+
87
+ #### Tool Calling
88
+
89
+ Works seamlessly with LangChain tool calling:
90
+
91
+ ```python
92
+ from langchain_core.tools import tool
93
+
94
+ @tool
95
+ def search(query: str) -> str:
96
+ """Search the web."""
97
+ return {"results": [...]} # Large JSON response
98
+
99
+ llm_with_tools = llm.bind_tools([search])
100
+ response = llm_with_tools.invoke("Search for Python tutorials")
101
+ # Tool outputs are automatically compressed in subsequent turns
102
+ ```
103
+
104
+ ---
105
+
106
+ ### 2. Memory Integration
107
+
108
+ `HeadroomChatMessageHistory` wraps any chat history with automatic compression:
109
+
110
+ ```python
111
+ from langchain.memory import ConversationBufferMemory
112
+ from langchain_community.chat_message_histories import ChatMessageHistory
113
+ from headroom.integrations import HeadroomChatMessageHistory
114
+
115
+ # Wrap any history
116
+ base_history = ChatMessageHistory()
117
+ compressed_history = HeadroomChatMessageHistory(
118
+ base_history,
119
+ compress_threshold_tokens=4000, # Compress when over 4K tokens
120
+ keep_recent_turns=5, # Always keep last 5 turns
121
+ )
122
+
123
+ # Use with any memory class
124
+ memory = ConversationBufferMemory(chat_memory=compressed_history)
125
+
126
+ # Zero changes to your chain!
127
+ chain = ConversationChain(llm=llm, memory=memory)
128
+ ```
129
+
130
+ **Why this matters**: Long conversations can blow up to 50K+ tokens. HeadroomChatMessageHistory automatically compresses older turns while preserving recent context.
131
+
132
+ ```python
133
+ # Check compression stats
134
+ print(compressed_history.get_compression_stats())
135
+ # {'compression_count': 12, 'total_tokens_saved': 28000}
136
+ ```
137
+
138
+ ---
139
+
140
+ ### 3. Retriever Integration
141
+
142
+ `HeadroomDocumentCompressor` filters retrieved documents by relevance:
143
+
144
+ ```python
145
+ from langchain.retrievers import ContextualCompressionRetriever
146
+ from langchain_community.vectorstores import FAISS
147
+ from headroom.integrations import HeadroomDocumentCompressor
148
+
149
+ # Create vector store retriever (retrieve many for recall)
150
+ vectorstore = FAISS.from_documents(documents, embeddings)
151
+ base_retriever = vectorstore.as_retriever(search_kwargs={"k": 50})
152
+
153
+ # Wrap with Headroom compression (keep best for precision)
154
+ compressor = HeadroomDocumentCompressor(
155
+ max_documents=10, # Keep top 10
156
+ min_relevance=0.3, # Minimum relevance score
157
+ prefer_diverse=True, # MMR-style diversity
158
+ )
159
+
160
+ retriever = ContextualCompressionRetriever(
161
+ base_compressor=compressor,
162
+ base_retriever=base_retriever,
163
+ )
164
+
165
+ # Retrieves 50 docs, returns best 10
166
+ docs = retriever.invoke("What is Python?")
167
+ ```
168
+
169
+ **Why this matters**: Vector search often returns many marginally-relevant documents. HeadroomDocumentCompressor uses BM25-style scoring to keep only the most relevant ones, reducing context size while improving answer quality.
170
+
171
+ ---
172
+
173
+ ### 4. Agent Tool Wrapping
174
+
175
+ `wrap_tools_with_headroom` compresses tool outputs for agents:
176
+
177
+ ```python
178
+ from langchain.agents import create_openai_tools_agent, AgentExecutor
179
+ from langchain_core.tools import tool
180
+ from headroom.integrations import wrap_tools_with_headroom
181
+
182
+ @tool
183
+ def search_database(query: str) -> str:
184
+ """Search the database."""
185
+ # Returns 1000 results as JSON
186
+ return json.dumps({"results": [...], "total": 1000})
187
+
188
+ @tool
189
+ def fetch_logs(service: str) -> str:
190
+ """Fetch service logs."""
191
+ # Returns 500 log entries
192
+ return json.dumps({"logs": [...]})
193
+
194
+ # Wrap tools with compression
195
+ tools = [search_database, fetch_logs]
196
+ wrapped_tools = wrap_tools_with_headroom(
197
+ tools,
198
+ min_chars_to_compress=1000, # Only compress large outputs
199
+ )
200
+
201
+ # Create agent with wrapped tools
202
+ agent = create_openai_tools_agent(llm, wrapped_tools, prompt)
203
+ executor = AgentExecutor(agent=agent, tools=wrapped_tools)
204
+
205
+ # Tool outputs are automatically compressed
206
+ result = executor.invoke({"input": "Find users who logged in yesterday"})
207
+ ```
208
+
209
+ **Per-tool metrics:**
210
+
211
+ ```python
212
+ from headroom.integrations import get_tool_metrics
213
+
214
+ metrics = get_tool_metrics()
215
+ print(metrics.get_summary())
216
+ # {
217
+ # 'total_invocations': 25,
218
+ # 'total_compressions': 18,
219
+ # 'total_chars_saved': 450000,
220
+ # 'by_tool': {
221
+ # 'search_database': {'invocations': 15, 'chars_saved': 320000},
222
+ # 'fetch_logs': {'invocations': 10, 'chars_saved': 130000},
223
+ # }
224
+ # }
225
+ ```
226
+
227
+ ---
228
+
229
+ ### 5. Streaming Metrics
230
+
231
+ Track output tokens during streaming:
232
+
233
+ ```python
234
+ from headroom.integrations import StreamingMetricsTracker
235
+
236
+ tracker = StreamingMetricsTracker(model="gpt-4o")
237
+
238
+ for chunk in llm.stream("Write a poem about coding"):
239
+ tracker.add_chunk(chunk)
240
+ print(chunk.content, end="", flush=True)
241
+
242
+ metrics = tracker.finish()
243
+ print(f"\nOutput tokens: {metrics.output_tokens}")
244
+ print(f"Duration: {metrics.duration_ms:.0f}ms")
245
+ ```
246
+
247
+ **Context manager style:**
248
+
249
+ ```python
250
+ from headroom.integrations import StreamingMetricsCallback
251
+
252
+ with StreamingMetricsCallback(model="gpt-4o") as tracker:
253
+ for chunk in llm.stream(messages):
254
+ tracker.add_chunk(chunk)
255
+ print(chunk.content, end="")
256
+
257
+ print(f"Metrics: {tracker.metrics}")
258
+ ```
259
+
260
+ ---
261
+
262
+ ### 6. LangSmith Integration
263
+
264
+ Add Headroom metrics to LangSmith traces:
265
+
266
+ ```python
267
+ from headroom.integrations import HeadroomLangSmithCallbackHandler
268
+
269
+ # Create callback handler
270
+ langsmith_handler = HeadroomLangSmithCallbackHandler()
271
+
272
+ # Use with your LLM
273
+ llm = HeadroomChatModel(
274
+ ChatOpenAI(model="gpt-4o"),
275
+ callbacks=[langsmith_handler],
276
+ )
277
+
278
+ # After calls, metrics appear in LangSmith traces:
279
+ # - headroom.tokens_before
280
+ # - headroom.tokens_after
281
+ # - headroom.tokens_saved
282
+ # - headroom.compression_ratio
283
+ ```
284
+
285
+ ---
286
+
287
+ ## Real-World Examples
288
+
289
+ ### Example 1: LangGraph ReAct Agent
290
+
291
+ The ReAct pattern is the most common agent architecture. Here's how to optimize it:
292
+
293
+ ```python
294
+ from langchain_openai import ChatOpenAI
295
+ from langchain_core.tools import tool
296
+ from langgraph.prebuilt import create_react_agent
297
+ from headroom.integrations import HeadroomChatModel, wrap_tools_with_headroom
298
+
299
+ # Define tools that return large outputs
300
+ @tool
301
+ def search_web(query: str) -> str:
302
+ """Search the web for information."""
303
+ # Simulating large search results
304
+ return json.dumps({
305
+ "results": [
306
+ {"title": f"Result {i}", "snippet": "..." * 100, "url": f"https://..."}
307
+ for i in range(100)
308
+ ],
309
+ "total": 1000,
310
+ })
311
+
312
+ @tool
313
+ def query_database(sql: str) -> str:
314
+ """Execute SQL query."""
315
+ return json.dumps({
316
+ "rows": [{"id": i, "data": "..." * 50} for i in range(500)],
317
+ "total": 500,
318
+ })
319
+
320
+ # Wrap model with Headroom
321
+ llm = HeadroomChatModel(ChatOpenAI(model="gpt-4o"))
322
+
323
+ # Wrap tools with compression
324
+ tools = wrap_tools_with_headroom([search_web, query_database])
325
+
326
+ # Create ReAct agent
327
+ agent = create_react_agent(llm, tools)
328
+
329
+ # Run - tool outputs are automatically compressed between iterations
330
+ result = agent.invoke({
331
+ "messages": [("user", "Find all users who signed up last week and their activity")]
332
+ })
333
+
334
+ # Check savings
335
+ print(f"Tokens saved: {llm.get_metrics()['tokens_saved']}")
336
+ ```
337
+
338
+ **Without Headroom**: Each tool call adds 10-50K tokens to context.
339
+ **With Headroom**: Tool outputs compressed to 1-2K tokens, agent runs faster and cheaper.
340
+
341
+ ---
342
+
343
+ ### Example 2: RAG Pipeline with Document Filtering
344
+
345
+ ```python
346
+ from langchain_openai import ChatOpenAI, OpenAIEmbeddings
347
+ from langchain_community.vectorstores import Chroma
348
+ from langchain.chains import RetrievalQA
349
+ from langchain.retrievers import ContextualCompressionRetriever
350
+ from headroom.integrations import HeadroomChatModel, HeadroomDocumentCompressor
351
+
352
+ # Setup vector store
353
+ embeddings = OpenAIEmbeddings()
354
+ vectorstore = Chroma.from_documents(documents, embeddings)
355
+
356
+ # High-recall retriever (get many candidates)
357
+ base_retriever = vectorstore.as_retriever(search_kwargs={"k": 50})
358
+
359
+ # Headroom compressor for precision
360
+ compressor = HeadroomDocumentCompressor(
361
+ max_documents=5, # Keep only top 5
362
+ min_relevance=0.4, # Must be 40%+ relevant
363
+ prefer_diverse=True, # Avoid redundant docs
364
+ )
365
+
366
+ # Combine into compression retriever
367
+ retriever = ContextualCompressionRetriever(
368
+ base_compressor=compressor,
369
+ base_retriever=base_retriever,
370
+ )
371
+
372
+ # Wrap LLM
373
+ llm = HeadroomChatModel(ChatOpenAI(model="gpt-4o"))
374
+
375
+ # Create QA chain
376
+ qa_chain = RetrievalQA.from_chain_type(
377
+ llm=llm,
378
+ retriever=retriever,
379
+ return_source_documents=True,
380
+ )
381
+
382
+ # Query - retrieves 50 docs, uses best 5
383
+ result = qa_chain.invoke({"query": "How do I configure authentication?"})
384
+ print(f"Answer: {result['result']}")
385
+ print(f"Sources: {len(result['source_documents'])} docs")
386
+ ```
387
+
388
+ **Impact**:
389
+ - Without filtering: 50 docs × ~500 tokens = 25K context tokens
390
+ - With Headroom: 5 docs × ~500 tokens = 2.5K context tokens (90% reduction)
391
+
392
+ ---
393
+
394
+ ### Example 3: Conversational Agent with Memory
395
+
396
+ ```python
397
+ from langchain_openai import ChatOpenAI
398
+ from langchain.memory import ConversationBufferMemory
399
+ from langchain_community.chat_message_histories import ChatMessageHistory
400
+ from langchain.chains import ConversationChain
401
+ from headroom.integrations import HeadroomChatModel, HeadroomChatMessageHistory
402
+
403
+ # Wrap LLM
404
+ llm = HeadroomChatModel(ChatOpenAI(model="gpt-4o"))
405
+
406
+ # Wrap memory with auto-compression
407
+ base_history = ChatMessageHistory()
408
+ compressed_history = HeadroomChatMessageHistory(
409
+ base_history,
410
+ compress_threshold_tokens=8000, # Compress when over 8K
411
+ keep_recent_turns=10, # Always keep last 10 turns
412
+ )
413
+
414
+ memory = ConversationBufferMemory(
415
+ chat_memory=compressed_history,
416
+ return_messages=True,
417
+ )
418
+
419
+ # Create conversation chain
420
+ chain = ConversationChain(llm=llm, memory=memory)
421
+
422
+ # Long conversation - memory auto-compresses
423
+ for i in range(100):
424
+ response = chain.invoke({"input": f"Tell me about topic {i}"})
425
+ print(f"Turn {i}: {len(response['response'])} chars")
426
+
427
+ # Check memory stats
428
+ print(compressed_history.get_compression_stats())
429
+ # {'compression_count': 8, 'total_tokens_saved': 45000}
430
+ ```
431
+
432
+ **Impact**: Without compression, 100-turn conversation = 100K+ tokens. With HeadroomChatMessageHistory, it stays under 8K tokens while preserving recent context.
433
+
434
+ ---
435
+
436
+ ### Example 4: Multi-Tool Research Agent
437
+
438
+ ```python
439
+ from langchain_openai import ChatOpenAI
440
+ from langchain.agents import AgentExecutor, create_openai_tools_agent
441
+ from langchain_core.prompts import ChatPromptTemplate
442
+ from langchain_core.tools import tool
443
+ from headroom.integrations import (
444
+ HeadroomChatModel,
445
+ wrap_tools_with_headroom,
446
+ get_tool_metrics,
447
+ reset_tool_metrics,
448
+ )
449
+
450
+ @tool
451
+ def search_arxiv(query: str) -> str:
452
+ """Search arXiv for papers."""
453
+ return json.dumps({"papers": [{"title": f"Paper {i}", "abstract": "..." * 200} for i in range(50)]})
454
+
455
+ @tool
456
+ def search_github(query: str) -> str:
457
+ """Search GitHub repositories."""
458
+ return json.dumps({"repos": [{"name": f"repo-{i}", "description": "..." * 100, "stars": i * 100} for i in range(100)]})
459
+
460
+ @tool
461
+ def fetch_documentation(url: str) -> str:
462
+ """Fetch documentation from URL."""
463
+ return "..." * 5000 # Large doc content
464
+
465
+ # Wrap everything
466
+ llm = HeadroomChatModel(ChatOpenAI(model="gpt-4o"))
467
+ tools = wrap_tools_with_headroom([search_arxiv, search_github, fetch_documentation])
468
+
469
+ prompt = ChatPromptTemplate.from_messages([
470
+ ("system", "You are a research assistant. Use tools to gather information."),
471
+ ("human", "{input}"),
472
+ ("placeholder", "{agent_scratchpad}"),
473
+ ])
474
+
475
+ agent = create_openai_tools_agent(llm, tools, prompt)
476
+ executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
477
+
478
+ # Reset metrics for this session
479
+ reset_tool_metrics()
480
+
481
+ # Run complex research task
482
+ result = executor.invoke({
483
+ "input": "Research the latest advances in LLM context compression and find relevant GitHub projects"
484
+ })
485
+
486
+ # Check per-tool metrics
487
+ metrics = get_tool_metrics().get_summary()
488
+ print(f"Total chars saved: {metrics['total_chars_saved']:,}")
489
+ print(f"Per-tool breakdown: {metrics['by_tool']}")
490
+ ```
491
+
492
+ ---
493
+
494
+ ## Configuration Options
495
+
496
+ ### HeadroomChatModel
497
+
498
+ ```python
499
+ HeadroomChatModel(
500
+ wrapped_model, # Any LangChain BaseChatModel
501
+ headroom_config=HeadroomConfig(), # Headroom configuration
502
+ auto_detect_provider=True, # Auto-detect from wrapped model
503
+ )
504
+ ```
505
+
506
+ ### HeadroomChatMessageHistory
507
+
508
+ ```python
509
+ HeadroomChatMessageHistory(
510
+ base_history, # Any BaseChatMessageHistory
511
+ compress_threshold_tokens=4000, # Token threshold for compression
512
+ keep_recent_turns=5, # Minimum turns to preserve
513
+ model="gpt-4o", # Model for token counting
514
+ )
515
+ ```
516
+
517
+ ### HeadroomDocumentCompressor
518
+
519
+ ```python
520
+ HeadroomDocumentCompressor(
521
+ max_documents=10, # Maximum docs to return
522
+ min_relevance=0.0, # Minimum relevance score (0-1)
523
+ prefer_diverse=False, # Use MMR for diversity
524
+ )
525
+ ```
526
+
527
+ ### wrap_tools_with_headroom
528
+
529
+ ```python
530
+ wrap_tools_with_headroom(
531
+ tools, # List of LangChain tools
532
+ min_chars_to_compress=1000, # Minimum output size
533
+ smart_crusher_config=None, # SmartCrusher configuration
534
+ )
535
+ ```
536
+
537
+ ---
538
+
539
+ ## Import Reference
540
+
541
+ ```python
542
+ from headroom.integrations import (
543
+ # Chat Model
544
+ HeadroomChatModel,
545
+
546
+ # Memory
547
+ HeadroomChatMessageHistory,
548
+
549
+ # Retrievers
550
+ HeadroomDocumentCompressor,
551
+
552
+ # Agents
553
+ HeadroomToolWrapper,
554
+ wrap_tools_with_headroom,
555
+ get_tool_metrics,
556
+ reset_tool_metrics,
557
+
558
+ # Streaming
559
+ StreamingMetricsTracker,
560
+ StreamingMetricsCallback,
561
+ track_streaming_response,
562
+
563
+ # LangSmith
564
+ HeadroomLangSmithCallbackHandler,
565
+
566
+ # Provider Detection
567
+ detect_provider,
568
+ get_headroom_provider,
569
+ )
570
+
571
+ # Or import from subpackage directly
572
+ from headroom.integrations.langchain import HeadroomChatModel
573
+ from headroom.integrations.langchain.memory import HeadroomChatMessageHistory
574
+ ```
575
+
576
+ ---
577
+
578
+ ## Troubleshooting
579
+
580
+ ### LangChain not detected
581
+
582
+ ```python
583
+ from headroom.integrations import langchain_available
584
+
585
+ if not langchain_available():
586
+ print("Install with: pip install headroom-ai[langchain]")
587
+ ```
588
+
589
+ ### Provider detection failing
590
+
591
+ ```python
592
+ # Force a specific provider
593
+ from headroom.providers import AnthropicProvider
594
+
595
+ llm = HeadroomChatModel(
596
+ ChatAnthropic(model="claude-3-5-sonnet-20241022"),
597
+ auto_detect_provider=False,
598
+ )
599
+ llm._provider = AnthropicProvider()
600
+ ```
601
+
602
+ ### Memory not compressing
603
+
604
+ Check that your message count exceeds the threshold:
605
+
606
+ ```python
607
+ history = HeadroomChatMessageHistory(
608
+ base_history,
609
+ compress_threshold_tokens=1000, # Lower threshold
610
+ keep_recent_turns=2, # Fewer preserved turns
611
+ )
612
+ ```
613
+
614
+ ---
615
+
616
+ ## Performance Tips
617
+
618
+ 1. **Use tool wrapping for agents** - Agents with tools benefit most from compression
619
+ 2. **Set appropriate thresholds** - Don't compress small conversations
620
+ 3. **Enable diversity for RAG** - `prefer_diverse=True` improves answer quality
621
+ 4. **Monitor with LangSmith** - Use the callback handler to track savings over time
622
+ 5. **Batch similar requests** - Provider caching works better with stable prefixes
headroom/cache/compression_store.py CHANGED
@@ -292,7 +292,8 @@ class CompressionStore:
292
  tool_signature_hash=entry.tool_signature_hash,
293
  )
294
 
295
- # CRITICAL: Make a deep copy to return (entry could be modified/evicted after lock release)
 
296
  # The entry contains mutable fields (search_queries list) that must be copied
297
  result_entry = replace(entry, search_queries=list(entry.search_queries))
298
 
 
292
  tool_signature_hash=entry.tool_signature_hash,
293
  )
294
 
295
+ # CRITICAL: Make a deep copy to return
296
+ # (entry could be modified/evicted after lock release)
297
  # The entry contains mutable fields (search_queries list) that must be copied
298
  result_entry = replace(entry, search_queries=list(entry.search_queries))
299
 
headroom/cache/dynamic_detector.py CHANGED
@@ -588,13 +588,19 @@ class NERDetector:
588
  self._load_error: str | None = None
589
 
590
  if not _SPACY_AVAILABLE:
591
- self._load_error = "spaCy not installed. Install with: pip install spacy && python -m spacy download en_core_web_sm"
 
 
 
592
  return
593
 
594
  try:
595
  self._nlp = spacy.load(config.spacy_model)
596
  except OSError:
597
- self._load_error = f"spaCy model '{config.spacy_model}' not found. Install with: python -m spacy download {config.spacy_model}"
 
 
 
598
 
599
  @property
600
  def is_available(self) -> bool:
@@ -704,7 +710,10 @@ class SemanticDetector:
704
  self._load_error: str | None = None
705
 
706
  if not _SENTENCE_TRANSFORMERS_AVAILABLE:
707
- self._load_error = "sentence-transformers not installed. Install with: pip install sentence-transformers"
 
 
 
708
  return
709
 
710
  try:
 
588
  self._load_error: str | None = None
589
 
590
  if not _SPACY_AVAILABLE:
591
+ self._load_error = (
592
+ "spaCy not installed. Install with: "
593
+ "pip install spacy && python -m spacy download en_core_web_sm"
594
+ )
595
  return
596
 
597
  try:
598
  self._nlp = spacy.load(config.spacy_model)
599
  except OSError:
600
+ self._load_error = (
601
+ f"spaCy model '{config.spacy_model}' not found. "
602
+ f"Install with: python -m spacy download {config.spacy_model}"
603
+ )
604
 
605
  @property
606
  def is_available(self) -> bool:
 
710
  self._load_error: str | None = None
711
 
712
  if not _SENTENCE_TRANSFORMERS_AVAILABLE:
713
+ self._load_error = (
714
+ "sentence-transformers not installed. "
715
+ "Install with: pip install sentence-transformers"
716
+ )
717
  return
718
 
719
  try:
headroom/ccr/mcp_server.py CHANGED
@@ -109,9 +109,10 @@ class CCRMCPServer:
109
  Tool(
110
  name=CCR_TOOL_NAME,
111
  description=(
112
- "Retrieve original uncompressed content that was compressed to save tokens. "
113
- "Use this when you need more data than what's shown in compressed tool results. "
114
- "The hash is provided in compression markers like [N items compressed... hash=abc123]."
 
115
  ),
116
  inputSchema={
117
  "type": "object",
 
109
  Tool(
110
  name=CCR_TOOL_NAME,
111
  description=(
112
+ "Retrieve original uncompressed content that was compressed "
113
+ "to save tokens. Use this when you need more data than what's "
114
+ "shown in compressed tool results. The hash is provided in "
115
+ "compression markers like [N items compressed... hash=abc123]."
116
  ),
117
  inputSchema={
118
  "type": "object",
headroom/integrations/__init__.py CHANGED
@@ -1,18 +1,69 @@
1
  """Headroom integrations with popular LLM frameworks.
2
 
3
  Available integrations:
4
- - LangChain: HeadroomChatModel, HeadroomCallbackHandler, optimize_messages
5
- - MCP: HeadroomMCPCompressor, compress_tool_result, HeadroomMCPClientWrapper
6
 
7
- Install LangChain support: pip install headroom[langchain]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  """
9
 
 
10
  from .langchain import (
 
 
 
11
  HeadroomCallbackHandler,
 
 
12
  HeadroomChatModel,
 
 
 
13
  HeadroomRunnable,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  optimize_messages,
 
 
 
 
15
  )
 
 
16
  from .mcp import (
17
  DEFAULT_MCP_PROFILES,
18
  HeadroomMCPClientWrapper,
@@ -25,11 +76,39 @@ from .mcp import (
25
  )
26
 
27
  __all__ = [
28
- # LangChain
29
  "HeadroomChatModel",
30
  "HeadroomCallbackHandler",
31
- "optimize_messages",
32
  "HeadroomRunnable",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  # MCP
34
  "HeadroomMCPCompressor",
35
  "HeadroomMCPClientWrapper",
 
1
  """Headroom integrations with popular LLM frameworks.
2
 
3
  Available integrations:
 
 
4
 
5
+ LangChain (pip install headroom[langchain]):
6
+ - HeadroomChatModel: Drop-in wrapper for any LangChain chat model
7
+ - HeadroomChatMessageHistory: Automatic conversation compression
8
+ - HeadroomDocumentCompressor: Relevance-based document filtering
9
+ - HeadroomToolWrapper: Tool output compression for agents
10
+ - StreamingMetricsTracker: Token counting during streaming
11
+ - HeadroomLangSmithCallbackHandler: LangSmith trace enrichment
12
+
13
+ MCP (Model Context Protocol):
14
+ - HeadroomMCPCompressor: Compress MCP tool results
15
+ - compress_tool_result: Simple function for tool compression
16
+
17
+ Example:
18
+ # LangChain integration
19
+ from headroom.integrations import HeadroomChatModel
20
+ # or explicitly:
21
+ from headroom.integrations.langchain import HeadroomChatModel
22
+
23
+ # MCP integration
24
+ from headroom.integrations import compress_tool_result
25
+ # or explicitly:
26
+ from headroom.integrations.mcp import compress_tool_result
27
  """
28
 
29
+ # Re-export from langchain subpackage for backwards compatibility
30
  from .langchain import (
31
+ # Retrievers
32
+ CompressionMetrics,
33
+ # Core
34
  HeadroomCallbackHandler,
35
+ # Memory
36
+ HeadroomChatMessageHistory,
37
  HeadroomChatModel,
38
+ HeadroomDocumentCompressor,
39
+ # LangSmith
40
+ HeadroomLangSmithCallbackHandler,
41
  HeadroomRunnable,
42
+ # Agents
43
+ HeadroomToolWrapper,
44
+ OptimizationMetrics,
45
+ # Streaming
46
+ StreamingMetrics,
47
+ StreamingMetricsCallback,
48
+ StreamingMetricsTracker,
49
+ ToolCompressionMetrics,
50
+ ToolMetricsCollector,
51
+ # Provider Detection
52
+ detect_provider,
53
+ get_headroom_provider,
54
+ get_model_name_from_langchain,
55
+ get_tool_metrics,
56
+ is_langsmith_available,
57
+ is_langsmith_tracing_enabled,
58
+ langchain_available,
59
  optimize_messages,
60
+ reset_tool_metrics,
61
+ track_async_streaming_response,
62
+ track_streaming_response,
63
+ wrap_tools_with_headroom,
64
  )
65
+
66
+ # Re-export from mcp subpackage for backwards compatibility
67
  from .mcp import (
68
  DEFAULT_MCP_PROFILES,
69
  HeadroomMCPClientWrapper,
 
76
  )
77
 
78
  __all__ = [
79
+ # LangChain Core
80
  "HeadroomChatModel",
81
  "HeadroomCallbackHandler",
 
82
  "HeadroomRunnable",
83
+ "OptimizationMetrics",
84
+ "optimize_messages",
85
+ "langchain_available",
86
+ # Provider Detection
87
+ "detect_provider",
88
+ "get_headroom_provider",
89
+ "get_model_name_from_langchain",
90
+ # Memory
91
+ "HeadroomChatMessageHistory",
92
+ # Retrievers
93
+ "HeadroomDocumentCompressor",
94
+ "CompressionMetrics",
95
+ # Agents
96
+ "HeadroomToolWrapper",
97
+ "ToolCompressionMetrics",
98
+ "ToolMetricsCollector",
99
+ "wrap_tools_with_headroom",
100
+ "get_tool_metrics",
101
+ "reset_tool_metrics",
102
+ # LangSmith
103
+ "HeadroomLangSmithCallbackHandler",
104
+ "is_langsmith_available",
105
+ "is_langsmith_tracing_enabled",
106
+ # Streaming
107
+ "StreamingMetricsTracker",
108
+ "StreamingMetricsCallback",
109
+ "StreamingMetrics",
110
+ "track_streaming_response",
111
+ "track_async_streaming_response",
112
  # MCP
113
  "HeadroomMCPCompressor",
114
  "HeadroomMCPClientWrapper",
headroom/integrations/langchain/__init__.py ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """LangChain integration for Headroom.
2
+
3
+ This package provides seamless integration with LangChain, including:
4
+ - HeadroomChatModel: Drop-in wrapper for any LangChain chat model
5
+ - HeadroomChatMessageHistory: Automatic conversation compression
6
+ - HeadroomDocumentCompressor: Relevance-based document filtering
7
+ - HeadroomToolWrapper: Tool output compression for agents
8
+ - StreamingMetricsTracker: Token counting during streaming
9
+ - HeadroomLangSmithCallbackHandler: LangSmith trace enrichment
10
+
11
+ Example:
12
+ from langchain_openai import ChatOpenAI
13
+ from headroom.integrations.langchain import HeadroomChatModel
14
+
15
+ # Wrap any LangChain model
16
+ llm = HeadroomChatModel(ChatOpenAI(model="gpt-4o"))
17
+
18
+ # Use like normal - optimization happens automatically
19
+ response = llm.invoke("Hello!")
20
+
21
+ Install: pip install headroom[langchain]
22
+ """
23
+
24
+ # Core chat model wrapper
25
+ # Agent tool wrapping
26
+ from .agents import (
27
+ HeadroomToolWrapper,
28
+ ToolCompressionMetrics,
29
+ ToolMetricsCollector,
30
+ get_tool_metrics,
31
+ reset_tool_metrics,
32
+ wrap_tools_with_headroom,
33
+ )
34
+ from .chat_model import (
35
+ HeadroomCallbackHandler,
36
+ HeadroomChatModel,
37
+ HeadroomRunnable,
38
+ OptimizationMetrics,
39
+ langchain_available,
40
+ optimize_messages,
41
+ )
42
+
43
+ # LangSmith integration
44
+ from .langsmith import (
45
+ HeadroomLangSmithCallbackHandler,
46
+ is_langsmith_available,
47
+ is_langsmith_tracing_enabled,
48
+ )
49
+
50
+ # Memory integration
51
+ from .memory import HeadroomChatMessageHistory
52
+
53
+ # Provider auto-detection
54
+ from .providers import (
55
+ detect_provider,
56
+ get_headroom_provider,
57
+ get_model_name_from_langchain,
58
+ )
59
+
60
+ # Retriever integration
61
+ from .retriever import CompressionMetrics, HeadroomDocumentCompressor
62
+
63
+ # Streaming metrics
64
+ from .streaming import (
65
+ StreamingMetrics,
66
+ StreamingMetricsCallback,
67
+ StreamingMetricsTracker,
68
+ track_async_streaming_response,
69
+ track_streaming_response,
70
+ )
71
+
72
+ __all__ = [
73
+ # Core
74
+ "HeadroomChatModel",
75
+ "HeadroomCallbackHandler",
76
+ "HeadroomRunnable",
77
+ "OptimizationMetrics",
78
+ "optimize_messages",
79
+ "langchain_available",
80
+ # Provider Detection
81
+ "detect_provider",
82
+ "get_headroom_provider",
83
+ "get_model_name_from_langchain",
84
+ # Memory
85
+ "HeadroomChatMessageHistory",
86
+ # Retrievers
87
+ "HeadroomDocumentCompressor",
88
+ "CompressionMetrics",
89
+ # Agents
90
+ "HeadroomToolWrapper",
91
+ "ToolCompressionMetrics",
92
+ "ToolMetricsCollector",
93
+ "wrap_tools_with_headroom",
94
+ "get_tool_metrics",
95
+ "reset_tool_metrics",
96
+ # LangSmith
97
+ "HeadroomLangSmithCallbackHandler",
98
+ "is_langsmith_available",
99
+ "is_langsmith_tracing_enabled",
100
+ # Streaming
101
+ "StreamingMetricsTracker",
102
+ "StreamingMetricsCallback",
103
+ "StreamingMetrics",
104
+ "track_streaming_response",
105
+ "track_async_streaming_response",
106
+ ]
headroom/integrations/langchain/agents.py ADDED
@@ -0,0 +1,326 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Agent tool integration for LangChain with output compression.
2
+
3
+ This module provides HeadroomToolWrapper and wrap_tools_with_headroom
4
+ for wrapping LangChain tools to automatically compress their outputs
5
+ and track per-tool compression metrics.
6
+
7
+ Example:
8
+ from langchain.agents import create_openai_tools_agent
9
+ from langchain.tools import Tool
10
+ from headroom.integrations import wrap_tools_with_headroom
11
+
12
+ # Define tools
13
+ tools = [
14
+ Tool(name="search", func=search_func, description="Search"),
15
+ Tool(name="database", func=db_func, description="Query DB"),
16
+ ]
17
+
18
+ # Wrap with Headroom compression
19
+ wrapped_tools = wrap_tools_with_headroom(tools)
20
+
21
+ # Use in agent - outputs are automatically compressed
22
+ agent = create_openai_tools_agent(llm, wrapped_tools, prompt)
23
+ """
24
+
25
+ from __future__ import annotations
26
+
27
+ import logging
28
+ from dataclasses import dataclass, field
29
+ from datetime import datetime
30
+ from typing import Any
31
+
32
+ # LangChain imports - these are optional dependencies
33
+ try:
34
+ from langchain_core.tools import BaseTool, StructuredTool, Tool
35
+
36
+ LANGCHAIN_AVAILABLE = True
37
+ except ImportError:
38
+ LANGCHAIN_AVAILABLE = False
39
+ BaseTool = object # type: ignore[misc,assignment]
40
+ StructuredTool = object # type: ignore[misc,assignment]
41
+ Tool = object # type: ignore[misc,assignment]
42
+
43
+ from headroom.integrations.mcp import compress_tool_result
44
+
45
+ logger = logging.getLogger(__name__)
46
+
47
+
48
+ def _check_langchain_available() -> None:
49
+ """Raise ImportError if LangChain is not installed."""
50
+ if not LANGCHAIN_AVAILABLE:
51
+ raise ImportError(
52
+ "LangChain is required for this integration. "
53
+ "Install with: pip install headroom[langchain] "
54
+ "or: pip install langchain-core"
55
+ )
56
+
57
+
58
+ @dataclass
59
+ class ToolCompressionMetrics:
60
+ """Metrics from a single tool compression."""
61
+
62
+ tool_name: str
63
+ timestamp: datetime
64
+ chars_before: int
65
+ chars_after: int
66
+ chars_saved: int
67
+ compression_ratio: float
68
+ was_compressed: bool
69
+
70
+
71
+ @dataclass
72
+ class ToolMetricsCollector:
73
+ """Collects compression metrics across all tool invocations."""
74
+
75
+ metrics: list[ToolCompressionMetrics] = field(default_factory=list)
76
+
77
+ def add(self, metric: ToolCompressionMetrics) -> None:
78
+ """Add a metric entry."""
79
+ self.metrics.append(metric)
80
+ # Keep only last 1000
81
+ if len(self.metrics) > 1000:
82
+ self.metrics = self.metrics[-1000:]
83
+
84
+ def get_summary(self) -> dict[str, Any]:
85
+ """Get summary statistics."""
86
+ if not self.metrics:
87
+ return {
88
+ "total_invocations": 0,
89
+ "total_compressions": 0,
90
+ "total_chars_saved": 0,
91
+ }
92
+
93
+ compressed = [m for m in self.metrics if m.was_compressed]
94
+ return {
95
+ "total_invocations": len(self.metrics),
96
+ "total_compressions": len(compressed),
97
+ "total_chars_saved": sum(m.chars_saved for m in self.metrics),
98
+ "average_compression_ratio": (
99
+ sum(m.compression_ratio for m in compressed) / len(compressed) if compressed else 0
100
+ ),
101
+ "by_tool": self._get_by_tool_stats(),
102
+ }
103
+
104
+ def _get_by_tool_stats(self) -> dict[str, dict[str, Any]]:
105
+ """Get per-tool statistics."""
106
+ by_tool: dict[str, list[ToolCompressionMetrics]] = {}
107
+ for m in self.metrics:
108
+ if m.tool_name not in by_tool:
109
+ by_tool[m.tool_name] = []
110
+ by_tool[m.tool_name].append(m)
111
+
112
+ result = {}
113
+ for name, tool_metrics in by_tool.items():
114
+ compressed = [m for m in tool_metrics if m.was_compressed]
115
+ result[name] = {
116
+ "invocations": len(tool_metrics),
117
+ "compressions": len(compressed),
118
+ "chars_saved": sum(m.chars_saved for m in tool_metrics),
119
+ }
120
+ return result
121
+
122
+
123
+ # Global metrics collector
124
+ _global_metrics = ToolMetricsCollector()
125
+
126
+
127
+ def get_tool_metrics() -> ToolMetricsCollector:
128
+ """Get the global tool metrics collector."""
129
+ return _global_metrics
130
+
131
+
132
+ def reset_tool_metrics() -> None:
133
+ """Reset global tool metrics."""
134
+ global _global_metrics
135
+ _global_metrics = ToolMetricsCollector()
136
+
137
+
138
+ class HeadroomToolWrapper:
139
+ """Wraps a LangChain tool to compress its output.
140
+
141
+ Applies SmartCrusher compression to tool outputs, particularly
142
+ useful for tools that return large JSON arrays (search results,
143
+ database queries, etc.).
144
+
145
+ Example:
146
+ from langchain.tools import Tool
147
+ from headroom.integrations import HeadroomToolWrapper
148
+
149
+ def search(query: str) -> str:
150
+ # Returns large JSON with 1000 results
151
+ return json.dumps({"results": [...1000 items...]})
152
+
153
+ search_tool = Tool(name="search", func=search, description="Search")
154
+ wrapped = HeadroomToolWrapper(search_tool)
155
+
156
+ # Use wrapped tool - output automatically compressed
157
+ result = wrapped("python tutorials")
158
+
159
+ Attributes:
160
+ tool: The wrapped LangChain tool
161
+ min_chars_to_compress: Minimum output size to trigger compression
162
+ metrics_collector: Collector for compression metrics
163
+ """
164
+
165
+ def __init__(
166
+ self,
167
+ tool: BaseTool,
168
+ min_chars_to_compress: int = 1000,
169
+ metrics_collector: ToolMetricsCollector | None = None,
170
+ ):
171
+ """Initialize HeadroomToolWrapper.
172
+
173
+ Args:
174
+ tool: The LangChain BaseTool to wrap.
175
+ min_chars_to_compress: Minimum character count for output
176
+ before compression is applied. Default 1000.
177
+ metrics_collector: Collector for metrics. Uses global
178
+ collector if not specified.
179
+ """
180
+ _check_langchain_available()
181
+
182
+ self.tool = tool
183
+ self.min_chars_to_compress = min_chars_to_compress
184
+ self._metrics = metrics_collector or _global_metrics
185
+
186
+ # Copy tool metadata
187
+ self.name = tool.name
188
+ self.description = tool.description
189
+
190
+ def __call__(self, *args: Any, **kwargs: Any) -> str:
191
+ """Invoke the tool and compress output.
192
+
193
+ Args:
194
+ *args: Arguments to pass to the tool.
195
+ **kwargs: Keyword arguments to pass to the tool.
196
+
197
+ Returns:
198
+ Compressed tool output as string.
199
+ """
200
+ # Invoke underlying tool
201
+ result = self.tool.invoke(*args, **kwargs)
202
+
203
+ # Convert to string if needed
204
+ if not isinstance(result, str):
205
+ result = str(result)
206
+
207
+ # Check if compression is needed
208
+ if len(result) < self.min_chars_to_compress:
209
+ self._record_metrics(result, result, was_compressed=False)
210
+ return result
211
+
212
+ # Try to compress
213
+ compressed = self._compress_output(result)
214
+ self._record_metrics(result, compressed, was_compressed=True)
215
+
216
+ return compressed
217
+
218
+ def invoke(self, *args: Any, **kwargs: Any) -> str:
219
+ """Invoke the tool (alias for __call__)."""
220
+ return self(*args, **kwargs)
221
+
222
+ def _compress_output(self, output: str) -> str:
223
+ """Apply compression to tool output.
224
+
225
+ Args:
226
+ output: Tool output string.
227
+
228
+ Returns:
229
+ Compressed output.
230
+ """
231
+ try:
232
+ return compress_tool_result(
233
+ content=output,
234
+ tool_name=self.name,
235
+ )
236
+ except Exception as e:
237
+ logger.debug(f"Tool compression failed: {e}")
238
+ return output
239
+
240
+ def _record_metrics(self, original: str, compressed: str, was_compressed: bool) -> None:
241
+ """Record compression metrics.
242
+
243
+ Args:
244
+ original: Original output.
245
+ compressed: Compressed output.
246
+ was_compressed: Whether compression was applied.
247
+ """
248
+ chars_before = len(original)
249
+ chars_after = len(compressed)
250
+ chars_saved = chars_before - chars_after
251
+
252
+ metric = ToolCompressionMetrics(
253
+ tool_name=self.name,
254
+ timestamp=datetime.now(),
255
+ chars_before=chars_before,
256
+ chars_after=chars_after,
257
+ chars_saved=max(0, chars_saved),
258
+ compression_ratio=chars_after / chars_before if chars_before > 0 else 1.0,
259
+ was_compressed=was_compressed and chars_saved > 0,
260
+ )
261
+
262
+ self._metrics.add(metric)
263
+
264
+ if was_compressed and chars_saved > 0:
265
+ logger.info(
266
+ f"HeadroomToolWrapper[{self.name}]: {chars_before} -> {chars_after} chars "
267
+ f"({chars_saved} saved, {metric.compression_ratio:.1%} of original)"
268
+ )
269
+
270
+ def as_langchain_tool(self) -> StructuredTool:
271
+ """Convert wrapper back to a LangChain tool.
272
+
273
+ Useful when you need to pass the wrapped tool to APIs
274
+ that expect a LangChain tool type.
275
+
276
+ Returns:
277
+ StructuredTool that wraps this wrapper.
278
+ """
279
+ return StructuredTool.from_function(
280
+ func=self.__call__,
281
+ name=self.name,
282
+ description=self.description,
283
+ )
284
+
285
+
286
+ def wrap_tools_with_headroom(
287
+ tools: list[BaseTool],
288
+ min_chars_to_compress: int = 1000,
289
+ metrics_collector: ToolMetricsCollector | None = None,
290
+ ) -> list[StructuredTool]:
291
+ """Wrap multiple LangChain tools with Headroom compression.
292
+
293
+ Convenience function to wrap all tools in a list at once.
294
+
295
+ Args:
296
+ tools: List of LangChain tools to wrap.
297
+ min_chars_to_compress: Minimum output size for compression.
298
+ metrics_collector: Shared metrics collector for all tools.
299
+
300
+ Returns:
301
+ List of wrapped tools as StructuredTools.
302
+
303
+ Example:
304
+ from langchain.tools import Tool
305
+ from headroom.integrations import wrap_tools_with_headroom
306
+
307
+ tools = [search_tool, database_tool, api_tool]
308
+ wrapped = wrap_tools_with_headroom(tools)
309
+
310
+ # Use wrapped tools in agent
311
+ agent = create_openai_tools_agent(llm, wrapped, prompt)
312
+ """
313
+ _check_langchain_available()
314
+
315
+ collector = metrics_collector or _global_metrics
316
+
317
+ wrapped = []
318
+ for tool in tools:
319
+ wrapper = HeadroomToolWrapper(
320
+ tool=tool,
321
+ min_chars_to_compress=min_chars_to_compress,
322
+ metrics_collector=collector,
323
+ )
324
+ wrapped.append(wrapper.as_langchain_tool())
325
+
326
+ return wrapped
headroom/integrations/{langchain.py → langchain/chat_model.py} RENAMED
@@ -27,9 +27,10 @@ Example:
27
 
28
  from __future__ import annotations
29
 
 
30
  import json
31
  import logging
32
- from collections.abc import Iterator, Sequence
33
  from dataclasses import dataclass
34
  from datetime import datetime
35
  from typing import Any
@@ -48,13 +49,14 @@ try:
48
  )
49
  from langchain_core.outputs import ChatGeneration, ChatResult
50
  from langchain_core.runnables import RunnableLambda
51
- from pydantic import Field, PrivateAttr
52
 
53
  LANGCHAIN_AVAILABLE = True
54
  except ImportError:
55
  LANGCHAIN_AVAILABLE = False
56
  BaseChatModel = object
57
  BaseCallbackHandler = object
 
58
  Field = lambda **kwargs: None # type: ignore[assignment] # noqa: E731
59
  PrivateAttr = lambda **kwargs: None # type: ignore[assignment] # noqa: E731
60
 
@@ -62,10 +64,12 @@ from headroom import HeadroomConfig, HeadroomMode
62
  from headroom.providers import OpenAIProvider
63
  from headroom.transforms import TransformPipeline
64
 
 
 
65
  logger = logging.getLogger(__name__)
66
 
67
 
68
- def _check_langchain_available():
69
  """Raise ImportError if LangChain is not installed."""
70
  if not LANGCHAIN_AVAILABLE:
71
  raise ImportError(
@@ -133,6 +137,10 @@ class HeadroomChatModel(BaseChatModel):
133
  wrapped_model: Any = Field(description="The wrapped LangChain chat model")
134
  headroom_config: Any = Field(default=None, description="Headroom configuration")
135
  mode: HeadroomMode = Field(default=HeadroomMode.OPTIMIZE, description="Headroom mode")
 
 
 
 
136
 
137
  # Private attributes (not serialized)
138
  _metrics_history: list = PrivateAttr(default_factory=list)
@@ -140,24 +148,27 @@ class HeadroomChatModel(BaseChatModel):
140
  _pipeline: Any = PrivateAttr(default=None)
141
  _provider: Any = PrivateAttr(default=None)
142
 
143
- class Config:
144
- """Pydantic config for LangChain compatibility."""
145
-
146
- arbitrary_types_allowed = True
147
 
148
  def __init__(
149
  self,
150
  wrapped_model: BaseChatModel,
151
  config: HeadroomConfig | None = None,
152
  mode: HeadroomMode = HeadroomMode.OPTIMIZE,
153
- **kwargs,
154
- ):
 
155
  """Initialize HeadroomChatModel.
156
 
157
  Args:
158
  wrapped_model: Any LangChain BaseChatModel to wrap
159
  config: HeadroomConfig for optimization settings
160
  mode: HeadroomMode (AUDIT, OPTIMIZE, or SIMULATE)
 
 
 
 
161
  **kwargs: Additional arguments passed to BaseChatModel
162
  """
163
  _check_langchain_available()
@@ -166,6 +177,7 @@ class HeadroomChatModel(BaseChatModel):
166
  wrapped_model=wrapped_model,
167
  headroom_config=config or HeadroomConfig(),
168
  mode=mode,
 
169
  **kwargs,
170
  )
171
  self._metrics_history = []
@@ -188,9 +200,17 @@ class HeadroomChatModel(BaseChatModel):
188
 
189
  @property
190
  def pipeline(self) -> TransformPipeline:
191
- """Lazily initialize TransformPipeline."""
 
 
 
 
192
  if self._pipeline is None:
193
- self._provider = OpenAIProvider()
 
 
 
 
194
  self._pipeline = TransformPipeline(
195
  config=self.headroom_config,
196
  provider=self._provider,
@@ -290,10 +310,11 @@ class HeadroomChatModel(BaseChatModel):
290
  # Convert to OpenAI format
291
  openai_messages = self._convert_messages_to_openai(messages)
292
 
293
- # Get model name and context limit
294
- model = getattr(self.wrapped_model, "model_name", None)
295
- if model is None:
296
- model = getattr(self.wrapped_model, "model", "gpt-4o")
 
297
 
298
  # Get model context limit from provider
299
  model_limit = self._provider.get_context_limit(model) if self._provider else 128000
@@ -342,7 +363,7 @@ class HeadroomChatModel(BaseChatModel):
342
  messages: list[BaseMessage],
343
  stop: list[str] | None = None,
344
  run_manager: Any = None,
345
- **kwargs,
346
  ) -> ChatResult:
347
  """Generate response with Headroom optimization.
348
 
@@ -371,14 +392,15 @@ class HeadroomChatModel(BaseChatModel):
371
  messages: list[BaseMessage],
372
  stop: list[str] | None = None,
373
  run_manager: Any = None,
374
- **kwargs,
375
  ) -> Iterator[ChatGeneration]:
376
  """Stream response with Headroom optimization."""
377
  # Optimize messages
378
  optimized_messages, metrics = self._optimize_messages(messages)
379
 
380
  logger.info(
381
- f"Headroom optimized (streaming): {metrics.tokens_before} -> {metrics.tokens_after} tokens"
 
382
  )
383
 
384
  # Stream from wrapped model
@@ -389,13 +411,78 @@ class HeadroomChatModel(BaseChatModel):
389
  **kwargs,
390
  )
391
 
392
- def bind_tools(self, tools: Sequence[Any], **kwargs) -> HeadroomChatModel:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
393
  """Bind tools to the wrapped model."""
394
  new_wrapped = self.wrapped_model.bind_tools(tools, **kwargs)
395
  return HeadroomChatModel(
396
  wrapped_model=new_wrapped,
397
  config=self.headroom_config,
398
  mode=self.mode,
 
399
  )
400
 
401
  def get_savings_summary(self) -> dict[str, Any]:
@@ -494,7 +581,7 @@ class HeadroomCallbackHandler(BaseCallbackHandler):
494
  self,
495
  serialized: dict[str, Any],
496
  prompts: list[str],
497
- **kwargs,
498
  ) -> None:
499
  """Called when LLM starts processing."""
500
  self._current_request = {
@@ -511,7 +598,7 @@ class HeadroomCallbackHandler(BaseCallbackHandler):
511
  self,
512
  serialized: dict[str, Any],
513
  messages: list[list[BaseMessage]],
514
- **kwargs,
515
  ) -> None:
516
  """Called when chat model starts processing."""
517
  # Estimate tokens from messages
@@ -532,7 +619,10 @@ class HeadroomCallbackHandler(BaseCallbackHandler):
532
 
533
  # Check token alert
534
  if self.token_alert_threshold and estimated_tokens > self.token_alert_threshold:
535
- alert = f"Token alert: {estimated_tokens} tokens exceeds threshold {self.token_alert_threshold}"
 
 
 
536
  self._alerts.append(alert)
537
  logger.warning(alert)
538
 
@@ -542,7 +632,7 @@ class HeadroomCallbackHandler(BaseCallbackHandler):
542
  f"Chat model request: ~{estimated_tokens} input tokens",
543
  )
544
 
545
- def on_llm_end(self, response: Any, **kwargs) -> None:
546
  """Called when LLM finishes processing."""
547
  if self._current_request is None:
548
  return
@@ -579,7 +669,7 @@ class HeadroomCallbackHandler(BaseCallbackHandler):
579
 
580
  self._current_request = None
581
 
582
- def on_llm_error(self, error: Exception, **kwargs) -> None:
583
  """Called when LLM encounters an error."""
584
  if self._current_request:
585
  self._current_request["error"] = str(error)
@@ -677,19 +767,19 @@ class HeadroomRunnable:
677
  )
678
  return self._pipeline
679
 
680
- def __or__(self, other):
681
  """Support pipe operator for LCEL composition."""
682
  from langchain_core.runnables import RunnableSequence
683
 
684
  return RunnableSequence(first=self.as_runnable(), last=other)
685
 
686
- def __ror__(self, other):
687
  """Support reverse pipe operator."""
688
  from langchain_core.runnables import RunnableSequence
689
 
690
  return RunnableSequence(first=other, last=self.as_runnable())
691
 
692
- def as_runnable(self):
693
  """Convert to LangChain Runnable."""
694
  return RunnableLambda(self._optimize)
695
 
 
27
 
28
  from __future__ import annotations
29
 
30
+ import asyncio
31
  import json
32
  import logging
33
+ from collections.abc import AsyncIterator, Iterator, Sequence
34
  from dataclasses import dataclass
35
  from datetime import datetime
36
  from typing import Any
 
49
  )
50
  from langchain_core.outputs import ChatGeneration, ChatResult
51
  from langchain_core.runnables import RunnableLambda
52
+ from pydantic import ConfigDict, Field, PrivateAttr
53
 
54
  LANGCHAIN_AVAILABLE = True
55
  except ImportError:
56
  LANGCHAIN_AVAILABLE = False
57
  BaseChatModel = object
58
  BaseCallbackHandler = object
59
+ ConfigDict = lambda **kwargs: {} # type: ignore[assignment,misc] # noqa: E731
60
  Field = lambda **kwargs: None # type: ignore[assignment] # noqa: E731
61
  PrivateAttr = lambda **kwargs: None # type: ignore[assignment] # noqa: E731
62
 
 
64
  from headroom.providers import OpenAIProvider
65
  from headroom.transforms import TransformPipeline
66
 
67
+ from .providers import get_headroom_provider, get_model_name_from_langchain
68
+
69
  logger = logging.getLogger(__name__)
70
 
71
 
72
+ def _check_langchain_available() -> None:
73
  """Raise ImportError if LangChain is not installed."""
74
  if not LANGCHAIN_AVAILABLE:
75
  raise ImportError(
 
137
  wrapped_model: Any = Field(description="The wrapped LangChain chat model")
138
  headroom_config: Any = Field(default=None, description="Headroom configuration")
139
  mode: HeadroomMode = Field(default=HeadroomMode.OPTIMIZE, description="Headroom mode")
140
+ auto_detect_provider: bool = Field(
141
+ default=True,
142
+ description="Auto-detect provider from wrapped model (OpenAI, Anthropic, Google)",
143
+ )
144
 
145
  # Private attributes (not serialized)
146
  _metrics_history: list = PrivateAttr(default_factory=list)
 
148
  _pipeline: Any = PrivateAttr(default=None)
149
  _provider: Any = PrivateAttr(default=None)
150
 
151
+ # Pydantic v2 config for LangChain compatibility
152
+ model_config = ConfigDict(arbitrary_types_allowed=True)
 
 
153
 
154
  def __init__(
155
  self,
156
  wrapped_model: BaseChatModel,
157
  config: HeadroomConfig | None = None,
158
  mode: HeadroomMode = HeadroomMode.OPTIMIZE,
159
+ auto_detect_provider: bool = True,
160
+ **kwargs: Any,
161
+ ) -> None:
162
  """Initialize HeadroomChatModel.
163
 
164
  Args:
165
  wrapped_model: Any LangChain BaseChatModel to wrap
166
  config: HeadroomConfig for optimization settings
167
  mode: HeadroomMode (AUDIT, OPTIMIZE, or SIMULATE)
168
+ auto_detect_provider: Auto-detect provider from wrapped model.
169
+ When True (default), automatically detects if the wrapped model
170
+ is OpenAI, Anthropic, Google, etc. and uses the appropriate
171
+ Headroom provider for accurate token counting.
172
  **kwargs: Additional arguments passed to BaseChatModel
173
  """
174
  _check_langchain_available()
 
177
  wrapped_model=wrapped_model,
178
  headroom_config=config or HeadroomConfig(),
179
  mode=mode,
180
+ auto_detect_provider=auto_detect_provider,
181
  **kwargs,
182
  )
183
  self._metrics_history = []
 
200
 
201
  @property
202
  def pipeline(self) -> TransformPipeline:
203
+ """Lazily initialize TransformPipeline.
204
+
205
+ When auto_detect_provider is True, automatically detects the provider
206
+ from the wrapped model's class path (e.g., ChatAnthropic -> AnthropicProvider).
207
+ """
208
  if self._pipeline is None:
209
+ if self.auto_detect_provider:
210
+ self._provider = get_headroom_provider(self.wrapped_model)
211
+ logger.debug(f"Auto-detected provider: {self._provider.__class__.__name__}")
212
+ else:
213
+ self._provider = OpenAIProvider()
214
  self._pipeline = TransformPipeline(
215
  config=self.headroom_config,
216
  provider=self._provider,
 
310
  # Convert to OpenAI format
311
  openai_messages = self._convert_messages_to_openai(messages)
312
 
313
+ # Get model name from wrapped model
314
+ model = get_model_name_from_langchain(self.wrapped_model)
315
+
316
+ # Ensure pipeline is initialized (this also sets up provider)
317
+ _ = self.pipeline
318
 
319
  # Get model context limit from provider
320
  model_limit = self._provider.get_context_limit(model) if self._provider else 128000
 
363
  messages: list[BaseMessage],
364
  stop: list[str] | None = None,
365
  run_manager: Any = None,
366
+ **kwargs: Any,
367
  ) -> ChatResult:
368
  """Generate response with Headroom optimization.
369
 
 
392
  messages: list[BaseMessage],
393
  stop: list[str] | None = None,
394
  run_manager: Any = None,
395
+ **kwargs: Any,
396
  ) -> Iterator[ChatGeneration]:
397
  """Stream response with Headroom optimization."""
398
  # Optimize messages
399
  optimized_messages, metrics = self._optimize_messages(messages)
400
 
401
  logger.info(
402
+ f"Headroom optimized (streaming): {metrics.tokens_before} -> "
403
+ f"{metrics.tokens_after} tokens"
404
  )
405
 
406
  # Stream from wrapped model
 
411
  **kwargs,
412
  )
413
 
414
+ async def _agenerate(
415
+ self,
416
+ messages: list[BaseMessage],
417
+ stop: list[str] | None = None,
418
+ run_manager: Any = None,
419
+ **kwargs: Any,
420
+ ) -> ChatResult:
421
+ """Async generate response with Headroom optimization.
422
+
423
+ This enables `await model.ainvoke(messages)` to work correctly.
424
+ The optimization step runs in a thread executor since it's CPU-bound.
425
+ """
426
+ # Run optimization in executor (CPU-bound)
427
+ loop = asyncio.get_event_loop()
428
+ optimized_messages, metrics = await loop.run_in_executor(
429
+ None, self._optimize_messages, messages
430
+ )
431
+
432
+ logger.info(
433
+ f"Headroom optimized (async): {metrics.tokens_before} -> {metrics.tokens_after} tokens "
434
+ f"({metrics.savings_percent:.1f}% saved)"
435
+ )
436
+
437
+ # Call wrapped model's async generate
438
+ result = await self.wrapped_model._agenerate(
439
+ optimized_messages,
440
+ stop=stop,
441
+ run_manager=run_manager,
442
+ **kwargs,
443
+ )
444
+
445
+ return result
446
+
447
+ async def _astream(
448
+ self,
449
+ messages: list[BaseMessage],
450
+ stop: list[str] | None = None,
451
+ run_manager: Any = None,
452
+ **kwargs: Any,
453
+ ) -> AsyncIterator[ChatGeneration]:
454
+ """Async stream response with Headroom optimization.
455
+
456
+ This enables `async for chunk in model.astream(messages)` to work correctly.
457
+ """
458
+ # Run optimization in executor (CPU-bound)
459
+ loop = asyncio.get_event_loop()
460
+ optimized_messages, metrics = await loop.run_in_executor(
461
+ None, self._optimize_messages, messages
462
+ )
463
+
464
+ logger.info(
465
+ f"Headroom optimized (async streaming): {metrics.tokens_before} -> "
466
+ f"{metrics.tokens_after} tokens"
467
+ )
468
+
469
+ # Async stream from wrapped model
470
+ async for chunk in self.wrapped_model._astream(
471
+ optimized_messages,
472
+ stop=stop,
473
+ run_manager=run_manager,
474
+ **kwargs,
475
+ ):
476
+ yield chunk
477
+
478
+ def bind_tools(self, tools: Sequence[Any], **kwargs: Any) -> HeadroomChatModel:
479
  """Bind tools to the wrapped model."""
480
  new_wrapped = self.wrapped_model.bind_tools(tools, **kwargs)
481
  return HeadroomChatModel(
482
  wrapped_model=new_wrapped,
483
  config=self.headroom_config,
484
  mode=self.mode,
485
+ auto_detect_provider=self.auto_detect_provider,
486
  )
487
 
488
  def get_savings_summary(self) -> dict[str, Any]:
 
581
  self,
582
  serialized: dict[str, Any],
583
  prompts: list[str],
584
+ **kwargs: Any,
585
  ) -> None:
586
  """Called when LLM starts processing."""
587
  self._current_request = {
 
598
  self,
599
  serialized: dict[str, Any],
600
  messages: list[list[BaseMessage]],
601
+ **kwargs: Any,
602
  ) -> None:
603
  """Called when chat model starts processing."""
604
  # Estimate tokens from messages
 
619
 
620
  # Check token alert
621
  if self.token_alert_threshold and estimated_tokens > self.token_alert_threshold:
622
+ alert = (
623
+ f"Token alert: {estimated_tokens} tokens exceeds "
624
+ f"threshold {self.token_alert_threshold}"
625
+ )
626
  self._alerts.append(alert)
627
  logger.warning(alert)
628
 
 
632
  f"Chat model request: ~{estimated_tokens} input tokens",
633
  )
634
 
635
+ def on_llm_end(self, response: Any, **kwargs: Any) -> None:
636
  """Called when LLM finishes processing."""
637
  if self._current_request is None:
638
  return
 
669
 
670
  self._current_request = None
671
 
672
+ def on_llm_error(self, error: Exception, **kwargs: Any) -> None:
673
  """Called when LLM encounters an error."""
674
  if self._current_request:
675
  self._current_request["error"] = str(error)
 
767
  )
768
  return self._pipeline
769
 
770
+ def __or__(self, other: Any) -> Any:
771
  """Support pipe operator for LCEL composition."""
772
  from langchain_core.runnables import RunnableSequence
773
 
774
  return RunnableSequence(first=self.as_runnable(), last=other)
775
 
776
+ def __ror__(self, other: Any) -> Any:
777
  """Support reverse pipe operator."""
778
  from langchain_core.runnables import RunnableSequence
779
 
780
  return RunnableSequence(first=other, last=self.as_runnable())
781
 
782
+ def as_runnable(self) -> RunnableLambda:
783
  """Convert to LangChain Runnable."""
784
  return RunnableLambda(self._optimize)
785
 
headroom/integrations/langchain/langsmith.py ADDED
@@ -0,0 +1,324 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """LangSmith integration for Headroom compression metrics.
2
+
3
+ This module provides HeadroomLangSmithCallbackHandler, a LangChain callback
4
+ handler that adds Headroom compression metrics to LangSmith traces.
5
+
6
+ When used with HeadroomChatModel, it automatically captures:
7
+ - Tokens before/after optimization
8
+ - Savings percentage
9
+ - Transforms applied
10
+ - Per-request compression details
11
+
12
+ Example:
13
+ import os
14
+ from langchain_openai import ChatOpenAI
15
+ from headroom.integrations import (
16
+ HeadroomChatModel,
17
+ HeadroomLangSmithCallbackHandler,
18
+ )
19
+
20
+ # Enable LangSmith tracing
21
+ os.environ["LANGCHAIN_TRACING_V2"] = "true"
22
+ os.environ["LANGCHAIN_API_KEY"] = "..."
23
+
24
+ # Create handler
25
+ handler = HeadroomLangSmithCallbackHandler()
26
+
27
+ # Use with HeadroomChatModel
28
+ llm = HeadroomChatModel(
29
+ ChatOpenAI(model="gpt-4o"),
30
+ callbacks=[handler],
31
+ )
32
+
33
+ # Traces will include headroom.* metadata
34
+ response = llm.invoke("Hello!")
35
+ """
36
+
37
+ from __future__ import annotations
38
+
39
+ import logging
40
+ import os
41
+ from dataclasses import dataclass, field
42
+ from datetime import datetime
43
+ from typing import Any
44
+ from uuid import UUID
45
+
46
+ # LangChain imports - these are optional dependencies
47
+ try:
48
+ from langchain_core.callbacks import BaseCallbackHandler
49
+ from langchain_core.messages import BaseMessage
50
+ from langchain_core.outputs import LLMResult
51
+
52
+ LANGCHAIN_AVAILABLE = True
53
+ except ImportError:
54
+ LANGCHAIN_AVAILABLE = False
55
+ BaseCallbackHandler = object # type: ignore[misc,assignment]
56
+ LLMResult = object # type: ignore[misc,assignment]
57
+
58
+ # LangSmith imports - optional
59
+ try:
60
+ from langsmith import Client as LangSmithClient
61
+
62
+ LANGSMITH_AVAILABLE = True
63
+ except ImportError:
64
+ LANGSMITH_AVAILABLE = False
65
+ LangSmithClient = None # type: ignore[misc,assignment]
66
+
67
+ logger = logging.getLogger(__name__)
68
+
69
+
70
+ def _check_langchain_available() -> None:
71
+ """Raise ImportError if LangChain is not installed."""
72
+ if not LANGCHAIN_AVAILABLE:
73
+ raise ImportError(
74
+ "LangChain is required for this integration. "
75
+ "Install with: pip install headroom[langchain] "
76
+ "or: pip install langchain-core"
77
+ )
78
+
79
+
80
+ @dataclass
81
+ class PendingMetrics:
82
+ """Metrics pending attachment to a LangSmith run."""
83
+
84
+ tokens_before: int
85
+ tokens_after: int
86
+ tokens_saved: int
87
+ savings_percent: float
88
+ transforms_applied: list[str]
89
+ timestamp: datetime = field(default_factory=datetime.now)
90
+
91
+
92
+ class HeadroomLangSmithCallbackHandler(BaseCallbackHandler):
93
+ """Callback handler that adds Headroom metrics to LangSmith traces.
94
+
95
+ Integrates with LangSmith to provide visibility into context
96
+ optimization within traces. Metrics appear as metadata with
97
+ the `headroom.` prefix.
98
+
99
+ Works automatically when:
100
+ 1. LANGCHAIN_TRACING_V2=true is set
101
+ 2. Used as a callback with HeadroomChatModel
102
+ 3. LangSmith API key is configured
103
+
104
+ Example:
105
+ from headroom.integrations import (
106
+ HeadroomChatModel,
107
+ HeadroomLangSmithCallbackHandler,
108
+ )
109
+
110
+ handler = HeadroomLangSmithCallbackHandler()
111
+ llm = HeadroomChatModel(
112
+ ChatOpenAI(model="gpt-4o"),
113
+ callbacks=[handler],
114
+ )
115
+
116
+ response = llm.invoke("Hello!")
117
+ # LangSmith trace now includes:
118
+ # - headroom.tokens_before
119
+ # - headroom.tokens_after
120
+ # - headroom.tokens_saved
121
+ # - headroom.savings_percent
122
+ # - headroom.transforms_applied
123
+
124
+ Attributes:
125
+ langsmith_client: LangSmith client for updating runs.
126
+ pending_metrics: Metrics waiting to be attached to runs.
127
+ """
128
+
129
+ def __init__(
130
+ self,
131
+ langsmith_client: Any = None,
132
+ auto_update_runs: bool = True,
133
+ ):
134
+ """Initialize HeadroomLangSmithCallbackHandler.
135
+
136
+ Args:
137
+ langsmith_client: LangSmith client instance. Auto-creates
138
+ one if not provided and LangSmith is available.
139
+ auto_update_runs: If True, automatically updates LangSmith
140
+ runs with Headroom metadata. Default True.
141
+ """
142
+ _check_langchain_available()
143
+
144
+ self._client = langsmith_client
145
+ self._auto_update = auto_update_runs
146
+ self._pending_metrics: dict[str, PendingMetrics] = {}
147
+ self._run_metrics: dict[str, dict[str, Any]] = {}
148
+
149
+ # Initialize LangSmith client if available and not provided
150
+ if self._client is None and LANGSMITH_AVAILABLE and auto_update_runs:
151
+ try:
152
+ if os.environ.get("LANGCHAIN_API_KEY"):
153
+ self._client = LangSmithClient()
154
+ except Exception as e:
155
+ logger.debug(f"Could not initialize LangSmith client: {e}")
156
+
157
+ def set_headroom_metrics(
158
+ self,
159
+ run_id: str | UUID,
160
+ tokens_before: int,
161
+ tokens_after: int,
162
+ transforms_applied: list[str] | None = None,
163
+ ) -> None:
164
+ """Set Headroom metrics for a run.
165
+
166
+ Call this from HeadroomChatModel after optimization to attach
167
+ metrics to the current run.
168
+
169
+ Args:
170
+ run_id: The LangSmith run ID.
171
+ tokens_before: Token count before optimization.
172
+ tokens_after: Token count after optimization.
173
+ transforms_applied: List of transforms that were applied.
174
+ """
175
+ run_id_str = str(run_id)
176
+ tokens_saved = tokens_before - tokens_after
177
+ savings_percent = (tokens_saved / tokens_before * 100) if tokens_before > 0 else 0.0
178
+
179
+ metrics = PendingMetrics(
180
+ tokens_before=tokens_before,
181
+ tokens_after=tokens_after,
182
+ tokens_saved=tokens_saved,
183
+ savings_percent=savings_percent,
184
+ transforms_applied=transforms_applied or [],
185
+ )
186
+
187
+ self._pending_metrics[run_id_str] = metrics
188
+
189
+ logger.debug(
190
+ f"Headroom metrics set for run {run_id_str}: "
191
+ f"{tokens_before} -> {tokens_after} tokens ({savings_percent:.1f}% saved)"
192
+ )
193
+
194
+ def on_chat_model_start(
195
+ self,
196
+ serialized: dict[str, Any],
197
+ messages: list[list[BaseMessage]],
198
+ *,
199
+ run_id: UUID,
200
+ **kwargs: Any,
201
+ ) -> None:
202
+ """Called when chat model starts.
203
+
204
+ Records the run ID for later metric attachment.
205
+ """
206
+ run_id_str = str(run_id)
207
+ # Initialize empty metrics for this run
208
+ self._run_metrics[run_id_str] = {}
209
+
210
+ def on_llm_end(
211
+ self,
212
+ response: LLMResult,
213
+ *,
214
+ run_id: UUID,
215
+ **kwargs: Any,
216
+ ) -> None:
217
+ """Called when LLM completes.
218
+
219
+ Attaches pending Headroom metrics to the LangSmith run.
220
+ """
221
+ run_id_str = str(run_id)
222
+
223
+ # Check for pending metrics
224
+ if run_id_str in self._pending_metrics:
225
+ metrics = self._pending_metrics.pop(run_id_str)
226
+ self._attach_metrics_to_run(run_id_str, metrics)
227
+
228
+ def _attach_metrics_to_run(self, run_id: str, metrics: PendingMetrics) -> None:
229
+ """Attach Headroom metrics to a LangSmith run.
230
+
231
+ Args:
232
+ run_id: The run ID.
233
+ metrics: Metrics to attach.
234
+ """
235
+ metadata = {
236
+ "headroom.tokens_before": metrics.tokens_before,
237
+ "headroom.tokens_after": metrics.tokens_after,
238
+ "headroom.tokens_saved": metrics.tokens_saved,
239
+ "headroom.savings_percent": round(metrics.savings_percent, 2),
240
+ "headroom.transforms_applied": metrics.transforms_applied,
241
+ "headroom.optimization_timestamp": metrics.timestamp.isoformat(),
242
+ }
243
+
244
+ # Store in run metrics
245
+ self._run_metrics[run_id] = metadata
246
+
247
+ # Update LangSmith run if client available
248
+ if self._client and self._auto_update:
249
+ try:
250
+ self._client.update_run(
251
+ run_id=run_id,
252
+ extra={"metadata": metadata},
253
+ )
254
+ logger.debug(f"Updated LangSmith run {run_id} with Headroom metrics")
255
+ except Exception as e:
256
+ logger.debug(f"Could not update LangSmith run: {e}")
257
+
258
+ def get_run_metrics(self, run_id: str | UUID) -> dict[str, Any]:
259
+ """Get Headroom metrics for a specific run.
260
+
261
+ Args:
262
+ run_id: The run ID.
263
+
264
+ Returns:
265
+ Dictionary of headroom.* metrics for the run.
266
+ """
267
+ return self._run_metrics.get(str(run_id), {})
268
+
269
+ def get_all_metrics(self) -> dict[str, dict[str, Any]]:
270
+ """Get all recorded run metrics.
271
+
272
+ Returns:
273
+ Dictionary mapping run IDs to their metrics.
274
+ """
275
+ return self._run_metrics.copy()
276
+
277
+ def get_summary(self) -> dict[str, Any]:
278
+ """Get summary statistics across all runs.
279
+
280
+ Returns:
281
+ Summary with total runs, tokens saved, etc.
282
+ """
283
+ if not self._run_metrics:
284
+ return {
285
+ "total_runs": 0,
286
+ "total_tokens_saved": 0,
287
+ "average_savings_percent": 0,
288
+ }
289
+
290
+ total_saved = sum(m.get("headroom.tokens_saved", 0) for m in self._run_metrics.values())
291
+ savings_percents = [
292
+ m.get("headroom.savings_percent", 0) for m in self._run_metrics.values()
293
+ ]
294
+
295
+ return {
296
+ "total_runs": len(self._run_metrics),
297
+ "total_tokens_saved": total_saved,
298
+ "average_savings_percent": (
299
+ sum(savings_percents) / len(savings_percents) if savings_percents else 0
300
+ ),
301
+ }
302
+
303
+ def reset(self) -> None:
304
+ """Clear all recorded metrics."""
305
+ self._pending_metrics.clear()
306
+ self._run_metrics.clear()
307
+
308
+
309
+ def is_langsmith_available() -> bool:
310
+ """Check if LangSmith is available and configured.
311
+
312
+ Returns:
313
+ True if LangSmith is installed and API key is set.
314
+ """
315
+ return LANGSMITH_AVAILABLE and bool(os.environ.get("LANGCHAIN_API_KEY"))
316
+
317
+
318
+ def is_langsmith_tracing_enabled() -> bool:
319
+ """Check if LangSmith tracing is enabled.
320
+
321
+ Returns:
322
+ True if LANGCHAIN_TRACING_V2 is set to "true".
323
+ """
324
+ return os.environ.get("LANGCHAIN_TRACING_V2", "").lower() == "true"
headroom/integrations/langchain/memory.py ADDED
@@ -0,0 +1,319 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Memory integration for LangChain with automatic compression.
2
+
3
+ This module provides HeadroomChatMessageHistory, a wrapper for any LangChain
4
+ chat message history that automatically compresses conversation history
5
+ when it exceeds a token threshold.
6
+
7
+ Example:
8
+ from langchain.memory import ConversationBufferMemory
9
+ from langchain_community.chat_message_histories import ChatMessageHistory
10
+ from headroom.integrations import HeadroomChatMessageHistory
11
+
12
+ # Wrap any chat message history
13
+ base_history = ChatMessageHistory()
14
+ compressed_history = HeadroomChatMessageHistory(base_history)
15
+
16
+ # Use with ConversationBufferMemory (zero code changes to chain)
17
+ memory = ConversationBufferMemory(chat_memory=compressed_history)
18
+ """
19
+
20
+ from __future__ import annotations
21
+
22
+ import logging
23
+ from typing import TYPE_CHECKING, Any
24
+
25
+ if TYPE_CHECKING:
26
+ from headroom.providers.base import Provider
27
+
28
+ # LangChain imports - these are optional dependencies
29
+ try:
30
+ from langchain_core.chat_history import BaseChatMessageHistory
31
+ from langchain_core.messages import (
32
+ AIMessage,
33
+ BaseMessage,
34
+ HumanMessage,
35
+ SystemMessage,
36
+ ToolMessage,
37
+ )
38
+
39
+ LANGCHAIN_AVAILABLE = True
40
+ except ImportError:
41
+ LANGCHAIN_AVAILABLE = False
42
+ BaseChatMessageHistory = object # type: ignore[misc,assignment]
43
+
44
+ from headroom import HeadroomConfig
45
+ from headroom.config import RollingWindowConfig
46
+ from headroom.providers import OpenAIProvider
47
+ from headroom.transforms import TransformPipeline
48
+
49
+ logger = logging.getLogger(__name__)
50
+
51
+
52
+ def _check_langchain_available() -> None:
53
+ """Raise ImportError if LangChain is not installed."""
54
+ if not LANGCHAIN_AVAILABLE:
55
+ raise ImportError(
56
+ "LangChain is required for this integration. "
57
+ "Install with: pip install headroom[langchain] "
58
+ "or: pip install langchain-core"
59
+ )
60
+
61
+
62
+ class HeadroomChatMessageHistory(BaseChatMessageHistory):
63
+ """Wraps any LangChain chat message history with automatic compression.
64
+
65
+ When conversation history exceeds the token threshold, automatically
66
+ applies RollingWindow compression to keep recent turns while fitting
67
+ within the limit.
68
+
69
+ This works with ANY memory type because it wraps at the storage layer:
70
+ - ConversationBufferMemory
71
+ - ConversationSummaryMemory
72
+ - ConversationBufferWindowMemory
73
+ - Redis, PostgreSQL, or any custom history
74
+
75
+ Example:
76
+ from langchain.memory import ConversationBufferMemory
77
+ from langchain_community.chat_message_histories import ChatMessageHistory
78
+ from headroom.integrations import HeadroomChatMessageHistory
79
+
80
+ # Wrap base history
81
+ base = ChatMessageHistory()
82
+ compressed = HeadroomChatMessageHistory(
83
+ base,
84
+ compress_threshold_tokens=4000,
85
+ keep_recent_turns=5,
86
+ )
87
+
88
+ # Use with any memory class
89
+ memory = ConversationBufferMemory(chat_memory=compressed)
90
+
91
+ # Messages are compressed automatically when accessed
92
+ chain = ConversationChain(llm=llm, memory=memory)
93
+ chain.invoke({"input": "Hello!"})
94
+
95
+ Attributes:
96
+ base_history: The underlying chat message history
97
+ compress_threshold_tokens: Token count that triggers compression
98
+ keep_recent_turns: Minimum recent turns to always preserve
99
+ model: Model name for token counting (default: "gpt-4o")
100
+ """
101
+
102
+ def __init__(
103
+ self,
104
+ base_history: BaseChatMessageHistory,
105
+ compress_threshold_tokens: int = 4000,
106
+ keep_recent_turns: int = 5,
107
+ model: str = "gpt-4o",
108
+ provider: Provider | None = None,
109
+ ):
110
+ """Initialize HeadroomChatMessageHistory.
111
+
112
+ Args:
113
+ base_history: Any LangChain BaseChatMessageHistory to wrap
114
+ compress_threshold_tokens: Apply compression when history exceeds
115
+ this many tokens. Default 4000.
116
+ keep_recent_turns: Minimum number of recent user/assistant turns
117
+ to always preserve during compression. Default 5.
118
+ model: Model name for token counting. Default "gpt-4o".
119
+ provider: Headroom provider for token counting. Auto-uses
120
+ OpenAIProvider if not specified.
121
+ """
122
+ _check_langchain_available()
123
+
124
+ self._base = base_history
125
+ self._threshold = compress_threshold_tokens
126
+ self._keep_recent_turns = keep_recent_turns
127
+ self._model = model
128
+ self._provider: Provider = provider or OpenAIProvider()
129
+
130
+ # Track compression stats
131
+ self._compression_count = 0
132
+ self._total_tokens_saved = 0
133
+
134
+ @property
135
+ def messages(self) -> list[BaseMessage]:
136
+ """Get messages, applying compression if over threshold.
137
+
138
+ Returns:
139
+ List of messages, potentially compressed to fit within threshold.
140
+ """
141
+ raw_messages = self._base.messages
142
+
143
+ if not raw_messages:
144
+ return []
145
+
146
+ # Count tokens
147
+ token_count = self._count_tokens(raw_messages)
148
+
149
+ if token_count <= self._threshold:
150
+ return list(raw_messages)
151
+
152
+ # Apply compression
153
+ compressed = self._apply_rolling_window(raw_messages)
154
+ tokens_after = self._count_tokens(compressed)
155
+
156
+ self._compression_count += 1
157
+ self._total_tokens_saved += token_count - tokens_after
158
+
159
+ logger.info(
160
+ f"HeadroomChatMessageHistory compressed: {token_count} -> {tokens_after} tokens "
161
+ f"({len(raw_messages)} -> {len(compressed)} messages)"
162
+ )
163
+
164
+ return compressed
165
+
166
+ def add_message(self, message: BaseMessage) -> None:
167
+ """Add a message to the underlying history.
168
+
169
+ Args:
170
+ message: The message to add.
171
+ """
172
+ self._base.add_message(message)
173
+
174
+ def add_user_message(self, message: str) -> None:
175
+ """Add a user message to the history.
176
+
177
+ Args:
178
+ message: The user message content.
179
+ """
180
+ self._base.add_user_message(message)
181
+
182
+ def add_ai_message(self, message: str) -> None:
183
+ """Add an AI message to the history.
184
+
185
+ Args:
186
+ message: The AI message content.
187
+ """
188
+ self._base.add_ai_message(message)
189
+
190
+ def clear(self) -> None:
191
+ """Clear all messages from history."""
192
+ self._base.clear()
193
+
194
+ def _count_tokens(self, messages: list[BaseMessage]) -> int:
195
+ """Count tokens in messages using provider's tokenizer.
196
+
197
+ Args:
198
+ messages: List of messages to count.
199
+
200
+ Returns:
201
+ Total token count.
202
+ """
203
+ token_counter = self._provider.get_token_counter(self._model)
204
+ total = 0
205
+ for msg in messages:
206
+ content = msg.content if isinstance(msg.content, str) else str(msg.content)
207
+ total += token_counter.count_text(content)
208
+ return total
209
+
210
+ def _apply_rolling_window(self, messages: list[BaseMessage]) -> list[BaseMessage]:
211
+ """Apply RollingWindow compression to messages.
212
+
213
+ Args:
214
+ messages: Messages to compress.
215
+
216
+ Returns:
217
+ Compressed messages fitting within threshold.
218
+ """
219
+ # Convert to OpenAI format for Headroom transforms
220
+ openai_messages = self._convert_to_openai(messages)
221
+
222
+ # Use TransformPipeline which handles tokenizer setup
223
+ config = HeadroomConfig(
224
+ rolling_window=RollingWindowConfig(keep_last_turns=self._keep_recent_turns),
225
+ )
226
+ pipeline = TransformPipeline(config=config, provider=self._provider)
227
+
228
+ # Apply compression via pipeline
229
+ result = pipeline.apply(
230
+ messages=openai_messages,
231
+ model=self._model,
232
+ model_limit=self._threshold,
233
+ )
234
+
235
+ # Convert back to LangChain format
236
+ return self._convert_from_openai(result.messages)
237
+
238
+ def _convert_to_openai(self, messages: list[BaseMessage]) -> list[dict[str, Any]]:
239
+ """Convert LangChain messages to OpenAI format.
240
+
241
+ Args:
242
+ messages: LangChain messages.
243
+
244
+ Returns:
245
+ OpenAI format messages.
246
+ """
247
+ result = []
248
+ for msg in messages:
249
+ content = msg.content if isinstance(msg.content, str) else str(msg.content)
250
+
251
+ if isinstance(msg, SystemMessage):
252
+ result.append({"role": "system", "content": content})
253
+ elif isinstance(msg, HumanMessage):
254
+ result.append({"role": "user", "content": content})
255
+ elif isinstance(msg, AIMessage):
256
+ entry: dict[str, Any] = {"role": "assistant", "content": content}
257
+ if hasattr(msg, "tool_calls") and msg.tool_calls:
258
+ entry["tool_calls"] = msg.tool_calls
259
+ result.append(entry)
260
+ elif isinstance(msg, ToolMessage):
261
+ result.append(
262
+ {
263
+ "role": "tool",
264
+ "tool_call_id": getattr(msg, "tool_call_id", ""),
265
+ "content": content,
266
+ }
267
+ )
268
+ else:
269
+ # Generic fallback
270
+ result.append(
271
+ {
272
+ "role": getattr(msg, "type", "user"),
273
+ "content": content,
274
+ }
275
+ )
276
+ return result
277
+
278
+ def _convert_from_openai(self, messages: list[dict[str, Any]]) -> list[BaseMessage]:
279
+ """Convert OpenAI format back to LangChain messages.
280
+
281
+ Args:
282
+ messages: OpenAI format messages.
283
+
284
+ Returns:
285
+ LangChain messages.
286
+ """
287
+ result: list[BaseMessage] = []
288
+ for msg in messages:
289
+ role = msg.get("role", "user")
290
+ content = msg.get("content", "")
291
+
292
+ if role == "system":
293
+ result.append(SystemMessage(content=content))
294
+ elif role == "user":
295
+ result.append(HumanMessage(content=content))
296
+ elif role == "assistant":
297
+ tool_calls = msg.get("tool_calls", [])
298
+ result.append(AIMessage(content=content, tool_calls=tool_calls))
299
+ elif role == "tool":
300
+ result.append(
301
+ ToolMessage(
302
+ content=content,
303
+ tool_call_id=msg.get("tool_call_id", ""),
304
+ )
305
+ )
306
+ return result
307
+
308
+ def get_compression_stats(self) -> dict[str, Any]:
309
+ """Get statistics about compression operations.
310
+
311
+ Returns:
312
+ Dictionary with compression_count, total_tokens_saved.
313
+ """
314
+ return {
315
+ "compression_count": self._compression_count,
316
+ "total_tokens_saved": self._total_tokens_saved,
317
+ "threshold_tokens": self._threshold,
318
+ "keep_recent_turns": self._keep_recent_turns,
319
+ }
headroom/integrations/langchain/providers.py ADDED
@@ -0,0 +1,200 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Provider detection for LangChain models.
2
+
3
+ This module provides automatic provider detection from LangChain chat models
4
+ without requiring explicit provider imports. It uses duck-typing based on
5
+ class paths to identify the appropriate Headroom provider.
6
+
7
+ Example:
8
+ from langchain_anthropic import ChatAnthropic
9
+ from headroom.integrations.langchain import get_headroom_provider
10
+
11
+ model = ChatAnthropic(model="claude-3-5-sonnet-20241022")
12
+ provider = get_headroom_provider(model) # Returns AnthropicProvider
13
+ """
14
+
15
+ from __future__ import annotations
16
+
17
+ import logging
18
+ from typing import TYPE_CHECKING, Any
19
+
20
+ if TYPE_CHECKING:
21
+ from headroom.providers.base import Provider
22
+
23
+ logger = logging.getLogger(__name__)
24
+
25
+ # Provider detection patterns
26
+ # Maps provider name to list of class path patterns to match
27
+ PROVIDER_PATTERNS: dict[str, list[str]] = {
28
+ "openai": [
29
+ "langchain_openai.ChatOpenAI",
30
+ "langchain_openai.chat_models.ChatOpenAI",
31
+ "langchain_community.chat_models.ChatOpenAI",
32
+ "langchain.chat_models.ChatOpenAI",
33
+ "ChatOpenAI",
34
+ ],
35
+ "anthropic": [
36
+ "langchain_anthropic.ChatAnthropic",
37
+ "langchain_anthropic.chat_models.ChatAnthropic",
38
+ "langchain_community.chat_models.ChatAnthropic",
39
+ "langchain.chat_models.ChatAnthropic",
40
+ "ChatAnthropic",
41
+ ],
42
+ "google": [
43
+ "langchain_google_genai.ChatGoogleGenerativeAI",
44
+ "langchain_google_genai.chat_models.ChatGoogleGenerativeAI",
45
+ "langchain_community.chat_models.ChatGoogleGenerativeAI",
46
+ "ChatGoogleGenerativeAI",
47
+ # Also match Vertex AI
48
+ "langchain_google_vertexai.ChatVertexAI",
49
+ "ChatVertexAI",
50
+ ],
51
+ "cohere": [
52
+ "langchain_cohere.ChatCohere",
53
+ "langchain_community.chat_models.ChatCohere",
54
+ "ChatCohere",
55
+ ],
56
+ "mistral": [
57
+ "langchain_mistralai.ChatMistralAI",
58
+ "langchain_community.chat_models.ChatMistralAI",
59
+ "ChatMistralAI",
60
+ ],
61
+ }
62
+
63
+ # Model name patterns for fallback detection
64
+ MODEL_NAME_PATTERNS: dict[str, list[str]] = {
65
+ "anthropic": ["claude", "anthropic"],
66
+ "openai": ["gpt", "o1", "o3", "davinci", "turbo"],
67
+ "google": ["gemini", "palm", "bison"],
68
+ "cohere": ["command", "cohere"],
69
+ "mistral": ["mistral", "mixtral"],
70
+ }
71
+
72
+
73
+ def detect_provider(model: Any) -> str:
74
+ """Detect provider name from a LangChain model using duck-typing.
75
+
76
+ Detection strategy:
77
+ 1. Check class module and name against known patterns
78
+ 2. Check model_name attribute against known model patterns
79
+ 3. Fall back to "openai" as safe default
80
+
81
+ Args:
82
+ model: Any LangChain chat model instance
83
+
84
+ Returns:
85
+ Provider name string: "openai", "anthropic", "google", "cohere", "mistral"
86
+
87
+ Example:
88
+ >>> from langchain_anthropic import ChatAnthropic
89
+ >>> model = ChatAnthropic(model="claude-3-5-sonnet-20241022")
90
+ >>> detect_provider(model)
91
+ 'anthropic'
92
+ """
93
+ # Strategy 1: Check class path
94
+ class_module = getattr(model.__class__, "__module__", "")
95
+ class_name = model.__class__.__name__
96
+ class_path = f"{class_module}.{class_name}"
97
+
98
+ for provider_name, patterns in PROVIDER_PATTERNS.items():
99
+ for pattern in patterns:
100
+ if pattern in class_path or class_name == pattern.split(".")[-1]:
101
+ logger.debug(f"Detected provider '{provider_name}' from class path: {class_path}")
102
+ return provider_name
103
+
104
+ # Strategy 2: Check model_name attribute
105
+ model_name = _get_model_name(model)
106
+ if model_name:
107
+ model_name_lower = model_name.lower()
108
+ for provider_name, name_patterns in MODEL_NAME_PATTERNS.items():
109
+ for pattern in name_patterns:
110
+ if pattern in model_name_lower:
111
+ logger.debug(
112
+ f"Detected provider '{provider_name}' from model name: {model_name}"
113
+ )
114
+ return provider_name
115
+
116
+ # Strategy 3: Fall back to OpenAI (most common, safe default)
117
+ logger.debug(f"Could not detect provider for {class_path}, falling back to 'openai'")
118
+ return "openai"
119
+
120
+
121
+ def _get_model_name(model: Any) -> str | None:
122
+ """Extract model name from a LangChain model.
123
+
124
+ Tries common attribute names used by different LangChain models.
125
+ """
126
+ # Try common attribute names
127
+ for attr in ["model_name", "model", "model_id", "_model_name"]:
128
+ value = getattr(model, attr, None)
129
+ if isinstance(value, str):
130
+ return value
131
+
132
+ return None
133
+
134
+
135
+ def get_headroom_provider(model: Any) -> Provider:
136
+ """Get appropriate Headroom Provider instance for a LangChain model.
137
+
138
+ This function automatically detects the provider from the model type
139
+ and returns a configured Headroom provider for accurate token counting
140
+ and context limit detection.
141
+
142
+ Args:
143
+ model: Any LangChain chat model instance
144
+
145
+ Returns:
146
+ Configured Headroom Provider instance
147
+
148
+ Example:
149
+ >>> from langchain_anthropic import ChatAnthropic
150
+ >>> model = ChatAnthropic(model="claude-3-5-sonnet-20241022")
151
+ >>> provider = get_headroom_provider(model)
152
+ >>> provider.name
153
+ 'anthropic'
154
+ """
155
+ # Import providers lazily to avoid circular imports
156
+ from headroom.providers import (
157
+ AnthropicProvider,
158
+ GoogleProvider,
159
+ OpenAIProvider,
160
+ )
161
+
162
+ provider_name = detect_provider(model)
163
+
164
+ if provider_name == "anthropic":
165
+ return AnthropicProvider()
166
+ elif provider_name == "google":
167
+ return GoogleProvider()
168
+ # Cohere and Mistral fall back to OpenAI-compatible for now
169
+ # TODO: Add dedicated providers when needed
170
+
171
+ # Default to OpenAI
172
+ return OpenAIProvider()
173
+
174
+
175
+ def get_model_name_from_langchain(model: Any) -> str:
176
+ """Extract the model name string from a LangChain model.
177
+
178
+ Useful for getting the model identifier for token counting
179
+ and context limit lookup.
180
+
181
+ Args:
182
+ model: Any LangChain chat model instance
183
+
184
+ Returns:
185
+ Model name string (e.g., "gpt-4o", "claude-3-5-sonnet-20241022")
186
+ """
187
+ name = _get_model_name(model)
188
+ if name:
189
+ return name
190
+
191
+ # Try to infer from class name
192
+ class_name = model.__class__.__name__
193
+ if "GPT" in class_name or "OpenAI" in class_name:
194
+ return "gpt-4o" # Safe default for OpenAI
195
+ elif "Anthropic" in class_name or "Claude" in class_name:
196
+ return "claude-3-5-sonnet-20241022" # Safe default for Anthropic
197
+ elif "Google" in class_name or "Gemini" in class_name:
198
+ return "gemini-1.5-pro" # Safe default for Google
199
+
200
+ return "gpt-4o" # Ultimate fallback
headroom/integrations/langchain/retriever.py ADDED
@@ -0,0 +1,371 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Retriever integration for LangChain with intelligent document compression.
2
+
3
+ This module provides HeadroomDocumentCompressor, a LangChain BaseDocumentCompressor
4
+ that reduces retrieved documents based on relevance scoring while preserving
5
+ the most important information.
6
+
7
+ Example:
8
+ from langchain.retrievers import ContextualCompressionRetriever
9
+ from langchain_community.vectorstores import Chroma
10
+ from headroom.integrations import HeadroomDocumentCompressor
11
+
12
+ # Create vector store retriever
13
+ vectorstore = Chroma.from_documents(documents, embeddings)
14
+ base_retriever = vectorstore.as_retriever(search_kwargs={"k": 50})
15
+
16
+ # Wrap with Headroom compression
17
+ compressor = HeadroomDocumentCompressor(max_documents=10)
18
+ retriever = ContextualCompressionRetriever(
19
+ base_compressor=compressor,
20
+ base_retriever=base_retriever,
21
+ )
22
+
23
+ # Retrieve - automatically keeps most relevant documents
24
+ docs = retriever.invoke("What is the capital of France?")
25
+ """
26
+
27
+ from __future__ import annotations
28
+
29
+ import logging
30
+ import re
31
+ from collections.abc import Sequence
32
+ from dataclasses import dataclass
33
+ from typing import Any
34
+
35
+ # LangChain imports - these are optional dependencies
36
+ try:
37
+ from langchain_core.callbacks import Callbacks
38
+ from langchain_core.documents import Document
39
+
40
+ # BaseDocumentCompressor location varies by langchain version
41
+ try:
42
+ from langchain.retrievers.document_compressors import BaseDocumentCompressor
43
+ except ImportError:
44
+ try:
45
+ from langchain_core.documents.compressors import BaseDocumentCompressor
46
+ except ImportError:
47
+ # Fallback: create a minimal base class
48
+ class BaseDocumentCompressor: # type: ignore[no-redef]
49
+ """Minimal base class for document compression."""
50
+
51
+ def compress_documents(
52
+ self, documents: Sequence[Any], query: str, callbacks: Any = None
53
+ ) -> Sequence[Any]:
54
+ raise NotImplementedError
55
+
56
+ LANGCHAIN_AVAILABLE = True
57
+ except ImportError:
58
+ LANGCHAIN_AVAILABLE = False
59
+ BaseDocumentCompressor = object # type: ignore[misc,assignment]
60
+ Document = object # type: ignore[misc,assignment]
61
+ Callbacks = None # type: ignore[misc,assignment]
62
+
63
+ logger = logging.getLogger(__name__)
64
+
65
+
66
+ def _check_langchain_available() -> None:
67
+ """Raise ImportError if LangChain is not installed."""
68
+ if not LANGCHAIN_AVAILABLE:
69
+ raise ImportError(
70
+ "LangChain is required for this integration. "
71
+ "Install with: pip install headroom[langchain] "
72
+ "or: pip install langchain-core"
73
+ )
74
+
75
+
76
+ @dataclass
77
+ class CompressionMetrics:
78
+ """Metrics from document compression."""
79
+
80
+ documents_before: int
81
+ documents_after: int
82
+ documents_removed: int
83
+ relevance_scores: list[float]
84
+
85
+
86
+ class HeadroomDocumentCompressor(BaseDocumentCompressor):
87
+ """Compresses retrieved documents based on relevance to query.
88
+
89
+ Uses BM25-style relevance scoring to keep only the most relevant
90
+ documents from a larger retrieval set. This allows you to retrieve
91
+ many documents initially (for recall) and then compress down to
92
+ the most relevant ones (for precision).
93
+
94
+ Works with LangChain's ContextualCompressionRetriever pattern.
95
+
96
+ Example:
97
+ from langchain.retrievers import ContextualCompressionRetriever
98
+ from headroom.integrations import HeadroomDocumentCompressor
99
+
100
+ compressor = HeadroomDocumentCompressor(
101
+ max_documents=10,
102
+ min_relevance=0.3,
103
+ )
104
+
105
+ retriever = ContextualCompressionRetriever(
106
+ base_compressor=compressor,
107
+ base_retriever=base_retriever, # Any retriever
108
+ )
109
+
110
+ # Retrieves top 10 most relevant docs
111
+ docs = retriever.invoke("What is Python?")
112
+
113
+ Attributes:
114
+ max_documents: Maximum documents to return
115
+ min_relevance: Minimum relevance score (0-1) to include
116
+ prefer_diverse: Whether to prefer diverse results
117
+ """
118
+
119
+ max_documents: int = 10
120
+ min_relevance: float = 0.0
121
+ prefer_diverse: bool = False
122
+
123
+ def __init__(
124
+ self,
125
+ max_documents: int = 10,
126
+ min_relevance: float = 0.0,
127
+ prefer_diverse: bool = False,
128
+ **kwargs: Any,
129
+ ):
130
+ """Initialize HeadroomDocumentCompressor.
131
+
132
+ Args:
133
+ max_documents: Maximum number of documents to return. Default 10.
134
+ min_relevance: Minimum relevance score (0-1) for a document to
135
+ be included. Default 0.0 (no minimum).
136
+ prefer_diverse: If True, use MMR-style selection to prefer
137
+ diverse results over pure relevance. Default False.
138
+ **kwargs: Additional arguments for BaseDocumentCompressor.
139
+ """
140
+ _check_langchain_available()
141
+
142
+ super().__init__(**kwargs)
143
+ self.max_documents = max_documents
144
+ self.min_relevance = min_relevance
145
+ self.prefer_diverse = prefer_diverse
146
+ self._last_metrics: CompressionMetrics | None = None
147
+
148
+ def compress_documents(
149
+ self,
150
+ documents: Sequence[Document],
151
+ query: str,
152
+ callbacks: Callbacks = None,
153
+ ) -> Sequence[Document]:
154
+ """Compress documents based on relevance to query.
155
+
156
+ Args:
157
+ documents: Documents to compress.
158
+ query: Query to score relevance against.
159
+ callbacks: LangChain callbacks (unused).
160
+
161
+ Returns:
162
+ Compressed list of most relevant documents.
163
+ """
164
+ if not documents:
165
+ self._last_metrics = CompressionMetrics(
166
+ documents_before=0,
167
+ documents_after=0,
168
+ documents_removed=0,
169
+ relevance_scores=[],
170
+ )
171
+ return []
172
+
173
+ if len(documents) <= self.max_documents:
174
+ # No compression needed
175
+ scores = [self._score_document(doc, query) for doc in documents]
176
+ self._last_metrics = CompressionMetrics(
177
+ documents_before=len(documents),
178
+ documents_after=len(documents),
179
+ documents_removed=0,
180
+ relevance_scores=scores,
181
+ )
182
+ return list(documents)
183
+
184
+ # Score all documents
185
+ scored = [(doc, self._score_document(doc, query)) for doc in documents]
186
+
187
+ if self.prefer_diverse:
188
+ # Use MMR-style selection for diversity
189
+ selected = self._select_diverse(scored, query)
190
+ else:
191
+ # Sort by relevance score
192
+ scored.sort(key=lambda x: x[1], reverse=True)
193
+ selected = scored[: self.max_documents]
194
+
195
+ # Filter by minimum relevance
196
+ if self.min_relevance > 0:
197
+ selected = [(doc, score) for doc, score in selected if score >= self.min_relevance]
198
+
199
+ # Track metrics
200
+ final_docs = [doc for doc, _ in selected]
201
+ final_scores = [score for _, score in selected]
202
+
203
+ self._last_metrics = CompressionMetrics(
204
+ documents_before=len(documents),
205
+ documents_after=len(final_docs),
206
+ documents_removed=len(documents) - len(final_docs),
207
+ relevance_scores=final_scores,
208
+ )
209
+
210
+ logger.info(
211
+ f"HeadroomDocumentCompressor: {len(documents)} -> {len(final_docs)} documents "
212
+ f"(avg relevance: {sum(final_scores) / len(final_scores) if final_scores else 0:.2f})"
213
+ )
214
+
215
+ return final_docs
216
+
217
+ def _score_document(self, doc: Document, query: str) -> float:
218
+ """Score a document's relevance to the query using BM25-style scoring.
219
+
220
+ Args:
221
+ doc: Document to score.
222
+ query: Query to compare against.
223
+
224
+ Returns:
225
+ Relevance score between 0 and 1.
226
+ """
227
+ content = doc.page_content.lower()
228
+ query_lower = query.lower()
229
+
230
+ # Tokenize
231
+ query_terms = self._tokenize(query_lower)
232
+ doc_terms = self._tokenize(content)
233
+
234
+ if not query_terms or not doc_terms:
235
+ return 0.0
236
+
237
+ # BM25-style scoring
238
+ k1 = 1.5
239
+ b = 0.75
240
+ avg_dl = 100 # Assume average document length
241
+
242
+ doc_len = len(doc_terms)
243
+ term_freqs: dict[str, int] = {}
244
+ for term in doc_terms:
245
+ term_freqs[term] = term_freqs.get(term, 0) + 1
246
+
247
+ score = 0.0
248
+ for term in query_terms:
249
+ if term in term_freqs:
250
+ tf = term_freqs[term]
251
+ # Simplified BM25 (without IDF since we don't have corpus stats)
252
+ numerator = tf * (k1 + 1)
253
+ denominator = tf + k1 * (1 - b + b * (doc_len / avg_dl))
254
+ score += numerator / denominator
255
+
256
+ # Normalize to 0-1 range
257
+ max_possible = len(query_terms) * (k1 + 1)
258
+ normalized = score / max_possible if max_possible > 0 else 0.0
259
+
260
+ # Boost for exact phrase matches
261
+ if query_lower in content:
262
+ normalized = min(1.0, normalized + 0.3)
263
+
264
+ return min(1.0, normalized)
265
+
266
+ def _tokenize(self, text: str) -> list[str]:
267
+ """Tokenize text into terms.
268
+
269
+ Args:
270
+ text: Text to tokenize.
271
+
272
+ Returns:
273
+ List of tokens.
274
+ """
275
+ # Simple tokenization: split on non-alphanumeric, filter short terms
276
+ tokens = re.findall(r"\b\w+\b", text)
277
+ return [t for t in tokens if len(t) > 1]
278
+
279
+ def _select_diverse(
280
+ self, scored_docs: list[tuple[Document, float]], query: str
281
+ ) -> list[tuple[Document, float]]:
282
+ """Select diverse documents using MMR-style approach.
283
+
284
+ Balances relevance with diversity to avoid redundant results.
285
+
286
+ Args:
287
+ scored_docs: List of (document, relevance_score) tuples.
288
+ query: Original query.
289
+
290
+ Returns:
291
+ Selected documents with diversity considered.
292
+ """
293
+ if not scored_docs:
294
+ return []
295
+
296
+ # Sort by initial relevance
297
+ scored_docs = sorted(scored_docs, key=lambda x: x[1], reverse=True)
298
+
299
+ # Start with most relevant
300
+ selected = [scored_docs[0]]
301
+ remaining = scored_docs[1:]
302
+
303
+ lambda_param = 0.5 # Balance between relevance and diversity
304
+
305
+ while len(selected) < self.max_documents and remaining:
306
+ best_score = -1.0
307
+ best_idx = 0
308
+
309
+ for i, (doc, rel_score) in enumerate(remaining):
310
+ # Calculate max similarity to already selected docs
311
+ max_sim = max(self._document_similarity(doc, sel_doc) for sel_doc, _ in selected)
312
+
313
+ # MMR score: lambda * relevance - (1-lambda) * max_similarity
314
+ mmr_score = lambda_param * rel_score - (1 - lambda_param) * max_sim
315
+
316
+ if mmr_score > best_score:
317
+ best_score = mmr_score
318
+ best_idx = i
319
+
320
+ selected.append(remaining[best_idx])
321
+ remaining.pop(best_idx)
322
+
323
+ return selected
324
+
325
+ def _document_similarity(self, doc1: Document, doc2: Document) -> float:
326
+ """Calculate similarity between two documents.
327
+
328
+ Uses Jaccard similarity on terms for simplicity.
329
+
330
+ Args:
331
+ doc1: First document.
332
+ doc2: Second document.
333
+
334
+ Returns:
335
+ Similarity score between 0 and 1.
336
+ """
337
+ terms1 = set(self._tokenize(doc1.page_content.lower()))
338
+ terms2 = set(self._tokenize(doc2.page_content.lower()))
339
+
340
+ if not terms1 or not terms2:
341
+ return 0.0
342
+
343
+ intersection = len(terms1 & terms2)
344
+ union = len(terms1 | terms2)
345
+
346
+ return intersection / union if union > 0 else 0.0
347
+
348
+ @property
349
+ def last_metrics(self) -> CompressionMetrics | None:
350
+ """Get metrics from the last compression operation."""
351
+ return self._last_metrics
352
+
353
+ def get_compression_stats(self) -> dict[str, Any]:
354
+ """Get statistics from the last compression.
355
+
356
+ Returns:
357
+ Dictionary with compression metrics, or empty if no compression yet.
358
+ """
359
+ if self._last_metrics is None:
360
+ return {}
361
+
362
+ return {
363
+ "documents_before": self._last_metrics.documents_before,
364
+ "documents_after": self._last_metrics.documents_after,
365
+ "documents_removed": self._last_metrics.documents_removed,
366
+ "average_relevance": (
367
+ sum(self._last_metrics.relevance_scores) / len(self._last_metrics.relevance_scores)
368
+ if self._last_metrics.relevance_scores
369
+ else 0.0
370
+ ),
371
+ }
headroom/integrations/langchain/streaming.py ADDED
@@ -0,0 +1,341 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Streaming metrics tracking for LangChain.
2
+
3
+ This module provides StreamingMetricsTracker for tracking output tokens
4
+ during streaming responses from LangChain models.
5
+
6
+ Example:
7
+ from langchain_openai import ChatOpenAI
8
+ from headroom.integrations import HeadroomChatModel, StreamingMetricsTracker
9
+
10
+ llm = HeadroomChatModel(ChatOpenAI(model="gpt-4o"))
11
+ tracker = StreamingMetricsTracker(model="gpt-4o")
12
+
13
+ for chunk in llm.stream("Tell me a story"):
14
+ tracker.add_chunk(chunk)
15
+ print(chunk.content, end="", flush=True)
16
+
17
+ print(f"\\nOutput tokens: {tracker.output_tokens}")
18
+ """
19
+
20
+ from __future__ import annotations
21
+
22
+ import logging
23
+ from dataclasses import dataclass
24
+ from datetime import datetime
25
+ from typing import Any
26
+
27
+ # LangChain imports - these are optional dependencies
28
+ try:
29
+ from langchain_core.messages import AIMessageChunk
30
+ from langchain_core.outputs import ChatGenerationChunk
31
+
32
+ LANGCHAIN_AVAILABLE = True
33
+ except ImportError:
34
+ LANGCHAIN_AVAILABLE = False
35
+ AIMessageChunk = object # type: ignore[misc,assignment]
36
+ ChatGenerationChunk = object # type: ignore[misc,assignment]
37
+
38
+ from headroom.providers import OpenAIProvider
39
+
40
+ logger = logging.getLogger(__name__)
41
+
42
+
43
+ def _check_langchain_available() -> None:
44
+ """Raise ImportError if LangChain is not installed."""
45
+ if not LANGCHAIN_AVAILABLE:
46
+ raise ImportError(
47
+ "LangChain is required for this integration. "
48
+ "Install with: pip install headroom[langchain] "
49
+ "or: pip install langchain-core"
50
+ )
51
+
52
+
53
+ @dataclass
54
+ class StreamingMetrics:
55
+ """Metrics from a streaming response."""
56
+
57
+ output_tokens: int
58
+ chunk_count: int
59
+ content_length: int
60
+ start_time: datetime
61
+ end_time: datetime | None
62
+ duration_ms: float | None
63
+
64
+ def to_dict(self) -> dict[str, Any]:
65
+ """Convert to dictionary."""
66
+ return {
67
+ "output_tokens": self.output_tokens,
68
+ "chunk_count": self.chunk_count,
69
+ "content_length": self.content_length,
70
+ "start_time": self.start_time.isoformat(),
71
+ "end_time": self.end_time.isoformat() if self.end_time else None,
72
+ "duration_ms": self.duration_ms,
73
+ }
74
+
75
+
76
+ class StreamingMetricsTracker:
77
+ """Tracks output tokens and metrics during streaming.
78
+
79
+ Accumulates content from streaming chunks and provides accurate
80
+ token counting for the streamed output.
81
+
82
+ Example:
83
+ tracker = StreamingMetricsTracker(model="gpt-4o")
84
+
85
+ async for chunk in llm.astream(messages):
86
+ tracker.add_chunk(chunk)
87
+ print(chunk.content, end="")
88
+
89
+ print(f"\\nTokens: {tracker.output_tokens}")
90
+ print(f"Duration: {tracker.duration_ms}ms")
91
+
92
+ Attributes:
93
+ model: Model name for token counting
94
+ content: Accumulated content from all chunks
95
+ output_tokens: Estimated token count for output
96
+ chunk_count: Number of chunks received
97
+ """
98
+
99
+ def __init__(
100
+ self,
101
+ model: str = "gpt-4o",
102
+ provider: Any = None,
103
+ ):
104
+ """Initialize StreamingMetricsTracker.
105
+
106
+ Args:
107
+ model: Model name for token counting. Default "gpt-4o".
108
+ provider: Headroom provider for token counting. Uses
109
+ OpenAIProvider if not specified.
110
+ """
111
+ _check_langchain_available()
112
+
113
+ self._model = model
114
+ self._provider = provider or OpenAIProvider()
115
+ self._content = ""
116
+ self._chunk_count = 0
117
+ self._start_time: datetime | None = None
118
+ self._end_time: datetime | None = None
119
+
120
+ def add_chunk(self, chunk: Any) -> None:
121
+ """Add a streaming chunk to the tracker.
122
+
123
+ Extracts content from various chunk types:
124
+ - AIMessageChunk
125
+ - ChatGenerationChunk
126
+ - dict with 'content' key
127
+ - string
128
+
129
+ Args:
130
+ chunk: Streaming chunk from LangChain.
131
+ """
132
+ if self._start_time is None:
133
+ self._start_time = datetime.now()
134
+
135
+ self._chunk_count += 1
136
+
137
+ # Extract content from various chunk types
138
+ content = self._extract_content(chunk)
139
+ if content:
140
+ self._content += content
141
+
142
+ def _extract_content(self, chunk: Any) -> str:
143
+ """Extract string content from a chunk.
144
+
145
+ Args:
146
+ chunk: Streaming chunk of various types.
147
+
148
+ Returns:
149
+ Extracted content string.
150
+ """
151
+ # AIMessageChunk
152
+ if hasattr(chunk, "content"):
153
+ content = chunk.content
154
+ if isinstance(content, str):
155
+ return content
156
+ return str(content) if content else ""
157
+
158
+ # ChatGenerationChunk
159
+ if hasattr(chunk, "message") and hasattr(chunk.message, "content"):
160
+ content = chunk.message.content
161
+ if isinstance(content, str):
162
+ return content
163
+ return str(content) if content else ""
164
+
165
+ # dict
166
+ if isinstance(chunk, dict):
167
+ return str(chunk.get("content", ""))
168
+
169
+ # string
170
+ if isinstance(chunk, str):
171
+ return chunk
172
+
173
+ return ""
174
+
175
+ def finish(self) -> StreamingMetrics:
176
+ """Mark streaming as complete and return final metrics.
177
+
178
+ Returns:
179
+ StreamingMetrics with final values.
180
+ """
181
+ self._end_time = datetime.now()
182
+
183
+ duration_ms = None
184
+ if self._start_time:
185
+ duration_ms = (self._end_time - self._start_time).total_seconds() * 1000
186
+
187
+ return StreamingMetrics(
188
+ output_tokens=self.output_tokens,
189
+ chunk_count=self._chunk_count,
190
+ content_length=len(self._content),
191
+ start_time=self._start_time or self._end_time,
192
+ end_time=self._end_time,
193
+ duration_ms=duration_ms,
194
+ )
195
+
196
+ @property
197
+ def content(self) -> str:
198
+ """Get accumulated content."""
199
+ return self._content
200
+
201
+ @property
202
+ def output_tokens(self) -> int:
203
+ """Get estimated output token count."""
204
+ if not self._content:
205
+ return 0
206
+ token_counter = self._provider.get_token_counter(self._model)
207
+ return token_counter.count_text(self._content)
208
+
209
+ @property
210
+ def chunk_count(self) -> int:
211
+ """Get number of chunks received."""
212
+ return self._chunk_count
213
+
214
+ @property
215
+ def duration_ms(self) -> float | None:
216
+ """Get duration in milliseconds (after finish())."""
217
+ if self._start_time is None or self._end_time is None:
218
+ return None
219
+ return (self._end_time - self._start_time).total_seconds() * 1000
220
+
221
+ def reset(self) -> None:
222
+ """Reset tracker for reuse."""
223
+ self._content = ""
224
+ self._chunk_count = 0
225
+ self._start_time = None
226
+ self._end_time = None
227
+
228
+
229
+ class StreamingMetricsCallback:
230
+ """Context manager for tracking streaming metrics.
231
+
232
+ Provides a clean interface for tracking a complete streaming
233
+ response with automatic timing.
234
+
235
+ Example:
236
+ with StreamingMetricsCallback(model="gpt-4o") as tracker:
237
+ for chunk in llm.stream(messages):
238
+ tracker.add_chunk(chunk)
239
+ print(chunk.content, end="")
240
+
241
+ print(f"\\nMetrics: {tracker.metrics}")
242
+
243
+ Attributes:
244
+ tracker: The underlying StreamingMetricsTracker
245
+ metrics: Final metrics after context exit
246
+ """
247
+
248
+ def __init__(self, model: str = "gpt-4o", provider: Any = None):
249
+ """Initialize StreamingMetricsCallback.
250
+
251
+ Args:
252
+ model: Model name for token counting.
253
+ provider: Headroom provider for token counting.
254
+ """
255
+ self._tracker = StreamingMetricsTracker(model=model, provider=provider)
256
+ self._metrics: StreamingMetrics | None = None
257
+
258
+ def __enter__(self) -> StreamingMetricsTracker:
259
+ """Enter context, return tracker."""
260
+ return self._tracker
261
+
262
+ def __exit__(self, exc_type: Any, exc_val: Any, exc_tb: Any) -> None:
263
+ """Exit context, finalize metrics."""
264
+ self._metrics = self._tracker.finish()
265
+
266
+ @property
267
+ def tracker(self) -> StreamingMetricsTracker:
268
+ """Get the tracker."""
269
+ return self._tracker
270
+
271
+ @property
272
+ def metrics(self) -> StreamingMetrics | None:
273
+ """Get final metrics (after context exit)."""
274
+ return self._metrics
275
+
276
+
277
+ def track_streaming_response(
278
+ stream: Any,
279
+ model: str = "gpt-4o",
280
+ provider: Any = None,
281
+ ) -> tuple[str, StreamingMetrics]:
282
+ """Track a complete streaming response.
283
+
284
+ Convenience function that consumes a stream and returns the
285
+ accumulated content and metrics.
286
+
287
+ Args:
288
+ stream: Iterable of streaming chunks.
289
+ model: Model name for token counting.
290
+ provider: Headroom provider for token counting.
291
+
292
+ Returns:
293
+ Tuple of (accumulated_content, metrics).
294
+
295
+ Example:
296
+ content, metrics = track_streaming_response(
297
+ llm.stream(messages),
298
+ model="gpt-4o"
299
+ )
300
+ print(f"Content: {content}")
301
+ print(f"Tokens: {metrics.output_tokens}")
302
+ """
303
+ tracker = StreamingMetricsTracker(model=model, provider=provider)
304
+
305
+ for chunk in stream:
306
+ tracker.add_chunk(chunk)
307
+
308
+ metrics = tracker.finish()
309
+ return tracker.content, metrics
310
+
311
+
312
+ async def track_async_streaming_response(
313
+ stream: Any,
314
+ model: str = "gpt-4o",
315
+ provider: Any = None,
316
+ ) -> tuple[str, StreamingMetrics]:
317
+ """Track a complete async streaming response.
318
+
319
+ Async version of track_streaming_response.
320
+
321
+ Args:
322
+ stream: Async iterable of streaming chunks.
323
+ model: Model name for token counting.
324
+ provider: Headroom provider for token counting.
325
+
326
+ Returns:
327
+ Tuple of (accumulated_content, metrics).
328
+
329
+ Example:
330
+ content, metrics = await track_async_streaming_response(
331
+ llm.astream(messages),
332
+ model="gpt-4o"
333
+ )
334
+ """
335
+ tracker = StreamingMetricsTracker(model=model, provider=provider)
336
+
337
+ async for chunk in stream:
338
+ tracker.add_chunk(chunk)
339
+
340
+ metrics = tracker.finish()
341
+ return tracker.content, metrics
headroom/integrations/mcp/__init__.py ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """MCP (Model Context Protocol) integration for Headroom.
2
+
3
+ This package provides compression utilities for MCP tool results,
4
+ helping reduce context usage when tools return large outputs.
5
+
6
+ Example:
7
+ from headroom.integrations.mcp import compress_tool_result
8
+
9
+ # Compress large tool output
10
+ result = compress_tool_result(
11
+ tool_name="search",
12
+ result=large_json_result,
13
+ max_chars=5000,
14
+ )
15
+ """
16
+
17
+ from .server import (
18
+ DEFAULT_MCP_PROFILES,
19
+ HeadroomMCPClientWrapper,
20
+ HeadroomMCPCompressor,
21
+ MCPCompressionResult,
22
+ MCPToolProfile,
23
+ compress_tool_result,
24
+ compress_tool_result_with_metrics,
25
+ create_headroom_mcp_proxy,
26
+ )
27
+
28
+ __all__ = [
29
+ "HeadroomMCPCompressor",
30
+ "HeadroomMCPClientWrapper",
31
+ "MCPCompressionResult",
32
+ "MCPToolProfile",
33
+ "compress_tool_result",
34
+ "compress_tool_result_with_metrics",
35
+ "create_headroom_mcp_proxy",
36
+ "DEFAULT_MCP_PROFILES",
37
+ ]
headroom/integrations/{mcp.py → mcp/server.py} RENAMED
File without changes
headroom/transforms/llmlingua_compressor.py CHANGED
@@ -88,7 +88,8 @@ def _get_llmlingua_compressor(model_name: str, device: str) -> Any:
88
  from llmlingua import PromptCompressor
89
 
90
  logger.info(
91
- "Loading LLMLingua-2 model: %s on device: %s (this may take 10-30s on first run)",
 
92
  model_name,
93
  device,
94
  )
 
88
  from llmlingua import PromptCompressor
89
 
90
  logger.info(
91
+ "Loading LLMLingua-2 model: %s on device: %s "
92
+ "(this may take 10-30s on first run)",
93
  model_name,
94
  device,
95
  )
pyproject.toml CHANGED
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
4
 
5
  [project]
6
  name = "headroom-ai"
7
- version = "0.2.2"
8
  description = "The Context Optimization Layer for LLM Applications - Cut costs by 50-90%"
9
  readme = "README.md"
10
  license = "Apache-2.0"
 
4
 
5
  [project]
6
  name = "headroom-ai"
7
+ version = "0.2.3"
8
  description = "The Context Optimization Layer for LLM Applications - Cut costs by 50-90%"
9
  readme = "README.md"
10
  license = "Apache-2.0"
tests/test_integrations/langchain/__init__.py ADDED
File without changes
tests/test_integrations/{test_langchain.py → langchain/test_chat_model.py} RENAMED
@@ -488,7 +488,7 @@ class TestOptimizeMessages:
488
  """Basic message optimization."""
489
  from headroom.integrations import optimize_messages
490
 
491
- with patch("headroom.integrations.langchain.TransformPipeline") as MockPipeline:
492
  mock_instance = MagicMock()
493
  mock_result = MagicMock()
494
  mock_result.messages = [
@@ -513,7 +513,7 @@ class TestOptimizeMessages:
513
 
514
  config = HeadroomConfig(default_mode=HeadroomMode.AUDIT)
515
 
516
- with patch("headroom.integrations.langchain.TransformPipeline") as MockPipeline:
517
  mock_instance = MagicMock()
518
  mock_result = MagicMock()
519
  mock_result.messages = []
@@ -547,7 +547,7 @@ class TestOptimizeMessages:
547
  ToolMessage(content="Sunny", tool_call_id="1"),
548
  ]
549
 
550
- with patch("headroom.integrations.langchain.TransformPipeline") as MockPipeline:
551
  mock_instance = MagicMock()
552
  mock_result = MagicMock()
553
  mock_result.messages = [
 
488
  """Basic message optimization."""
489
  from headroom.integrations import optimize_messages
490
 
491
+ with patch("headroom.integrations.langchain.chat_model.TransformPipeline") as MockPipeline:
492
  mock_instance = MagicMock()
493
  mock_result = MagicMock()
494
  mock_result.messages = [
 
513
 
514
  config = HeadroomConfig(default_mode=HeadroomMode.AUDIT)
515
 
516
+ with patch("headroom.integrations.langchain.chat_model.TransformPipeline") as MockPipeline:
517
  mock_instance = MagicMock()
518
  mock_result = MagicMock()
519
  mock_result.messages = []
 
547
  ToolMessage(content="Sunny", tool_call_id="1"),
548
  ]
549
 
550
+ with patch("headroom.integrations.langchain.chat_model.TransformPipeline") as MockPipeline:
551
  mock_instance = MagicMock()
552
  mock_result = MagicMock()
553
  mock_result.messages = [
tests/test_integrations/{test_langchain_evals.py → langchain/test_evals.py} RENAMED
File without changes
tests/test_integrations/langchain/test_extended.py ADDED
@@ -0,0 +1,646 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Tests for extended LangChain integration modules.
2
+
3
+ Tests cover:
4
+ 1. langchain_providers - Provider auto-detection
5
+ 2. langchain_memory - HeadroomChatMessageHistory
6
+ 3. langchain_retriever - HeadroomDocumentCompressor
7
+ 4. langchain_agents - HeadroomToolWrapper
8
+ 5. langchain_langsmith - LangSmith integration
9
+ 6. langchain_streaming - Streaming metrics
10
+ """
11
+
12
+ import json
13
+ from unittest.mock import MagicMock
14
+
15
+ import pytest
16
+
17
+ # Check if LangChain is available
18
+ try:
19
+ from langchain_core.documents import Document
20
+ from langchain_core.messages import AIMessage, HumanMessage
21
+ from langchain_core.tools import StructuredTool
22
+
23
+ LANGCHAIN_AVAILABLE = True
24
+ except ImportError:
25
+ LANGCHAIN_AVAILABLE = False
26
+
27
+ # Skip all tests if LangChain not installed
28
+ pytestmark = pytest.mark.skipif(not LANGCHAIN_AVAILABLE, reason="LangChain not installed")
29
+
30
+
31
+ class TestProviderDetection:
32
+ """Tests for langchain_providers module."""
33
+
34
+ def test_detect_openai_provider(self):
35
+ """Detect OpenAI from ChatOpenAI class."""
36
+ from headroom.integrations.langchain.providers import detect_provider
37
+
38
+ mock_model = MagicMock()
39
+ mock_model.__class__.__name__ = "ChatOpenAI"
40
+ mock_model.__class__.__module__ = "langchain_openai.chat_models"
41
+
42
+ provider = detect_provider(mock_model)
43
+ assert provider == "openai"
44
+
45
+ def test_detect_anthropic_provider(self):
46
+ """Detect Anthropic from ChatAnthropic class."""
47
+ from headroom.integrations.langchain.providers import detect_provider
48
+
49
+ mock_model = MagicMock()
50
+ mock_model.__class__.__name__ = "ChatAnthropic"
51
+ mock_model.__class__.__module__ = "langchain_anthropic.chat_models"
52
+
53
+ provider = detect_provider(mock_model)
54
+ assert provider == "anthropic"
55
+
56
+ def test_detect_google_provider(self):
57
+ """Detect Google from ChatGoogleGenerativeAI class."""
58
+ from headroom.integrations.langchain.providers import detect_provider
59
+
60
+ mock_model = MagicMock()
61
+ mock_model.__class__.__name__ = "ChatGoogleGenerativeAI"
62
+ mock_model.__class__.__module__ = "langchain_google_genai"
63
+
64
+ provider = detect_provider(mock_model)
65
+ assert provider == "google"
66
+
67
+ def test_detect_fallback_to_openai(self):
68
+ """Fall back to OpenAI for unknown models."""
69
+ from headroom.integrations.langchain.providers import detect_provider
70
+
71
+ mock_model = MagicMock()
72
+ mock_model.__class__.__name__ = "CustomChatModel"
73
+ mock_model.__class__.__module__ = "my_custom_module"
74
+
75
+ provider = detect_provider(mock_model)
76
+ assert provider == "openai"
77
+
78
+ def test_detect_from_model_name_claude(self):
79
+ """Detect Anthropic from model name containing 'claude'."""
80
+ from headroom.integrations.langchain.providers import detect_provider
81
+
82
+ mock_model = MagicMock()
83
+ mock_model.__class__.__name__ = "CustomModel"
84
+ mock_model.__class__.__module__ = "custom"
85
+ mock_model.model_name = "claude-3-5-sonnet-20241022"
86
+
87
+ provider = detect_provider(mock_model)
88
+ assert provider == "anthropic"
89
+
90
+ def test_get_headroom_provider_openai(self):
91
+ """Get OpenAIProvider for OpenAI model."""
92
+ from headroom.integrations.langchain.providers import get_headroom_provider
93
+ from headroom.providers import OpenAIProvider
94
+
95
+ mock_model = MagicMock()
96
+ mock_model.__class__.__name__ = "ChatOpenAI"
97
+ mock_model.__class__.__module__ = "langchain_openai"
98
+
99
+ provider = get_headroom_provider(mock_model)
100
+ assert isinstance(provider, OpenAIProvider)
101
+
102
+ def test_get_headroom_provider_anthropic(self):
103
+ """Get AnthropicProvider for Anthropic model."""
104
+ from headroom.integrations.langchain.providers import get_headroom_provider
105
+ from headroom.providers import AnthropicProvider
106
+
107
+ mock_model = MagicMock()
108
+ mock_model.__class__.__name__ = "ChatAnthropic"
109
+ mock_model.__class__.__module__ = "langchain_anthropic"
110
+
111
+ provider = get_headroom_provider(mock_model)
112
+ assert isinstance(provider, AnthropicProvider)
113
+
114
+ def test_get_model_name_from_langchain(self):
115
+ """Extract model name from LangChain model."""
116
+ from headroom.integrations.langchain.providers import get_model_name_from_langchain
117
+
118
+ mock_model = MagicMock()
119
+ mock_model.model_name = "gpt-4o"
120
+
121
+ name = get_model_name_from_langchain(mock_model)
122
+ assert name == "gpt-4o"
123
+
124
+ def test_get_model_name_fallback(self):
125
+ """Fall back when model name not available."""
126
+ from headroom.integrations.langchain.providers import get_model_name_from_langchain
127
+
128
+ mock_model = MagicMock(spec=[])
129
+ mock_model.__class__.__name__ = "ChatOpenAI"
130
+
131
+ name = get_model_name_from_langchain(mock_model)
132
+ assert name == "gpt-4o" # Default for OpenAI
133
+
134
+
135
+ class TestHeadroomChatMessageHistory:
136
+ """Tests for HeadroomChatMessageHistory memory wrapper."""
137
+
138
+ def test_init(self):
139
+ """Initialize with base history."""
140
+ from headroom.integrations.langchain.memory import HeadroomChatMessageHistory
141
+
142
+ mock_history = MagicMock()
143
+ mock_history.messages = []
144
+
145
+ wrapper = HeadroomChatMessageHistory(
146
+ mock_history,
147
+ compress_threshold_tokens=4000,
148
+ keep_recent_turns=5,
149
+ )
150
+
151
+ assert wrapper._base is mock_history
152
+ assert wrapper._threshold == 4000
153
+ assert wrapper._keep_recent_turns == 5
154
+
155
+ def test_messages_passthrough_under_threshold(self):
156
+ """Messages pass through when under threshold."""
157
+ from headroom.integrations.langchain.memory import HeadroomChatMessageHistory
158
+
159
+ mock_history = MagicMock()
160
+ mock_history.messages = [
161
+ HumanMessage(content="Hello"),
162
+ AIMessage(content="Hi there!"),
163
+ ]
164
+
165
+ wrapper = HeadroomChatMessageHistory(
166
+ mock_history,
167
+ compress_threshold_tokens=10000, # High threshold
168
+ )
169
+
170
+ messages = wrapper.messages
171
+ assert len(messages) == 2
172
+ assert messages[0].content == "Hello"
173
+
174
+ def test_add_message_delegates(self):
175
+ """add_message delegates to base history."""
176
+ from headroom.integrations.langchain.memory import HeadroomChatMessageHistory
177
+
178
+ mock_history = MagicMock()
179
+ mock_history.messages = []
180
+
181
+ wrapper = HeadroomChatMessageHistory(mock_history)
182
+ message = HumanMessage(content="Test")
183
+ wrapper.add_message(message)
184
+
185
+ mock_history.add_message.assert_called_once_with(message)
186
+
187
+ def test_clear_delegates(self):
188
+ """clear delegates to base history."""
189
+ from headroom.integrations.langchain.memory import HeadroomChatMessageHistory
190
+
191
+ mock_history = MagicMock()
192
+ mock_history.messages = []
193
+
194
+ wrapper = HeadroomChatMessageHistory(mock_history)
195
+ wrapper.clear()
196
+
197
+ mock_history.clear.assert_called_once()
198
+
199
+ def test_get_compression_stats(self):
200
+ """Get compression statistics."""
201
+ from headroom.integrations.langchain.memory import HeadroomChatMessageHistory
202
+
203
+ mock_history = MagicMock()
204
+ mock_history.messages = []
205
+
206
+ wrapper = HeadroomChatMessageHistory(mock_history)
207
+ stats = wrapper.get_compression_stats()
208
+
209
+ assert "compression_count" in stats
210
+ assert "total_tokens_saved" in stats
211
+ assert stats["compression_count"] == 0
212
+
213
+
214
+ class TestHeadroomDocumentCompressor:
215
+ """Tests for HeadroomDocumentCompressor retriever integration."""
216
+
217
+ def test_init(self):
218
+ """Initialize with defaults."""
219
+ from headroom.integrations.langchain.retriever import HeadroomDocumentCompressor
220
+
221
+ compressor = HeadroomDocumentCompressor()
222
+
223
+ assert compressor.max_documents == 10
224
+ assert compressor.min_relevance == 0.0
225
+ assert compressor.prefer_diverse is False
226
+
227
+ def test_init_custom(self):
228
+ """Initialize with custom settings."""
229
+ from headroom.integrations.langchain.retriever import HeadroomDocumentCompressor
230
+
231
+ compressor = HeadroomDocumentCompressor(
232
+ max_documents=5,
233
+ min_relevance=0.5,
234
+ prefer_diverse=True,
235
+ )
236
+
237
+ assert compressor.max_documents == 5
238
+ assert compressor.min_relevance == 0.5
239
+ assert compressor.prefer_diverse is True
240
+
241
+ def test_compress_passthrough_under_limit(self):
242
+ """Pass through when under max_documents."""
243
+ from headroom.integrations.langchain.retriever import HeadroomDocumentCompressor
244
+
245
+ compressor = HeadroomDocumentCompressor(max_documents=10)
246
+
247
+ docs = [
248
+ Document(page_content="Python is a programming language."),
249
+ Document(page_content="JavaScript runs in browsers."),
250
+ ]
251
+
252
+ result = compressor.compress_documents(docs, "What is Python?")
253
+
254
+ assert len(result) == 2
255
+
256
+ def test_compress_reduces_to_max(self):
257
+ """Compress when over max_documents."""
258
+ from headroom.integrations.langchain.retriever import HeadroomDocumentCompressor
259
+
260
+ compressor = HeadroomDocumentCompressor(max_documents=2)
261
+
262
+ docs = [
263
+ Document(page_content="Python is a programming language."),
264
+ Document(page_content="Java is also a language."),
265
+ Document(page_content="Weather today is sunny."),
266
+ Document(page_content="Cats are cute animals."),
267
+ ]
268
+
269
+ result = compressor.compress_documents(docs, "programming language")
270
+
271
+ assert len(result) == 2
272
+
273
+ def test_compress_prefers_relevant(self):
274
+ """Keep most relevant documents."""
275
+ from headroom.integrations.langchain.retriever import HeadroomDocumentCompressor
276
+
277
+ compressor = HeadroomDocumentCompressor(max_documents=1)
278
+
279
+ docs = [
280
+ Document(page_content="Weather today is sunny."),
281
+ Document(page_content="Python programming tutorial basics."),
282
+ Document(page_content="Cats are cute animals."),
283
+ ]
284
+
285
+ result = compressor.compress_documents(docs, "Python tutorial")
286
+
287
+ assert len(result) == 1
288
+ assert "Python" in result[0].page_content
289
+
290
+ def test_metrics_tracked(self):
291
+ """Compression metrics are tracked."""
292
+ from headroom.integrations.langchain.retriever import HeadroomDocumentCompressor
293
+
294
+ compressor = HeadroomDocumentCompressor(max_documents=2)
295
+
296
+ docs = [
297
+ Document(page_content="Doc 1"),
298
+ Document(page_content="Doc 2"),
299
+ Document(page_content="Doc 3"),
300
+ ]
301
+
302
+ compressor.compress_documents(docs, "query")
303
+
304
+ metrics = compressor.last_metrics
305
+ assert metrics is not None
306
+ assert metrics.documents_before == 3
307
+ assert metrics.documents_after == 2
308
+ assert metrics.documents_removed == 1
309
+
310
+ def test_get_compression_stats(self):
311
+ """Get compression statistics."""
312
+ from headroom.integrations.langchain.retriever import HeadroomDocumentCompressor
313
+
314
+ compressor = HeadroomDocumentCompressor(max_documents=1)
315
+ docs = [Document(page_content="A"), Document(page_content="B")]
316
+
317
+ compressor.compress_documents(docs, "A")
318
+ stats = compressor.get_compression_stats()
319
+
320
+ assert "documents_before" in stats
321
+ assert "documents_after" in stats
322
+ assert "average_relevance" in stats
323
+
324
+
325
+ class TestHeadroomToolWrapper:
326
+ """Tests for HeadroomToolWrapper agent integration."""
327
+
328
+ def test_init(self):
329
+ """Initialize wrapper."""
330
+ from headroom.integrations.langchain.agents import HeadroomToolWrapper
331
+
332
+ mock_tool = MagicMock()
333
+ mock_tool.name = "test_tool"
334
+ mock_tool.description = "A test tool"
335
+
336
+ wrapper = HeadroomToolWrapper(mock_tool)
337
+
338
+ assert wrapper.name == "test_tool"
339
+ assert wrapper.description == "A test tool"
340
+
341
+ def test_call_passthrough_small_output(self):
342
+ """Small outputs pass through without compression."""
343
+ from headroom.integrations.langchain.agents import HeadroomToolWrapper
344
+
345
+ mock_tool = MagicMock()
346
+ mock_tool.name = "test"
347
+ mock_tool.description = "test"
348
+ mock_tool.invoke.return_value = "small result"
349
+
350
+ wrapper = HeadroomToolWrapper(mock_tool, min_chars_to_compress=1000)
351
+ result = wrapper("query")
352
+
353
+ assert result == "small result"
354
+
355
+ def test_call_compresses_large_json(self):
356
+ """Large JSON outputs get compressed."""
357
+ from headroom.integrations.langchain.agents import HeadroomToolWrapper
358
+
359
+ mock_tool = MagicMock()
360
+ mock_tool.name = "search"
361
+ mock_tool.description = "search"
362
+
363
+ # Large JSON output
364
+ large_output = json.dumps([{"id": i, "data": "x" * 100} for i in range(50)])
365
+ mock_tool.invoke.return_value = large_output
366
+
367
+ wrapper = HeadroomToolWrapper(mock_tool, min_chars_to_compress=100)
368
+ result = wrapper("query")
369
+
370
+ # Should be smaller after compression
371
+ assert len(result) <= len(large_output)
372
+
373
+ def test_as_langchain_tool(self):
374
+ """Convert to LangChain tool."""
375
+ from headroom.integrations.langchain.agents import HeadroomToolWrapper
376
+
377
+ mock_tool = MagicMock()
378
+ mock_tool.name = "test"
379
+ mock_tool.description = "test tool"
380
+ mock_tool.invoke.return_value = "result"
381
+
382
+ wrapper = HeadroomToolWrapper(mock_tool)
383
+ lc_tool = wrapper.as_langchain_tool()
384
+
385
+ assert isinstance(lc_tool, StructuredTool)
386
+ assert lc_tool.name == "test"
387
+
388
+ def test_wrap_tools_with_headroom(self):
389
+ """Wrap multiple tools at once."""
390
+ from headroom.integrations.langchain.agents import wrap_tools_with_headroom
391
+
392
+ tools = []
393
+ for i in range(3):
394
+ mock = MagicMock()
395
+ mock.name = f"tool_{i}"
396
+ mock.description = f"Tool {i}"
397
+ mock.invoke.return_value = "result"
398
+ tools.append(mock)
399
+
400
+ wrapped = wrap_tools_with_headroom(tools)
401
+
402
+ assert len(wrapped) == 3
403
+ assert all(isinstance(t, StructuredTool) for t in wrapped)
404
+
405
+ def test_metrics_collector(self):
406
+ """Tool metrics are collected."""
407
+ from headroom.integrations.langchain.agents import (
408
+ HeadroomToolWrapper,
409
+ ToolMetricsCollector,
410
+ )
411
+
412
+ collector = ToolMetricsCollector()
413
+
414
+ mock_tool = MagicMock()
415
+ mock_tool.name = "test"
416
+ mock_tool.description = "test"
417
+ mock_tool.invoke.return_value = "result"
418
+
419
+ wrapper = HeadroomToolWrapper(mock_tool, metrics_collector=collector)
420
+ wrapper("query")
421
+
422
+ assert len(collector.metrics) == 1
423
+ assert collector.metrics[0].tool_name == "test"
424
+
425
+
426
+ class TestHeadroomLangSmithCallbackHandler:
427
+ """Tests for LangSmith integration."""
428
+
429
+ def test_init(self):
430
+ """Initialize handler."""
431
+ from headroom.integrations.langchain.langsmith import (
432
+ HeadroomLangSmithCallbackHandler,
433
+ )
434
+
435
+ handler = HeadroomLangSmithCallbackHandler(auto_update_runs=False)
436
+
437
+ assert handler._auto_update is False
438
+ assert handler._pending_metrics == {}
439
+
440
+ def test_set_headroom_metrics(self):
441
+ """Set metrics for a run."""
442
+ from headroom.integrations.langchain.langsmith import (
443
+ HeadroomLangSmithCallbackHandler,
444
+ )
445
+
446
+ handler = HeadroomLangSmithCallbackHandler(auto_update_runs=False)
447
+
448
+ handler.set_headroom_metrics(
449
+ run_id="test-run-123",
450
+ tokens_before=1000,
451
+ tokens_after=800,
452
+ transforms_applied=["smart_crusher"],
453
+ )
454
+
455
+ assert "test-run-123" in handler._pending_metrics
456
+ metrics = handler._pending_metrics["test-run-123"]
457
+ assert metrics.tokens_before == 1000
458
+ assert metrics.tokens_after == 800
459
+ assert metrics.tokens_saved == 200
460
+ assert metrics.savings_percent == 20.0
461
+
462
+ def test_get_run_metrics(self):
463
+ """Get metrics for a specific run."""
464
+ from headroom.integrations.langchain.langsmith import (
465
+ HeadroomLangSmithCallbackHandler,
466
+ )
467
+
468
+ handler = HeadroomLangSmithCallbackHandler(auto_update_runs=False)
469
+ handler._run_metrics["run-1"] = {"headroom.tokens_saved": 100}
470
+
471
+ metrics = handler.get_run_metrics("run-1")
472
+ assert metrics["headroom.tokens_saved"] == 100
473
+
474
+ def test_get_summary(self):
475
+ """Get summary statistics."""
476
+ from headroom.integrations.langchain.langsmith import (
477
+ HeadroomLangSmithCallbackHandler,
478
+ )
479
+
480
+ handler = HeadroomLangSmithCallbackHandler(auto_update_runs=False)
481
+ handler._run_metrics = {
482
+ "run-1": {"headroom.tokens_saved": 100, "headroom.savings_percent": 20},
483
+ "run-2": {"headroom.tokens_saved": 200, "headroom.savings_percent": 30},
484
+ }
485
+
486
+ summary = handler.get_summary()
487
+ assert summary["total_runs"] == 2
488
+ assert summary["total_tokens_saved"] == 300
489
+ assert summary["average_savings_percent"] == 25.0
490
+
491
+ def test_reset(self):
492
+ """Reset clears all metrics."""
493
+ from headroom.integrations.langchain.langsmith import (
494
+ HeadroomLangSmithCallbackHandler,
495
+ )
496
+
497
+ handler = HeadroomLangSmithCallbackHandler(auto_update_runs=False)
498
+ handler._run_metrics = {"run-1": {}}
499
+ handler._pending_metrics = {"run-2": MagicMock()}
500
+
501
+ handler.reset()
502
+
503
+ assert handler._run_metrics == {}
504
+ assert handler._pending_metrics == {}
505
+
506
+
507
+ class TestStreamingMetricsTracker:
508
+ """Tests for streaming metrics tracking."""
509
+
510
+ def test_init(self):
511
+ """Initialize tracker."""
512
+ from headroom.integrations.langchain.streaming import StreamingMetricsTracker
513
+
514
+ tracker = StreamingMetricsTracker(model="gpt-4o")
515
+
516
+ assert tracker._model == "gpt-4o"
517
+ assert tracker._content == ""
518
+ assert tracker._chunk_count == 0
519
+
520
+ def test_add_chunk_string(self):
521
+ """Add string chunks."""
522
+ from headroom.integrations.langchain.streaming import StreamingMetricsTracker
523
+
524
+ tracker = StreamingMetricsTracker()
525
+ tracker.add_chunk("Hello ")
526
+ tracker.add_chunk("world!")
527
+
528
+ assert tracker.content == "Hello world!"
529
+ assert tracker.chunk_count == 2
530
+
531
+ def test_add_chunk_with_content_attr(self):
532
+ """Add chunks with content attribute."""
533
+ from headroom.integrations.langchain.streaming import StreamingMetricsTracker
534
+
535
+ tracker = StreamingMetricsTracker()
536
+
537
+ chunk1 = MagicMock()
538
+ chunk1.content = "Hello "
539
+ chunk2 = MagicMock()
540
+ chunk2.content = "world!"
541
+
542
+ tracker.add_chunk(chunk1)
543
+ tracker.add_chunk(chunk2)
544
+
545
+ assert tracker.content == "Hello world!"
546
+
547
+ def test_output_tokens(self):
548
+ """Count output tokens."""
549
+ from headroom.integrations.langchain.streaming import StreamingMetricsTracker
550
+
551
+ tracker = StreamingMetricsTracker(model="gpt-4o")
552
+ tracker.add_chunk("Hello world, this is a test message.")
553
+
554
+ tokens = tracker.output_tokens
555
+ assert tokens > 0
556
+
557
+ def test_finish(self):
558
+ """Finish tracking and get metrics."""
559
+ from headroom.integrations.langchain.streaming import StreamingMetricsTracker
560
+
561
+ tracker = StreamingMetricsTracker()
562
+ tracker.add_chunk("Test content")
563
+ metrics = tracker.finish()
564
+
565
+ assert metrics.chunk_count == 1
566
+ assert metrics.content_length == len("Test content")
567
+ assert metrics.duration_ms is not None
568
+ assert metrics.end_time is not None
569
+
570
+ def test_reset(self):
571
+ """Reset tracker for reuse."""
572
+ from headroom.integrations.langchain.streaming import StreamingMetricsTracker
573
+
574
+ tracker = StreamingMetricsTracker()
575
+ tracker.add_chunk("Content")
576
+ tracker.finish()
577
+
578
+ tracker.reset()
579
+
580
+ assert tracker.content == ""
581
+ assert tracker.chunk_count == 0
582
+
583
+ def test_streaming_metrics_callback(self):
584
+ """Test context manager interface."""
585
+ from headroom.integrations.langchain.streaming import StreamingMetricsCallback
586
+
587
+ with StreamingMetricsCallback(model="gpt-4o") as tracker:
588
+ tracker.add_chunk("Hello")
589
+ tracker.add_chunk(" world")
590
+
591
+ # After context exit, metrics should be available
592
+ # (accessed via the callback object, not the tracker)
593
+
594
+ def test_track_streaming_response(self):
595
+ """Track a complete streaming response."""
596
+ from headroom.integrations.langchain.streaming import track_streaming_response
597
+
598
+ chunks = ["Hello ", "world", "!"]
599
+ content, metrics = track_streaming_response(iter(chunks), model="gpt-4o")
600
+
601
+ assert content == "Hello world!"
602
+ assert metrics.chunk_count == 3
603
+
604
+
605
+ class TestAutoDetectProviderInChatModel:
606
+ """Tests for auto_detect_provider in HeadroomChatModel."""
607
+
608
+ def test_auto_detect_enabled_by_default(self):
609
+ """auto_detect_provider is True by default."""
610
+ from headroom.integrations import HeadroomChatModel
611
+
612
+ mock_model = MagicMock()
613
+ mock_model._llm_type = "test"
614
+ mock_model._identifying_params = {}
615
+ mock_model.__class__.__name__ = "ChatOpenAI"
616
+ mock_model.__class__.__module__ = "langchain_openai"
617
+
618
+ model = HeadroomChatModel(mock_model)
619
+ assert model.auto_detect_provider is True
620
+
621
+ def test_auto_detect_can_be_disabled(self):
622
+ """auto_detect_provider can be set to False."""
623
+ from headroom.integrations import HeadroomChatModel
624
+
625
+ mock_model = MagicMock()
626
+ mock_model._llm_type = "test"
627
+ mock_model._identifying_params = {}
628
+
629
+ model = HeadroomChatModel(mock_model, auto_detect_provider=False)
630
+ assert model.auto_detect_provider is False
631
+
632
+ def test_pipeline_uses_detected_provider(self):
633
+ """Pipeline uses auto-detected provider."""
634
+ from headroom.integrations import HeadroomChatModel
635
+ from headroom.providers import AnthropicProvider
636
+
637
+ mock_model = MagicMock()
638
+ mock_model._llm_type = "test"
639
+ mock_model._identifying_params = {}
640
+ mock_model.__class__.__name__ = "ChatAnthropic"
641
+ mock_model.__class__.__module__ = "langchain_anthropic"
642
+
643
+ model = HeadroomChatModel(mock_model)
644
+ _ = model.pipeline # Force lazy init
645
+
646
+ assert isinstance(model._provider, AnthropicProvider)
tests/test_integrations/mcp/__init__.py ADDED
File without changes
tests/test_integrations/{test_mcp.py → mcp/test_server.py} RENAMED
File without changes
uv.lock CHANGED
@@ -6,6 +6,25 @@ resolution-markers = [
6
  "python_full_version < '3.11'",
7
  ]
8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  [[package]]
10
  name = "annotated-doc"
11
  version = "0.0.4"
@@ -362,8 +381,8 @@ wheels = [
362
  ]
363
 
364
  [[package]]
365
- name = "headroom"
366
- version = "0.2.0"
367
  source = { editable = "." }
368
  dependencies = [
369
  { name = "pydantic" },
@@ -375,11 +394,18 @@ all = [
375
  { name = "fastapi" },
376
  { name = "httpx" },
377
  { name = "jinja2" },
 
378
  { name = "numpy", version = "2.2.6", source = { registry = "https://pypi.netflix.net/simple" }, marker = "python_full_version < '3.11'" },
379
  { name = "numpy", version = "2.4.0", source = { registry = "https://pypi.netflix.net/simple" }, marker = "python_full_version >= '3.11'" },
380
  { name = "sentence-transformers" },
 
 
 
381
  { name = "uvicorn" },
382
  ]
 
 
 
383
  dev = [
384
  { name = "anthropic" },
385
  { name = "mypy" },
@@ -389,6 +415,11 @@ dev = [
389
  { name = "pytest-cov" },
390
  { name = "ruff" },
391
  ]
 
 
 
 
 
392
  proxy = [
393
  { name = "fastapi" },
394
  { name = "httpx" },
@@ -407,9 +438,10 @@ reports = [
407
  requires-dist = [
408
  { name = "anthropic", marker = "extra == 'dev'", specifier = ">=0.18.0" },
409
  { name = "fastapi", marker = "extra == 'proxy'", specifier = ">=0.100.0" },
410
- { name = "headroom", extras = ["relevance", "proxy", "reports"], marker = "extra == 'all'" },
411
  { name = "httpx", marker = "extra == 'proxy'", specifier = ">=0.24.0" },
412
  { name = "jinja2", marker = "extra == 'reports'", specifier = ">=3.0.0" },
 
413
  { name = "mypy", marker = "extra == 'dev'", specifier = ">=1.0.0" },
414
  { name = "numpy", marker = "extra == 'relevance'", specifier = ">=1.24.0" },
415
  { name = "openai", marker = "extra == 'dev'", specifier = ">=1.0.0" },
@@ -420,6 +452,9 @@ requires-dist = [
420
  { name = "ruff", marker = "extra == 'dev'", specifier = ">=0.1.0" },
421
  { name = "sentence-transformers", marker = "extra == 'relevance'", specifier = ">=2.2.0" },
422
  { name = "tiktoken", specifier = ">=0.5.0" },
 
 
 
423
  { name = "uvicorn", marker = "extra == 'proxy'", specifier = ">=0.23.0" },
424
  ]
425
 
@@ -708,6 +743,24 @@ wheels = [
708
  { url = "https://pypi.netflix.net/packages/19544946795/librt-0.7.7-cp314-cp314t-win_arm64.whl", hash = "sha256:142c2cd91794b79fd0ce113bd658993b7ede0fe93057668c2f98a45ca00b7e91", size = 39724 },
709
  ]
710
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
711
  [[package]]
712
  name = "markupsafe"
713
  version = "3.0.3"
@@ -882,6 +935,21 @@ wheels = [
882
  { url = "https://pypi.netflix.net/packages/19441125158/networkx-3.6.1-py3-none-any.whl", hash = "sha256:d47fbf302e7d9cbbb9e2555a0d267983d2aa476bac30e90dfbe5669bd57f3762", size = 2068504 },
883
  ]
884
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
885
  [[package]]
886
  name = "numpy"
887
  version = "2.2.6"
@@ -1225,6 +1293,34 @@ wheels = [
1225
  { url = "https://pypi.netflix.net/packages/18687957486/pluggy-1.6.0-py3-none-any.whl", hash = "sha256:e920276dd6813095e9377c0bc5566d94c932c33b27a3e3945d8389c374dd4746", size = 20538 },
1226
  ]
1227
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1228
  [[package]]
1229
  name = "pydantic"
1230
  version = "2.12.5"
@@ -2193,6 +2289,115 @@ wheels = [
2193
  { url = "https://pypi.netflix.net/packages/19387983499/transformers-4.57.3-py3-none-any.whl", hash = "sha256:c77d353a4851b1880191603d36acb313411d3577f6e2897814f333841f7003f4", size = 11993463 },
2194
  ]
2195
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2196
  [[package]]
2197
  name = "triton"
2198
  version = "3.5.1"
 
6
  "python_full_version < '3.11'",
7
  ]
8
 
9
+ [[package]]
10
+ name = "accelerate"
11
+ version = "1.12.0"
12
+ source = { registry = "https://pypi.netflix.net/simple" }
13
+ dependencies = [
14
+ { name = "huggingface-hub" },
15
+ { name = "numpy", version = "2.2.6", source = { registry = "https://pypi.netflix.net/simple" }, marker = "python_full_version < '3.11'" },
16
+ { name = "numpy", version = "2.4.0", source = { registry = "https://pypi.netflix.net/simple" }, marker = "python_full_version >= '3.11'" },
17
+ { name = "packaging" },
18
+ { name = "psutil" },
19
+ { name = "pyyaml" },
20
+ { name = "safetensors" },
21
+ { name = "torch" },
22
+ ]
23
+ sdist = { url = "https://pypi.netflix.net/packages/19372078203/accelerate-1.12.0.tar.gz", hash = "sha256:70988c352feb481887077d2ab845125024b2a137a5090d6d7a32b57d03a45df6", size = 398399 }
24
+ wheels = [
25
+ { url = "https://pypi.netflix.net/packages/19372078202/accelerate-1.12.0-py3-none-any.whl", hash = "sha256:3e2091cd341423207e2f084a6654b1efcd250dc326f2a37d6dde446e07cabb11", size = 380935 },
26
+ ]
27
+
28
  [[package]]
29
  name = "annotated-doc"
30
  version = "0.0.4"
 
381
  ]
382
 
383
  [[package]]
384
+ name = "headroom-ai"
385
+ version = "0.2.3"
386
  source = { editable = "." }
387
  dependencies = [
388
  { name = "pydantic" },
 
394
  { name = "fastapi" },
395
  { name = "httpx" },
396
  { name = "jinja2" },
397
+ { name = "llmlingua" },
398
  { name = "numpy", version = "2.2.6", source = { registry = "https://pypi.netflix.net/simple" }, marker = "python_full_version < '3.11'" },
399
  { name = "numpy", version = "2.4.0", source = { registry = "https://pypi.netflix.net/simple" }, marker = "python_full_version >= '3.11'" },
400
  { name = "sentence-transformers" },
401
+ { name = "torch" },
402
+ { name = "transformers" },
403
+ { name = "tree-sitter-language-pack" },
404
  { name = "uvicorn" },
405
  ]
406
+ code = [
407
+ { name = "tree-sitter-language-pack" },
408
+ ]
409
  dev = [
410
  { name = "anthropic" },
411
  { name = "mypy" },
 
415
  { name = "pytest-cov" },
416
  { name = "ruff" },
417
  ]
418
+ llmlingua = [
419
+ { name = "llmlingua" },
420
+ { name = "torch" },
421
+ { name = "transformers" },
422
+ ]
423
  proxy = [
424
  { name = "fastapi" },
425
  { name = "httpx" },
 
438
  requires-dist = [
439
  { name = "anthropic", marker = "extra == 'dev'", specifier = ">=0.18.0" },
440
  { name = "fastapi", marker = "extra == 'proxy'", specifier = ">=0.100.0" },
441
+ { name = "headroom-ai", extras = ["relevance", "proxy", "reports", "llmlingua", "code"], marker = "extra == 'all'" },
442
  { name = "httpx", marker = "extra == 'proxy'", specifier = ">=0.24.0" },
443
  { name = "jinja2", marker = "extra == 'reports'", specifier = ">=3.0.0" },
444
+ { name = "llmlingua", marker = "extra == 'llmlingua'", specifier = ">=0.2.0" },
445
  { name = "mypy", marker = "extra == 'dev'", specifier = ">=1.0.0" },
446
  { name = "numpy", marker = "extra == 'relevance'", specifier = ">=1.24.0" },
447
  { name = "openai", marker = "extra == 'dev'", specifier = ">=1.0.0" },
 
452
  { name = "ruff", marker = "extra == 'dev'", specifier = ">=0.1.0" },
453
  { name = "sentence-transformers", marker = "extra == 'relevance'", specifier = ">=2.2.0" },
454
  { name = "tiktoken", specifier = ">=0.5.0" },
455
+ { name = "torch", marker = "extra == 'llmlingua'", specifier = ">=2.0.0" },
456
+ { name = "transformers", marker = "extra == 'llmlingua'", specifier = ">=4.30.0" },
457
+ { name = "tree-sitter-language-pack", marker = "extra == 'code'", specifier = ">=0.10.0" },
458
  { name = "uvicorn", marker = "extra == 'proxy'", specifier = ">=0.23.0" },
459
  ]
460
 
 
743
  { url = "https://pypi.netflix.net/packages/19544946795/librt-0.7.7-cp314-cp314t-win_arm64.whl", hash = "sha256:142c2cd91794b79fd0ce113bd658993b7ede0fe93057668c2f98a45ca00b7e91", size = 39724 },
744
  ]
745
 
746
+ [[package]]
747
+ name = "llmlingua"
748
+ version = "0.2.2"
749
+ source = { registry = "https://pypi.netflix.net/simple" }
750
+ dependencies = [
751
+ { name = "accelerate" },
752
+ { name = "nltk" },
753
+ { name = "numpy", version = "2.2.6", source = { registry = "https://pypi.netflix.net/simple" }, marker = "python_full_version < '3.11'" },
754
+ { name = "numpy", version = "2.4.0", source = { registry = "https://pypi.netflix.net/simple" }, marker = "python_full_version >= '3.11'" },
755
+ { name = "tiktoken" },
756
+ { name = "torch" },
757
+ { name = "transformers" },
758
+ ]
759
+ sdist = { url = "https://pypi.netflix.net/packages/19606733170/llmlingua-0.2.2.tar.gz", hash = "sha256:1a0caedd8d5a65512a85dadb6bfda6f5b3c4b45e5cb9e7b1c6009573f9058572", size = 59753 }
760
+ wheels = [
761
+ { url = "https://pypi.netflix.net/packages/19606733169/llmlingua-0.2.2-py3-none-any.whl", hash = "sha256:da55137efe0db78063b3395396efe8a0dcfe4ae5a09aea0d503c34b7bf1d800c", size = 30536 },
762
+ ]
763
+
764
  [[package]]
765
  name = "markupsafe"
766
  version = "3.0.3"
 
935
  { url = "https://pypi.netflix.net/packages/19441125158/networkx-3.6.1-py3-none-any.whl", hash = "sha256:d47fbf302e7d9cbbb9e2555a0d267983d2aa476bac30e90dfbe5669bd57f3762", size = 2068504 },
936
  ]
937
 
938
+ [[package]]
939
+ name = "nltk"
940
+ version = "3.9.2"
941
+ source = { registry = "https://pypi.netflix.net/simple" }
942
+ dependencies = [
943
+ { name = "click" },
944
+ { name = "joblib" },
945
+ { name = "regex" },
946
+ { name = "tqdm" },
947
+ ]
948
+ sdist = { url = "https://pypi.netflix.net/packages/19152095449/nltk-3.9.2.tar.gz", hash = "sha256:0f409e9b069ca4177c1903c3e843eef90c7e92992fa4931ae607da6de49e1419", size = 2887629 }
949
+ wheels = [
950
+ { url = "https://pypi.netflix.net/packages/19152095448/nltk-3.9.2-py3-none-any.whl", hash = "sha256:1e209d2b3009110635ed9709a67a1a3e33a10f799490fa71cf4bec218c11c88a", size = 1513404 },
951
+ ]
952
+
953
  [[package]]
954
  name = "numpy"
955
  version = "2.2.6"
 
1293
  { url = "https://pypi.netflix.net/packages/18687957486/pluggy-1.6.0-py3-none-any.whl", hash = "sha256:e920276dd6813095e9377c0bc5566d94c932c33b27a3e3945d8389c374dd4746", size = 20538 },
1294
  ]
1295
 
1296
+ [[package]]
1297
+ name = "psutil"
1298
+ version = "7.2.1"
1299
+ source = { registry = "https://pypi.netflix.net/simple" }
1300
+ sdist = { url = "https://pypi.netflix.net/packages/19533562506/psutil-7.2.1.tar.gz", hash = "sha256:f7583aec590485b43ca601dd9cea0dcd65bd7bb21d30ef4ddbf4ea6b5ed1bdd3", size = 490253 }
1301
+ wheels = [
1302
+ { url = "https://pypi.netflix.net/packages/19533562496/psutil-7.2.1-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:ba9f33bb525b14c3ea563b2fd521a84d2fa214ec59e3e6a2858f78d0844dd60d", size = 129624 },
1303
+ { url = "https://pypi.netflix.net/packages/19533562497/psutil-7.2.1-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:81442dac7abfc2f4f4385ea9e12ddf5a796721c0f6133260687fec5c3780fa49", size = 130132 },
1304
+ { url = "https://pypi.netflix.net/packages/19533562498/psutil-7.2.1-cp313-cp313t-manylinux2010_x86_64.manylinux_2_12_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ea46c0d060491051d39f0d2cff4f98d5c72b288289f57a21556cc7d504db37fc", size = 180612 },
1305
+ { url = "https://pypi.netflix.net/packages/19533562499/psutil-7.2.1-cp313-cp313t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:35630d5af80d5d0d49cfc4d64c1c13838baf6717a13effb35869a5919b854cdf", size = 183201 },
1306
+ { url = "https://pypi.netflix.net/packages/19533562500/psutil-7.2.1-cp313-cp313t-win_amd64.whl", hash = "sha256:923f8653416604e356073e6e0bccbe7c09990acef442def2f5640dd0faa9689f", size = 139081 },
1307
+ { url = "https://pypi.netflix.net/packages/19533562501/psutil-7.2.1-cp313-cp313t-win_arm64.whl", hash = "sha256:cfbe6b40ca48019a51827f20d830887b3107a74a79b01ceb8cc8de4ccb17b672", size = 134767 },
1308
+ { url = "https://pypi.netflix.net/packages/19533562502/psutil-7.2.1-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:494c513ccc53225ae23eec7fe6e1482f1b8a44674241b54561f755a898650679", size = 129716 },
1309
+ { url = "https://pypi.netflix.net/packages/19533562503/psutil-7.2.1-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:3fce5f92c22b00cdefd1645aa58ab4877a01679e901555067b1bd77039aa589f", size = 130133 },
1310
+ { url = "https://pypi.netflix.net/packages/19533562504/psutil-7.2.1-cp314-cp314t-manylinux2010_x86_64.manylinux_2_12_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:93f3f7b0bb07711b49626e7940d6fe52aa9940ad86e8f7e74842e73189712129", size = 181518 },
1311
+ { url = "https://pypi.netflix.net/packages/19533562505/psutil-7.2.1-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d34d2ca888208eea2b5c68186841336a7f5e0b990edec929be909353a202768a", size = 184348 },
1312
+ { url = "https://pypi.netflix.net/packages/19533563921/psutil-7.2.1-cp314-cp314t-win_amd64.whl", hash = "sha256:2ceae842a78d1603753561132d5ad1b2f8a7979cb0c283f5b52fb4e6e14b1a79", size = 140400 },
1313
+ { url = "https://pypi.netflix.net/packages/19533563922/psutil-7.2.1-cp314-cp314t-win_arm64.whl", hash = "sha256:08a2f175e48a898c8eb8eace45ce01777f4785bc744c90aa2cc7f2fa5462a266", size = 135430 },
1314
+ { url = "https://pypi.netflix.net/packages/19533563923/psutil-7.2.1-cp36-abi3-macosx_10_9_x86_64.whl", hash = "sha256:b2e953fcfaedcfbc952b44744f22d16575d3aa78eb4f51ae74165b4e96e55f42", size = 128137 },
1315
+ { url = "https://pypi.netflix.net/packages/19533563924/psutil-7.2.1-cp36-abi3-macosx_11_0_arm64.whl", hash = "sha256:05cc68dbb8c174828624062e73078e7e35406f4ca2d0866c272c2410d8ef06d1", size = 128947 },
1316
+ { url = "https://pypi.netflix.net/packages/19533563925/psutil-7.2.1-cp36-abi3-manylinux2010_x86_64.manylinux_2_12_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:5e38404ca2bb30ed7267a46c02f06ff842e92da3bb8c5bfdadbd35a5722314d8", size = 154694 },
1317
+ { url = "https://pypi.netflix.net/packages/19533563926/psutil-7.2.1-cp36-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:ab2b98c9fc19f13f59628d94df5cc4cc4844bc572467d113a8b517d634e362c6", size = 156136 },
1318
+ { url = "https://pypi.netflix.net/packages/19533563927/psutil-7.2.1-cp36-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:f78baafb38436d5a128f837fab2d92c276dfb48af01a240b861ae02b2413ada8", size = 148108 },
1319
+ { url = "https://pypi.netflix.net/packages/19533565348/psutil-7.2.1-cp36-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:99a4cd17a5fdd1f3d014396502daa70b5ec21bf4ffe38393e152f8e449757d67", size = 147402 },
1320
+ { url = "https://pypi.netflix.net/packages/19533565349/psutil-7.2.1-cp37-abi3-win_amd64.whl", hash = "sha256:b1b0671619343aa71c20ff9767eced0483e4fc9e1f489d50923738caf6a03c17", size = 136938 },
1321
+ { url = "https://pypi.netflix.net/packages/19533565350/psutil-7.2.1-cp37-abi3-win_arm64.whl", hash = "sha256:0d67c1822c355aa6f7314d92018fb4268a76668a536f133599b91edd48759442", size = 133836 },
1322
+ ]
1323
+
1324
  [[package]]
1325
  name = "pydantic"
1326
  version = "2.12.5"
 
2289
  { url = "https://pypi.netflix.net/packages/19387983499/transformers-4.57.3-py3-none-any.whl", hash = "sha256:c77d353a4851b1880191603d36acb313411d3577f6e2897814f333841f7003f4", size = 11993463 },
2290
  ]
2291
 
2292
+ [[package]]
2293
+ name = "tree-sitter"
2294
+ version = "0.25.2"
2295
+ source = { registry = "https://pypi.netflix.net/simple" }
2296
+ sdist = { url = "https://pypi.netflix.net/packages/19129803294/tree-sitter-0.25.2.tar.gz", hash = "sha256:fe43c158555da46723b28b52e058ad444195afd1db3ca7720c59a254544e9c20", size = 177961 }
2297
+ wheels = [
2298
+ { url = "https://pypi.netflix.net/packages/19129800490/tree_sitter-0.25.2-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:72a510931c3c25f134aac2daf4eb4feca99ffe37a35896d7150e50ac3eee06c7", size = 146749 },
2299
+ { url = "https://pypi.netflix.net/packages/19129800491/tree_sitter-0.25.2-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:44488e0e78146f87baaa009736886516779253d6d6bac3ef636ede72bc6a8234", size = 137766 },
2300
+ { url = "https://pypi.netflix.net/packages/19129800492/tree_sitter-0.25.2-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c2f8e7d6b2f8489d4a9885e3adcaef4bc5ff0a275acd990f120e29c4ab3395c5", size = 599809 },
2301
+ { url = "https://pypi.netflix.net/packages/19129800493/tree_sitter-0.25.2-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:20b570690f87f1da424cd690e51cc56728d21d63f4abd4b326d382a30353acc7", size = 627676 },
2302
+ { url = "https://pypi.netflix.net/packages/19129800494/tree_sitter-0.25.2-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:a0ec41b895da717bc218a42a3a7a0bfcfe9a213d7afaa4255353901e0e21f696", size = 624281 },
2303
+ { url = "https://pypi.netflix.net/packages/19129800495/tree_sitter-0.25.2-cp310-cp310-win_amd64.whl", hash = "sha256:7712335855b2307a21ae86efe949c76be36c6068d76df34faa27ce9ee40ff444", size = 127295 },
2304
+ { url = "https://pypi.netflix.net/packages/19129800496/tree_sitter-0.25.2-cp310-cp310-win_arm64.whl", hash = "sha256:a925364eb7fbb9cdce55a9868f7525a1905af512a559303bd54ef468fd88cb37", size = 113991 },
2305
+ { url = "https://pypi.netflix.net/packages/19129800497/tree_sitter-0.25.2-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:b8ca72d841215b6573ed0655b3a5cd1133f9b69a6fa561aecad40dca9029d75b", size = 146752 },
2306
+ { url = "https://pypi.netflix.net/packages/19129800498/tree_sitter-0.25.2-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:cc0351cfe5022cec5a77645f647f92a936b38850346ed3f6d6babfbeeeca4d26", size = 137765 },
2307
+ { url = "https://pypi.netflix.net/packages/19129800499/tree_sitter-0.25.2-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:1799609636c0193e16c38f366bda5af15b1ce476df79ddaae7dd274df9e44266", size = 604643 },
2308
+ { url = "https://pypi.netflix.net/packages/19129800500/tree_sitter-0.25.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:3e65ae456ad0d210ee71a89ee112ac7e72e6c2e5aac1b95846ecc7afa68a194c", size = 632229 },
2309
+ { url = "https://pypi.netflix.net/packages/19129800501/tree_sitter-0.25.2-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:49ee3c348caa459244ec437ccc7ff3831f35977d143f65311572b8ba0a5f265f", size = 629861 },
2310
+ { url = "https://pypi.netflix.net/packages/19129800502/tree_sitter-0.25.2-cp311-cp311-win_amd64.whl", hash = "sha256:56ac6602c7d09c2c507c55e58dc7026b8988e0475bd0002f8a386cce5e8e8adc", size = 127304 },
2311
+ { url = "https://pypi.netflix.net/packages/19129800503/tree_sitter-0.25.2-cp311-cp311-win_arm64.whl", hash = "sha256:b3d11a3a3ac89bb8a2543d75597f905a9926f9c806f40fcca8242922d1cc6ad5", size = 113990 },
2312
+ { url = "https://pypi.netflix.net/packages/19129801135/tree_sitter-0.25.2-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:ddabfff809ffc983fc9963455ba1cecc90295803e06e140a4c83e94c1fa3d960", size = 146941 },
2313
+ { url = "https://pypi.netflix.net/packages/19129801136/tree_sitter-0.25.2-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:c0c0ab5f94938a23fe81928a21cc0fac44143133ccc4eb7eeb1b92f84748331c", size = 137699 },
2314
+ { url = "https://pypi.netflix.net/packages/19129801137/tree_sitter-0.25.2-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:dd12d80d91d4114ca097626eb82714618dcdfacd6a5e0955216c6485c350ef99", size = 607125 },
2315
+ { url = "https://pypi.netflix.net/packages/19129801138/tree_sitter-0.25.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:b43a9e4c89d4d0839de27cd4d6902d33396de700e9ff4c5ab7631f277a85ead9", size = 635418 },
2316
+ { url = "https://pypi.netflix.net/packages/19129801139/tree_sitter-0.25.2-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:fbb1706407c0e451c4f8cc016fec27d72d4b211fdd3173320b1ada7a6c74c3ac", size = 631250 },
2317
+ { url = "https://pypi.netflix.net/packages/19129801140/tree_sitter-0.25.2-cp312-cp312-win_amd64.whl", hash = "sha256:6d0302550bbe4620a5dc7649517c4409d74ef18558276ce758419cf09e578897", size = 127156 },
2318
+ { url = "https://pypi.netflix.net/packages/19129801141/tree_sitter-0.25.2-cp312-cp312-win_arm64.whl", hash = "sha256:0c8b6682cac77e37cfe5cf7ec388844957f48b7bd8d6321d0ca2d852994e10d5", size = 113984 },
2319
+ { url = "https://pypi.netflix.net/packages/19129801142/tree_sitter-0.25.2-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:0628671f0de69bb279558ef6b640bcfc97864fe0026d840f872728a86cd6b6cd", size = 146926 },
2320
+ { url = "https://pypi.netflix.net/packages/19129801143/tree_sitter-0.25.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:f5ddcd3e291a749b62521f71fc953f66f5fd9743973fd6dd962b092773569601", size = 137712 },
2321
+ { url = "https://pypi.netflix.net/packages/19129801144/tree_sitter-0.25.2-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:bd88fbb0f6c3a0f28f0a68d72df88e9755cf5215bae146f5a1bdc8362b772053", size = 607873 },
2322
+ { url = "https://pypi.netflix.net/packages/19129801145/tree_sitter-0.25.2-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:b878e296e63661c8e124177cc3084b041ba3f5936b43076d57c487822426f614", size = 636313 },
2323
+ { url = "https://pypi.netflix.net/packages/19129801146/tree_sitter-0.25.2-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:d77605e0d353ba3fe5627e5490f0fbfe44141bafa4478d88ef7954a61a848dae", size = 631370 },
2324
+ { url = "https://pypi.netflix.net/packages/19129803321/tree_sitter-0.25.2-cp313-cp313-win_amd64.whl", hash = "sha256:463c032bd02052d934daa5f45d183e0521ceb783c2548501cf034b0beba92c9b", size = 127157 },
2325
+ { url = "https://pypi.netflix.net/packages/19129803322/tree_sitter-0.25.2-cp313-cp313-win_arm64.whl", hash = "sha256:b3f63a1796886249bd22c559a5944d64d05d43f2be72961624278eff0dcc5cb8", size = 113975 },
2326
+ { url = "https://pypi.netflix.net/packages/19129803323/tree_sitter-0.25.2-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:65d3c931013ea798b502782acab986bbf47ba2c452610ab0776cf4a8ef150fc0", size = 146776 },
2327
+ { url = "https://pypi.netflix.net/packages/19129803324/tree_sitter-0.25.2-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:bda059af9d621918efb813b22fb06b3fe00c3e94079c6143fcb2c565eb44cb87", size = 137732 },
2328
+ { url = "https://pypi.netflix.net/packages/19129803325/tree_sitter-0.25.2-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:eac4e8e4c7060c75f395feec46421eb61212cb73998dbe004b7384724f3682ab", size = 609456 },
2329
+ { url = "https://pypi.netflix.net/packages/19129803326/tree_sitter-0.25.2-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:260586381b23be33b6191a07cea3d44ecbd6c01aa4c6b027a0439145fcbc3358", size = 636772 },
2330
+ { url = "https://pypi.netflix.net/packages/19129803327/tree_sitter-0.25.2-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:7d2ee1acbacebe50ba0f85fff1bc05e65d877958f00880f49f9b2af38dce1af0", size = 631522 },
2331
+ { url = "https://pypi.netflix.net/packages/19129803328/tree_sitter-0.25.2-cp314-cp314-win_amd64.whl", hash = "sha256:4973b718fcadfb04e59e746abfbb0288694159c6aeecd2add59320c03368c721", size = 130864 },
2332
+ { url = "https://pypi.netflix.net/packages/19129803329/tree_sitter-0.25.2-cp314-cp314-win_arm64.whl", hash = "sha256:b8d4429954a3beb3e844e2872610d2a4800ba4eb42bb1990c6a4b1949b18459f", size = 117470 },
2333
+ ]
2334
+
2335
+ [[package]]
2336
+ name = "tree-sitter-c-sharp"
2337
+ version = "0.23.1"
2338
+ source = { registry = "https://pypi.netflix.net/simple" }
2339
+ sdist = { url = "https://pypi.netflix.net/packages/18519163555/tree_sitter_c_sharp-0.23.1.tar.gz", hash = "sha256:322e2cfd3a547a840375276b2aea3335fa6458aeac082f6c60fec3f745c967eb", size = 1317728 }
2340
+ wheels = [
2341
+ { url = "https://pypi.netflix.net/packages/18519163548/tree_sitter_c_sharp-0.23.1-cp39-abi3-macosx_10_9_x86_64.whl", hash = "sha256:2b612a6e5bd17bb7fa2aab4bb6fc1fba45c94f09cb034ab332e45603b86e32fd", size = 372235 },
2342
+ { url = "https://pypi.netflix.net/packages/18519163549/tree_sitter_c_sharp-0.23.1-cp39-abi3-macosx_11_0_arm64.whl", hash = "sha256:1a8b98f62bc53efcd4d971151950c9b9cd5cbe3bacdb0cd69fdccac63350d83e", size = 419046 },
2343
+ { url = "https://pypi.netflix.net/packages/18519163550/tree_sitter_c_sharp-0.23.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:986e93d845a438ec3c4416401aa98e6a6f6631d644bbbc2e43fcb915c51d255d", size = 415999 },
2344
+ { url = "https://pypi.netflix.net/packages/18519163551/tree_sitter_c_sharp-0.23.1-cp39-abi3-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a8024e466b2f5611c6dc90321f232d8584893c7fb88b75e4a831992f877616d2", size = 402830 },
2345
+ { url = "https://pypi.netflix.net/packages/18519163552/tree_sitter_c_sharp-0.23.1-cp39-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:7f9bf876866835492281d336b9e1f9626ab668737f74e914c31d285261507da7", size = 397880 },
2346
+ { url = "https://pypi.netflix.net/packages/18519163553/tree_sitter_c_sharp-0.23.1-cp39-abi3-win_amd64.whl", hash = "sha256:ae9a9e859e8f44e2b07578d44f9a220d3fa25b688966708af6aa55d42abeebb3", size = 377562 },
2347
+ { url = "https://pypi.netflix.net/packages/18519163554/tree_sitter_c_sharp-0.23.1-cp39-abi3-win_arm64.whl", hash = "sha256:c81548347a93347be4f48cb63ec7d60ef4b0efa91313330e69641e49aa5a08c5", size = 375157 },
2348
+ ]
2349
+
2350
+ [[package]]
2351
+ name = "tree-sitter-embedded-template"
2352
+ version = "0.25.0"
2353
+ source = { registry = "https://pypi.netflix.net/simple" }
2354
+ sdist = { url = "https://pypi.netflix.net/packages/19023467751/tree_sitter_embedded_template-0.25.0.tar.gz", hash = "sha256:7d72d5e8a1d1d501a7c90e841b51f1449a90cc240be050e4fb85c22dab991d50", size = 14114 }
2355
+ wheels = [
2356
+ { url = "https://pypi.netflix.net/packages/19023467743/tree_sitter_embedded_template-0.25.0-cp310-abi3-macosx_10_9_x86_64.whl", hash = "sha256:fa0d06467199aeb33fb3d6fa0665bf9b7d5a32621ffdaf37fd8249f8a8050649", size = 10266 },
2357
+ { url = "https://pypi.netflix.net/packages/19023467744/tree_sitter_embedded_template-0.25.0-cp310-abi3-macosx_11_0_arm64.whl", hash = "sha256:fc7aacbc2985a5d7e7fe7334f44dffe24c38fb0a8295c4188a04cf21a3d64a73", size = 10650 },
2358
+ { url = "https://pypi.netflix.net/packages/19023467745/tree_sitter_embedded_template-0.25.0-cp310-abi3-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:a7c88c3dd8b94b3c9efe8ae071ff6b1b936a27ac5f6e651845c3b9631fa4c1c2", size = 18268 },
2359
+ { url = "https://pypi.netflix.net/packages/19023467746/tree_sitter_embedded_template-0.25.0-cp310-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:025f7ca84218dcd8455efc901bdbcc2689fb694f3a636c0448e322a23d4bc96b", size = 19068 },
2360
+ { url = "https://pypi.netflix.net/packages/19023467747/tree_sitter_embedded_template-0.25.0-cp310-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:b5dc1aef6ffa3fae621fe037d85dd98948b597afba20df29d779c426be813ee5", size = 18518 },
2361
+ { url = "https://pypi.netflix.net/packages/19023467748/tree_sitter_embedded_template-0.25.0-cp310-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:d0a35cfe634c44981a516243bc039874580e02a2990669313730187ce83a5bc6", size = 18267 },
2362
+ { url = "https://pypi.netflix.net/packages/19023467749/tree_sitter_embedded_template-0.25.0-cp310-abi3-win_amd64.whl", hash = "sha256:3e05a4ac013d54505e75ae48e1a0e9db9aab19949fe15d9f4c7345b11a84a069", size = 13049 },
2363
+ { url = "https://pypi.netflix.net/packages/19023467750/tree_sitter_embedded_template-0.25.0-cp310-abi3-win_arm64.whl", hash = "sha256:2751d402179ac0e83f2065b249d8fe6df0718153f1636bcb6a02bde3e5730db9", size = 11978 },
2364
+ ]
2365
+
2366
+ [[package]]
2367
+ name = "tree-sitter-language-pack"
2368
+ version = "0.13.0"
2369
+ source = { registry = "https://pypi.netflix.net/simple" }
2370
+ dependencies = [
2371
+ { name = "tree-sitter" },
2372
+ { name = "tree-sitter-c-sharp" },
2373
+ { name = "tree-sitter-embedded-template" },
2374
+ { name = "tree-sitter-yaml" },
2375
+ ]
2376
+ sdist = { url = "https://pypi.netflix.net/packages/19391792931/tree_sitter_language_pack-0.13.0.tar.gz", hash = "sha256:032034c5e27b1f6e00730b9e7c2dbc8203b4700d0c681fd019d6defcf61183ec", size = 51353370 }
2377
+ wheels = [
2378
+ { url = "https://pypi.netflix.net/packages/19391792760/tree_sitter_language_pack-0.13.0-cp310-abi3-macosx_10_15_universal2.whl", hash = "sha256:0e7eae812b40a2dc8a12eb2f5c55e130eb892706a0bee06215dd76affeb00d07", size = 32991857 },
2379
+ { url = "https://pypi.netflix.net/packages/19391792761/tree_sitter_language_pack-0.13.0-cp310-abi3-manylinux2014_aarch64.whl", hash = "sha256:7fdacf383418a845b20772118fcb53ad245f9c5d409bd07dae16acec65151756", size = 20092989 },
2380
+ { url = "https://pypi.netflix.net/packages/19391792762/tree_sitter_language_pack-0.13.0-cp310-abi3-manylinux2014_x86_64.whl", hash = "sha256:0d4f261fce387ae040dae7e4d1c1aca63d84c88320afcc0961c123bec0be8377", size = 19952029 },
2381
+ { url = "https://pypi.netflix.net/packages/19391792845/tree_sitter_language_pack-0.13.0-cp310-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:78f369dc4d456c5b08d659939e662c2f9b9fba8c0ec5538a1f973e01edfcf04d", size = 19944614 },
2382
+ { url = "https://pypi.netflix.net/packages/19391792846/tree_sitter_language_pack-0.13.0-cp310-abi3-win_amd64.whl", hash = "sha256:1cdbc88a03dacd47bec69e56cc20c48eace1fbb6f01371e89c3ee6a2e8f34db1", size = 16896852 },
2383
+ ]
2384
+
2385
+ [[package]]
2386
+ name = "tree-sitter-yaml"
2387
+ version = "0.7.2"
2388
+ source = { registry = "https://pypi.netflix.net/simple" }
2389
+ sdist = { url = "https://pypi.netflix.net/packages/19176087043/tree_sitter_yaml-0.7.2.tar.gz", hash = "sha256:756db4c09c9d9e97c81699e8f941cb8ce4e51104927f6090eefe638ee567d32c", size = 84882 }
2390
+ wheels = [
2391
+ { url = "https://pypi.netflix.net/packages/19176087035/tree_sitter_yaml-0.7.2-cp310-abi3-macosx_10_9_x86_64.whl", hash = "sha256:7e269ddcfcab8edb14fbb1f1d34eed1e1e26888f78f94eedfe7cc98c60f8bc9f", size = 43898 },
2392
+ { url = "https://pypi.netflix.net/packages/19176087036/tree_sitter_yaml-0.7.2-cp310-abi3-macosx_11_0_arm64.whl", hash = "sha256:0807b7966e23ddf7dddc4545216e28b5a58cdadedcecca86b8d8c74271a07870", size = 44691 },
2393
+ { url = "https://pypi.netflix.net/packages/19176087037/tree_sitter_yaml-0.7.2-cp310-abi3-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:f1a5c60c98b6c4c037aae023569f020d0c489fad8dc26fdfd5510363c9c29a41", size = 91430 },
2394
+ { url = "https://pypi.netflix.net/packages/19176087038/tree_sitter_yaml-0.7.2-cp310-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:88636d19d0654fd24f4f242eaaafa90f6f5ebdba8a62e4b32d251ed156c51a2a", size = 92428 },
2395
+ { url = "https://pypi.netflix.net/packages/19176087039/tree_sitter_yaml-0.7.2-cp310-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:1d2e8f0bb14aa4537320952d0f9607eef3021d5aada8383c34ebeece17db1e06", size = 90580 },
2396
+ { url = "https://pypi.netflix.net/packages/19176087040/tree_sitter_yaml-0.7.2-cp310-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:74ca712c50fc9d7dbc68cb36b4a7811d6e67a5466b5a789f19bf8dd6084ef752", size = 90455 },
2397
+ { url = "https://pypi.netflix.net/packages/19176087041/tree_sitter_yaml-0.7.2-cp310-abi3-win_amd64.whl", hash = "sha256:7587b5ca00fc4f9a548eff649697a3b395370b2304b399ceefa2087d8a6c9186", size = 45514 },
2398
+ { url = "https://pypi.netflix.net/packages/19176087042/tree_sitter_yaml-0.7.2-cp310-abi3-win_arm64.whl", hash = "sha256:f63c227b18e7ce7587bce124578f0bbf1f890ac63d3e3cd027417574273642c4", size = 44065 },
2399
+ ]
2400
+
2401
  [[package]]
2402
  name = "triton"
2403
  version = "3.5.1"