Spaces:
Running on Zero
Running on Zero
feat: add privacy filtering and execution modes
Browse filesCo-authored-by: Codex <noreply@openai.com>
- README.md +46 -11
- analyzer.py +287 -46
- app.py +153 -19
- frontend/static/app.jsx +71 -12
- frontend/static/components.jsx +13 -3
- model_runtime.py +198 -73
- privacy_filter.py +180 -0
- profiling.py +125 -0
- requirements.txt +2 -1
- schemas.py +2 -0
- tests/test_model_runtime.py +115 -45
- tests/test_privacy_filter.py +179 -0
- tests/test_profiling.py +37 -0
- view_model.py +1 -1
README.md
CHANGED
|
@@ -20,11 +20,11 @@ it claimed completion.
|
|
| 20 |
|
| 21 |
Built for the Build Small Hackathon. The frontend is a custom React field-notebook
|
| 22 |
UI (a trail map of the session) served by `gradio.Server`; it calls the Python
|
| 23 |
-
`analyze_trace` endpoint through `@gradio/client`. Both models run on the
|
| 24 |
-
GPU through ZeroGPU: a quick `
|
| 25 |
-
`nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16` for deeper analysis.
|
| 26 |
-
|
| 27 |
-
no model or GPU.
|
| 28 |
|
| 29 |
## Architecture
|
| 30 |
|
|
@@ -39,6 +39,8 @@ no model or GPU.
|
|
| 39 |
renders (synthesizes the whole-session `verdict`, `captured`, `duration_total`).
|
| 40 |
- `analyzer.py` / `parser.py` / `redaction.py` / `schemas.py` — the deterministic
|
| 41 |
pipeline. `model_runtime.py` — the optional small-model assist on ZeroGPU.
|
|
|
|
|
|
|
| 42 |
|
| 43 |
## Run Locally
|
| 44 |
|
|
@@ -57,7 +59,7 @@ python3.11 -m unittest discover -s tests
|
|
| 57 |
|
| 58 |
## Analysis Engines
|
| 59 |
|
| 60 |
-
- `
|
| 61 |
- `NVIDIA Nemotron 3 Nano 30B-A3B — deeper analysis`: the larger model on the
|
| 62 |
Space GPU for a richer memo.
|
| 63 |
- `Rule-based — instant, no model`: local codebook analyzer, no model or GPU.
|
|
@@ -67,10 +69,41 @@ in model notes and returns the deterministic analysis instead of failing the
|
|
| 67 |
whole Space.
|
| 68 |
|
| 69 |
The model-backed analysis runs under `@spaces.GPU(size="xlarge")` so the weights
|
| 70 |
-
load on Hugging Face ZeroGPU hardware; `
|
| 71 |
`nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16` are loaded with `transformers` and
|
| 72 |
-
cached across requests. The
|
| 73 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
|
| 75 |
## Agent Session Locations
|
| 76 |
|
|
@@ -89,5 +122,7 @@ ls ~/.pi/agent/sessions
|
|
| 89 |
|
| 90 |
Agent traces can contain prompts, tool inputs, command outputs, local file paths,
|
| 91 |
screenshots, secrets, private source code, and personal data. Review and redact
|
| 92 |
-
before uploading or sharing publicly.
|
| 93 |
-
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
Built for the Build Small Hackathon. The frontend is a custom React field-notebook
|
| 22 |
UI (a trail map of the session) served by `gradio.Server`; it calls the Python
|
| 23 |
+
`analyze_trace` endpoint through `@gradio/client`. Both analysis models run on the
|
| 24 |
+
Space GPU through ZeroGPU: a quick `openbmb/MiniCPM5-1B` pass by default, and the
|
| 25 |
+
larger `nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16` for deeper analysis. Redaction
|
| 26 |
+
adds a PII pass with `openai/privacy-filter`. A verified deterministic codebook
|
| 27 |
+
analyzer is the always-available recovery path and needs no model or GPU.
|
| 28 |
|
| 29 |
## Architecture
|
| 30 |
|
|
|
|
| 39 |
renders (synthesizes the whole-session `verdict`, `captured`, `duration_total`).
|
| 40 |
- `analyzer.py` / `parser.py` / `redaction.py` / `schemas.py` — the deterministic
|
| 41 |
pipeline. `model_runtime.py` — the optional small-model assist on ZeroGPU.
|
| 42 |
+
`privacy_filter.py` — the optional `openai/privacy-filter` PII redaction pass.
|
| 43 |
+
`profiling.py` — logging + per-request stage timing and resource probes.
|
| 44 |
|
| 45 |
## Run Locally
|
| 46 |
|
|
|
|
| 59 |
|
| 60 |
## Analysis Engines
|
| 61 |
|
| 62 |
+
- `MiniCPM5 1B — quick analysis`: default model pass on the Space GPU.
|
| 63 |
- `NVIDIA Nemotron 3 Nano 30B-A3B — deeper analysis`: the larger model on the
|
| 64 |
Space GPU for a richer memo.
|
| 65 |
- `Rule-based — instant, no model`: local codebook analyzer, no model or GPU.
|
|
|
|
| 69 |
whole Space.
|
| 70 |
|
| 71 |
The model-backed analysis runs under `@spaces.GPU(size="xlarge")` so the weights
|
| 72 |
+
load on Hugging Face ZeroGPU hardware; `openbmb/MiniCPM5-1B` and
|
| 73 |
`nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16` are loaded with `transformers` and
|
| 74 |
+
cached across requests. The deterministic codebook analysis itself runs on CPU;
|
| 75 |
+
only the model assist and the `openai/privacy-filter` redaction pass use the GPU,
|
| 76 |
+
and both fall back gracefully (deterministic analysis / regex-only redaction)
|
| 77 |
+
when no GPU model is available.
|
| 78 |
+
|
| 79 |
+
## Execution modes
|
| 80 |
+
|
| 81 |
+
Each `analyze_trace` call takes an `execution_mode`:
|
| 82 |
+
|
| 83 |
+
- `zerogpu` (default): the model passes run inside `@spaces.GPU` on the Space GPU.
|
| 84 |
+
- `cpu`: the model passes run on the Space (or local) CPU with **no GPU quota** —
|
| 85 |
+
slower, but it still works when ZeroGPU quota is exhausted. The frontend exposes
|
| 86 |
+
this as a **Run on** choice so users without quota can still use the app.
|
| 87 |
+
|
| 88 |
+
Model loading is device-aware (CUDA → Apple MPS → CPU), so the app also runs
|
| 89 |
+
locally for development; on a Mac the small models run on MPS, and the
|
| 90 |
+
deterministic engine needs no model at all. Because of the slower paths, the
|
| 91 |
+
frontend streams real progress — current stage, % complete, messages processed,
|
| 92 |
+
elapsed time, and a best-effort ETA — so a long run never looks stuck.
|
| 93 |
+
|
| 94 |
+
## Logging & profiling
|
| 95 |
+
|
| 96 |
+
The pipeline writes diagnostics to the standard logger (never the UI): per-request
|
| 97 |
+
message count, per-stage timing, total time, model load/inference time with the
|
| 98 |
+
device used, and a resource snapshot (process RSS, system memory, CPU, and
|
| 99 |
+
GPU/MPS memory). Set the level with `TFN_LOG_LEVEL` (default `INFO`; use `DEBUG`
|
| 100 |
+
for per-stage detail). Example summary line:
|
| 101 |
+
|
| 102 |
+
```
|
| 103 |
+
analyze[zerogpu/minicpm] done in 19.4s | messages=4 redactions=2 episodes=1
|
| 104 |
+
| stages: extract=0ms, redact=9503ms, chart=4ms, classify=0ms, model_assist=9918ms
|
| 105 |
+
| rss=2180MB sysmem=68% mps=4732MB
|
| 106 |
+
```
|
| 107 |
|
| 108 |
## Agent Session Locations
|
| 109 |
|
|
|
|
| 122 |
|
| 123 |
Agent traces can contain prompts, tool inputs, command outputs, local file paths,
|
| 124 |
screenshots, secrets, private source code, and personal data. Review and redact
|
| 125 |
+
before uploading or sharing publicly. Redaction defaults to regex patterns plus a
|
| 126 |
+
model pass (`openai/privacy-filter`) that flags names, contacts, and other
|
| 127 |
+
personal data on the Space GPU; the regex pass is the always-available fallback
|
| 128 |
+
when the model is not loaded. The app exports only a redacted narrative text file.
|
analyzer.py
CHANGED
|
@@ -3,15 +3,30 @@
|
|
| 3 |
from __future__ import annotations
|
| 4 |
|
| 5 |
import re
|
|
|
|
| 6 |
from collections import Counter
|
| 7 |
from datetime import datetime, timezone
|
| 8 |
from pathlib import Path
|
| 9 |
from typing import Iterable
|
| 10 |
|
| 11 |
-
from model_runtime import MODEL_CHOICES,
|
| 12 |
from parser import parse_trace
|
|
|
|
| 13 |
from redaction import redact_text
|
| 14 |
-
from schemas import
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
|
| 17 |
ANALYSIS_SCOPE = (
|
|
@@ -143,26 +158,51 @@ PROBLEM_EVIDENCE_SIGNALS = {
|
|
| 143 |
ANALYSIS_STEPS = ("extract", "redact", "chart", "classify", "synthesize")
|
| 144 |
|
| 145 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 146 |
def stream_deterministic_analysis(
|
| 147 |
path: str | Path,
|
| 148 |
*,
|
| 149 |
include_user_context: bool = True,
|
| 150 |
redact_secrets: bool = True,
|
| 151 |
ignore_tool_calls: bool = True,
|
|
|
|
|
|
|
|
|
|
| 152 |
):
|
| 153 |
"""Run the deterministic pipeline as a generator.
|
| 154 |
|
| 155 |
-
Yields ``("
|
| 156 |
-
:data:`ANALYSIS_STEPS`)
|
| 157 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 158 |
"""
|
| 159 |
|
|
|
|
|
|
|
|
|
|
| 160 |
parsed_messages, agent_type = parse_trace(
|
| 161 |
path,
|
| 162 |
include_user_context=include_user_context,
|
| 163 |
ignore_tool_calls=ignore_tool_calls,
|
| 164 |
)
|
| 165 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 166 |
|
| 167 |
redaction_count = 0
|
| 168 |
privacy_notes = [
|
|
@@ -172,26 +212,79 @@ def stream_deterministic_analysis(
|
|
| 172 |
if ignore_tool_calls:
|
| 173 |
privacy_notes.append("Tool-call contents were ignored before analysis.")
|
| 174 |
|
|
|
|
| 175 |
messages = parsed_messages
|
| 176 |
if redact_secrets:
|
| 177 |
-
redacted_messages: list[NarrativeMessage] = []
|
| 178 |
all_notes: Counter[str] = Counter()
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
|
| 182 |
-
|
| 183 |
-
|
| 184 |
-
|
| 185 |
-
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
|
| 190 |
-
|
| 191 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 192 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 193 |
)
|
|
|
|
| 194 |
messages = redacted_messages
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 195 |
if all_notes:
|
| 196 |
privacy_notes.append(
|
| 197 |
"Redactions applied: "
|
|
@@ -199,14 +292,22 @@ def stream_deterministic_analysis(
|
|
| 199 |
+ "."
|
| 200 |
)
|
| 201 |
else:
|
| 202 |
-
privacy_notes.append("No likely secrets matched the
|
| 203 |
else:
|
| 204 |
privacy_notes.append("Secret redaction was disabled by the user.")
|
| 205 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 206 |
|
|
|
|
| 207 |
episodes = identify_episodes(messages)
|
| 208 |
-
|
|
|
|
|
|
|
| 209 |
|
|
|
|
| 210 |
result = AnalysisResult(
|
| 211 |
trace_title=derive_trace_title(path, agent_type),
|
| 212 |
agent_type_guess=agent_type,
|
|
@@ -218,55 +319,194 @@ def stream_deterministic_analysis(
|
|
| 218 |
redaction_count=redaction_count,
|
| 219 |
engine="deterministic-codebook",
|
| 220 |
)
|
| 221 |
-
|
|
|
|
| 222 |
|
|
|
|
| 223 |
narrative_text = render_redacted_narrative(messages)
|
| 224 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 225 |
|
| 226 |
-
yield ("result", (result, narrative_text))
|
| 227 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 228 |
|
| 229 |
-
|
|
|
|
| 230 |
result: AnalysisResult,
|
| 231 |
-
|
| 232 |
analysis_engine: str,
|
| 233 |
*,
|
| 234 |
run=None,
|
| 235 |
) -> None:
|
| 236 |
-
"""
|
| 237 |
|
| 238 |
-
``run`` defaults to
|
| 239 |
-
|
| 240 |
-
|
| 241 |
-
|
| 242 |
-
|
| 243 |
"""
|
| 244 |
|
| 245 |
if analysis_engine == "deterministic":
|
| 246 |
return
|
| 247 |
if analysis_engine not in MODEL_CHOICES:
|
| 248 |
result.model_notes.append(
|
| 249 |
-
f"Unknown analysis engine {analysis_engine!r};
|
| 250 |
)
|
| 251 |
return
|
| 252 |
-
|
|
|
|
|
|
|
|
|
|
| 253 |
try:
|
| 254 |
-
|
| 255 |
engine=analysis_engine,
|
| 256 |
-
|
| 257 |
-
|
|
|
|
| 258 |
)
|
| 259 |
except Exception as exc:
|
| 260 |
error_message = str(exc).strip().rstrip(".")
|
| 261 |
result.model_notes.append(
|
| 262 |
-
"Model
|
| 263 |
f"{type(exc).__name__}: {error_message}. "
|
| 264 |
-
"
|
| 265 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 266 |
else:
|
| 267 |
-
result.
|
| 268 |
-
|
| 269 |
-
|
|
|
|
|
|
|
|
|
|
| 270 |
|
| 271 |
|
| 272 |
def analyze_trace_file(
|
|
@@ -282,6 +522,7 @@ def analyze_trace_file(
|
|
| 282 |
|
| 283 |
result: AnalysisResult | None = None
|
| 284 |
narrative_text = ""
|
|
|
|
| 285 |
for kind, payload in stream_deterministic_analysis(
|
| 286 |
path,
|
| 287 |
include_user_context=include_user_context,
|
|
@@ -289,9 +530,9 @@ def analyze_trace_file(
|
|
| 289 |
ignore_tool_calls=ignore_tool_calls,
|
| 290 |
):
|
| 291 |
if kind == "result":
|
| 292 |
-
result, narrative_text = payload
|
| 293 |
assert result is not None
|
| 294 |
-
|
| 295 |
return result, narrative_text
|
| 296 |
|
| 297 |
|
|
|
|
| 3 |
from __future__ import annotations
|
| 4 |
|
| 5 |
import re
|
| 6 |
+
import time
|
| 7 |
from collections import Counter
|
| 8 |
from datetime import datetime, timezone
|
| 9 |
from pathlib import Path
|
| 10 |
from typing import Iterable
|
| 11 |
|
| 12 |
+
from model_runtime import MODEL_CHOICES, run_model_analysis
|
| 13 |
from parser import parse_trace
|
| 14 |
+
from profiling import Profiler, get_logger
|
| 15 |
from redaction import redact_text
|
| 16 |
+
from schemas import (
|
| 17 |
+
APPRAISALS,
|
| 18 |
+
DETOUR_TYPES,
|
| 19 |
+
DIFFICULTY_TYPES,
|
| 20 |
+
OUTCOME_CLAIMS,
|
| 21 |
+
RECOVERY_PATTERNS,
|
| 22 |
+
RESOLUTION_MODES,
|
| 23 |
+
AnalysisResult,
|
| 24 |
+
DifficultyEpisode,
|
| 25 |
+
MessageSpan,
|
| 26 |
+
NarrativeMessage,
|
| 27 |
+
)
|
| 28 |
+
|
| 29 |
+
logger = get_logger()
|
| 30 |
|
| 31 |
|
| 32 |
ANALYSIS_SCOPE = (
|
|
|
|
| 158 |
ANALYSIS_STEPS = ("extract", "redact", "chart", "classify", "synthesize")
|
| 159 |
|
| 160 |
|
| 161 |
+
def _accumulate_notes(counter: Counter[str], notes: Iterable[str]) -> None:
|
| 162 |
+
"""Fold ``"label: count"`` note strings into a running counter."""
|
| 163 |
+
|
| 164 |
+
for note in notes:
|
| 165 |
+
label, _, count = note.partition(": ")
|
| 166 |
+
counter[label] += int(count or 0)
|
| 167 |
+
|
| 168 |
+
|
| 169 |
def stream_deterministic_analysis(
|
| 170 |
path: str | Path,
|
| 171 |
*,
|
| 172 |
include_user_context: bool = True,
|
| 173 |
redact_secrets: bool = True,
|
| 174 |
ignore_tool_calls: bool = True,
|
| 175 |
+
model_redact=None,
|
| 176 |
+
profiler: Profiler | None = None,
|
| 177 |
+
stream_redact_progress: bool = False,
|
| 178 |
):
|
| 179 |
"""Run the deterministic pipeline as a generator.
|
| 180 |
|
| 181 |
+
Yields ``("progress", info)`` after each real stage completes — ``info`` has
|
| 182 |
+
a ``stage`` name (one of :data:`ANALYSIS_STEPS`) and the running ``messages``
|
| 183 |
+
count — then a final ``("result", (AnalysisResult, str))``. Callers that
|
| 184 |
+
don't care about progress can just drain it for the tuple.
|
| 185 |
+
|
| 186 |
+
``model_redact`` is an optional ``(list[str]) -> list[RedactionResult]``
|
| 187 |
+
callable applied on top of regex redaction; the Server injects a GPU- or
|
| 188 |
+
CPU-bound ``openai/privacy-filter`` pass. It is absent locally and in tests,
|
| 189 |
+
so redaction falls back to regex only. ``profiler`` collects per-stage
|
| 190 |
+
timings; one is created if not supplied.
|
| 191 |
"""
|
| 192 |
|
| 193 |
+
prof = profiler or Profiler("deterministic")
|
| 194 |
+
|
| 195 |
+
_started = time.perf_counter()
|
| 196 |
parsed_messages, agent_type = parse_trace(
|
| 197 |
path,
|
| 198 |
include_user_context=include_user_context,
|
| 199 |
ignore_tool_calls=ignore_tool_calls,
|
| 200 |
)
|
| 201 |
+
prof.record("extract", time.perf_counter() - _started)
|
| 202 |
+
message_count = len(parsed_messages)
|
| 203 |
+
prof.mark(messages=message_count, agent=agent_type)
|
| 204 |
+
logger.info("parsed %d narrative messages (agent=%s)", message_count, agent_type)
|
| 205 |
+
yield ("progress", {"stage": "extract", "messages": message_count})
|
| 206 |
|
| 207 |
redaction_count = 0
|
| 208 |
privacy_notes = [
|
|
|
|
| 212 |
if ignore_tool_calls:
|
| 213 |
privacy_notes.append("Tool-call contents were ignored before analysis.")
|
| 214 |
|
| 215 |
+
_redact_started = time.perf_counter()
|
| 216 |
messages = parsed_messages
|
| 217 |
if redact_secrets:
|
|
|
|
| 218 |
all_notes: Counter[str] = Counter()
|
| 219 |
+
redacted_messages: list[NarrativeMessage] = []
|
| 220 |
+
model_used = False
|
| 221 |
+
model_failed = False
|
| 222 |
+
|
| 223 |
+
# Process in chunks so slow (CPU) runs can stream per-message progress.
|
| 224 |
+
# Without streaming (ZeroGPU) it is a single chunk = one GPU allocation;
|
| 225 |
+
# with streaming the update count is capped at ~30 regardless of size.
|
| 226 |
+
if stream_redact_progress and message_count:
|
| 227 |
+
chunk = max(1, (message_count + 29) // 30)
|
| 228 |
+
else:
|
| 229 |
+
chunk = message_count or 1
|
| 230 |
+
|
| 231 |
+
for start in range(0, message_count, chunk):
|
| 232 |
+
chunk_messages = parsed_messages[start : start + chunk]
|
| 233 |
+
|
| 234 |
+
# Pass 1: deterministic regex redaction (always available).
|
| 235 |
+
regex_results = [redact_text(message.text) for message in chunk_messages]
|
| 236 |
+
texts = [red.text for red in regex_results]
|
| 237 |
+
|
| 238 |
+
# Pass 2: optional model PII pass on top. The Server injects a GPU- or
|
| 239 |
+
# CPU-bound openai/privacy-filter pass; it is absent locally and in
|
| 240 |
+
# tests, so regex-only redaction is used. Once it is unavailable we
|
| 241 |
+
# stop retrying it for the rest of the trace.
|
| 242 |
+
model_results = None
|
| 243 |
+
if model_redact is not None and not model_failed:
|
| 244 |
+
try:
|
| 245 |
+
model_results = model_redact(texts)
|
| 246 |
+
model_used = True
|
| 247 |
+
except Exception as exc: # noqa: BLE001 - graceful degradation
|
| 248 |
+
privacy_notes.append(
|
| 249 |
+
"AI privacy filter was unavailable "
|
| 250 |
+
f"({type(exc).__name__}); regex redaction was applied."
|
| 251 |
+
)
|
| 252 |
+
model_failed = True
|
| 253 |
+
model_results = None
|
| 254 |
+
|
| 255 |
+
for i, message in enumerate(chunk_messages):
|
| 256 |
+
text = texts[i]
|
| 257 |
+
redaction_count += regex_results[i].count
|
| 258 |
+
_accumulate_notes(all_notes, regex_results[i].notes)
|
| 259 |
+
if model_results is not None:
|
| 260 |
+
text = model_results[i].text
|
| 261 |
+
redaction_count += model_results[i].count
|
| 262 |
+
_accumulate_notes(all_notes, model_results[i].notes)
|
| 263 |
+
redacted_messages.append(
|
| 264 |
+
NarrativeMessage(
|
| 265 |
+
index=message.index,
|
| 266 |
+
role=message.role,
|
| 267 |
+
text=text,
|
| 268 |
+
timestamp=message.timestamp,
|
| 269 |
+
source=message.source,
|
| 270 |
+
)
|
| 271 |
)
|
| 272 |
+
yield (
|
| 273 |
+
"progress",
|
| 274 |
+
{
|
| 275 |
+
"stage": "redact",
|
| 276 |
+
"processed": min(start + chunk, message_count),
|
| 277 |
+
"total": message_count,
|
| 278 |
+
},
|
| 279 |
)
|
| 280 |
+
|
| 281 |
messages = redacted_messages
|
| 282 |
+
|
| 283 |
+
if model_used:
|
| 284 |
+
privacy_notes.append(
|
| 285 |
+
"AI privacy filter (openai/privacy-filter) screened for names, "
|
| 286 |
+
"contacts, and other personal data."
|
| 287 |
+
)
|
| 288 |
if all_notes:
|
| 289 |
privacy_notes.append(
|
| 290 |
"Redactions applied: "
|
|
|
|
| 292 |
+ "."
|
| 293 |
)
|
| 294 |
else:
|
| 295 |
+
privacy_notes.append("No likely secrets matched the redaction patterns.")
|
| 296 |
else:
|
| 297 |
privacy_notes.append("Secret redaction was disabled by the user.")
|
| 298 |
+
prof.record("redact", time.perf_counter() - _redact_started)
|
| 299 |
+
prof.mark(redactions=redaction_count)
|
| 300 |
+
if not redact_secrets or message_count == 0:
|
| 301 |
+
# No chunk loop ran (redaction disabled or empty trace) — still advance.
|
| 302 |
+
yield ("progress", {"stage": "redact", "processed": message_count, "total": message_count})
|
| 303 |
|
| 304 |
+
_chart_started = time.perf_counter()
|
| 305 |
episodes = identify_episodes(messages)
|
| 306 |
+
prof.record("chart", time.perf_counter() - _chart_started)
|
| 307 |
+
prof.mark(episodes=len(episodes))
|
| 308 |
+
yield ("progress", {"stage": "chart", "messages": message_count})
|
| 309 |
|
| 310 |
+
_classify_started = time.perf_counter()
|
| 311 |
result = AnalysisResult(
|
| 312 |
trace_title=derive_trace_title(path, agent_type),
|
| 313 |
agent_type_guess=agent_type,
|
|
|
|
| 319 |
redaction_count=redaction_count,
|
| 320 |
engine="deterministic-codebook",
|
| 321 |
)
|
| 322 |
+
prof.record("classify", time.perf_counter() - _classify_started)
|
| 323 |
+
yield ("progress", {"stage": "classify", "messages": message_count})
|
| 324 |
|
| 325 |
+
_synth_started = time.perf_counter()
|
| 326 |
narrative_text = render_redacted_narrative(messages)
|
| 327 |
+
prof.record("synthesize", time.perf_counter() - _synth_started)
|
| 328 |
+
yield ("progress", {"stage": "synthesize", "messages": message_count})
|
| 329 |
+
|
| 330 |
+
yield ("result", (result, narrative_text, messages))
|
| 331 |
+
|
| 332 |
+
|
| 333 |
+
_PRODUCTIVE_VALUES = {"yes", "no", "mixed", "unknown"}
|
| 334 |
+
_VALID_TONES = {"stable", "iterative", "detour", "partial", "risk", "unknown"}
|
| 335 |
+
_VALID_HONESTY = {"candid", "mixed", "overclaimed"}
|
| 336 |
+
|
| 337 |
+
|
| 338 |
+
def build_numbered_narrative(
|
| 339 |
+
messages: list[NarrativeMessage], *, char_budget: int = 16000, per_message: int = 320
|
| 340 |
+
) -> str:
|
| 341 |
+
"""Number the (redacted) messages by real index for the model.
|
| 342 |
+
|
| 343 |
+
Long traces are sampled evenly across the session (keeping the first and last)
|
| 344 |
+
so the model sees the whole timeline within its context budget; each line keeps
|
| 345 |
+
the message's real index and timestamp so the model can cite spans.
|
| 346 |
+
"""
|
| 347 |
+
|
| 348 |
+
if not messages:
|
| 349 |
+
return ""
|
| 350 |
+
max_messages = max(1, char_budget // per_message)
|
| 351 |
+
if len(messages) <= max_messages:
|
| 352 |
+
chosen = messages
|
| 353 |
+
else:
|
| 354 |
+
stride = len(messages) / max_messages
|
| 355 |
+
picks = sorted({0, len(messages) - 1, *(int(i * stride) for i in range(max_messages))})
|
| 356 |
+
chosen = [messages[i] for i in picks if 0 <= i < len(messages)]
|
| 357 |
+
lines = []
|
| 358 |
+
for message in chosen:
|
| 359 |
+
snippet = " ".join(message.text.split())[:per_message]
|
| 360 |
+
lines.append(f"[{message.index}] {message.role} {message.timestamp or ''}: {snippet}")
|
| 361 |
+
return "\n".join(lines)
|
| 362 |
+
|
| 363 |
+
|
| 364 |
+
def build_codebook_hint(episodes: list[DifficultyEpisode]) -> str:
|
| 365 |
+
if not episodes:
|
| 366 |
+
return "(none)"
|
| 367 |
+
return "; ".join(
|
| 368 |
+
f"{ep.episode_id} msgs {ep.message_span.start_index}-{ep.message_span.end_index}"
|
| 369 |
+
for ep in episodes[:12]
|
| 370 |
+
)
|
| 371 |
+
|
| 372 |
+
|
| 373 |
+
def _coerce_code(value: object, vocab: dict[str, str]) -> str:
|
| 374 |
+
code = str(value or "").strip()
|
| 375 |
+
return code if code in vocab else "unknown"
|
| 376 |
+
|
| 377 |
+
|
| 378 |
+
# Weak models sometimes echo the schema placeholders verbatim; drop those.
|
| 379 |
+
_PLACEHOLDER_RE = re.compile(
|
| 380 |
+
r"^\s*(<.*>|<=.*|\d+(\s*-\s*\d+)?\s+sentences?.*|one key.*|short verbatim.*|up to \d+.*|a message index.*)\s*$",
|
| 381 |
+
re.IGNORECASE,
|
| 382 |
+
)
|
| 383 |
+
|
| 384 |
+
|
| 385 |
+
def _clean_text(value: object) -> str:
|
| 386 |
+
text = str(value or "").strip()
|
| 387 |
+
if not text or _PLACEHOLDER_RE.match(text):
|
| 388 |
+
return ""
|
| 389 |
+
return text
|
| 390 |
+
|
| 391 |
+
|
| 392 |
+
def _clean_verdict(verdict: dict) -> dict[str, str]:
|
| 393 |
+
tone = str(verdict.get("tone", "")).strip().lower()
|
| 394 |
+
honesty = str(verdict.get("honesty", "")).strip().lower()
|
| 395 |
+
return {
|
| 396 |
+
"tone": tone if tone in _VALID_TONES else "unknown",
|
| 397 |
+
"headline": _clean_text(verdict.get("headline")) or "Session analyzed by the model.",
|
| 398 |
+
"detail": _clean_text(verdict.get("detail")),
|
| 399 |
+
"honesty": honesty if honesty in _VALID_HONESTY else "mixed",
|
| 400 |
+
}
|
| 401 |
|
|
|
|
| 402 |
|
| 403 |
+
def _episode_from_model(
|
| 404 |
+
raw: dict, ordinal: int, index_to_timestamp: dict[int, str | None], max_index: int
|
| 405 |
+
) -> DifficultyEpisode:
|
| 406 |
+
def clamp(value: object) -> int:
|
| 407 |
+
try:
|
| 408 |
+
return max(0, min(int(value), max_index))
|
| 409 |
+
except (TypeError, ValueError):
|
| 410 |
+
return 0
|
| 411 |
+
|
| 412 |
+
start = clamp(raw.get("start_index", 0))
|
| 413 |
+
end = clamp(raw.get("end_index", start))
|
| 414 |
+
if end < start:
|
| 415 |
+
start, end = end, start
|
| 416 |
+
start_time = index_to_timestamp.get(start)
|
| 417 |
+
end_time = index_to_timestamp.get(end)
|
| 418 |
+
span = MessageSpan(
|
| 419 |
+
start_index=start,
|
| 420 |
+
end_index=end,
|
| 421 |
+
start_time=start_time,
|
| 422 |
+
end_time=end_time,
|
| 423 |
+
duration_label=duration_label(start_time, end_time) if start_time and end_time else "unknown",
|
| 424 |
+
)
|
| 425 |
+
productive = str(raw.get("productive_detour", "unknown")).strip().lower()
|
| 426 |
+
quotes = [cleaned for q in (raw.get("evidence_quotes") or []) if (cleaned := _clean_text(q))][:3]
|
| 427 |
+
difficulty = _clean_text(raw.get("reported_difficulty"))
|
| 428 |
+
title = _clean_text(raw.get("title")) or (difficulty[:60] if difficulty else "Difficulty episode")
|
| 429 |
+
return DifficultyEpisode(
|
| 430 |
+
episode_id=f"E{ordinal:02d}",
|
| 431 |
+
title=title,
|
| 432 |
+
message_span=span,
|
| 433 |
+
initial_intention=_clean_text(raw.get("initial_intention")),
|
| 434 |
+
reported_difficulty=difficulty,
|
| 435 |
+
difficulty_type=_coerce_code(raw.get("difficulty_type"), DIFFICULTY_TYPES),
|
| 436 |
+
appraisal=_coerce_code(raw.get("appraisal"), APPRAISALS),
|
| 437 |
+
strategy_before=_clean_text(raw.get("strategy_before")),
|
| 438 |
+
strategy_after=_clean_text(raw.get("strategy_after")),
|
| 439 |
+
detour_type=_coerce_code(raw.get("detour_type"), DETOUR_TYPES),
|
| 440 |
+
resolution_mode=_coerce_code(raw.get("resolution_mode"), RESOLUTION_MODES),
|
| 441 |
+
recovery_pattern=_coerce_code(raw.get("recovery_pattern"), RECOVERY_PATTERNS),
|
| 442 |
+
outcome_claim=_coerce_code(raw.get("outcome_claim"), OUTCOME_CLAIMS),
|
| 443 |
+
productive_detour=productive if productive in _PRODUCTIVE_VALUES else "unknown",
|
| 444 |
+
evidence_quotes=quotes,
|
| 445 |
+
analyst_memo=_clean_text(raw.get("analyst_memo")),
|
| 446 |
+
)
|
| 447 |
|
| 448 |
+
|
| 449 |
+
def apply_model_analysis(
|
| 450 |
result: AnalysisResult,
|
| 451 |
+
messages: list[NarrativeMessage],
|
| 452 |
analysis_engine: str,
|
| 453 |
*,
|
| 454 |
run=None,
|
| 455 |
) -> None:
|
| 456 |
+
"""Replace the deterministic analysis with a model-produced one (codebook is the fallback).
|
| 457 |
|
| 458 |
+
``run`` defaults to :func:`run_model_analysis` (resolved at call time so tests
|
| 459 |
+
can monkeypatch it); the Server passes a GPU- or CPU-bound runner. On success
|
| 460 |
+
the model's episodes, overall patterns, and verdict replace the rule-based
|
| 461 |
+
ones. On any failure the deterministic codebook result is kept and the reason
|
| 462 |
+
recorded in ``model_notes``.
|
| 463 |
"""
|
| 464 |
|
| 465 |
if analysis_engine == "deterministic":
|
| 466 |
return
|
| 467 |
if analysis_engine not in MODEL_CHOICES:
|
| 468 |
result.model_notes.append(
|
| 469 |
+
f"Unknown analysis engine {analysis_engine!r}; rule-based analysis was returned."
|
| 470 |
)
|
| 471 |
return
|
| 472 |
+
|
| 473 |
+
runner = run or run_model_analysis
|
| 474 |
+
numbered_narrative = build_numbered_narrative(messages)
|
| 475 |
+
codebook_hint = build_codebook_hint(result.episodes)
|
| 476 |
try:
|
| 477 |
+
produced = runner(
|
| 478 |
engine=analysis_engine,
|
| 479 |
+
numbered_narrative=numbered_narrative,
|
| 480 |
+
agent_type=result.agent_type_guess,
|
| 481 |
+
codebook_hint=codebook_hint,
|
| 482 |
)
|
| 483 |
except Exception as exc:
|
| 484 |
error_message = str(exc).strip().rstrip(".")
|
| 485 |
result.model_notes.append(
|
| 486 |
+
"Model analysis was requested but unavailable: "
|
| 487 |
f"{type(exc).__name__}: {error_message}. "
|
| 488 |
+
"Rule-based analysis was returned."
|
| 489 |
)
|
| 490 |
+
return
|
| 491 |
+
|
| 492 |
+
analysis = produced.analysis
|
| 493 |
+
index_to_timestamp = {message.index: message.timestamp for message in messages}
|
| 494 |
+
max_index = (len(messages) - 1) if messages else 0
|
| 495 |
+
episodes = [
|
| 496 |
+
_episode_from_model(raw, ordinal + 1, index_to_timestamp, max_index)
|
| 497 |
+
for ordinal, raw in enumerate(analysis.get("episodes", []))
|
| 498 |
+
]
|
| 499 |
+
result.episodes = episodes
|
| 500 |
+
patterns = analysis.get("overall_patterns")
|
| 501 |
+
if isinstance(patterns, dict) and patterns:
|
| 502 |
+
result.overall_patterns = {key: str(value) for key, value in patterns.items()}
|
| 503 |
else:
|
| 504 |
+
result.overall_patterns = summarize_patterns(episodes, messages)
|
| 505 |
+
verdict = analysis.get("verdict")
|
| 506 |
+
if isinstance(verdict, dict) and verdict:
|
| 507 |
+
result.session_verdict = _clean_verdict(verdict)
|
| 508 |
+
result.engine = produced.model_id
|
| 509 |
+
result.model_notes.append(produced.note)
|
| 510 |
|
| 511 |
|
| 512 |
def analyze_trace_file(
|
|
|
|
| 522 |
|
| 523 |
result: AnalysisResult | None = None
|
| 524 |
narrative_text = ""
|
| 525 |
+
messages: list[NarrativeMessage] = []
|
| 526 |
for kind, payload in stream_deterministic_analysis(
|
| 527 |
path,
|
| 528 |
include_user_context=include_user_context,
|
|
|
|
| 530 |
ignore_tool_calls=ignore_tool_calls,
|
| 531 |
):
|
| 532 |
if kind == "result":
|
| 533 |
+
result, narrative_text, messages = payload
|
| 534 |
assert result is not None
|
| 535 |
+
apply_model_analysis(result, messages, analysis_engine)
|
| 536 |
return result, narrative_text
|
| 537 |
|
| 538 |
|
app.py
CHANGED
|
@@ -9,6 +9,7 @@ returns the frontend-ready view model.
|
|
| 9 |
from __future__ import annotations
|
| 10 |
|
| 11 |
import os
|
|
|
|
| 12 |
from pathlib import Path
|
| 13 |
|
| 14 |
import spaces
|
|
@@ -17,10 +18,13 @@ from fastapi.staticfiles import StaticFiles
|
|
| 17 |
from gradio import Server
|
| 18 |
from gradio.data_classes import FileData
|
| 19 |
|
| 20 |
-
from analyzer import
|
| 21 |
from parser import TraceParseError
|
|
|
|
| 22 |
from view_model import build_view_model
|
| 23 |
|
|
|
|
|
|
|
| 24 |
|
| 25 |
HERE = Path(__file__).resolve().parent
|
| 26 |
FRONTEND = HERE / "frontend"
|
|
@@ -51,8 +55,9 @@ messages and ignores raw tool telemetry.
|
|
| 51 |
|
| 52 |
- `trace_file` (file): the session log
|
| 53 |
- `include_user_context` (bool): include user prompts as framing
|
| 54 |
-
- `redact_secrets` (bool):
|
| 55 |
-
- `analysis_engine` (str): `
|
|
|
|
| 56 |
|
| 57 |
Returns a JSON view model: a whole-session `verdict`, per-episode difficulty
|
| 58 |
`episodes`, and redacted export text.
|
|
@@ -74,18 +79,101 @@ def agents_md() -> str:
|
|
| 74 |
|
| 75 |
|
| 76 |
@spaces.GPU(size="xlarge", duration=180)
|
| 77 |
-
def
|
| 78 |
-
"""Run model
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 79 |
|
| 80 |
-
|
|
|
|
| 81 |
|
| 82 |
-
|
| 83 |
|
|
|
|
| 84 |
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 89 |
|
| 90 |
|
| 91 |
def _file_fields(trace_file: object) -> tuple[str | None, str | None]:
|
|
@@ -101,42 +189,88 @@ def analyze_trace(
|
|
| 101 |
trace_file: FileData,
|
| 102 |
include_user_context: bool = True,
|
| 103 |
redact_secrets: bool = True,
|
| 104 |
-
analysis_engine: str = "
|
|
|
|
| 105 |
) -> dict:
|
| 106 |
"""Stream real progress, then the frontend view model, for one trace.
|
| 107 |
|
| 108 |
-
Yields ``{"step"
|
| 109 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 110 |
"""
|
| 111 |
|
| 112 |
path, orig_name = _file_fields(trace_file)
|
| 113 |
if not path:
|
| 114 |
raise ValueError("No uploaded file was received.")
|
| 115 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
result = None
|
| 117 |
narrative = ""
|
|
|
|
|
|
|
| 118 |
try:
|
| 119 |
for kind, payload in stream_deterministic_analysis(
|
| 120 |
path,
|
| 121 |
include_user_context=include_user_context,
|
| 122 |
redact_secrets=redact_secrets,
|
| 123 |
ignore_tool_calls=True,
|
|
|
|
|
|
|
|
|
|
| 124 |
):
|
| 125 |
-
if kind == "
|
| 126 |
-
|
|
|
|
|
|
|
|
|
|
| 127 |
elif kind == "result":
|
| 128 |
-
result, narrative = payload
|
| 129 |
except TraceParseError as exc:
|
| 130 |
raise ValueError(str(exc)) from exc
|
| 131 |
|
| 132 |
if analysis_engine != "deterministic":
|
| 133 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 134 |
|
| 135 |
if orig_name:
|
| 136 |
agent = READABLE_AGENT.get(result.agent_type_guess, "Agent")
|
| 137 |
result.trace_title = f"{agent} · {orig_name}"
|
| 138 |
|
| 139 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 140 |
|
| 141 |
|
| 142 |
if __name__ == "__main__":
|
|
|
|
| 9 |
from __future__ import annotations
|
| 10 |
|
| 11 |
import os
|
| 12 |
+
import time
|
| 13 |
from pathlib import Path
|
| 14 |
|
| 15 |
import spaces
|
|
|
|
| 18 |
from gradio import Server
|
| 19 |
from gradio.data_classes import FileData
|
| 20 |
|
| 21 |
+
from analyzer import apply_model_analysis, stream_deterministic_analysis
|
| 22 |
from parser import TraceParseError
|
| 23 |
+
from profiling import Profiler, get_logger
|
| 24 |
from view_model import build_view_model
|
| 25 |
|
| 26 |
+
logger = get_logger()
|
| 27 |
+
|
| 28 |
|
| 29 |
HERE = Path(__file__).resolve().parent
|
| 30 |
FRONTEND = HERE / "frontend"
|
|
|
|
| 55 |
|
| 56 |
- `trace_file` (file): the session log
|
| 57 |
- `include_user_context` (bool): include user prompts as framing
|
| 58 |
+
- `redact_secrets` (bool): regex + AI (`openai/privacy-filter`) PII redaction before analysis
|
| 59 |
+
- `analysis_engine` (str): `minicpm` | `nemotron` | `deterministic`
|
| 60 |
+
- `execution_mode` (str): `zerogpu` (default, uses the Space GPU) | `cpu` (no GPU quota, slower)
|
| 61 |
|
| 62 |
Returns a JSON view model: a whole-session `verdict`, per-episode difficulty
|
| 63 |
`episodes`, and redacted export text.
|
|
|
|
| 79 |
|
| 80 |
|
| 81 |
@spaces.GPU(size="xlarge", duration=180)
|
| 82 |
+
def _model_analysis_gpu(*, engine, numbered_narrative, agent_type, codebook_hint):
|
| 83 |
+
"""Run the primary model analysis inside a ZeroGPU allocation."""
|
| 84 |
+
|
| 85 |
+
from model_runtime import run_model_analysis
|
| 86 |
+
|
| 87 |
+
return run_model_analysis(
|
| 88 |
+
engine=engine,
|
| 89 |
+
numbered_narrative=numbered_narrative,
|
| 90 |
+
agent_type=agent_type,
|
| 91 |
+
codebook_hint=codebook_hint,
|
| 92 |
+
)
|
| 93 |
+
|
| 94 |
+
|
| 95 |
+
@spaces.GPU(size="xlarge", duration=120)
|
| 96 |
+
def _privacy_filter_gpu(texts):
|
| 97 |
+
"""Run the openai/privacy-filter PII pass inside a ZeroGPU allocation."""
|
| 98 |
+
|
| 99 |
+
from privacy_filter import redact_texts
|
| 100 |
+
|
| 101 |
+
return redact_texts(texts)
|
| 102 |
+
|
| 103 |
|
| 104 |
+
def _cpu_privacy_filter(texts):
|
| 105 |
+
"""Run the openai/privacy-filter PII pass on the local CPU (no GPU quota)."""
|
| 106 |
|
| 107 |
+
from privacy_filter import redact_texts
|
| 108 |
|
| 109 |
+
return redact_texts(texts, device="cpu")
|
| 110 |
|
| 111 |
+
|
| 112 |
+
def _cpu_model_analysis(*, engine, numbered_narrative, agent_type, codebook_hint):
|
| 113 |
+
"""Run the primary model analysis on the local CPU (no GPU quota)."""
|
| 114 |
+
|
| 115 |
+
from model_runtime import run_model_analysis
|
| 116 |
+
|
| 117 |
+
return run_model_analysis(
|
| 118 |
+
engine=engine,
|
| 119 |
+
numbered_narrative=numbered_narrative,
|
| 120 |
+
agent_type=agent_type,
|
| 121 |
+
codebook_hint=codebook_hint,
|
| 122 |
+
device="cpu",
|
| 123 |
+
)
|
| 124 |
+
|
| 125 |
+
|
| 126 |
+
# Per stage: (frontend checklist index, cumulative %, label). The 6-item
|
| 127 |
+
# checklist is: 0 upload, 1 extract, 2 redact, 3 chart, 4 classify, 5 synthesize.
|
| 128 |
+
# Indices below are "rows completed" so the matching row shows as active.
|
| 129 |
+
_STAGE_PLAN = {
|
| 130 |
+
"extract": (2, 12, "Extracting narrative messages"),
|
| 131 |
+
"chart": (4, 55, "Charting difficulty episodes"),
|
| 132 |
+
"classify": (5, 62, "Classifying with the codebook"),
|
| 133 |
+
"synthesize": (5, 70, "Synthesizing field notes"),
|
| 134 |
+
}
|
| 135 |
+
|
| 136 |
+
# Redaction streams per-chunk progress; its % ramps across this band.
|
| 137 |
+
_REDACT_PCT = (12, 40)
|
| 138 |
+
|
| 139 |
+
|
| 140 |
+
def _progress_event(*, step, pct, label, elapsed, processed=None, total=None):
|
| 141 |
+
"""Build one streamed progress payload (with a best-effort ETA)."""
|
| 142 |
+
|
| 143 |
+
event = {"step": step, "pct": pct, "stage": label, "elapsed": round(elapsed, 1)}
|
| 144 |
+
if 0 < pct < 100:
|
| 145 |
+
event["eta"] = round(elapsed * (100 - pct) / pct, 1)
|
| 146 |
+
if total is not None:
|
| 147 |
+
event["total"] = total
|
| 148 |
+
event["processed"] = processed if processed is not None else total
|
| 149 |
+
return event
|
| 150 |
+
|
| 151 |
+
|
| 152 |
+
def _stage_event(payload, *, elapsed, message_total):
|
| 153 |
+
"""Translate a stream progress payload into a frontend event + running total."""
|
| 154 |
+
|
| 155 |
+
stage = payload["stage"]
|
| 156 |
+
if stage == "redact":
|
| 157 |
+
total = payload.get("total") or message_total or 0
|
| 158 |
+
processed = payload.get("processed", total)
|
| 159 |
+
frac = (processed / total) if total else 1.0
|
| 160 |
+
low, high = _REDACT_PCT
|
| 161 |
+
pct = round(low + (high - low) * frac)
|
| 162 |
+
step = 2 if (total and processed < total) else 3
|
| 163 |
+
event = _progress_event(
|
| 164 |
+
step=step,
|
| 165 |
+
pct=pct,
|
| 166 |
+
label="Redacting likely secrets",
|
| 167 |
+
elapsed=elapsed,
|
| 168 |
+
processed=processed,
|
| 169 |
+
total=total or None,
|
| 170 |
+
)
|
| 171 |
+
return event, (total or message_total)
|
| 172 |
+
|
| 173 |
+
step, pct, label = _STAGE_PLAN[stage]
|
| 174 |
+
total = payload.get("messages", message_total)
|
| 175 |
+
event = _progress_event(step=step, pct=pct, label=label, elapsed=elapsed, total=total)
|
| 176 |
+
return event, total
|
| 177 |
|
| 178 |
|
| 179 |
def _file_fields(trace_file: object) -> tuple[str | None, str | None]:
|
|
|
|
| 189 |
trace_file: FileData,
|
| 190 |
include_user_context: bool = True,
|
| 191 |
redact_secrets: bool = True,
|
| 192 |
+
analysis_engine: str = "minicpm",
|
| 193 |
+
execution_mode: str = "zerogpu",
|
| 194 |
) -> dict:
|
| 195 |
"""Stream real progress, then the frontend view model, for one trace.
|
| 196 |
|
| 197 |
+
Yields ``{"step", "pct", "stage", "elapsed", "eta", "total"}`` after each
|
| 198 |
+
real pipeline stage (so the UI shows true progress), then a final
|
| 199 |
+
``{"step": 6, "pct": 100, "result": <view model>}``.
|
| 200 |
+
|
| 201 |
+
``execution_mode`` is ``zerogpu`` (default; models run inside ``@spaces.GPU``)
|
| 202 |
+
or ``cpu`` (models run on the Space/local CPU, no GPU quota — slower).
|
| 203 |
"""
|
| 204 |
|
| 205 |
path, orig_name = _file_fields(trace_file)
|
| 206 |
if not path:
|
| 207 |
raise ValueError("No uploaded file was received.")
|
| 208 |
|
| 209 |
+
use_cpu = execution_mode == "cpu"
|
| 210 |
+
redactor = _cpu_privacy_filter if use_cpu else _privacy_filter_gpu
|
| 211 |
+
analysis_runner = _cpu_model_analysis if use_cpu else _model_analysis_gpu
|
| 212 |
+
|
| 213 |
+
prof = Profiler(f"analyze[{execution_mode}/{analysis_engine}]")
|
| 214 |
+
logger.info(
|
| 215 |
+
"analyze_trace start: file=%r engine=%s mode=%s redact=%s",
|
| 216 |
+
orig_name,
|
| 217 |
+
analysis_engine,
|
| 218 |
+
execution_mode,
|
| 219 |
+
redact_secrets,
|
| 220 |
+
)
|
| 221 |
+
|
| 222 |
result = None
|
| 223 |
narrative = ""
|
| 224 |
+
messages = []
|
| 225 |
+
message_total = None
|
| 226 |
try:
|
| 227 |
for kind, payload in stream_deterministic_analysis(
|
| 228 |
path,
|
| 229 |
include_user_context=include_user_context,
|
| 230 |
redact_secrets=redact_secrets,
|
| 231 |
ignore_tool_calls=True,
|
| 232 |
+
model_redact=redactor,
|
| 233 |
+
profiler=prof,
|
| 234 |
+
stream_redact_progress=use_cpu,
|
| 235 |
):
|
| 236 |
+
if kind == "progress":
|
| 237 |
+
event, message_total = _stage_event(
|
| 238 |
+
payload, elapsed=prof.elapsed(), message_total=message_total
|
| 239 |
+
)
|
| 240 |
+
yield event
|
| 241 |
elif kind == "result":
|
| 242 |
+
result, narrative, messages = payload
|
| 243 |
except TraceParseError as exc:
|
| 244 |
raise ValueError(str(exc)) from exc
|
| 245 |
|
| 246 |
if analysis_engine != "deterministic":
|
| 247 |
+
yield _progress_event(
|
| 248 |
+
step=5,
|
| 249 |
+
pct=78,
|
| 250 |
+
label=f"Reading the trace with {analysis_engine}",
|
| 251 |
+
elapsed=prof.elapsed(),
|
| 252 |
+
total=message_total,
|
| 253 |
+
)
|
| 254 |
+
analysis_started = time.perf_counter()
|
| 255 |
+
apply_model_analysis(result, messages, analysis_engine, run=analysis_runner)
|
| 256 |
+
prof.record("model_analysis", time.perf_counter() - analysis_started)
|
| 257 |
|
| 258 |
if orig_name:
|
| 259 |
agent = READABLE_AGENT.get(result.agent_type_guess, "Agent")
|
| 260 |
result.trace_title = f"{agent} · {orig_name}"
|
| 261 |
|
| 262 |
+
view = build_view_model(result, narrative)
|
| 263 |
+
prof.mark(engine=result.engine, mode=execution_mode)
|
| 264 |
+
prof.summary()
|
| 265 |
+
yield {
|
| 266 |
+
"step": 6,
|
| 267 |
+
"pct": 100,
|
| 268 |
+
"stage": "Field notes ready",
|
| 269 |
+
"elapsed": round(prof.elapsed(), 1),
|
| 270 |
+
"total": message_total,
|
| 271 |
+
"processed": message_total,
|
| 272 |
+
"result": view,
|
| 273 |
+
}
|
| 274 |
|
| 275 |
|
| 276 |
if __name__ == "__main__":
|
frontend/static/app.jsx
CHANGED
|
@@ -31,19 +31,23 @@ function TopBar() {
|
|
| 31 |
</div>
|
| 32 |
</div>
|
| 33 |
<div className="topbar__right mono">
|
| 34 |
-
<span className="topbar__pill">
|
| 35 |
-
<span className="topbar__pill">privacy-first</span>
|
| 36 |
</div>
|
| 37 |
</header>
|
| 38 |
);
|
| 39 |
}
|
| 40 |
|
| 41 |
const ENGINES = [
|
| 42 |
-
["
|
| 43 |
["nemotron", "Deeper analysis", "Nemotron 3 Nano 30B-A3B"],
|
| 44 |
["deterministic", "Rule-based", "no model, always on"],
|
| 45 |
];
|
| 46 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
function Toggle({ on, set, label, sub, locked }) {
|
| 48 |
return (
|
| 49 |
<button className={"toggle" + (on ? " toggle--on" : "") + (locked ? " toggle--locked" : "")}
|
|
@@ -61,7 +65,8 @@ function LandingView({ onAnalyze, onSample, error }) {
|
|
| 61 |
const [staged, setStaged] = React.useState(null); // { name, file }
|
| 62 |
const [redact, setRedact] = React.useState(true);
|
| 63 |
const [userCtx, setUserCtx] = React.useState(true);
|
| 64 |
-
const [engine, setEngine] = React.useState("
|
|
|
|
| 65 |
const [dragOver, setDragOver] = React.useState(false);
|
| 66 |
const [copied, setCopied] = React.useState(false);
|
| 67 |
const fileRef = React.useRef(null);
|
|
@@ -76,7 +81,7 @@ function LandingView({ onAnalyze, onSample, error }) {
|
|
| 76 |
function pick() { if (fileRef.current) fileRef.current.click(); }
|
| 77 |
function run() {
|
| 78 |
if (!staged) return;
|
| 79 |
-
onAnalyze({ file: staged.file, include_user_context: userCtx, redact_secrets: redact, analysis_engine: engine, engineLabel });
|
| 80 |
}
|
| 81 |
|
| 82 |
const AGENT_PROMPT = `Use this Space as a tool.
|
|
@@ -92,7 +97,7 @@ function LandingView({ onAnalyze, onSample, error }) {
|
|
| 92 |
<TopBar />
|
| 93 |
|
| 94 |
<section className="hero">
|
| 95 |
-
<h1 className="hero__title">See how your coding agent<br /> got stuck, detoured, recovered<span className="hero__amp"> & </span>claimed success.</h1>
|
| 96 |
<p className="hero__sub">
|
| 97 |
Upload a Codex, Claude Code, or Pi Agent session log. Trace Field Notes reads only the agent's
|
| 98 |
<em> narrated</em> messages — what it planned, where it snagged, how it rerouted, and how honestly it called it done —
|
|
@@ -104,7 +109,7 @@ function LandingView({ onAnalyze, onSample, error }) {
|
|
| 104 |
<span className="privacy__mark">!</span>
|
| 105 |
<p>
|
| 106 |
Agent traces can carry prompts, command output, local paths, screenshots, secrets, and private code.
|
| 107 |
-
<b> Review and redact before uploading or sharing.</b> This app analyzes only visible narrative messages
|
| 108 |
</p>
|
| 109 |
</div>
|
| 110 |
|
|
@@ -145,7 +150,7 @@ function LandingView({ onAnalyze, onSample, error }) {
|
|
| 145 |
</div>
|
| 146 |
|
| 147 |
<div className="opts">
|
| 148 |
-
<Toggle on={redact} set={setRedact} label="Redact
|
| 149 |
<Toggle on={userCtx} set={setUserCtx} label="Include user context" sub="user prompts as framing" />
|
| 150 |
<Toggle on={true} set={() => {}} locked label="Ignore tool contents" sub="locked for this release" />
|
| 151 |
</div>
|
|
@@ -162,7 +167,22 @@ function LandingView({ onAnalyze, onSample, error }) {
|
|
| 162 |
</button>
|
| 163 |
))}
|
| 164 |
</div>
|
| 165 |
-
<p className="engine__note muted">Quick uses
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 166 |
</div>
|
| 167 |
|
| 168 |
<div className="panel__actions">
|
|
@@ -235,7 +255,14 @@ const PIPELINE = [
|
|
| 235 |
"Synthesizing field notes",
|
| 236 |
];
|
| 237 |
|
| 238 |
-
function
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 239 |
return (
|
| 240 |
<div className="analyzing">
|
| 241 |
<div className="analyzing__card card card--raised">
|
|
@@ -247,6 +274,25 @@ function Analyzing({ label, step }) {
|
|
| 247 |
<circle className="analyzing__dot" r="4.5" fill="var(--accent)" />
|
| 248 |
</svg>
|
| 249 |
<Kicker>Surveying the trace · {label}</Kicker>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 250 |
<ul className="analyzing__steps">
|
| 251 |
{PIPELINE.map((s, i) => (
|
| 252 |
<li key={s} className={i < step ? "done" : i === step ? "active" : ""}>
|
|
@@ -286,11 +332,13 @@ function App() {
|
|
| 286 |
const [engineLabel, setEngineLabel] = React.useState("");
|
| 287 |
const [error, setError] = React.useState("");
|
| 288 |
const [step, setStep] = React.useState(0);
|
|
|
|
| 289 |
|
| 290 |
-
async function analyze({ file, include_user_context, redact_secrets, analysis_engine, engineLabel }) {
|
| 291 |
setError("");
|
| 292 |
setEngineLabel(engineLabel || analysis_engine);
|
| 293 |
setStep(0);
|
|
|
|
| 294 |
setStage("analyzing");
|
| 295 |
window.scrollTo({ top: 0 });
|
| 296 |
try {
|
|
@@ -302,6 +350,7 @@ function App() {
|
|
| 302 |
include_user_context: !!include_user_context,
|
| 303 |
redact_secrets: !!redact_secrets,
|
| 304 |
analysis_engine,
|
|
|
|
| 305 |
});
|
| 306 |
let result = null;
|
| 307 |
for await (const msg of sub) {
|
|
@@ -313,6 +362,16 @@ function App() {
|
|
| 313 |
} else if (typeof p.step === "number") {
|
| 314 |
setStep(Math.min(p.step, PIPELINE.length - 1));
|
| 315 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 316 |
}
|
| 317 |
} else if (msg.type === "status") {
|
| 318 |
if (msg.stage === "error") throw new Error(msg.message || "The analyzer failed on the server.");
|
|
@@ -349,7 +408,7 @@ function App() {
|
|
| 349 |
<div className="backdrop"><div className="grain" /><TopoBackground /></div>
|
| 350 |
<div className="page">
|
| 351 |
{stage === "landing" && <LandingView onAnalyze={analyze} onSample={loadSample} error={error} />}
|
| 352 |
-
{stage === "analyzing" && <Analyzing label={engineLabel} step={step} />}
|
| 353 |
{stage === "report" && (
|
| 354 |
<div className="report-wrap">
|
| 355 |
<button className="report-back btn btn--sm btn--ghost" onClick={reset}>← New trace</button>
|
|
|
|
| 31 |
</div>
|
| 32 |
</div>
|
| 33 |
<div className="topbar__right mono">
|
| 34 |
+
<span className="topbar__pill">build small</span>
|
|
|
|
| 35 |
</div>
|
| 36 |
</header>
|
| 37 |
);
|
| 38 |
}
|
| 39 |
|
| 40 |
const ENGINES = [
|
| 41 |
+
["minicpm", "Quick analysis", "MiniCPM5 1B"],
|
| 42 |
["nemotron", "Deeper analysis", "Nemotron 3 Nano 30B-A3B"],
|
| 43 |
["deterministic", "Rule-based", "no model, always on"],
|
| 44 |
];
|
| 45 |
|
| 46 |
+
const EXEC_MODES = [
|
| 47 |
+
["zerogpu", "GPU", "Space GPU · faster"],
|
| 48 |
+
["cpu", "CPU", "no GPU quota · slower"],
|
| 49 |
+
];
|
| 50 |
+
|
| 51 |
function Toggle({ on, set, label, sub, locked }) {
|
| 52 |
return (
|
| 53 |
<button className={"toggle" + (on ? " toggle--on" : "") + (locked ? " toggle--locked" : "")}
|
|
|
|
| 65 |
const [staged, setStaged] = React.useState(null); // { name, file }
|
| 66 |
const [redact, setRedact] = React.useState(true);
|
| 67 |
const [userCtx, setUserCtx] = React.useState(true);
|
| 68 |
+
const [engine, setEngine] = React.useState("minicpm");
|
| 69 |
+
const [execMode, setExecMode] = React.useState("zerogpu");
|
| 70 |
const [dragOver, setDragOver] = React.useState(false);
|
| 71 |
const [copied, setCopied] = React.useState(false);
|
| 72 |
const fileRef = React.useRef(null);
|
|
|
|
| 81 |
function pick() { if (fileRef.current) fileRef.current.click(); }
|
| 82 |
function run() {
|
| 83 |
if (!staged) return;
|
| 84 |
+
onAnalyze({ file: staged.file, include_user_context: userCtx, redact_secrets: redact, analysis_engine: engine, execution_mode: execMode, engineLabel });
|
| 85 |
}
|
| 86 |
|
| 87 |
const AGENT_PROMPT = `Use this Space as a tool.
|
|
|
|
| 97 |
<TopBar />
|
| 98 |
|
| 99 |
<section className="hero">
|
| 100 |
+
<h1 className="hero__title">See how your coding agent<br /> got stuck, detoured, recovered<span className="hero__amp"> & </span><br />claimed success.</h1>
|
| 101 |
<p className="hero__sub">
|
| 102 |
Upload a Codex, Claude Code, or Pi Agent session log. Trace Field Notes reads only the agent's
|
| 103 |
<em> narrated</em> messages — what it planned, where it snagged, how it rerouted, and how honestly it called it done —
|
|
|
|
| 109 |
<span className="privacy__mark">!</span>
|
| 110 |
<p>
|
| 111 |
Agent traces can carry prompts, command output, local paths, screenshots, secrets, and private code.
|
| 112 |
+
<b> Review and redact before uploading or sharing.</b> This app analyzes only visible narrative messages, ignores raw tool telemetry by default, and scrubs secrets and personal data with pattern rules plus OpenAI's privacy-filter model.
|
| 113 |
</p>
|
| 114 |
</div>
|
| 115 |
|
|
|
|
| 150 |
</div>
|
| 151 |
|
| 152 |
<div className="opts">
|
| 153 |
+
<Toggle on={redact} set={setRedact} label="Redact secrets & personal data" sub="regex + AI: names, contacts, tokens, keys, paths" />
|
| 154 |
<Toggle on={userCtx} set={setUserCtx} label="Include user context" sub="user prompts as framing" />
|
| 155 |
<Toggle on={true} set={() => {}} locked label="Ignore tool contents" sub="locked for this release" />
|
| 156 |
</div>
|
|
|
|
| 167 |
</button>
|
| 168 |
))}
|
| 169 |
</div>
|
| 170 |
+
<p className="engine__note muted">Quick uses MiniCPM5 1B on the Space GPU. Deeper uses Nemotron 3 Nano 30B-A3B. Rule-based needs no model and never fails.</p>
|
| 171 |
+
</div>
|
| 172 |
+
|
| 173 |
+
<div className="engine">
|
| 174 |
+
<Label>Run on</Label>
|
| 175 |
+
<div className="engine__opts">
|
| 176 |
+
{EXEC_MODES.map(([key, name, detail]) => (
|
| 177 |
+
<button key={key}
|
| 178 |
+
className={"engine__opt" + (execMode === key ? " engine__opt--on" : "")}
|
| 179 |
+
onClick={() => setExecMode(key)}>
|
| 180 |
+
<span className="engine__name">{name}</span>
|
| 181 |
+
<span className="engine__detail mono">{detail}</span>
|
| 182 |
+
</button>
|
| 183 |
+
))}
|
| 184 |
+
</div>
|
| 185 |
+
<p className="engine__note muted">ZeroGPU is fast but spends your Space GPU quota. CPU needs no quota and still works if you've run out — just slower, so the progress bar will move more gradually.</p>
|
| 186 |
</div>
|
| 187 |
|
| 188 |
<div className="panel__actions">
|
|
|
|
| 255 |
"Synthesizing field notes",
|
| 256 |
];
|
| 257 |
|
| 258 |
+
function fmtSeconds(s) {
|
| 259 |
+
if (s == null || isNaN(s)) return "—";
|
| 260 |
+
const m = Math.floor(s / 60), sec = Math.round(s % 60);
|
| 261 |
+
return m > 0 ? `${m}m ${sec}s` : `${sec}s`;
|
| 262 |
+
}
|
| 263 |
+
|
| 264 |
+
function Analyzing({ label, step, progress }) {
|
| 265 |
+
const pct = progress && typeof progress.pct === "number" ? Math.max(0, Math.min(100, progress.pct)) : null;
|
| 266 |
return (
|
| 267 |
<div className="analyzing">
|
| 268 |
<div className="analyzing__card card card--raised">
|
|
|
|
| 274 |
<circle className="analyzing__dot" r="4.5" fill="var(--accent)" />
|
| 275 |
</svg>
|
| 276 |
<Kicker>Surveying the trace · {label}</Kicker>
|
| 277 |
+
{pct != null && (
|
| 278 |
+
<div style={{ margin: "12px 0 2px" }}>
|
| 279 |
+
<div style={{ height: 6, borderRadius: 4, background: "var(--rule)", overflow: "hidden" }}>
|
| 280 |
+
<div style={{ width: pct + "%", height: "100%", background: "var(--accent)", transition: "width .45s ease" }} />
|
| 281 |
+
</div>
|
| 282 |
+
<div className="mono muted" style={{ display: "flex", justifyContent: "space-between", gap: 12, fontSize: 12, marginTop: 7 }}>
|
| 283 |
+
<span>{pct}%{progress.stage ? " · " + progress.stage : ""}</span>
|
| 284 |
+
<span>
|
| 285 |
+
{progress.total != null
|
| 286 |
+
? (progress.processed != null && progress.processed < progress.total
|
| 287 |
+
? progress.processed + "/" + progress.total
|
| 288 |
+
: progress.total) + " msgs · "
|
| 289 |
+
: ""}
|
| 290 |
+
{fmtSeconds(progress.elapsed)} elapsed
|
| 291 |
+
{progress.eta != null && pct < 100 ? " · ~" + fmtSeconds(progress.eta) + " left" : ""}
|
| 292 |
+
</span>
|
| 293 |
+
</div>
|
| 294 |
+
</div>
|
| 295 |
+
)}
|
| 296 |
<ul className="analyzing__steps">
|
| 297 |
{PIPELINE.map((s, i) => (
|
| 298 |
<li key={s} className={i < step ? "done" : i === step ? "active" : ""}>
|
|
|
|
| 332 |
const [engineLabel, setEngineLabel] = React.useState("");
|
| 333 |
const [error, setError] = React.useState("");
|
| 334 |
const [step, setStep] = React.useState(0);
|
| 335 |
+
const [progress, setProgress] = React.useState(null);
|
| 336 |
|
| 337 |
+
async function analyze({ file, include_user_context, redact_secrets, analysis_engine, execution_mode, engineLabel }) {
|
| 338 |
setError("");
|
| 339 |
setEngineLabel(engineLabel || analysis_engine);
|
| 340 |
setStep(0);
|
| 341 |
+
setProgress(null);
|
| 342 |
setStage("analyzing");
|
| 343 |
window.scrollTo({ top: 0 });
|
| 344 |
try {
|
|
|
|
| 350 |
include_user_context: !!include_user_context,
|
| 351 |
redact_secrets: !!redact_secrets,
|
| 352 |
analysis_engine,
|
| 353 |
+
execution_mode,
|
| 354 |
});
|
| 355 |
let result = null;
|
| 356 |
for await (const msg of sub) {
|
|
|
|
| 362 |
} else if (typeof p.step === "number") {
|
| 363 |
setStep(Math.min(p.step, PIPELINE.length - 1));
|
| 364 |
}
|
| 365 |
+
if (typeof p.pct === "number") {
|
| 366 |
+
setProgress({
|
| 367 |
+
pct: p.pct,
|
| 368 |
+
elapsed: p.elapsed,
|
| 369 |
+
eta: p.eta,
|
| 370 |
+
total: p.total,
|
| 371 |
+
processed: p.processed,
|
| 372 |
+
stage: p.stage,
|
| 373 |
+
});
|
| 374 |
+
}
|
| 375 |
}
|
| 376 |
} else if (msg.type === "status") {
|
| 377 |
if (msg.stage === "error") throw new Error(msg.message || "The analyzer failed on the server.");
|
|
|
|
| 408 |
<div className="backdrop"><div className="grain" /><TopoBackground /></div>
|
| 409 |
<div className="page">
|
| 410 |
{stage === "landing" && <LandingView onAnalyze={analyze} onSample={loadSample} error={error} />}
|
| 411 |
+
{stage === "analyzing" && <Analyzing label={engineLabel} step={step} progress={progress} />}
|
| 412 |
{stage === "report" && (
|
| 413 |
<div className="report-wrap">
|
| 414 |
<button className="report-back btn btn--sm btn--ghost" onClick={reset}>← New trace</button>
|
frontend/static/components.jsx
CHANGED
|
@@ -414,12 +414,22 @@ function ReportHeader({ data }) {
|
|
| 414 |
}
|
| 415 |
|
| 416 |
function ModelStatus({ data }) {
|
| 417 |
-
const notes = (data.privacy_notes || []).filter((note) =>
|
|
|
|
|
|
|
| 418 |
if (!notes.length) return null;
|
|
|
|
|
|
|
|
|
|
| 419 |
return (
|
| 420 |
<div className="privacy model-status">
|
| 421 |
-
<span className="privacy__mark">!</span>
|
| 422 |
-
<p>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 423 |
</div>
|
| 424 |
);
|
| 425 |
}
|
|
|
|
| 414 |
}
|
| 415 |
|
| 416 |
function ModelStatus({ data }) {
|
| 417 |
+
const notes = (data.privacy_notes || []).filter((note) =>
|
| 418 |
+
/^(Analysis produced|Model analysis|Model assist|Unknown analysis engine)/.test(String(note))
|
| 419 |
+
);
|
| 420 |
if (!notes.length) return null;
|
| 421 |
+
const fellBack = notes.some((note) =>
|
| 422 |
+
/unavailable|rule-based analysis was returned|deterministic analysis was returned|unknown analysis engine/i.test(note)
|
| 423 |
+
);
|
| 424 |
return (
|
| 425 |
<div className="privacy model-status">
|
| 426 |
+
<span className="privacy__mark">{fellBack ? "!" : "✓"}</span>
|
| 427 |
+
<p>
|
| 428 |
+
<b>{fellBack
|
| 429 |
+
? "Model unavailable — showing the rule-based analysis instead."
|
| 430 |
+
: "This report was written by the model."}</b>{" "}
|
| 431 |
+
{notes.join(" ")}
|
| 432 |
+
</p>
|
| 433 |
</div>
|
| 434 |
);
|
| 435 |
}
|
model_runtime.py
CHANGED
|
@@ -12,20 +12,31 @@ from __future__ import annotations
|
|
| 12 |
|
| 13 |
import json
|
| 14 |
import re
|
|
|
|
| 15 |
from collections.abc import Mapping
|
| 16 |
from dataclasses import dataclass
|
| 17 |
from typing import Any, Callable
|
| 18 |
|
| 19 |
-
from
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
|
| 22 |
PRIMARY_MODEL_ID = "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16"
|
| 23 |
-
QUICK_MODEL_ID = "
|
| 24 |
MODEL_MAX_NEW_TOKENS = 8192
|
| 25 |
|
| 26 |
MODEL_CHOICES = {
|
| 27 |
-
"
|
| 28 |
-
"label": "
|
| 29 |
"model_id": QUICK_MODEL_ID,
|
| 30 |
},
|
| 31 |
"nemotron": {
|
|
@@ -45,9 +56,9 @@ _MODEL_CACHE: dict[str, Any] = {}
|
|
| 45 |
|
| 46 |
|
| 47 |
@dataclass(slots=True)
|
| 48 |
-
class
|
| 49 |
model_id: str
|
| 50 |
-
|
| 51 |
note: str
|
| 52 |
|
| 53 |
|
|
@@ -59,38 +70,82 @@ def model_id_for_engine(engine: str) -> str | None:
|
|
| 59 |
return str(model_id) if model_id else None
|
| 60 |
|
| 61 |
|
| 62 |
-
def
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
*,
|
| 64 |
engine: str,
|
| 65 |
-
|
| 66 |
-
|
|
|
|
| 67 |
generate: GenerateFn | None = None,
|
| 68 |
-
|
| 69 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
|
| 71 |
model_id = model_id_for_engine(engine)
|
| 72 |
if not model_id:
|
| 73 |
raise ValueError(f"No model is configured for analysis engine {engine!r}.")
|
| 74 |
|
| 75 |
-
prompt =
|
|
|
|
|
|
|
| 76 |
messages = [
|
| 77 |
{
|
| 78 |
"role": "system",
|
| 79 |
"content": (
|
| 80 |
-
"You
|
| 81 |
-
"
|
|
|
|
| 82 |
),
|
| 83 |
},
|
| 84 |
{"role": "user", "content": prompt},
|
| 85 |
]
|
| 86 |
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 91 |
model_id=model_id,
|
| 92 |
-
|
| 93 |
-
note=f"
|
| 94 |
)
|
| 95 |
|
| 96 |
|
|
@@ -99,16 +154,18 @@ def _local_generator(
|
|
| 99 |
*,
|
| 100 |
model_id: str,
|
| 101 |
max_new_tokens: int,
|
|
|
|
| 102 |
) -> str:
|
| 103 |
-
"""Generate text with a locally loaded model on the
|
| 104 |
|
| 105 |
-
Imported lazily: ``torch`` only needs to exist on the GPU Space
|
| 106 |
-
the deterministic path, tests, or
|
|
|
|
| 107 |
"""
|
| 108 |
|
| 109 |
import torch
|
| 110 |
|
| 111 |
-
tokenizer, model = _load_model(model_id)
|
| 112 |
chat_inputs = tokenizer.apply_chat_template(
|
| 113 |
messages,
|
| 114 |
add_generation_prompt=True,
|
|
@@ -163,78 +220,146 @@ def _move_to_device(value: Any, device: Any) -> Any:
|
|
| 163 |
def _chat_template_kwargs(model_id: str) -> dict[str, Any]:
|
| 164 |
"""Model-specific chat-template controls."""
|
| 165 |
|
| 166 |
-
if model_id.startswith("
|
| 167 |
-
|
|
|
|
|
|
|
| 168 |
return {}
|
| 169 |
|
| 170 |
|
| 171 |
-
def _load_model(model_id: str) -> Any:
|
| 172 |
-
"""Lazily load and cache a (tokenizer, model) pair on the
|
| 173 |
|
| 174 |
The cache keeps weights resident across requests so only the first call per
|
| 175 |
-
model pays the load cost. ZeroGPU exposes CUDA inside the
|
| 176 |
-
context
|
|
|
|
| 177 |
"""
|
| 178 |
|
| 179 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 180 |
if cached is not None:
|
| 181 |
return cached
|
| 182 |
|
| 183 |
-
import torch
|
| 184 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 185 |
|
|
|
|
| 186 |
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 193 |
model.eval()
|
| 194 |
-
|
|
|
|
| 195 |
return tokenizer, model
|
| 196 |
|
| 197 |
|
| 198 |
-
def
|
| 199 |
-
|
| 200 |
-
narrative_excerpt = narrative_text[:12000]
|
| 201 |
-
return f"""Use the deterministic codebook analysis and redacted visible narrative below.
|
| 202 |
-
|
| 203 |
-
Return JSON with exactly these keys:
|
| 204 |
-
- executive_memo: 4-6 sentences for a developer
|
| 205 |
-
- detour_memo: 2-4 sentences about productive detours vs wandering
|
| 206 |
-
- outcome_audit_memo: 2-4 sentences about completion claims and caveats
|
| 207 |
-
- caveats: array of short strings
|
| 208 |
|
| 209 |
-
Rules:
|
| 210 |
-
- Return one valid JSON object and nothing else.
|
| 211 |
-
- The first non-whitespace character must be {{ and the last must be }}.
|
| 212 |
-
- Analyze only visible narrative messages.
|
| 213 |
-
- Do not claim to know hidden reasoning.
|
| 214 |
-
- Cite episode IDs where useful.
|
| 215 |
-
- Do not include raw secrets, tool outputs, or long quotes.
|
| 216 |
|
| 217 |
-
|
| 218 |
-
|
| 219 |
-
|
| 220 |
-
|
| 221 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 222 |
"""
|
| 223 |
|
| 224 |
|
| 225 |
-
def
|
| 226 |
-
|
| 227 |
|
| 228 |
-
|
| 229 |
-
|
| 230 |
-
|
| 231 |
-
"
|
| 232 |
-
|
| 233 |
-
|
| 234 |
-
|
| 235 |
-
|
| 236 |
-
|
| 237 |
-
parsed["caveats"] = [str(item) for item in parsed["caveats"][:6]]
|
| 238 |
return parsed
|
| 239 |
|
| 240 |
|
|
|
|
| 12 |
|
| 13 |
import json
|
| 14 |
import re
|
| 15 |
+
import time
|
| 16 |
from collections.abc import Mapping
|
| 17 |
from dataclasses import dataclass
|
| 18 |
from typing import Any, Callable
|
| 19 |
|
| 20 |
+
from profiling import get_logger
|
| 21 |
+
from schemas import (
|
| 22 |
+
APPRAISALS,
|
| 23 |
+
DETOUR_TYPES,
|
| 24 |
+
DIFFICULTY_TYPES,
|
| 25 |
+
OUTCOME_CLAIMS,
|
| 26 |
+
RECOVERY_PATTERNS,
|
| 27 |
+
RESOLUTION_MODES,
|
| 28 |
+
)
|
| 29 |
+
|
| 30 |
+
logger = get_logger()
|
| 31 |
|
| 32 |
|
| 33 |
PRIMARY_MODEL_ID = "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16"
|
| 34 |
+
QUICK_MODEL_ID = "openbmb/MiniCPM5-1B"
|
| 35 |
MODEL_MAX_NEW_TOKENS = 8192
|
| 36 |
|
| 37 |
MODEL_CHOICES = {
|
| 38 |
+
"minicpm": {
|
| 39 |
+
"label": "MiniCPM5 1B — quick analysis",
|
| 40 |
"model_id": QUICK_MODEL_ID,
|
| 41 |
},
|
| 42 |
"nemotron": {
|
|
|
|
| 56 |
|
| 57 |
|
| 58 |
@dataclass(slots=True)
|
| 59 |
+
class ModelAnalysisResult:
|
| 60 |
model_id: str
|
| 61 |
+
analysis: dict[str, Any]
|
| 62 |
note: str
|
| 63 |
|
| 64 |
|
|
|
|
| 70 |
return str(model_id) if model_id else None
|
| 71 |
|
| 72 |
|
| 73 |
+
def resolve_device(device: str | None = None) -> str:
|
| 74 |
+
"""Pick the compute device: explicit override, else cuda -> mps -> cpu."""
|
| 75 |
+
|
| 76 |
+
if device:
|
| 77 |
+
return device
|
| 78 |
+
import torch
|
| 79 |
+
|
| 80 |
+
if torch.cuda.is_available():
|
| 81 |
+
return "cuda"
|
| 82 |
+
mps = getattr(torch.backends, "mps", None)
|
| 83 |
+
if mps is not None and mps.is_available():
|
| 84 |
+
return "mps"
|
| 85 |
+
return "cpu"
|
| 86 |
+
|
| 87 |
+
|
| 88 |
+
def run_model_analysis(
|
| 89 |
*,
|
| 90 |
engine: str,
|
| 91 |
+
numbered_narrative: str,
|
| 92 |
+
agent_type: str = "unknown",
|
| 93 |
+
codebook_hint: str = "",
|
| 94 |
generate: GenerateFn | None = None,
|
| 95 |
+
device: str | None = None,
|
| 96 |
+
) -> ModelAnalysisResult:
|
| 97 |
+
"""Run the selected model as the primary analyst and return a field report.
|
| 98 |
+
|
| 99 |
+
The model identifies and classifies the difficulty episodes and writes the
|
| 100 |
+
session verdict directly from the visible narrative; the deterministic codebook
|
| 101 |
+
is only a fallback (used by the caller if this raises). ``device`` forces the
|
| 102 |
+
compute device for the default local generator; an injected ``generate`` is
|
| 103 |
+
used as-is.
|
| 104 |
+
"""
|
| 105 |
|
| 106 |
model_id = model_id_for_engine(engine)
|
| 107 |
if not model_id:
|
| 108 |
raise ValueError(f"No model is configured for analysis engine {engine!r}.")
|
| 109 |
|
| 110 |
+
prompt = build_analysis_prompt(
|
| 111 |
+
numbered_narrative, agent_type=agent_type, codebook_hint=codebook_hint
|
| 112 |
+
)
|
| 113 |
messages = [
|
| 114 |
{
|
| 115 |
"role": "system",
|
| 116 |
"content": (
|
| 117 |
+
"You are an expert analyst of coding-agent session traces. "
|
| 118 |
+
"Judge only the visible narrative; never invent hidden reasoning. "
|
| 119 |
+
"Return one JSON object and nothing else."
|
| 120 |
),
|
| 121 |
},
|
| 122 |
{"role": "user", "content": prompt},
|
| 123 |
]
|
| 124 |
|
| 125 |
+
started = time.perf_counter()
|
| 126 |
+
if generate is not None:
|
| 127 |
+
content = generate(messages, model_id=model_id, max_new_tokens=MODEL_MAX_NEW_TOKENS)
|
| 128 |
+
device_label = "injected"
|
| 129 |
+
else:
|
| 130 |
+
device_label = resolve_device(device)
|
| 131 |
+
content = _local_generator(
|
| 132 |
+
messages,
|
| 133 |
+
model_id=model_id,
|
| 134 |
+
max_new_tokens=MODEL_MAX_NEW_TOKENS,
|
| 135 |
+
device=device_label,
|
| 136 |
+
)
|
| 137 |
+
logger.info(
|
| 138 |
+
"model analysis: %s on %s in %.2fs (%d chars in)",
|
| 139 |
+
model_id,
|
| 140 |
+
device_label,
|
| 141 |
+
time.perf_counter() - started,
|
| 142 |
+
len(numbered_narrative),
|
| 143 |
+
)
|
| 144 |
+
analysis = parse_analysis_json(content)
|
| 145 |
+
return ModelAnalysisResult(
|
| 146 |
model_id=model_id,
|
| 147 |
+
analysis=analysis,
|
| 148 |
+
note=f"Analysis produced by {model_id}.",
|
| 149 |
)
|
| 150 |
|
| 151 |
|
|
|
|
| 154 |
*,
|
| 155 |
model_id: str,
|
| 156 |
max_new_tokens: int,
|
| 157 |
+
device: str | None = None,
|
| 158 |
) -> str:
|
| 159 |
+
"""Generate text with a locally loaded model on the chosen device.
|
| 160 |
|
| 161 |
+
Imported lazily: ``torch`` only needs to exist on the GPU Space (or a local
|
| 162 |
+
machine running the model), never for the deterministic path, tests, or
|
| 163 |
+
light local development.
|
| 164 |
"""
|
| 165 |
|
| 166 |
import torch
|
| 167 |
|
| 168 |
+
tokenizer, model = _load_model(model_id, device=device)
|
| 169 |
chat_inputs = tokenizer.apply_chat_template(
|
| 170 |
messages,
|
| 171 |
add_generation_prompt=True,
|
|
|
|
| 220 |
def _chat_template_kwargs(model_id: str) -> dict[str, Any]:
|
| 221 |
"""Model-specific chat-template controls."""
|
| 222 |
|
| 223 |
+
if model_id.startswith("openbmb/"):
|
| 224 |
+
# MiniCPM5 supports hybrid reasoning; the quick engine keeps thinking
|
| 225 |
+
# off for fast, reliably parseable JSON memos.
|
| 226 |
+
return {"enable_thinking": False}
|
| 227 |
return {}
|
| 228 |
|
| 229 |
|
| 230 |
+
def _load_model(model_id: str, device: str | None = None) -> Any:
|
| 231 |
+
"""Lazily load and cache a (tokenizer, model) pair on the chosen device.
|
| 232 |
|
| 233 |
The cache keeps weights resident across requests so only the first call per
|
| 234 |
+
(model, device) pays the load cost. ZeroGPU exposes CUDA inside the
|
| 235 |
+
``@spaces.GPU`` context; CPU/MPS support lets the app run off-Space (e.g. for
|
| 236 |
+
users without GPU quota, or local development).
|
| 237 |
"""
|
| 238 |
|
| 239 |
+
import torch
|
| 240 |
+
|
| 241 |
+
resolved = resolve_device(device)
|
| 242 |
+
cache_key = f"{model_id}@{resolved}"
|
| 243 |
+
cached = _MODEL_CACHE.get(cache_key)
|
| 244 |
if cached is not None:
|
| 245 |
return cached
|
| 246 |
|
|
|
|
| 247 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 248 |
|
| 249 |
+
started = time.perf_counter()
|
| 250 |
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
|
| 251 |
+
if resolved == "cuda":
|
| 252 |
+
# The ZeroGPU Space path: load straight onto the GPU in bfloat16.
|
| 253 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 254 |
+
model_id,
|
| 255 |
+
dtype=torch.bfloat16,
|
| 256 |
+
device_map="cuda",
|
| 257 |
+
trust_remote_code=True,
|
| 258 |
+
)
|
| 259 |
+
else:
|
| 260 |
+
# CPU / Apple MPS: fp16 on MPS, fp32 on CPU for numerical stability.
|
| 261 |
+
dtype = torch.float16 if resolved == "mps" else torch.float32
|
| 262 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 263 |
+
model_id,
|
| 264 |
+
dtype=dtype,
|
| 265 |
+
trust_remote_code=True,
|
| 266 |
+
).to(resolved)
|
| 267 |
model.eval()
|
| 268 |
+
logger.info("loaded %s on %s in %.1fs", model_id, resolved, time.perf_counter() - started)
|
| 269 |
+
_MODEL_CACHE[cache_key] = (tokenizer, model)
|
| 270 |
return tokenizer, model
|
| 271 |
|
| 272 |
|
| 273 |
+
def _vocab_block(name: str, vocab: dict[str, str]) -> str:
|
| 274 |
+
return f"{name}:\n" + "\n".join(f"- {key}: {meaning}" for key, meaning in vocab.items())
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 275 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 276 |
|
| 277 |
+
def build_analysis_prompt(
|
| 278 |
+
numbered_narrative: str, *, agent_type: str = "unknown", codebook_hint: str = ""
|
| 279 |
+
) -> str:
|
| 280 |
+
narrative = numbered_narrative[:16000]
|
| 281 |
+
vocab = "\n\n".join(
|
| 282 |
+
[
|
| 283 |
+
_vocab_block("difficulty_type", DIFFICULTY_TYPES),
|
| 284 |
+
_vocab_block("appraisal", APPRAISALS),
|
| 285 |
+
_vocab_block("detour_type", DETOUR_TYPES),
|
| 286 |
+
_vocab_block("resolution_mode", RESOLUTION_MODES),
|
| 287 |
+
_vocab_block("recovery_pattern", RECOVERY_PATTERNS),
|
| 288 |
+
_vocab_block("outcome_claim", OUTCOME_CLAIMS),
|
| 289 |
+
]
|
| 290 |
+
)
|
| 291 |
+
return f"""Read the agent's visible narrative and produce a structured field report as JSON.
|
| 292 |
+
|
| 293 |
+
Identify the real DIFFICULTY EPISODES — moments where the agent hit a snag, reassessed,
|
| 294 |
+
detoured, recovered, or claimed completion. Ignore instructions, skill files, prompts,
|
| 295 |
+
or boilerplate the agent merely read or quoted; those are NOT difficulties. Merge
|
| 296 |
+
duplicates. Prefer 1-8 substantive episodes; if there is genuinely no difficulty,
|
| 297 |
+
return an empty episodes list.
|
| 298 |
+
|
| 299 |
+
Return ONE JSON object (first character {{ and last character }}), no prose, EXACTLY:
|
| 300 |
+
{{
|
| 301 |
+
"verdict": {{
|
| 302 |
+
"tone": one of ["stable","iterative","detour","partial","risk","unknown"],
|
| 303 |
+
"headline": "<= 12 words, plain language",
|
| 304 |
+
"detail": "2-4 sentences a developer can act on",
|
| 305 |
+
"honesty": one of ["candid","mixed","overclaimed"]
|
| 306 |
+
}},
|
| 307 |
+
"overall_patterns": {{
|
| 308 |
+
"difficulty_style": "1 sentence", "detour_style": "1 sentence",
|
| 309 |
+
"recovery_style": "1 sentence", "risk_or_caveat": "1 sentence"
|
| 310 |
+
}},
|
| 311 |
+
"episodes": [
|
| 312 |
+
{{
|
| 313 |
+
"start_index": <a message index shown below>,
|
| 314 |
+
"end_index": <a message index shown below>,
|
| 315 |
+
"title": "<= 10 words",
|
| 316 |
+
"initial_intention": "1 sentence", "reported_difficulty": "1-2 sentences",
|
| 317 |
+
"difficulty_type": "<one key below>", "appraisal": "<one key below>",
|
| 318 |
+
"strategy_before": "1 sentence", "strategy_after": "1 sentence",
|
| 319 |
+
"detour_type": "<one key below>", "resolution_mode": "<one key below>",
|
| 320 |
+
"recovery_pattern": "<one key below>", "outcome_claim": "<one key below>",
|
| 321 |
+
"productive_detour": one of ["yes","no","mixed","unknown"],
|
| 322 |
+
"evidence_quotes": ["short verbatim quote", "up to 3"],
|
| 323 |
+
"analyst_memo": "1-3 sentences of real insight, NOT a restatement of the codes"
|
| 324 |
+
}}
|
| 325 |
+
]
|
| 326 |
+
}}
|
| 327 |
+
|
| 328 |
+
Controlled vocabulary (use these keys exactly):
|
| 329 |
+
{vocab}
|
| 330 |
+
|
| 331 |
+
Guidance:
|
| 332 |
+
- Every field must contain real content drawn from the trace. NEVER output a
|
| 333 |
+
placeholder such as "<= 10 words", "1 sentence", or "<one key below>" literally.
|
| 334 |
+
- difficulty_type, appraisal, detour_type, resolution_mode, recovery_pattern, and
|
| 335 |
+
outcome_claim must each be EXACTLY one key from the vocabulary above (lowercase,
|
| 336 |
+
with underscores). If unsure, use "unknown".
|
| 337 |
+
- Be accurate, not generous. If the agent ended unresolved or overclaimed, say so in tone/honesty.
|
| 338 |
+
- honesty = "overclaimed" when a success claim outruns the visible evidence.
|
| 339 |
+
- start_index / end_index must be message indices that appear below.
|
| 340 |
+
- Quote the agent's own words; keep the original language of the quote.
|
| 341 |
+
- Do not include secrets or long tool dumps.
|
| 342 |
+
|
| 343 |
+
Agent type: {agent_type}
|
| 344 |
+
Rule-based pre-scan candidate spans (hints only — keep, drop, merge, or add freely): {codebook_hint or "(none)"}
|
| 345 |
+
|
| 346 |
+
Numbered visible messages:
|
| 347 |
+
{narrative}
|
| 348 |
"""
|
| 349 |
|
| 350 |
|
| 351 |
+
def parse_analysis_json(content: str) -> dict[str, Any]:
|
| 352 |
+
"""Validate the structural shape of the model's field report (codes coerced later)."""
|
| 353 |
|
| 354 |
+
parsed = _loads_lenient(content)
|
| 355 |
+
episodes = parsed.get("episodes")
|
| 356 |
+
if not isinstance(episodes, list):
|
| 357 |
+
raise ValueError("Model response did not include an 'episodes' list.")
|
| 358 |
+
parsed["episodes"] = [episode for episode in episodes if isinstance(episode, dict)]
|
| 359 |
+
if not isinstance(parsed.get("overall_patterns"), dict):
|
| 360 |
+
parsed["overall_patterns"] = {}
|
| 361 |
+
if not isinstance(parsed.get("verdict"), dict):
|
| 362 |
+
parsed["verdict"] = {}
|
|
|
|
| 363 |
return parsed
|
| 364 |
|
| 365 |
|
privacy_filter.py
ADDED
|
@@ -0,0 +1,180 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Optional model-based PII redaction using ``openai/privacy-filter``.
|
| 2 |
+
|
| 3 |
+
The deterministic pipeline always runs regex redaction (:mod:`redaction`). On the
|
| 4 |
+
Hugging Face Space GPU this module adds a second pass: a token-classification
|
| 5 |
+
model (``openai/privacy-filter``) flags personal or sensitive spans that regex
|
| 6 |
+
patterns miss — names, phone numbers, postal addresses, and the like — and masks
|
| 7 |
+
them with typed placeholders.
|
| 8 |
+
|
| 9 |
+
Heavy imports (``torch``/``transformers``) load lazily so the deterministic
|
| 10 |
+
analyzer, the test suite, and local development keep working without GPU
|
| 11 |
+
dependencies. If the model cannot be loaded, the caller falls back to regex-only
|
| 12 |
+
redaction and records the reason in the privacy notes.
|
| 13 |
+
"""
|
| 14 |
+
|
| 15 |
+
from __future__ import annotations
|
| 16 |
+
|
| 17 |
+
import functools
|
| 18 |
+
import time
|
| 19 |
+
from collections import Counter
|
| 20 |
+
from typing import Any, Callable
|
| 21 |
+
|
| 22 |
+
from model_runtime import resolve_device
|
| 23 |
+
from profiling import get_logger
|
| 24 |
+
from redaction import RedactionResult
|
| 25 |
+
|
| 26 |
+
logger = get_logger()
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
PRIVACY_MODEL_ID = "openai/privacy-filter"
|
| 30 |
+
|
| 31 |
+
# Only mask spans the model is reasonably confident about.
|
| 32 |
+
PRIVACY_MIN_SCORE = 0.5
|
| 33 |
+
|
| 34 |
+
# Model entity group -> (placeholder written into the text, human label for notes).
|
| 35 |
+
PII_TYPES: dict[str, tuple[str, str]] = {
|
| 36 |
+
"private_person": ("[REDACTED_NAME]", "personal name"),
|
| 37 |
+
"private_email": ("[REDACTED_EMAIL]", "email address"),
|
| 38 |
+
"private_phone": ("[REDACTED_PHONE]", "phone number"),
|
| 39 |
+
"private_address": ("[REDACTED_ADDRESS]", "postal address"),
|
| 40 |
+
"private_url": ("[REDACTED_URL]", "personal URL"),
|
| 41 |
+
"private_date": ("[REDACTED_DATE]", "personal date"),
|
| 42 |
+
"account_number": ("[REDACTED_ACCOUNT]", "account number"),
|
| 43 |
+
"secret": ("[REDACTED_SECRET]", "secret"),
|
| 44 |
+
}
|
| 45 |
+
|
| 46 |
+
# (texts) -> per-text list of {"start", "end", "label"} spans.
|
| 47 |
+
DetectFn = Callable[[list[str]], list[list[dict[str, Any]]]]
|
| 48 |
+
|
| 49 |
+
_PIPELINE_CACHE: dict[str, Any] = {}
|
| 50 |
+
|
| 51 |
+
|
| 52 |
+
def redact_texts(
|
| 53 |
+
texts: list[str],
|
| 54 |
+
*,
|
| 55 |
+
detect: DetectFn | None = None,
|
| 56 |
+
device: str | None = None,
|
| 57 |
+
) -> list[RedactionResult]:
|
| 58 |
+
"""Detect and mask PII in each text, returning one result per input.
|
| 59 |
+
|
| 60 |
+
``detect`` defaults to :func:`_local_detect` (the lazy model); tests inject a
|
| 61 |
+
stand-in so the masking logic runs without ``torch``. ``device`` forces the
|
| 62 |
+
compute device for the default detector (``cuda`` / ``mps`` / ``cpu``).
|
| 63 |
+
"""
|
| 64 |
+
|
| 65 |
+
detector = detect or functools.partial(_local_detect, device=device)
|
| 66 |
+
spans_per_text = detector(texts)
|
| 67 |
+
return [_apply_spans(text, spans) for text, spans in zip(texts, spans_per_text)]
|
| 68 |
+
|
| 69 |
+
|
| 70 |
+
def _merge_spans(text: str, spans: list[dict[str, Any]]) -> list[dict[str, Any]]:
|
| 71 |
+
"""Drop malformed spans and merge same-label runs into clean, disjoint spans.
|
| 72 |
+
|
| 73 |
+
``openai/privacy-filter`` uses BIOES tags, which the pipeline's IOB-oriented
|
| 74 |
+
"simple" aggregation can split into adjacent fragments of one entity (and a
|
| 75 |
+
leading separator can leave a one-character gap). Merging same-label spans
|
| 76 |
+
that overlap or sit within one character keeps each entity to a single
|
| 77 |
+
placeholder; a remaining different-label overlap is clipped to stay disjoint.
|
| 78 |
+
"""
|
| 79 |
+
|
| 80 |
+
valid = [
|
| 81 |
+
span
|
| 82 |
+
for span in spans
|
| 83 |
+
if span.get("label") in PII_TYPES
|
| 84 |
+
and 0 <= int(span["start"]) < int(span["end"]) <= len(text)
|
| 85 |
+
]
|
| 86 |
+
valid.sort(key=lambda span: (int(span["start"]), int(span["end"])))
|
| 87 |
+
|
| 88 |
+
merged: list[dict[str, Any]] = []
|
| 89 |
+
for span in valid:
|
| 90 |
+
start, end, label = int(span["start"]), int(span["end"]), span["label"]
|
| 91 |
+
if merged:
|
| 92 |
+
prev = merged[-1]
|
| 93 |
+
if label == prev["label"] and start <= prev["end"] + 1:
|
| 94 |
+
prev["end"] = max(prev["end"], end)
|
| 95 |
+
continue
|
| 96 |
+
if start < prev["end"]: # different-label overlap: keep them disjoint
|
| 97 |
+
start = prev["end"]
|
| 98 |
+
if start >= end:
|
| 99 |
+
continue
|
| 100 |
+
merged.append({"start": start, "end": end, "label": label})
|
| 101 |
+
return merged
|
| 102 |
+
|
| 103 |
+
|
| 104 |
+
def _apply_spans(text: str, spans: list[dict[str, Any]]) -> RedactionResult:
|
| 105 |
+
"""Replace detected spans with typed placeholders, right-to-left."""
|
| 106 |
+
|
| 107 |
+
counts: Counter[str] = Counter()
|
| 108 |
+
redacted = text
|
| 109 |
+
for span in sorted(_merge_spans(text, spans), key=lambda span: span["start"], reverse=True):
|
| 110 |
+
placeholder, label = PII_TYPES[span["label"]]
|
| 111 |
+
redacted = redacted[: span["start"]] + placeholder + redacted[span["end"] :]
|
| 112 |
+
counts[label] += 1
|
| 113 |
+
|
| 114 |
+
notes = [f"{label}: {count}" for label, count in sorted(counts.items())]
|
| 115 |
+
return RedactionResult(text=redacted, notes=notes, count=sum(counts.values()))
|
| 116 |
+
|
| 117 |
+
|
| 118 |
+
def _local_detect(texts: list[str], device: str | None = None) -> list[list[dict[str, Any]]]:
|
| 119 |
+
"""Run ``openai/privacy-filter`` and return confident PII spans per text.
|
| 120 |
+
|
| 121 |
+
Imported lazily: ``transformers``/``torch`` only need to exist where the
|
| 122 |
+
model actually runs, never for the deterministic path, tests, or light local
|
| 123 |
+
development.
|
| 124 |
+
"""
|
| 125 |
+
|
| 126 |
+
pipe = _load_pipeline(device=device)
|
| 127 |
+
started = time.perf_counter()
|
| 128 |
+
results: list[list[dict[str, Any]]] = []
|
| 129 |
+
for text in texts:
|
| 130 |
+
if not text.strip():
|
| 131 |
+
results.append([])
|
| 132 |
+
continue
|
| 133 |
+
entities = pipe(text)
|
| 134 |
+
spans = [
|
| 135 |
+
{
|
| 136 |
+
"start": int(entity["start"]),
|
| 137 |
+
"end": int(entity["end"]),
|
| 138 |
+
"label": entity["entity_group"],
|
| 139 |
+
}
|
| 140 |
+
for entity in entities
|
| 141 |
+
if entity.get("entity_group") in PII_TYPES
|
| 142 |
+
and entity.get("start") is not None
|
| 143 |
+
and entity.get("end") is not None
|
| 144 |
+
and float(entity.get("score", 1.0)) >= PRIVACY_MIN_SCORE
|
| 145 |
+
]
|
| 146 |
+
results.append(spans)
|
| 147 |
+
detected = sum(len(spans) for spans in results)
|
| 148 |
+
logger.debug(
|
| 149 |
+
"privacy-filter scanned %d messages, %d raw spans in %.2fs",
|
| 150 |
+
len(texts),
|
| 151 |
+
detected,
|
| 152 |
+
time.perf_counter() - started,
|
| 153 |
+
)
|
| 154 |
+
return results
|
| 155 |
+
|
| 156 |
+
|
| 157 |
+
def _load_pipeline(device: str | None = None) -> Any:
|
| 158 |
+
"""Lazily build and cache the token-classification pipeline per device."""
|
| 159 |
+
|
| 160 |
+
resolved = resolve_device(device)
|
| 161 |
+
cached = _PIPELINE_CACHE.get(resolved)
|
| 162 |
+
if cached is not None:
|
| 163 |
+
return cached
|
| 164 |
+
|
| 165 |
+
from transformers import pipeline
|
| 166 |
+
|
| 167 |
+
# transformers pipeline device: 0 for cuda, "mps"/"cpu" otherwise.
|
| 168 |
+
pipe_device = 0 if resolved == "cuda" else resolved
|
| 169 |
+
started = time.perf_counter()
|
| 170 |
+
pipe = pipeline(
|
| 171 |
+
"token-classification",
|
| 172 |
+
model=PRIVACY_MODEL_ID,
|
| 173 |
+
aggregation_strategy="simple",
|
| 174 |
+
device=pipe_device,
|
| 175 |
+
)
|
| 176 |
+
logger.info(
|
| 177 |
+
"loaded %s on %s in %.1fs", PRIVACY_MODEL_ID, resolved, time.perf_counter() - started
|
| 178 |
+
)
|
| 179 |
+
_PIPELINE_CACHE[resolved] = pipe
|
| 180 |
+
return pipe
|
profiling.py
ADDED
|
@@ -0,0 +1,125 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Lightweight logging + profiling for the Trace Field Notes pipeline.
|
| 2 |
+
|
| 3 |
+
Everything here writes to the standard logging system, never the UI. Set the log
|
| 4 |
+
level with the ``TFN_LOG_LEVEL`` env var (default ``INFO``); use ``DEBUG`` for
|
| 5 |
+
per-stage detail. Resource probes (process RSS, system memory, CPU, and
|
| 6 |
+
GPU/MPS memory) are best-effort and degrade silently if a dependency is missing
|
| 7 |
+
— so the deterministic path, the test suite, and local development never need
|
| 8 |
+
``psutil`` or ``torch`` installed.
|
| 9 |
+
"""
|
| 10 |
+
|
| 11 |
+
from __future__ import annotations
|
| 12 |
+
|
| 13 |
+
import logging
|
| 14 |
+
import os
|
| 15 |
+
import time
|
| 16 |
+
from contextlib import contextmanager
|
| 17 |
+
from typing import Any, Iterator
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
def get_logger(name: str = "trace_field_notes") -> logging.Logger:
|
| 21 |
+
logger = logging.getLogger(name)
|
| 22 |
+
if not logger.handlers:
|
| 23 |
+
handler = logging.StreamHandler()
|
| 24 |
+
handler.setFormatter(
|
| 25 |
+
logging.Formatter("%(asctime)s [%(name)s] %(levelname)s %(message)s")
|
| 26 |
+
)
|
| 27 |
+
logger.addHandler(handler)
|
| 28 |
+
logger.setLevel(os.getenv("TFN_LOG_LEVEL", "INFO").upper())
|
| 29 |
+
logger.propagate = False
|
| 30 |
+
return logger
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
logger = get_logger()
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
def resource_snapshot() -> dict[str, Any]:
|
| 37 |
+
"""Best-effort process + system resource probe. Never raises."""
|
| 38 |
+
|
| 39 |
+
snap: dict[str, Any] = {}
|
| 40 |
+
try:
|
| 41 |
+
import psutil
|
| 42 |
+
|
| 43 |
+
proc = psutil.Process()
|
| 44 |
+
snap["rss_mb"] = round(proc.memory_info().rss / 1024 / 1024, 1)
|
| 45 |
+
vm = psutil.virtual_memory()
|
| 46 |
+
snap["sys_mem_pct"] = vm.percent
|
| 47 |
+
snap["sys_mem_avail_mb"] = round(vm.available / 1024 / 1024, 1)
|
| 48 |
+
snap["cpu_pct"] = psutil.cpu_percent(interval=None)
|
| 49 |
+
except Exception: # noqa: BLE001 - profiling must never break the request
|
| 50 |
+
pass
|
| 51 |
+
try:
|
| 52 |
+
import torch
|
| 53 |
+
|
| 54 |
+
if torch.cuda.is_available():
|
| 55 |
+
snap["accel"] = "cuda"
|
| 56 |
+
snap["accel_mem_mb"] = round(torch.cuda.memory_allocated() / 1024 / 1024, 1)
|
| 57 |
+
elif getattr(torch.backends, "mps", None) and torch.backends.mps.is_available():
|
| 58 |
+
snap["accel"] = "mps"
|
| 59 |
+
snap["accel_mem_mb"] = round(
|
| 60 |
+
torch.mps.current_allocated_memory() / 1024 / 1024, 1
|
| 61 |
+
)
|
| 62 |
+
except Exception: # noqa: BLE001
|
| 63 |
+
pass
|
| 64 |
+
return snap
|
| 65 |
+
|
| 66 |
+
|
| 67 |
+
def format_snapshot(snap: dict[str, Any]) -> str:
|
| 68 |
+
parts = []
|
| 69 |
+
if "rss_mb" in snap:
|
| 70 |
+
parts.append(f"rss={snap['rss_mb']}MB")
|
| 71 |
+
if "sys_mem_pct" in snap:
|
| 72 |
+
parts.append(f"sysmem={snap['sys_mem_pct']}%")
|
| 73 |
+
if "cpu_pct" in snap:
|
| 74 |
+
parts.append(f"cpu={snap['cpu_pct']}%")
|
| 75 |
+
if "accel_mem_mb" in snap:
|
| 76 |
+
parts.append(f"{snap.get('accel', 'accel')}={snap['accel_mem_mb']}MB")
|
| 77 |
+
return " ".join(parts) or "n/a"
|
| 78 |
+
|
| 79 |
+
|
| 80 |
+
class Profiler:
|
| 81 |
+
"""Accumulates per-stage timings + counts for one request and logs a summary."""
|
| 82 |
+
|
| 83 |
+
def __init__(self, label: str = "analyze") -> None:
|
| 84 |
+
self.label = label
|
| 85 |
+
self._t0 = time.perf_counter()
|
| 86 |
+
self.stages: list[tuple[str, float]] = []
|
| 87 |
+
self.meta: dict[str, Any] = {}
|
| 88 |
+
|
| 89 |
+
@contextmanager
|
| 90 |
+
def stage(self, name: str) -> Iterator[None]:
|
| 91 |
+
start = time.perf_counter()
|
| 92 |
+
logger.debug(
|
| 93 |
+
"%s: stage %r start | %s", self.label, name, format_snapshot(resource_snapshot())
|
| 94 |
+
)
|
| 95 |
+
try:
|
| 96 |
+
yield
|
| 97 |
+
finally:
|
| 98 |
+
dt = time.perf_counter() - start
|
| 99 |
+
self.stages.append((name, dt))
|
| 100 |
+
logger.debug("%s: stage %r done in %.3fs", self.label, name, dt)
|
| 101 |
+
|
| 102 |
+
def record(self, name: str, seconds: float) -> None:
|
| 103 |
+
"""Record a stage duration measured by the caller (no context manager)."""
|
| 104 |
+
|
| 105 |
+
self.stages.append((name, seconds))
|
| 106 |
+
logger.debug("%s: stage %r done in %.3fs", self.label, name, seconds)
|
| 107 |
+
|
| 108 |
+
def mark(self, **kwargs: Any) -> None:
|
| 109 |
+
self.meta.update(kwargs)
|
| 110 |
+
|
| 111 |
+
def elapsed(self) -> float:
|
| 112 |
+
return time.perf_counter() - self._t0
|
| 113 |
+
|
| 114 |
+
def summary(self) -> None:
|
| 115 |
+
total = self.elapsed()
|
| 116 |
+
stage_str = ", ".join(f"{name}={dt * 1000:.0f}ms" for name, dt in self.stages)
|
| 117 |
+
meta_str = " ".join(f"{key}={value}" for key, value in self.meta.items())
|
| 118 |
+
logger.info(
|
| 119 |
+
"%s done in %.3fs | %s | stages: %s | %s",
|
| 120 |
+
self.label,
|
| 121 |
+
total,
|
| 122 |
+
meta_str or "-",
|
| 123 |
+
stage_str or "-",
|
| 124 |
+
format_snapshot(resource_snapshot()),
|
| 125 |
+
)
|
requirements.txt
CHANGED
|
@@ -2,6 +2,7 @@ gradio>=6.16,<7
|
|
| 2 |
huggingface_hub>=0.30
|
| 3 |
spaces>=0.50
|
| 4 |
torch>=2.4
|
| 5 |
-
transformers>=
|
| 6 |
accelerate>=1.0
|
| 7 |
einops>=0.8
|
|
|
|
|
|
| 2 |
huggingface_hub>=0.30
|
| 3 |
spaces>=0.50
|
| 4 |
torch>=2.4
|
| 5 |
+
transformers>=5.6
|
| 6 |
accelerate>=1.0
|
| 7 |
einops>=0.8
|
| 8 |
+
psutil>=5.9
|
schemas.py
CHANGED
|
@@ -149,6 +149,7 @@ class AnalysisResult:
|
|
| 149 |
engine: str = "deterministic-codebook"
|
| 150 |
model_notes: list[str] = field(default_factory=list)
|
| 151 |
model_memo: dict[str, Any] = field(default_factory=dict)
|
|
|
|
| 152 |
|
| 153 |
def to_dict(self) -> dict[str, Any]:
|
| 154 |
return {
|
|
@@ -163,4 +164,5 @@ class AnalysisResult:
|
|
| 163 |
"engine": self.engine,
|
| 164 |
"model_notes": self.model_notes,
|
| 165 |
"model_memo": self.model_memo,
|
|
|
|
| 166 |
}
|
|
|
|
| 149 |
engine: str = "deterministic-codebook"
|
| 150 |
model_notes: list[str] = field(default_factory=list)
|
| 151 |
model_memo: dict[str, Any] = field(default_factory=dict)
|
| 152 |
+
session_verdict: dict[str, Any] = field(default_factory=dict)
|
| 153 |
|
| 154 |
def to_dict(self) -> dict[str, Any]:
|
| 155 |
return {
|
|
|
|
| 164 |
"engine": self.engine,
|
| 165 |
"model_notes": self.model_notes,
|
| 166 |
"model_memo": self.model_memo,
|
| 167 |
+
"session_verdict": self.session_verdict,
|
| 168 |
}
|
tests/test_model_runtime.py
CHANGED
|
@@ -14,16 +14,45 @@ from model_runtime import (
|
|
| 14 |
QUICK_MODEL_ID,
|
| 15 |
_chat_template_kwargs,
|
| 16 |
_prepare_generation_inputs,
|
| 17 |
-
|
| 18 |
-
|
|
|
|
| 19 |
)
|
| 20 |
|
| 21 |
|
| 22 |
-
|
| 23 |
-
"
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
}
|
| 28 |
|
| 29 |
|
|
@@ -37,7 +66,7 @@ class RecordingGenerator:
|
|
| 37 |
self.calls.append(
|
| 38 |
{"messages": messages, "model_id": model_id, "max_new_tokens": max_new_tokens}
|
| 39 |
)
|
| 40 |
-
return json.dumps(
|
| 41 |
|
| 42 |
|
| 43 |
class FakeTensor:
|
|
@@ -57,46 +86,62 @@ class ModelRuntimeTests(unittest.TestCase):
|
|
| 57 |
self.assertIn("NVIDIA Nemotron 3 Nano 30B-A3B", label)
|
| 58 |
self.assertNotIn("small", label.lower())
|
| 59 |
|
| 60 |
-
def
|
| 61 |
-
|
|
|
|
|
|
|
| 62 |
|
| 63 |
-
|
| 64 |
-
self.assertEqual(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
|
| 66 |
-
|
| 67 |
-
|
| 68 |
|
| 69 |
-
|
|
|
|
| 70 |
|
| 71 |
-
|
| 72 |
-
raw = "Here is the analysis:\n" + json.dumps(MEMO_JSON) + "\nHope this helps."
|
| 73 |
-
memo = parse_model_json(raw)
|
| 74 |
|
| 75 |
-
|
|
|
|
|
|
|
| 76 |
|
| 77 |
-
|
|
|
|
|
|
|
| 78 |
raw = (
|
| 79 |
"<think>Draft {not json} and a scratch object "
|
| 80 |
'{"draft": "ignore this"} before the final answer.</think>\n'
|
| 81 |
-
+ json.dumps(
|
| 82 |
)
|
| 83 |
-
|
|
|
|
|
|
|
| 84 |
|
| 85 |
-
|
|
|
|
|
|
|
| 86 |
|
| 87 |
-
def
|
| 88 |
-
result, narrative = analyze_trace_file(Path("examples/sample_trace_redacted.jsonl"))
|
| 89 |
generate = RecordingGenerator()
|
| 90 |
|
| 91 |
-
|
| 92 |
engine="nemotron",
|
| 93 |
-
|
| 94 |
-
narrative_text=narrative,
|
| 95 |
generate=generate,
|
| 96 |
)
|
| 97 |
|
| 98 |
-
self.assertEqual(
|
| 99 |
-
self.
|
| 100 |
self.assertEqual(generate.calls[0]["model_id"], PRIMARY_MODEL_ID)
|
| 101 |
self.assertEqual(generate.calls[0]["max_new_tokens"], MODEL_MAX_NEW_TOKENS)
|
| 102 |
|
|
@@ -121,12 +166,6 @@ class ModelRuntimeTests(unittest.TestCase):
|
|
| 121 |
self.assertEqual(generation_inputs["input_ids"], input_ids)
|
| 122 |
self.assertEqual(generation_inputs["attention_mask"], attention_mask)
|
| 123 |
self.assertEqual(prompt_tokens, 21)
|
| 124 |
-
self.assertEqual(input_ids.device, "cuda")
|
| 125 |
-
self.assertEqual(attention_mask.device, "cuda")
|
| 126 |
-
|
| 127 |
-
def test_qwen_chat_template_enables_thinking(self) -> None:
|
| 128 |
-
self.assertEqual(_chat_template_kwargs(QUICK_MODEL_ID), {"enable_thinking": True})
|
| 129 |
-
self.assertEqual(_chat_template_kwargs(PRIMARY_MODEL_ID), {})
|
| 130 |
|
| 131 |
def test_analyzer_records_unknown_engine_note(self) -> None:
|
| 132 |
result, _ = analyze_trace_file(
|
|
@@ -138,30 +177,61 @@ class ModelRuntimeTests(unittest.TestCase):
|
|
| 138 |
self.assertIn("Unknown analysis engine", result.model_notes[0])
|
| 139 |
|
| 140 |
def test_analyzer_model_error_note_avoids_double_period(self) -> None:
|
| 141 |
-
with patch("analyzer.
|
| 142 |
result, _ = analyze_trace_file(
|
| 143 |
Path("examples/sample_trace_redacted.jsonl"),
|
| 144 |
-
analysis_engine="
|
| 145 |
)
|
| 146 |
|
| 147 |
self.assertTrue(result.model_notes)
|
| 148 |
self.assertNotIn("..", result.model_notes[0])
|
| 149 |
self.assertIn("ValueError: model unavailable.", result.model_notes[0])
|
| 150 |
|
| 151 |
-
def
|
| 152 |
-
with patch("analyzer.
|
| 153 |
-
|
| 154 |
model_id=PRIMARY_MODEL_ID,
|
| 155 |
-
|
| 156 |
-
note="
|
| 157 |
)
|
| 158 |
result, _ = analyze_trace_file(
|
| 159 |
Path("examples/sample_trace_redacted.jsonl"),
|
| 160 |
analysis_engine="nemotron",
|
| 161 |
)
|
| 162 |
|
| 163 |
-
self.
|
| 164 |
-
self.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 165 |
|
| 166 |
|
| 167 |
if __name__ == "__main__":
|
|
|
|
| 14 |
QUICK_MODEL_ID,
|
| 15 |
_chat_template_kwargs,
|
| 16 |
_prepare_generation_inputs,
|
| 17 |
+
parse_analysis_json,
|
| 18 |
+
resolve_device,
|
| 19 |
+
run_model_analysis,
|
| 20 |
)
|
| 21 |
|
| 22 |
|
| 23 |
+
ANALYSIS_JSON = {
|
| 24 |
+
"verdict": {
|
| 25 |
+
"tone": "partial",
|
| 26 |
+
"headline": "Reroute landed with a caveat.",
|
| 27 |
+
"detail": "The agent caught a wrong assumption about the upload shape and narrowed the fix.",
|
| 28 |
+
"honesty": "candid",
|
| 29 |
+
},
|
| 30 |
+
"overall_patterns": {
|
| 31 |
+
"difficulty_style": "One localization snag.",
|
| 32 |
+
"detour_style": "A productive narrowing.",
|
| 33 |
+
"recovery_style": "Reflective.",
|
| 34 |
+
"risk_or_caveat": "Deployment path left unverified.",
|
| 35 |
+
},
|
| 36 |
+
"episodes": [
|
| 37 |
+
{
|
| 38 |
+
"start_index": 0,
|
| 39 |
+
"end_index": 3,
|
| 40 |
+
"title": "Upload boundary fix",
|
| 41 |
+
"initial_intention": "Inspect the failing upload path.",
|
| 42 |
+
"reported_difficulty": "The Gradio file object can arrive as a temporary path.",
|
| 43 |
+
"difficulty_type": "localization_difficulty",
|
| 44 |
+
"appraisal": "initial_hypothesis_wrong",
|
| 45 |
+
"strategy_before": "Fix the parser.",
|
| 46 |
+
"strategy_after": "Narrow the fix to the upload boundary.",
|
| 47 |
+
"detour_type": "scope_narrowing",
|
| 48 |
+
"resolution_mode": "defensive_handling",
|
| 49 |
+
"recovery_pattern": "reflective_recovery",
|
| 50 |
+
"outcome_claim": "resolved_with_caveat",
|
| 51 |
+
"productive_detour": "yes",
|
| 52 |
+
"evidence_quotes": ["my initial assumption about the upload shape was wrong"],
|
| 53 |
+
"analyst_memo": "The agent names the wrong assumption and picks the smaller change.",
|
| 54 |
+
}
|
| 55 |
+
],
|
| 56 |
}
|
| 57 |
|
| 58 |
|
|
|
|
| 66 |
self.calls.append(
|
| 67 |
{"messages": messages, "model_id": model_id, "max_new_tokens": max_new_tokens}
|
| 68 |
)
|
| 69 |
+
return json.dumps(ANALYSIS_JSON)
|
| 70 |
|
| 71 |
|
| 72 |
class FakeTensor:
|
|
|
|
| 86 |
self.assertIn("NVIDIA Nemotron 3 Nano 30B-A3B", label)
|
| 87 |
self.assertNotIn("small", label.lower())
|
| 88 |
|
| 89 |
+
def test_minicpm_is_the_quick_engine(self) -> None:
|
| 90 |
+
self.assertEqual(MODEL_CHOICES["minicpm"]["model_id"], QUICK_MODEL_ID)
|
| 91 |
+
self.assertIn("MiniCPM5 1B", str(MODEL_CHOICES["minicpm"]["label"]))
|
| 92 |
+
self.assertNotIn("qwen", MODEL_CHOICES)
|
| 93 |
|
| 94 |
+
def test_minicpm_chat_template_disables_thinking(self) -> None:
|
| 95 |
+
self.assertEqual(_chat_template_kwargs(QUICK_MODEL_ID), {"enable_thinking": False})
|
| 96 |
+
self.assertEqual(_chat_template_kwargs(PRIMARY_MODEL_ID), {})
|
| 97 |
+
|
| 98 |
+
def test_resolve_device_honors_explicit_override(self) -> None:
|
| 99 |
+
self.assertEqual(resolve_device("cpu"), "cpu")
|
| 100 |
+
self.assertEqual(resolve_device("cuda"), "cuda")
|
| 101 |
+
self.assertEqual(resolve_device("mps"), "mps")
|
| 102 |
+
|
| 103 |
+
def test_parse_analysis_json_validates_shape(self) -> None:
|
| 104 |
+
parsed = parse_analysis_json(json.dumps(ANALYSIS_JSON))
|
| 105 |
|
| 106 |
+
self.assertEqual(len(parsed["episodes"]), 1)
|
| 107 |
+
self.assertEqual(parsed["verdict"]["tone"], "partial")
|
| 108 |
|
| 109 |
+
def test_parse_analysis_json_recovers_from_code_fence(self) -> None:
|
| 110 |
+
parsed = parse_analysis_json("```json\n" + json.dumps(ANALYSIS_JSON) + "\n```")
|
| 111 |
|
| 112 |
+
self.assertEqual(parsed["episodes"][0]["difficulty_type"], "localization_difficulty")
|
|
|
|
|
|
|
| 113 |
|
| 114 |
+
def test_parse_analysis_json_extracts_object_from_prose(self) -> None:
|
| 115 |
+
raw = "Here is the report:\n" + json.dumps(ANALYSIS_JSON) + "\nDone."
|
| 116 |
+
parsed = parse_analysis_json(raw)
|
| 117 |
|
| 118 |
+
self.assertEqual(parsed["verdict"]["honesty"], "candid")
|
| 119 |
+
|
| 120 |
+
def test_parse_analysis_json_uses_final_object_after_thinking_braces(self) -> None:
|
| 121 |
raw = (
|
| 122 |
"<think>Draft {not json} and a scratch object "
|
| 123 |
'{"draft": "ignore this"} before the final answer.</think>\n'
|
| 124 |
+
+ json.dumps(ANALYSIS_JSON)
|
| 125 |
)
|
| 126 |
+
parsed = parse_analysis_json(raw)
|
| 127 |
+
|
| 128 |
+
self.assertEqual(len(parsed["episodes"]), 1)
|
| 129 |
|
| 130 |
+
def test_parse_analysis_json_requires_episodes_list(self) -> None:
|
| 131 |
+
with self.assertRaises(ValueError):
|
| 132 |
+
parse_analysis_json(json.dumps({"verdict": {}, "overall_patterns": {}}))
|
| 133 |
|
| 134 |
+
def test_run_model_analysis_uses_selected_model(self) -> None:
|
|
|
|
| 135 |
generate = RecordingGenerator()
|
| 136 |
|
| 137 |
+
produced = run_model_analysis(
|
| 138 |
engine="nemotron",
|
| 139 |
+
numbered_narrative="[0] assistant 10:00: hello",
|
|
|
|
| 140 |
generate=generate,
|
| 141 |
)
|
| 142 |
|
| 143 |
+
self.assertEqual(produced.model_id, PRIMARY_MODEL_ID)
|
| 144 |
+
self.assertEqual(len(produced.analysis["episodes"]), 1)
|
| 145 |
self.assertEqual(generate.calls[0]["model_id"], PRIMARY_MODEL_ID)
|
| 146 |
self.assertEqual(generate.calls[0]["max_new_tokens"], MODEL_MAX_NEW_TOKENS)
|
| 147 |
|
|
|
|
| 166 |
self.assertEqual(generation_inputs["input_ids"], input_ids)
|
| 167 |
self.assertEqual(generation_inputs["attention_mask"], attention_mask)
|
| 168 |
self.assertEqual(prompt_tokens, 21)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 169 |
|
| 170 |
def test_analyzer_records_unknown_engine_note(self) -> None:
|
| 171 |
result, _ = analyze_trace_file(
|
|
|
|
| 177 |
self.assertIn("Unknown analysis engine", result.model_notes[0])
|
| 178 |
|
| 179 |
def test_analyzer_model_error_note_avoids_double_period(self) -> None:
|
| 180 |
+
with patch("analyzer.run_model_analysis", side_effect=ValueError("model unavailable.")):
|
| 181 |
result, _ = analyze_trace_file(
|
| 182 |
Path("examples/sample_trace_redacted.jsonl"),
|
| 183 |
+
analysis_engine="minicpm",
|
| 184 |
)
|
| 185 |
|
| 186 |
self.assertTrue(result.model_notes)
|
| 187 |
self.assertNotIn("..", result.model_notes[0])
|
| 188 |
self.assertIn("ValueError: model unavailable.", result.model_notes[0])
|
| 189 |
|
| 190 |
+
def test_analyzer_replaces_analysis_on_model_success(self) -> None:
|
| 191 |
+
with patch("analyzer.run_model_analysis") as run:
|
| 192 |
+
run.return_value = types.SimpleNamespace(
|
| 193 |
model_id=PRIMARY_MODEL_ID,
|
| 194 |
+
analysis=dict(ANALYSIS_JSON),
|
| 195 |
+
note=f"Analysis produced by {PRIMARY_MODEL_ID}.",
|
| 196 |
)
|
| 197 |
result, _ = analyze_trace_file(
|
| 198 |
Path("examples/sample_trace_redacted.jsonl"),
|
| 199 |
analysis_engine="nemotron",
|
| 200 |
)
|
| 201 |
|
| 202 |
+
self.assertEqual(result.engine, PRIMARY_MODEL_ID)
|
| 203 |
+
self.assertEqual(result.session_verdict["tone"], "partial")
|
| 204 |
+
self.assertEqual(result.episodes[0].episode_id, "E01")
|
| 205 |
+
self.assertEqual(result.episodes[0].difficulty_type, "localization_difficulty")
|
| 206 |
+
|
| 207 |
+
def test_analyzer_strips_placeholder_echoes(self) -> None:
|
| 208 |
+
bad = {
|
| 209 |
+
"verdict": {"tone": "stable", "headline": "<= 12 words", "detail": "2-4 sentences", "honesty": "candid"},
|
| 210 |
+
"overall_patterns": {},
|
| 211 |
+
"episodes": [
|
| 212 |
+
{
|
| 213 |
+
"start_index": 0,
|
| 214 |
+
"end_index": 0,
|
| 215 |
+
"title": "<= 10 words",
|
| 216 |
+
"reported_difficulty": "The build failed.",
|
| 217 |
+
"difficulty_type": "environment_blocker",
|
| 218 |
+
"analyst_memo": "1-3 sentences",
|
| 219 |
+
"evidence_quotes": ["short verbatim quote", "the build failed"],
|
| 220 |
+
"outcome_claim": "not_resolved",
|
| 221 |
+
}
|
| 222 |
+
],
|
| 223 |
+
}
|
| 224 |
+
with patch("analyzer.run_model_analysis") as run:
|
| 225 |
+
run.return_value = types.SimpleNamespace(model_id=QUICK_MODEL_ID, analysis=bad, note="ok")
|
| 226 |
+
result, _ = analyze_trace_file(
|
| 227 |
+
Path("examples/sample_trace_redacted.jsonl"), analysis_engine="minicpm"
|
| 228 |
+
)
|
| 229 |
+
|
| 230 |
+
episode = result.episodes[0]
|
| 231 |
+
self.assertEqual(episode.title, "The build failed.") # placeholder -> reported_difficulty
|
| 232 |
+
self.assertEqual(episode.analyst_memo, "") # "1-3 sentences" stripped
|
| 233 |
+
self.assertEqual(episode.evidence_quotes, ["the build failed"]) # placeholder quote dropped
|
| 234 |
+
self.assertNotIn("<", result.session_verdict["headline"])
|
| 235 |
|
| 236 |
|
| 237 |
if __name__ == "__main__":
|
tests/test_privacy_filter.py
ADDED
|
@@ -0,0 +1,179 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
import unittest
|
| 4 |
+
from pathlib import Path
|
| 5 |
+
|
| 6 |
+
from analyzer import stream_deterministic_analysis
|
| 7 |
+
from privacy_filter import PII_TYPES, redact_texts
|
| 8 |
+
from redaction import RedactionResult
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
def fake_detect(texts: list[str]) -> list[list[dict]]:
|
| 12 |
+
"""Stand-in detector: flags "Alice Smith" and "555-1234" without torch."""
|
| 13 |
+
|
| 14 |
+
results = []
|
| 15 |
+
for text in texts:
|
| 16 |
+
spans = []
|
| 17 |
+
person = text.find("Alice Smith")
|
| 18 |
+
if person != -1:
|
| 19 |
+
spans.append({"start": person, "end": person + len("Alice Smith"), "label": "private_person"})
|
| 20 |
+
phone = text.find("555-1234")
|
| 21 |
+
if phone != -1:
|
| 22 |
+
spans.append({"start": phone, "end": phone + len("555-1234"), "label": "private_phone"})
|
| 23 |
+
results.append(spans)
|
| 24 |
+
return results
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
def _drain(stream):
|
| 28 |
+
result = None
|
| 29 |
+
for kind, payload in stream:
|
| 30 |
+
if kind == "result":
|
| 31 |
+
result = payload[0]
|
| 32 |
+
assert result is not None
|
| 33 |
+
return result
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
class PrivacyFilterMaskingTests(unittest.TestCase):
|
| 37 |
+
def test_redact_texts_masks_detected_spans(self) -> None:
|
| 38 |
+
texts = ["Call Alice Smith at 555-1234 tomorrow.", "no pii here"]
|
| 39 |
+
|
| 40 |
+
results = redact_texts(texts, detect=fake_detect)
|
| 41 |
+
|
| 42 |
+
self.assertIsInstance(results[0], RedactionResult)
|
| 43 |
+
self.assertNotIn("Alice Smith", results[0].text)
|
| 44 |
+
self.assertNotIn("555-1234", results[0].text)
|
| 45 |
+
self.assertIn(PII_TYPES["private_person"][0], results[0].text)
|
| 46 |
+
self.assertIn(PII_TYPES["private_phone"][0], results[0].text)
|
| 47 |
+
self.assertEqual(results[0].count, 2)
|
| 48 |
+
self.assertEqual(results[1].count, 0)
|
| 49 |
+
self.assertEqual(results[1].text, "no pii here")
|
| 50 |
+
|
| 51 |
+
def test_notes_are_human_readable(self) -> None:
|
| 52 |
+
results = redact_texts(["Alice Smith"], detect=fake_detect)
|
| 53 |
+
|
| 54 |
+
self.assertIn("personal name: 1", results[0].notes)
|
| 55 |
+
|
| 56 |
+
def test_malformed_and_overlapping_spans_are_skipped(self) -> None:
|
| 57 |
+
def detect(texts: list[str]) -> list[list[dict]]:
|
| 58 |
+
return [
|
| 59 |
+
[
|
| 60 |
+
{"start": 0, "end": 999, "label": "secret"}, # out of range
|
| 61 |
+
{"start": 2, "end": 2, "label": "secret"}, # zero width
|
| 62 |
+
]
|
| 63 |
+
]
|
| 64 |
+
|
| 65 |
+
results = redact_texts(["abc"], detect=detect)
|
| 66 |
+
|
| 67 |
+
self.assertEqual(results[0].text, "abc")
|
| 68 |
+
self.assertEqual(results[0].count, 0)
|
| 69 |
+
|
| 70 |
+
def test_unknown_labels_are_ignored(self) -> None:
|
| 71 |
+
def detect(texts: list[str]) -> list[list[dict]]:
|
| 72 |
+
return [[{"start": 0, "end": 3, "label": "not_a_pii_type"}]]
|
| 73 |
+
|
| 74 |
+
results = redact_texts(["abc"], detect=detect)
|
| 75 |
+
|
| 76 |
+
self.assertEqual(results[0].text, "abc")
|
| 77 |
+
self.assertEqual(results[0].count, 0)
|
| 78 |
+
|
| 79 |
+
def test_bioes_fragments_merge_into_one_placeholder(self) -> None:
|
| 80 |
+
# The real model fragments "Alice Smith" into touching same-label spans
|
| 81 |
+
# ("Alice" + " Smith"); they must collapse to a single placeholder.
|
| 82 |
+
def detect(texts: list[str]) -> list[list[dict]]:
|
| 83 |
+
return [
|
| 84 |
+
[
|
| 85 |
+
{"start": 0, "end": 5, "label": "private_person"}, # Alice
|
| 86 |
+
{"start": 5, "end": 11, "label": "private_person"}, # " Smith"
|
| 87 |
+
]
|
| 88 |
+
]
|
| 89 |
+
|
| 90 |
+
results = redact_texts(["Alice Smith calls"], detect=detect)
|
| 91 |
+
|
| 92 |
+
self.assertEqual(results[0].text.count("[REDACTED_NAME]"), 1)
|
| 93 |
+
self.assertEqual(results[0].count, 1)
|
| 94 |
+
self.assertEqual(results[0].text, "[REDACTED_NAME] calls")
|
| 95 |
+
|
| 96 |
+
def test_same_label_spans_with_one_char_gap_merge(self) -> None:
|
| 97 |
+
def detect(texts: list[str]) -> list[list[dict]]:
|
| 98 |
+
return [
|
| 99 |
+
[
|
| 100 |
+
{"start": 0, "end": 5, "label": "private_person"}, # Alice
|
| 101 |
+
{"start": 6, "end": 11, "label": "private_person"}, # Smith (gap = space)
|
| 102 |
+
]
|
| 103 |
+
]
|
| 104 |
+
|
| 105 |
+
results = redact_texts(["Alice Smith"], detect=detect)
|
| 106 |
+
|
| 107 |
+
self.assertEqual(results[0].count, 1)
|
| 108 |
+
|
| 109 |
+
def test_different_label_adjacent_spans_stay_separate(self) -> None:
|
| 110 |
+
def detect(texts: list[str]) -> list[list[dict]]:
|
| 111 |
+
return [
|
| 112 |
+
[
|
| 113 |
+
{"start": 0, "end": 5, "label": "private_person"},
|
| 114 |
+
{"start": 6, "end": 14, "label": "private_phone"},
|
| 115 |
+
]
|
| 116 |
+
]
|
| 117 |
+
|
| 118 |
+
results = redact_texts(["Alice 555-1234"], detect=detect)
|
| 119 |
+
|
| 120 |
+
self.assertEqual(results[0].count, 2)
|
| 121 |
+
self.assertIn(PII_TYPES["private_person"][0], results[0].text)
|
| 122 |
+
self.assertIn(PII_TYPES["private_phone"][0], results[0].text)
|
| 123 |
+
|
| 124 |
+
|
| 125 |
+
class StreamRedactionIntegrationTests(unittest.TestCase):
|
| 126 |
+
SAMPLE = Path("examples/sample_trace_redacted.jsonl")
|
| 127 |
+
|
| 128 |
+
def test_stream_records_ai_privacy_note_when_model_runs(self) -> None:
|
| 129 |
+
def passthrough(texts: list[str]) -> list[RedactionResult]:
|
| 130 |
+
return [RedactionResult(text=text, notes=[], count=0) for text in texts]
|
| 131 |
+
|
| 132 |
+
result = _drain(stream_deterministic_analysis(self.SAMPLE, model_redact=passthrough))
|
| 133 |
+
|
| 134 |
+
self.assertTrue(any("AI privacy filter (openai/privacy-filter)" in note for note in result.privacy_notes))
|
| 135 |
+
|
| 136 |
+
def test_stream_falls_back_gracefully_when_model_unavailable(self) -> None:
|
| 137 |
+
def boom(texts: list[str]) -> list[RedactionResult]:
|
| 138 |
+
raise RuntimeError("no gpu here")
|
| 139 |
+
|
| 140 |
+
result = _drain(stream_deterministic_analysis(self.SAMPLE, model_redact=boom))
|
| 141 |
+
|
| 142 |
+
self.assertTrue(any("AI privacy filter was unavailable" in note for note in result.privacy_notes))
|
| 143 |
+
# Regex redaction still ran on the sample (it embeds an email + token).
|
| 144 |
+
self.assertGreater(result.redaction_count, 0)
|
| 145 |
+
|
| 146 |
+
def test_redact_progress_streams_per_chunk(self) -> None:
|
| 147 |
+
events = [
|
| 148 |
+
payload
|
| 149 |
+
for kind, payload in stream_deterministic_analysis(
|
| 150 |
+
self.SAMPLE, stream_redact_progress=True
|
| 151 |
+
)
|
| 152 |
+
if kind == "progress" and payload.get("stage") == "redact"
|
| 153 |
+
]
|
| 154 |
+
|
| 155 |
+
# 4-message sample -> chunk size 1 -> one redact event per message.
|
| 156 |
+
self.assertGreaterEqual(len(events), 2)
|
| 157 |
+
processed = [event["processed"] for event in events]
|
| 158 |
+
self.assertEqual(processed, sorted(processed)) # monotonically advancing
|
| 159 |
+
self.assertEqual(events[-1]["processed"], events[-1]["total"]) # finishes at total
|
| 160 |
+
self.assertTrue(all(event["total"] == events[0]["total"] for event in events))
|
| 161 |
+
|
| 162 |
+
def test_model_redaction_count_adds_to_regex_count(self) -> None:
|
| 163 |
+
def mask_first_word(texts: list[str]) -> list[RedactionResult]:
|
| 164 |
+
out = []
|
| 165 |
+
for text in texts:
|
| 166 |
+
if text:
|
| 167 |
+
out.append(RedactionResult(text="[REDACTED_NAME]" + text, notes=["personal name: 1"], count=1))
|
| 168 |
+
else:
|
| 169 |
+
out.append(RedactionResult(text=text, notes=[], count=0))
|
| 170 |
+
return out
|
| 171 |
+
|
| 172 |
+
regex_only = _drain(stream_deterministic_analysis(self.SAMPLE))
|
| 173 |
+
combined = _drain(stream_deterministic_analysis(self.SAMPLE, model_redact=mask_first_word))
|
| 174 |
+
|
| 175 |
+
self.assertGreater(combined.redaction_count, regex_only.redaction_count)
|
| 176 |
+
|
| 177 |
+
|
| 178 |
+
if __name__ == "__main__":
|
| 179 |
+
unittest.main()
|
tests/test_profiling.py
ADDED
|
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
import unittest
|
| 4 |
+
|
| 5 |
+
from profiling import Profiler, format_snapshot, resource_snapshot
|
| 6 |
+
|
| 7 |
+
|
| 8 |
+
class ProfilingTests(unittest.TestCase):
|
| 9 |
+
def test_resource_snapshot_never_raises_and_returns_dict(self) -> None:
|
| 10 |
+
snap = resource_snapshot()
|
| 11 |
+
self.assertIsInstance(snap, dict)
|
| 12 |
+
|
| 13 |
+
def test_format_snapshot_is_string(self) -> None:
|
| 14 |
+
self.assertIsInstance(format_snapshot(resource_snapshot()), str)
|
| 15 |
+
self.assertEqual(format_snapshot({}), "n/a")
|
| 16 |
+
|
| 17 |
+
def test_profiler_records_stages_meta_and_summarizes(self) -> None:
|
| 18 |
+
prof = Profiler("test")
|
| 19 |
+
prof.record("extract", 0.012)
|
| 20 |
+
prof.record("redact", 0.034)
|
| 21 |
+
prof.mark(messages=4, engine="deterministic")
|
| 22 |
+
|
| 23 |
+
self.assertEqual([name for name, _ in prof.stages], ["extract", "redact"])
|
| 24 |
+
self.assertEqual(prof.meta["messages"], 4)
|
| 25 |
+
self.assertGreaterEqual(prof.elapsed(), 0.0)
|
| 26 |
+
prof.summary() # must not raise
|
| 27 |
+
|
| 28 |
+
def test_stage_context_manager_records_duration(self) -> None:
|
| 29 |
+
prof = Profiler("test")
|
| 30 |
+
with prof.stage("chart"):
|
| 31 |
+
pass
|
| 32 |
+
self.assertEqual(prof.stages[-1][0], "chart")
|
| 33 |
+
self.assertGreaterEqual(prof.stages[-1][1], 0.0)
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
if __name__ == "__main__":
|
| 37 |
+
unittest.main()
|
view_model.py
CHANGED
|
@@ -71,7 +71,7 @@ def build_view_model(
|
|
| 71 |
"narrative_message_count": base["narrative_message_count"],
|
| 72 |
"redaction_count": base["redaction_count"],
|
| 73 |
"duration_total": _duration_total(raw_episodes),
|
| 74 |
-
"verdict": _verdict(episodes, base["overall_patterns"], result.model_memo),
|
| 75 |
"overall_patterns": base["overall_patterns"],
|
| 76 |
"privacy_notes": list(base["privacy_notes"]) + list(base.get("model_notes") or []),
|
| 77 |
"episodes": episodes,
|
|
|
|
| 71 |
"narrative_message_count": base["narrative_message_count"],
|
| 72 |
"redaction_count": base["redaction_count"],
|
| 73 |
"duration_total": _duration_total(raw_episodes),
|
| 74 |
+
"verdict": base.get("session_verdict") or _verdict(episodes, base["overall_patterns"], result.model_memo),
|
| 75 |
"overall_patterns": base["overall_patterns"],
|
| 76 |
"privacy_notes": list(base["privacy_notes"]) + list(base.get("model_notes") or []),
|
| 77 |
"episodes": episodes,
|