Spaces:

build-small-hackathon
/

trace-field-notes

Running on Zero

App Files Files Community

JacobLinCool Codex commited on 25 days ago

Commit

8457788

verified ·

1 Parent(s): 92537b9

feat: add privacy filtering and execution modes

Browse files

Co-authored-by: Codex <noreply@openai.com>

Files changed (14) hide show

README.md +46 -11
analyzer.py +287 -46
app.py +153 -19
frontend/static/app.jsx +71 -12
frontend/static/components.jsx +13 -3
model_runtime.py +198 -73
privacy_filter.py +180 -0
profiling.py +125 -0
requirements.txt +2 -1
schemas.py +2 -0
tests/test_model_runtime.py +115 -45
tests/test_privacy_filter.py +179 -0
tests/test_profiling.py +37 -0
view_model.py +1 -1

README.md CHANGED Viewed

@@ -20,11 +20,11 @@ it claimed completion.
 Built for the Build Small Hackathon. The frontend is a custom React field-notebook
 UI (a trail map of the session) served by `gradio.Server`; it calls the Python
-`analyze_trace` endpoint through `@gradio/client`. Both models run on the Space
-GPU through ZeroGPU: a quick `Qwen/Qwen3.5-9B` pass by default, and the larger
-`nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16` for deeper analysis. A verified
-deterministic codebook analyzer is the always-available recovery path and needs
-no model or GPU.
 ## Architecture
@@ -39,6 +39,8 @@ no model or GPU.
   renders (synthesizes the whole-session `verdict`, `captured`, `duration_total`).
 - `analyzer.py` / `parser.py` / `redaction.py` / `schemas.py` — the deterministic
   pipeline. `model_runtime.py` — the optional small-model assist on ZeroGPU.
 ## Run Locally
@@ -57,7 +59,7 @@ python3.11 -m unittest discover -s tests
 ## Analysis Engines
-- `Qwen3.5 9B — quick analysis`: default model pass on the Space GPU.
 - `NVIDIA Nemotron 3 Nano 30B-A3B — deeper analysis`: the larger model on the
   Space GPU for a richer memo.
 - `Rule-based — instant, no model`: local codebook analyzer, no model or GPU.
@@ -67,10 +69,41 @@ in model notes and returns the deterministic analysis instead of failing the
 whole Space.
 The model-backed analysis runs under `@spaces.GPU(size="xlarge")` so the weights
-load on Hugging Face ZeroGPU hardware; `Qwen/Qwen3.5-9B` and
 `nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16` are loaded with `transformers` and
-cached across requests. The rule-based engine runs on CPU and never requests a
-GPU slot, so it returns instantly.
 ## Agent Session Locations
@@ -89,5 +122,7 @@ ls ~/.pi/agent/sessions
 Agent traces can contain prompts, tool inputs, command outputs, local file paths,
 screenshots, secrets, private source code, and personal data. Review and redact
-before uploading or sharing publicly. The app defaults to basic regex redaction
-and exports only a redacted narrative text file.

 Built for the Build Small Hackathon. The frontend is a custom React field-notebook
 UI (a trail map of the session) served by `gradio.Server`; it calls the Python
+`analyze_trace` endpoint through `@gradio/client`. Both analysis models run on the
+Space GPU through ZeroGPU: a quick `openbmb/MiniCPM5-1B` pass by default, and the
+larger `nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16` for deeper analysis. Redaction
+adds a PII pass with `openai/privacy-filter`. A verified deterministic codebook
+analyzer is the always-available recovery path and needs no model or GPU.
 ## Architecture
   renders (synthesizes the whole-session `verdict`, `captured`, `duration_total`).
 - `analyzer.py` / `parser.py` / `redaction.py` / `schemas.py` — the deterministic
   pipeline. `model_runtime.py` — the optional small-model assist on ZeroGPU.
+  `privacy_filter.py` — the optional `openai/privacy-filter` PII redaction pass.
+  `profiling.py` — logging + per-request stage timing and resource probes.
 ## Run Locally
 ## Analysis Engines
+- `MiniCPM5 1B — quick analysis`: default model pass on the Space GPU.
 - `NVIDIA Nemotron 3 Nano 30B-A3B — deeper analysis`: the larger model on the
   Space GPU for a richer memo.
 - `Rule-based — instant, no model`: local codebook analyzer, no model or GPU.
 whole Space.
 The model-backed analysis runs under `@spaces.GPU(size="xlarge")` so the weights
+load on Hugging Face ZeroGPU hardware; `openbmb/MiniCPM5-1B` and
 `nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16` are loaded with `transformers` and
+cached across requests. The deterministic codebook analysis itself runs on CPU;
+only the model assist and the `openai/privacy-filter` redaction pass use the GPU,
+and both fall back gracefully (deterministic analysis / regex-only redaction)
+when no GPU model is available.
+## Execution modes
+Each `analyze_trace` call takes an `execution_mode`:
+- `zerogpu` (default): the model passes run inside `@spaces.GPU` on the Space GPU.
+- `cpu`: the model passes run on the Space (or local) CPU with **no GPU quota** —
+  slower, but it still works when ZeroGPU quota is exhausted. The frontend exposes
+  this as a **Run on** choice so users without quota can still use the app.
+Model loading is device-aware (CUDA → Apple MPS → CPU), so the app also runs
+locally for development; on a Mac the small models run on MPS, and the
+deterministic engine needs no model at all. Because of the slower paths, the
+frontend streams real progress — current stage, % complete, messages processed,
+elapsed time, and a best-effort ETA — so a long run never looks stuck.
+## Logging & profiling
+The pipeline writes diagnostics to the standard logger (never the UI): per-request
+message count, per-stage timing, total time, model load/inference time with the
+device used, and a resource snapshot (process RSS, system memory, CPU, and
+GPU/MPS memory). Set the level with `TFN_LOG_LEVEL` (default `INFO`; use `DEBUG`
+for per-stage detail). Example summary line:
+```
+analyze[zerogpu/minicpm] done in 19.4s | messages=4 redactions=2 episodes=1
+  | stages: extract=0ms, redact=9503ms, chart=4ms, classify=0ms, model_assist=9918ms
+  | rss=2180MB sysmem=68% mps=4732MB
+```
 ## Agent Session Locations
 Agent traces can contain prompts, tool inputs, command outputs, local file paths,
 screenshots, secrets, private source code, and personal data. Review and redact
+before uploading or sharing publicly. Redaction defaults to regex patterns plus a
+model pass (`openai/privacy-filter`) that flags names, contacts, and other
+personal data on the Space GPU; the regex pass is the always-available fallback
+when the model is not loaded. The app exports only a redacted narrative text file.

analyzer.py CHANGED Viewed

@@ -3,15 +3,30 @@
 from __future__ import annotations
 import re
 from collections import Counter
 from datetime import datetime, timezone
 from pathlib import Path
 from typing import Iterable
-from model_runtime import MODEL_CHOICES, run_model_assist
 from parser import parse_trace
 from redaction import redact_text
-from schemas import AnalysisResult, DifficultyEpisode, MessageSpan, NarrativeMessage
 ANALYSIS_SCOPE = (
@@ -143,26 +158,51 @@ PROBLEM_EVIDENCE_SIGNALS = {
 ANALYSIS_STEPS = ("extract", "redact", "chart", "classify", "synthesize")
 def stream_deterministic_analysis(
     path: str | Path,
     *,
     include_user_context: bool = True,
     redact_secrets: bool = True,
     ignore_tool_calls: bool = True,
 ):
     """Run the deterministic pipeline as a generator.
-    Yields ``("step", name)`` after each real stage completes (the names in
-    :data:`ANALYSIS_STEPS`), then a final ``("result", (AnalysisResult, str))``.
-    Callers that don't care about progress can just drain it for the tuple.
     """
     parsed_messages, agent_type = parse_trace(
         path,
         include_user_context=include_user_context,
         ignore_tool_calls=ignore_tool_calls,
     )
-    yield ("step", "extract")
     redaction_count = 0
     privacy_notes = [
@@ -172,26 +212,79 @@ def stream_deterministic_analysis(
     if ignore_tool_calls:
         privacy_notes.append("Tool-call contents were ignored before analysis.")
     messages = parsed_messages
     if redact_secrets:
-        redacted_messages: list[NarrativeMessage] = []
         all_notes: Counter[str] = Counter()
-        for message in parsed_messages:
-            red = redact_text(message.text)
-            redaction_count += red.count
-            for note in red.notes:
-                label, _, count = note.partition(": ")
-                all_notes[label] += int(count or 0)
-            redacted_messages.append(
-                NarrativeMessage(
-                    index=message.index,
-                    role=message.role,
-                    text=red.text,
-                    timestamp=message.timestamp,
-                    source=message.source,
                 )
             )
         messages = redacted_messages
         if all_notes:
             privacy_notes.append(
                 "Redactions applied: "
@@ -199,14 +292,22 @@ def stream_deterministic_analysis(
                 + "."
             )
         else:
-            privacy_notes.append("No likely secrets matched the built-in redaction patterns.")
     else:
         privacy_notes.append("Secret redaction was disabled by the user.")
-    yield ("step", "redact")
     episodes = identify_episodes(messages)
-    yield ("step", "chart")
     result = AnalysisResult(
         trace_title=derive_trace_title(path, agent_type),
         agent_type_guess=agent_type,
@@ -218,55 +319,194 @@ def stream_deterministic_analysis(
         redaction_count=redaction_count,
         engine="deterministic-codebook",
     )
-    yield ("step", "classify")
     narrative_text = render_redacted_narrative(messages)
-    yield ("step", "synthesize")
-    yield ("result", (result, narrative_text))
-def apply_model_assist(
     result: AnalysisResult,
-    narrative_text: str,
     analysis_engine: str,
     *,
     run=None,
 ) -> None:
-    """Augment a deterministic result with model assist, with graceful fallback.
-    ``run`` defaults to the module-level :func:`run_model_assist` (resolved at
-    call time so tests can monkeypatch it); the Server passes a GPU-wrapped
-    runner so model inference happens inside a ``@spaces.GPU`` allocation. The
-    result is mutated in place; any failure leaves the deterministic result and
-    records the reason in ``model_notes``.
     """
     if analysis_engine == "deterministic":
         return
     if analysis_engine not in MODEL_CHOICES:
         result.model_notes.append(
-            f"Unknown analysis engine {analysis_engine!r}; deterministic analysis was returned."
         )
         return
-    runner = run or run_model_assist
     try:
-        assist = runner(
             engine=analysis_engine,
-            result=result,
-            narrative_text=narrative_text,
         )
     except Exception as exc:
         error_message = str(exc).strip().rstrip(".")
         result.model_notes.append(
-            "Model assist was requested but unavailable: "
             f"{type(exc).__name__}: {error_message}. "
-            "Deterministic analysis was returned."
         )
     else:
-        result.engine = f"deterministic-codebook + {assist.model_id}"
-        result.model_memo = assist.memo
-        result.model_notes.append(assist.note)
 def analyze_trace_file(
@@ -282,6 +522,7 @@ def analyze_trace_file(
     result: AnalysisResult | None = None
     narrative_text = ""
     for kind, payload in stream_deterministic_analysis(
         path,
         include_user_context=include_user_context,
@@ -289,9 +530,9 @@ def analyze_trace_file(
         ignore_tool_calls=ignore_tool_calls,
     ):
         if kind == "result":
-            result, narrative_text = payload
     assert result is not None
-    apply_model_assist(result, narrative_text, analysis_engine)
     return result, narrative_text

 from __future__ import annotations
 import re
+import time
 from collections import Counter
 from datetime import datetime, timezone
 from pathlib import Path
 from typing import Iterable
+from model_runtime import MODEL_CHOICES, run_model_analysis
 from parser import parse_trace
+from profiling import Profiler, get_logger
 from redaction import redact_text
+from schemas import (
+    APPRAISALS,
+    DETOUR_TYPES,
+    DIFFICULTY_TYPES,
+    OUTCOME_CLAIMS,
+    RECOVERY_PATTERNS,
+    RESOLUTION_MODES,
+    AnalysisResult,
+    DifficultyEpisode,
+    MessageSpan,
+    NarrativeMessage,
+)
+logger = get_logger()
 ANALYSIS_SCOPE = (
 ANALYSIS_STEPS = ("extract", "redact", "chart", "classify", "synthesize")
+def _accumulate_notes(counter: Counter[str], notes: Iterable[str]) -> None:
+    """Fold ``"label: count"`` note strings into a running counter."""
+    for note in notes:
+        label, _, count = note.partition(": ")
+        counter[label] += int(count or 0)
 def stream_deterministic_analysis(
     path: str | Path,
     *,
     include_user_context: bool = True,
     redact_secrets: bool = True,
     ignore_tool_calls: bool = True,
+    model_redact=None,
+    profiler: Profiler | None = None,
+    stream_redact_progress: bool = False,
 ):
     """Run the deterministic pipeline as a generator.
+    Yields ``("progress", info)`` after each real stage completes — ``info`` has
+    a ``stage`` name (one of :data:`ANALYSIS_STEPS`) and the running ``messages``
+    count — then a final ``("result", (AnalysisResult, str))``. Callers that
+    don't care about progress can just drain it for the tuple.
+    ``model_redact`` is an optional ``(list[str]) -> list[RedactionResult]``
+    callable applied on top of regex redaction; the Server injects a GPU- or
+    CPU-bound ``openai/privacy-filter`` pass. It is absent locally and in tests,
+    so redaction falls back to regex only. ``profiler`` collects per-stage
+    timings; one is created if not supplied.
     """
+    prof = profiler or Profiler("deterministic")
+    _started = time.perf_counter()
     parsed_messages, agent_type = parse_trace(
         path,
         include_user_context=include_user_context,
         ignore_tool_calls=ignore_tool_calls,
     )
+    prof.record("extract", time.perf_counter() - _started)
+    message_count = len(parsed_messages)
+    prof.mark(messages=message_count, agent=agent_type)
+    logger.info("parsed %d narrative messages (agent=%s)", message_count, agent_type)
+    yield ("progress", {"stage": "extract", "messages": message_count})
     redaction_count = 0
     privacy_notes = [
     if ignore_tool_calls:
         privacy_notes.append("Tool-call contents were ignored before analysis.")
+    _redact_started = time.perf_counter()
     messages = parsed_messages
     if redact_secrets:
         all_notes: Counter[str] = Counter()
+        redacted_messages: list[NarrativeMessage] = []
+        model_used = False
+        model_failed = False
+        # Process in chunks so slow (CPU) runs can stream per-message progress.
+        # Without streaming (ZeroGPU) it is a single chunk = one GPU allocation;
+        # with streaming the update count is capped at ~30 regardless of size.
+        if stream_redact_progress and message_count:
+            chunk = max(1, (message_count + 29) // 30)
+        else:
+            chunk = message_count or 1
+        for start in range(0, message_count, chunk):
+            chunk_messages = parsed_messages[start : start + chunk]
+            # Pass 1: deterministic regex redaction (always available).
+            regex_results = [redact_text(message.text) for message in chunk_messages]
+            texts = [red.text for red in regex_results]
+            # Pass 2: optional model PII pass on top. The Server injects a GPU- or
+            # CPU-bound openai/privacy-filter pass; it is absent locally and in
+            # tests, so regex-only redaction is used. Once it is unavailable we
+            # stop retrying it for the rest of the trace.
+            model_results = None
+            if model_redact is not None and not model_failed:
+                try:
+                    model_results = model_redact(texts)
+                    model_used = True
+                except Exception as exc:  # noqa: BLE001 - graceful degradation
+                    privacy_notes.append(
+                        "AI privacy filter was unavailable "
+                        f"({type(exc).__name__}); regex redaction was applied."
+                    )
+                    model_failed = True
+                    model_results = None
+            for i, message in enumerate(chunk_messages):
+                text = texts[i]
+                redaction_count += regex_results[i].count
+                _accumulate_notes(all_notes, regex_results[i].notes)
+                if model_results is not None:
+                    text = model_results[i].text
+                    redaction_count += model_results[i].count
+                    _accumulate_notes(all_notes, model_results[i].notes)
+                redacted_messages.append(
+                    NarrativeMessage(
+                        index=message.index,
+                        role=message.role,
+                        text=text,
+                        timestamp=message.timestamp,
+                        source=message.source,
+                    )
                 )
+            yield (
+                "progress",
+                {
+                    "stage": "redact",
+                    "processed": min(start + chunk, message_count),
+                    "total": message_count,
+                },
             )
         messages = redacted_messages
+        if model_used:
+            privacy_notes.append(
+                "AI privacy filter (openai/privacy-filter) screened for names, "
+                "contacts, and other personal data."
+            )
         if all_notes:
             privacy_notes.append(
                 "Redactions applied: "
                 + "."
             )
         else:
+            privacy_notes.append("No likely secrets matched the redaction patterns.")
     else:
         privacy_notes.append("Secret redaction was disabled by the user.")
+    prof.record("redact", time.perf_counter() - _redact_started)
+    prof.mark(redactions=redaction_count)
+    if not redact_secrets or message_count == 0:
+        # No chunk loop ran (redaction disabled or empty trace) — still advance.
+        yield ("progress", {"stage": "redact", "processed": message_count, "total": message_count})
+    _chart_started = time.perf_counter()
     episodes = identify_episodes(messages)
+    prof.record("chart", time.perf_counter() - _chart_started)
+    prof.mark(episodes=len(episodes))
+    yield ("progress", {"stage": "chart", "messages": message_count})
+    _classify_started = time.perf_counter()
     result = AnalysisResult(
         trace_title=derive_trace_title(path, agent_type),
         agent_type_guess=agent_type,
         redaction_count=redaction_count,
         engine="deterministic-codebook",
     )
+    prof.record("classify", time.perf_counter() - _classify_started)
+    yield ("progress", {"stage": "classify", "messages": message_count})
+    _synth_started = time.perf_counter()
     narrative_text = render_redacted_narrative(messages)
+    prof.record("synthesize", time.perf_counter() - _synth_started)
+    yield ("progress", {"stage": "synthesize", "messages": message_count})
+    yield ("result", (result, narrative_text, messages))
+_PRODUCTIVE_VALUES = {"yes", "no", "mixed", "unknown"}
+_VALID_TONES = {"stable", "iterative", "detour", "partial", "risk", "unknown"}
+_VALID_HONESTY = {"candid", "mixed", "overclaimed"}
+def build_numbered_narrative(
+    messages: list[NarrativeMessage], *, char_budget: int = 16000, per_message: int = 320
+) -> str:
+    """Number the (redacted) messages by real index for the model.
+    Long traces are sampled evenly across the session (keeping the first and last)
+    so the model sees the whole timeline within its context budget; each line keeps
+    the message's real index and timestamp so the model can cite spans.
+    """
+    if not messages:
+        return ""
+    max_messages = max(1, char_budget // per_message)
+    if len(messages) <= max_messages:
+        chosen = messages
+    else:
+        stride = len(messages) / max_messages
+        picks = sorted({0, len(messages) - 1, *(int(i * stride) for i in range(max_messages))})
+        chosen = [messages[i] for i in picks if 0 <= i < len(messages)]
+    lines = []
+    for message in chosen:
+        snippet = " ".join(message.text.split())[:per_message]
+        lines.append(f"[{message.index}] {message.role} {message.timestamp or ''}: {snippet}")
+    return "\n".join(lines)
+def build_codebook_hint(episodes: list[DifficultyEpisode]) -> str:
+    if not episodes:
+        return "(none)"
+    return "; ".join(
+        f"{ep.episode_id} msgs {ep.message_span.start_index}-{ep.message_span.end_index}"
+        for ep in episodes[:12]
+    )
+def _coerce_code(value: object, vocab: dict[str, str]) -> str:
+    code = str(value or "").strip()
+    return code if code in vocab else "unknown"
+# Weak models sometimes echo the schema placeholders verbatim; drop those.
+_PLACEHOLDER_RE = re.compile(
+    r"^\s*(<.*>|<=.*|\d+(\s*-\s*\d+)?\s+sentences?.*|one key.*|short verbatim.*|up to \d+.*|a message index.*)\s*$",
+    re.IGNORECASE,
+)
+def _clean_text(value: object) -> str:
+    text = str(value or "").strip()
+    if not text or _PLACEHOLDER_RE.match(text):
+        return ""
+    return text
+def _clean_verdict(verdict: dict) -> dict[str, str]:
+    tone = str(verdict.get("tone", "")).strip().lower()
+    honesty = str(verdict.get("honesty", "")).strip().lower()
+    return {
+        "tone": tone if tone in _VALID_TONES else "unknown",
+        "headline": _clean_text(verdict.get("headline")) or "Session analyzed by the model.",
+        "detail": _clean_text(verdict.get("detail")),
+        "honesty": honesty if honesty in _VALID_HONESTY else "mixed",
+    }
+def _episode_from_model(
+    raw: dict, ordinal: int, index_to_timestamp: dict[int, str | None], max_index: int
+) -> DifficultyEpisode:
+    def clamp(value: object) -> int:
+        try:
+            return max(0, min(int(value), max_index))
+        except (TypeError, ValueError):
+            return 0
+    start = clamp(raw.get("start_index", 0))
+    end = clamp(raw.get("end_index", start))
+    if end < start:
+        start, end = end, start
+    start_time = index_to_timestamp.get(start)
+    end_time = index_to_timestamp.get(end)
+    span = MessageSpan(
+        start_index=start,
+        end_index=end,
+        start_time=start_time,
+        end_time=end_time,
+        duration_label=duration_label(start_time, end_time) if start_time and end_time else "unknown",
+    )
+    productive = str(raw.get("productive_detour", "unknown")).strip().lower()
+    quotes = [cleaned for q in (raw.get("evidence_quotes") or []) if (cleaned := _clean_text(q))][:3]
+    difficulty = _clean_text(raw.get("reported_difficulty"))
+    title = _clean_text(raw.get("title")) or (difficulty[:60] if difficulty else "Difficulty episode")
+    return DifficultyEpisode(
+        episode_id=f"E{ordinal:02d}",
+        title=title,
+        message_span=span,
+        initial_intention=_clean_text(raw.get("initial_intention")),
+        reported_difficulty=difficulty,
+        difficulty_type=_coerce_code(raw.get("difficulty_type"), DIFFICULTY_TYPES),
+        appraisal=_coerce_code(raw.get("appraisal"), APPRAISALS),
+        strategy_before=_clean_text(raw.get("strategy_before")),
+        strategy_after=_clean_text(raw.get("strategy_after")),
+        detour_type=_coerce_code(raw.get("detour_type"), DETOUR_TYPES),
+        resolution_mode=_coerce_code(raw.get("resolution_mode"), RESOLUTION_MODES),
+        recovery_pattern=_coerce_code(raw.get("recovery_pattern"), RECOVERY_PATTERNS),
+        outcome_claim=_coerce_code(raw.get("outcome_claim"), OUTCOME_CLAIMS),
+        productive_detour=productive if productive in _PRODUCTIVE_VALUES else "unknown",
+        evidence_quotes=quotes,
+        analyst_memo=_clean_text(raw.get("analyst_memo")),
+    )
+def apply_model_analysis(
     result: AnalysisResult,
+    messages: list[NarrativeMessage],
     analysis_engine: str,
     *,
     run=None,
 ) -> None:
+    """Replace the deterministic analysis with a model-produced one (codebook is the fallback).
+    ``run`` defaults to :func:`run_model_analysis` (resolved at call time so tests
+    can monkeypatch it); the Server passes a GPU- or CPU-bound runner. On success
+    the model's episodes, overall patterns, and verdict replace the rule-based
+    ones. On any failure the deterministic codebook result is kept and the reason
+    recorded in ``model_notes``.
     """
     if analysis_engine == "deterministic":
         return
     if analysis_engine not in MODEL_CHOICES:
         result.model_notes.append(
+            f"Unknown analysis engine {analysis_engine!r}; rule-based analysis was returned."
         )
         return
+    runner = run or run_model_analysis
+    numbered_narrative = build_numbered_narrative(messages)
+    codebook_hint = build_codebook_hint(result.episodes)
     try:
+        produced = runner(
             engine=analysis_engine,
+            numbered_narrative=numbered_narrative,
+            agent_type=result.agent_type_guess,
+            codebook_hint=codebook_hint,
         )
     except Exception as exc:
         error_message = str(exc).strip().rstrip(".")
         result.model_notes.append(
+            "Model analysis was requested but unavailable: "
             f"{type(exc).__name__}: {error_message}. "
+            "Rule-based analysis was returned."
         )
+        return
+    analysis = produced.analysis
+    index_to_timestamp = {message.index: message.timestamp for message in messages}
+    max_index = (len(messages) - 1) if messages else 0
+    episodes = [
+        _episode_from_model(raw, ordinal + 1, index_to_timestamp, max_index)
+        for ordinal, raw in enumerate(analysis.get("episodes", []))
+    ]
+    result.episodes = episodes
+    patterns = analysis.get("overall_patterns")
+    if isinstance(patterns, dict) and patterns:
+        result.overall_patterns = {key: str(value) for key, value in patterns.items()}
     else:
+        result.overall_patterns = summarize_patterns(episodes, messages)
+    verdict = analysis.get("verdict")
+    if isinstance(verdict, dict) and verdict:
+        result.session_verdict = _clean_verdict(verdict)
+    result.engine = produced.model_id
+    result.model_notes.append(produced.note)
 def analyze_trace_file(
     result: AnalysisResult | None = None
     narrative_text = ""
+    messages: list[NarrativeMessage] = []
     for kind, payload in stream_deterministic_analysis(
         path,
         include_user_context=include_user_context,
         ignore_tool_calls=ignore_tool_calls,
     ):
         if kind == "result":
+            result, narrative_text, messages = payload
     assert result is not None
+    apply_model_analysis(result, messages, analysis_engine)
     return result, narrative_text

app.py CHANGED Viewed

@@ -9,6 +9,7 @@ returns the frontend-ready view model.
 from __future__ import annotations
 import os
 from pathlib import Path
 import spaces
@@ -17,10 +18,13 @@ from fastapi.staticfiles import StaticFiles
 from gradio import Server
 from gradio.data_classes import FileData
-from analyzer import apply_model_assist, stream_deterministic_analysis
 from parser import TraceParseError
 from view_model import build_view_model
 HERE = Path(__file__).resolve().parent
 FRONTEND = HERE / "frontend"
@@ -51,8 +55,9 @@ messages and ignores raw tool telemetry.
 - `trace_file` (file): the session log
 - `include_user_context` (bool): include user prompts as framing
-- `redact_secrets` (bool): redact likely secrets before analysis
-- `analysis_engine` (str): `qwen` | `nemotron` | `deterministic`
 Returns a JSON view model: a whole-session `verdict`, per-episode difficulty
 `episodes`, and redacted export text.
@@ -74,18 +79,101 @@ def agents_md() -> str:
 @spaces.GPU(size="xlarge", duration=180)
-def _model_assist_gpu(*, engine, result, narrative_text):
-    """Run model assist inside a ZeroGPU allocation."""
-    from model_runtime import run_model_assist
-    return run_model_assist(engine=engine, result=result, narrative_text=narrative_text)
-# Completed-step count for the frontend's 6-item checklist. Keep the final
-# synthesis row active until the final payload is ready, because model assist
-# runs after deterministic synthesis on the ZeroGPU path.
-_STEP_COUNT = {"extract": 2, "redact": 3, "chart": 4, "classify": 5, "synthesize": 5}
 def _file_fields(trace_file: object) -> tuple[str | None, str | None]:
@@ -101,42 +189,88 @@ def analyze_trace(
     trace_file: FileData,
     include_user_context: bool = True,
     redact_secrets: bool = True,
-    analysis_engine: str = "qwen",
 ) -> dict:
     """Stream real progress, then the frontend view model, for one trace.
-    Yields ``{"step": n}`` after each real pipeline stage (so the UI checklist
-    tracks actual work), then a final ``{"step": 6, "result": <view model>}``.
     """
     path, orig_name = _file_fields(trace_file)
     if not path:
         raise ValueError("No uploaded file was received.")
     result = None
     narrative = ""
     try:
         for kind, payload in stream_deterministic_analysis(
             path,
             include_user_context=include_user_context,
             redact_secrets=redact_secrets,
             ignore_tool_calls=True,
         ):
-            if kind == "step":
-                yield {"step": _STEP_COUNT[payload]}
             elif kind == "result":
-                result, narrative = payload
     except TraceParseError as exc:
         raise ValueError(str(exc)) from exc
     if analysis_engine != "deterministic":
-        apply_model_assist(result, narrative, analysis_engine, run=_model_assist_gpu)
     if orig_name:
         agent = READABLE_AGENT.get(result.agent_type_guess, "Agent")
         result.trace_title = f"{agent} · {orig_name}"
-    yield {"step": 6, "result": build_view_model(result, narrative)}
 if __name__ == "__main__":

 from __future__ import annotations
 import os
+import time
 from pathlib import Path
 import spaces
 from gradio import Server
 from gradio.data_classes import FileData
+from analyzer import apply_model_analysis, stream_deterministic_analysis
 from parser import TraceParseError
+from profiling import Profiler, get_logger
 from view_model import build_view_model
+logger = get_logger()
 HERE = Path(__file__).resolve().parent
 FRONTEND = HERE / "frontend"
 - `trace_file` (file): the session log
 - `include_user_context` (bool): include user prompts as framing
+- `redact_secrets` (bool): regex + AI (`openai/privacy-filter`) PII redaction before analysis
+- `analysis_engine` (str): `minicpm` | `nemotron` | `deterministic`
+- `execution_mode` (str): `zerogpu` (default, uses the Space GPU) | `cpu` (no GPU quota, slower)
 Returns a JSON view model: a whole-session `verdict`, per-episode difficulty
 `episodes`, and redacted export text.
 @spaces.GPU(size="xlarge", duration=180)
+def _model_analysis_gpu(*, engine, numbered_narrative, agent_type, codebook_hint):
+    """Run the primary model analysis inside a ZeroGPU allocation."""
+    from model_runtime import run_model_analysis
+    return run_model_analysis(
+        engine=engine,
+        numbered_narrative=numbered_narrative,
+        agent_type=agent_type,
+        codebook_hint=codebook_hint,
+    )
+@spaces.GPU(size="xlarge", duration=120)
+def _privacy_filter_gpu(texts):
+    """Run the openai/privacy-filter PII pass inside a ZeroGPU allocation."""
+    from privacy_filter import redact_texts
+    return redact_texts(texts)
+def _cpu_privacy_filter(texts):
+    """Run the openai/privacy-filter PII pass on the local CPU (no GPU quota)."""
+    from privacy_filter import redact_texts
+    return redact_texts(texts, device="cpu")
+def _cpu_model_analysis(*, engine, numbered_narrative, agent_type, codebook_hint):
+    """Run the primary model analysis on the local CPU (no GPU quota)."""
+    from model_runtime import run_model_analysis
+    return run_model_analysis(
+        engine=engine,
+        numbered_narrative=numbered_narrative,
+        agent_type=agent_type,
+        codebook_hint=codebook_hint,
+        device="cpu",
+    )
+# Per stage: (frontend checklist index, cumulative %, label). The 6-item
+# checklist is: 0 upload, 1 extract, 2 redact, 3 chart, 4 classify, 5 synthesize.
+# Indices below are "rows completed" so the matching row shows as active.
+_STAGE_PLAN = {
+    "extract": (2, 12, "Extracting narrative messages"),
+    "chart": (4, 55, "Charting difficulty episodes"),
+    "classify": (5, 62, "Classifying with the codebook"),
+    "synthesize": (5, 70, "Synthesizing field notes"),
+}
+# Redaction streams per-chunk progress; its % ramps across this band.
+_REDACT_PCT = (12, 40)
+def _progress_event(*, step, pct, label, elapsed, processed=None, total=None):
+    """Build one streamed progress payload (with a best-effort ETA)."""
+    event = {"step": step, "pct": pct, "stage": label, "elapsed": round(elapsed, 1)}
+    if 0 < pct < 100:
+        event["eta"] = round(elapsed * (100 - pct) / pct, 1)
+    if total is not None:
+        event["total"] = total
+        event["processed"] = processed if processed is not None else total
+    return event
+def _stage_event(payload, *, elapsed, message_total):
+    """Translate a stream progress payload into a frontend event + running total."""
+    stage = payload["stage"]
+    if stage == "redact":
+        total = payload.get("total") or message_total or 0
+        processed = payload.get("processed", total)
+        frac = (processed / total) if total else 1.0
+        low, high = _REDACT_PCT
+        pct = round(low + (high - low) * frac)
+        step = 2 if (total and processed < total) else 3
+        event = _progress_event(
+            step=step,
+            pct=pct,
+            label="Redacting likely secrets",
+            elapsed=elapsed,
+            processed=processed,
+            total=total or None,
+        )
+        return event, (total or message_total)
+    step, pct, label = _STAGE_PLAN[stage]
+    total = payload.get("messages", message_total)
+    event = _progress_event(step=step, pct=pct, label=label, elapsed=elapsed, total=total)
+    return event, total
 def _file_fields(trace_file: object) -> tuple[str | None, str | None]:
     trace_file: FileData,
     include_user_context: bool = True,
     redact_secrets: bool = True,
+    analysis_engine: str = "minicpm",
+    execution_mode: str = "zerogpu",
 ) -> dict:
     """Stream real progress, then the frontend view model, for one trace.
+    Yields ``{"step", "pct", "stage", "elapsed", "eta", "total"}`` after each
+    real pipeline stage (so the UI shows true progress), then a final
+    ``{"step": 6, "pct": 100, "result": <view model>}``.
+    ``execution_mode`` is ``zerogpu`` (default; models run inside ``@spaces.GPU``)
+    or ``cpu`` (models run on the Space/local CPU, no GPU quota — slower).
     """
     path, orig_name = _file_fields(trace_file)
     if not path:
         raise ValueError("No uploaded file was received.")
+    use_cpu = execution_mode == "cpu"
+    redactor = _cpu_privacy_filter if use_cpu else _privacy_filter_gpu
+    analysis_runner = _cpu_model_analysis if use_cpu else _model_analysis_gpu
+    prof = Profiler(f"analyze[{execution_mode}/{analysis_engine}]")
+    logger.info(
+        "analyze_trace start: file=%r engine=%s mode=%s redact=%s",
+        orig_name,
+        analysis_engine,
+        execution_mode,
+        redact_secrets,
+    )
     result = None
     narrative = ""
+    messages = []
+    message_total = None
     try:
         for kind, payload in stream_deterministic_analysis(
             path,
             include_user_context=include_user_context,
             redact_secrets=redact_secrets,
             ignore_tool_calls=True,
+            model_redact=redactor,
+            profiler=prof,
+            stream_redact_progress=use_cpu,
         ):
+            if kind == "progress":
+                event, message_total = _stage_event(
+                    payload, elapsed=prof.elapsed(), message_total=message_total
+                )
+                yield event
             elif kind == "result":
+                result, narrative, messages = payload
     except TraceParseError as exc:
         raise ValueError(str(exc)) from exc
     if analysis_engine != "deterministic":
+        yield _progress_event(
+            step=5,
+            pct=78,
+            label=f"Reading the trace with {analysis_engine}",
+            elapsed=prof.elapsed(),
+            total=message_total,
+        )
+        analysis_started = time.perf_counter()
+        apply_model_analysis(result, messages, analysis_engine, run=analysis_runner)
+        prof.record("model_analysis", time.perf_counter() - analysis_started)
     if orig_name:
         agent = READABLE_AGENT.get(result.agent_type_guess, "Agent")
         result.trace_title = f"{agent} · {orig_name}"
+    view = build_view_model(result, narrative)
+    prof.mark(engine=result.engine, mode=execution_mode)
+    prof.summary()
+    yield {
+        "step": 6,
+        "pct": 100,
+        "stage": "Field notes ready",
+        "elapsed": round(prof.elapsed(), 1),
+        "total": message_total,
+        "processed": message_total,
+        "result": view,
+    }
 if __name__ == "__main__":

frontend/static/app.jsx CHANGED Viewed

@@ -31,19 +31,23 @@ function TopBar() {
         </div>
       </div>
       <div className="topbar__right mono">
-        <span className="topbar__pill">narrative-only</span>
-        <span className="topbar__pill">privacy-first</span>
       </div>
     </header>
   );
 }
 const ENGINES = [
-  ["qwen", "Quick analysis", "Qwen3.5 9B"],
   ["nemotron", "Deeper analysis", "Nemotron 3 Nano 30B-A3B"],
   ["deterministic", "Rule-based", "no model, always on"],
 ];
 function Toggle({ on, set, label, sub, locked }) {
   return (
     <button className={"toggle" + (on ? " toggle--on" : "") + (locked ? " toggle--locked" : "")}
@@ -61,7 +65,8 @@ function LandingView({ onAnalyze, onSample, error }) {
   const [staged, setStaged] = React.useState(null); // { name, file }
   const [redact, setRedact] = React.useState(true);
   const [userCtx, setUserCtx] = React.useState(true);
-  const [engine, setEngine] = React.useState("qwen");
   const [dragOver, setDragOver] = React.useState(false);
   const [copied, setCopied] = React.useState(false);
   const fileRef = React.useRef(null);
@@ -76,7 +81,7 @@ function LandingView({ onAnalyze, onSample, error }) {
   function pick() { if (fileRef.current) fileRef.current.click(); }
   function run() {
     if (!staged) return;
-    onAnalyze({ file: staged.file, include_user_context: userCtx, redact_secrets: redact, analysis_engine: engine, engineLabel });
   }
   const AGENT_PROMPT = `Use this Space as a tool.
@@ -92,7 +97,7 @@ function LandingView({ onAnalyze, onSample, error }) {
       <TopBar />
       <section className="hero">
-        <h1 className="hero__title">See how your coding agent<br /> got stuck, detoured, recovered<span className="hero__amp"> &amp; </span>claimed success.</h1>
         <p className="hero__sub">
           Upload a Codex, Claude Code, or Pi Agent session log. Trace Field Notes reads only the agent's
           <em> narrated</em> messages — what it planned, where it snagged, how it rerouted, and how honestly it called it done —
@@ -104,7 +109,7 @@ function LandingView({ onAnalyze, onSample, error }) {
         <span className="privacy__mark">!</span>
         <p>
           Agent traces can carry prompts, command output, local paths, screenshots, secrets, and private code.
-          <b> Review and redact before uploading or sharing.</b> This app analyzes only visible narrative messages and ignores raw tool telemetry by default.
         </p>
       </div>
@@ -145,7 +150,7 @@ function LandingView({ onAnalyze, onSample, error }) {
           </div>
           <div className="opts">
-            <Toggle on={redact} set={setRedact} label="Redact likely secrets" sub="emails, tokens, keys, paths" />
             <Toggle on={userCtx} set={setUserCtx} label="Include user context" sub="user prompts as framing" />
             <Toggle on={true} set={() => {}} locked label="Ignore tool contents" sub="locked for this release" />
           </div>
@@ -162,7 +167,22 @@ function LandingView({ onAnalyze, onSample, error }) {
                 </button>
               ))}
             </div>
-            <p className="engine__note muted">Quick uses Qwen3.5 9B on the Space GPU. Deeper uses Nemotron 3 Nano 30B-A3B. Rule-based needs no model and never fails.</p>
           </div>
           <div className="panel__actions">
@@ -235,7 +255,14 @@ const PIPELINE = [
   "Synthesizing field notes",
 ];
-function Analyzing({ label, step }) {
   return (
     <div className="analyzing">
       <div className="analyzing__card card card--raised">
@@ -247,6 +274,25 @@ function Analyzing({ label, step }) {
           <circle className="analyzing__dot" r="4.5" fill="var(--accent)" />
         </svg>
         <Kicker>Surveying the trace · {label}</Kicker>
         <ul className="analyzing__steps">
           {PIPELINE.map((s, i) => (
             <li key={s} className={i < step ? "done" : i === step ? "active" : ""}>
@@ -286,11 +332,13 @@ function App() {
   const [engineLabel, setEngineLabel] = React.useState("");
   const [error, setError] = React.useState("");
   const [step, setStep] = React.useState(0);
-  async function analyze({ file, include_user_context, redact_secrets, analysis_engine, engineLabel }) {
     setError("");
     setEngineLabel(engineLabel || analysis_engine);
     setStep(0);
     setStage("analyzing");
     window.scrollTo({ top: 0 });
     try {
@@ -302,6 +350,7 @@ function App() {
         include_user_context: !!include_user_context,
         redact_secrets: !!redact_secrets,
         analysis_engine,
       });
       let result = null;
       for await (const msg of sub) {
@@ -313,6 +362,16 @@ function App() {
             } else if (typeof p.step === "number") {
               setStep(Math.min(p.step, PIPELINE.length - 1));
             }
           }
         } else if (msg.type === "status") {
           if (msg.stage === "error") throw new Error(msg.message || "The analyzer failed on the server.");
@@ -349,7 +408,7 @@ function App() {
       <div className="backdrop"><div className="grain" /><TopoBackground /></div>
       <div className="page">
         {stage === "landing" && <LandingView onAnalyze={analyze} onSample={loadSample} error={error} />}
-        {stage === "analyzing" && <Analyzing label={engineLabel} step={step} />}
         {stage === "report" && (
           <div className="report-wrap">
             <button className="report-back btn btn--sm btn--ghost" onClick={reset}>← New trace</button>

         </div>
       </div>
       <div className="topbar__right mono">
+        <span className="topbar__pill">build small</span>
       </div>
     </header>
   );
 }
 const ENGINES = [
+  ["minicpm", "Quick analysis", "MiniCPM5 1B"],
   ["nemotron", "Deeper analysis", "Nemotron 3 Nano 30B-A3B"],
   ["deterministic", "Rule-based", "no model, always on"],
 ];
+const EXEC_MODES = [
+  ["zerogpu", "GPU", "Space GPU · faster"],
+  ["cpu", "CPU", "no GPU quota · slower"],
+];
 function Toggle({ on, set, label, sub, locked }) {
   return (
     <button className={"toggle" + (on ? " toggle--on" : "") + (locked ? " toggle--locked" : "")}
   const [staged, setStaged] = React.useState(null); // { name, file }
   const [redact, setRedact] = React.useState(true);
   const [userCtx, setUserCtx] = React.useState(true);
+  const [engine, setEngine] = React.useState("minicpm");
+  const [execMode, setExecMode] = React.useState("zerogpu");
   const [dragOver, setDragOver] = React.useState(false);
   const [copied, setCopied] = React.useState(false);
   const fileRef = React.useRef(null);
   function pick() { if (fileRef.current) fileRef.current.click(); }
   function run() {
     if (!staged) return;
+    onAnalyze({ file: staged.file, include_user_context: userCtx, redact_secrets: redact, analysis_engine: engine, execution_mode: execMode, engineLabel });
   }
   const AGENT_PROMPT = `Use this Space as a tool.
       <TopBar />
       <section className="hero">
+        <h1 className="hero__title">See how your coding agent<br /> got stuck, detoured, recovered<span className="hero__amp"> &amp; </span><br />claimed success.</h1>
         <p className="hero__sub">
           Upload a Codex, Claude Code, or Pi Agent session log. Trace Field Notes reads only the agent's
           <em> narrated</em> messages — what it planned, where it snagged, how it rerouted, and how honestly it called it done —
         <span className="privacy__mark">!</span>
         <p>
           Agent traces can carry prompts, command output, local paths, screenshots, secrets, and private code.
+          <b> Review and redact before uploading or sharing.</b> This app analyzes only visible narrative messages, ignores raw tool telemetry by default, and scrubs secrets and personal data with pattern rules plus OpenAI's privacy-filter model.
         </p>
       </div>
           </div>
           <div className="opts">
+            <Toggle on={redact} set={setRedact} label="Redact secrets & personal data" sub="regex + AI: names, contacts, tokens, keys, paths" />
             <Toggle on={userCtx} set={setUserCtx} label="Include user context" sub="user prompts as framing" />
             <Toggle on={true} set={() => {}} locked label="Ignore tool contents" sub="locked for this release" />
           </div>
                 </button>
               ))}
             </div>
+            <p className="engine__note muted">Quick uses MiniCPM5 1B on the Space GPU. Deeper uses Nemotron 3 Nano 30B-A3B. Rule-based needs no model and never fails.</p>
+          </div>
+          <div className="engine">
+            <Label>Run on</Label>
+            <div className="engine__opts">
+              {EXEC_MODES.map(([key, name, detail]) => (
+                <button key={key}
+                  className={"engine__opt" + (execMode === key ? " engine__opt--on" : "")}
+                  onClick={() => setExecMode(key)}>
+                  <span className="engine__name">{name}</span>
+                  <span className="engine__detail mono">{detail}</span>
+                </button>
+              ))}
+            </div>
+            <p className="engine__note muted">ZeroGPU is fast but spends your Space GPU quota. CPU needs no quota and still works if you've run out — just slower, so the progress bar will move more gradually.</p>
           </div>
           <div className="panel__actions">
   "Synthesizing field notes",
 ];
+function fmtSeconds(s) {
+  if (s == null || isNaN(s)) return "—";
+  const m = Math.floor(s / 60), sec = Math.round(s % 60);
+  return m > 0 ? `${m}m ${sec}s` : `${sec}s`;
+}
+function Analyzing({ label, step, progress }) {
+  const pct = progress && typeof progress.pct === "number" ? Math.max(0, Math.min(100, progress.pct)) : null;
   return (
     <div className="analyzing">
       <div className="analyzing__card card card--raised">
           <circle className="analyzing__dot" r="4.5" fill="var(--accent)" />
         </svg>
         <Kicker>Surveying the trace · {label}</Kicker>
+        {pct != null && (
+          <div style={{ margin: "12px 0 2px" }}>
+            <div style={{ height: 6, borderRadius: 4, background: "var(--rule)", overflow: "hidden" }}>
+              <div style={{ width: pct + "%", height: "100%", background: "var(--accent)", transition: "width .45s ease" }} />
+            </div>
+            <div className="mono muted" style={{ display: "flex", justifyContent: "space-between", gap: 12, fontSize: 12, marginTop: 7 }}>
+              <span>{pct}%{progress.stage ? " · " + progress.stage : ""}</span>
+              <span>
+                {progress.total != null
+                  ? (progress.processed != null && progress.processed < progress.total
+                      ? progress.processed + "/" + progress.total
+                      : progress.total) + " msgs · "
+                  : ""}
+                {fmtSeconds(progress.elapsed)} elapsed
+                {progress.eta != null && pct < 100 ? " · ~" + fmtSeconds(progress.eta) + " left" : ""}
+              </span>
+            </div>
+          </div>
+        )}
         <ul className="analyzing__steps">
           {PIPELINE.map((s, i) => (
             <li key={s} className={i < step ? "done" : i === step ? "active" : ""}>
   const [engineLabel, setEngineLabel] = React.useState("");
   const [error, setError] = React.useState("");
   const [step, setStep] = React.useState(0);
+  const [progress, setProgress] = React.useState(null);
+  async function analyze({ file, include_user_context, redact_secrets, analysis_engine, execution_mode, engineLabel }) {
     setError("");
     setEngineLabel(engineLabel || analysis_engine);
     setStep(0);
+    setProgress(null);
     setStage("analyzing");
     window.scrollTo({ top: 0 });
     try {
         include_user_context: !!include_user_context,
         redact_secrets: !!redact_secrets,
         analysis_engine,
+        execution_mode,
       });
       let result = null;
       for await (const msg of sub) {
             } else if (typeof p.step === "number") {
               setStep(Math.min(p.step, PIPELINE.length - 1));
             }
+            if (typeof p.pct === "number") {
+              setProgress({
+                pct: p.pct,
+                elapsed: p.elapsed,
+                eta: p.eta,
+                total: p.total,
+                processed: p.processed,
+                stage: p.stage,
+              });
+            }
           }
         } else if (msg.type === "status") {
           if (msg.stage === "error") throw new Error(msg.message || "The analyzer failed on the server.");
       <div className="backdrop"><div className="grain" /><TopoBackground /></div>
       <div className="page">
         {stage === "landing" && <LandingView onAnalyze={analyze} onSample={loadSample} error={error} />}
+        {stage === "analyzing" && <Analyzing label={engineLabel} step={step} progress={progress} />}
         {stage === "report" && (
           <div className="report-wrap">
             <button className="report-back btn btn--sm btn--ghost" onClick={reset}>← New trace</button>

frontend/static/components.jsx CHANGED Viewed

@@ -414,12 +414,22 @@ function ReportHeader({ data }) {
 }
 function ModelStatus({ data }) {
-  const notes = (data.privacy_notes || []).filter((note) => String(note).startsWith("Model assist"));
   if (!notes.length) return null;
   return (
     <div className="privacy model-status">
-      <span className="privacy__mark">!</span>
-      <p><b>Model assist fell back to the rule-based analyzer.</b> {notes.join(" ")}</p>
     </div>
   );
 }

 }
 function ModelStatus({ data }) {
+  const notes = (data.privacy_notes || []).filter((note) =>
+    /^(Analysis produced|Model analysis|Model assist|Unknown analysis engine)/.test(String(note))
+  );
   if (!notes.length) return null;
+  const fellBack = notes.some((note) =>
+    /unavailable|rule-based analysis was returned|deterministic analysis was returned|unknown analysis engine/i.test(note)
+  );
   return (
     <div className="privacy model-status">
+      <span className="privacy__mark">{fellBack ? "!" : "✓"}</span>
+      <p>
+        <b>{fellBack
+          ? "Model unavailable — showing the rule-based analysis instead."
+          : "This report was written by the model."}</b>{" "}
+        {notes.join(" ")}
+      </p>
     </div>
   );
 }

model_runtime.py CHANGED Viewed

@@ -12,20 +12,31 @@ from __future__ import annotations
 import json
 import re
 from collections.abc import Mapping
 from dataclasses import dataclass
 from typing import Any, Callable
-from schemas import AnalysisResult
 PRIMARY_MODEL_ID = "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16"
-QUICK_MODEL_ID = "Qwen/Qwen3.5-9B"
 MODEL_MAX_NEW_TOKENS = 8192
 MODEL_CHOICES = {
-    "qwen": {
-        "label": "Qwen3.5 9B — quick analysis",
         "model_id": QUICK_MODEL_ID,
     },
     "nemotron": {
@@ -45,9 +56,9 @@ _MODEL_CACHE: dict[str, Any] = {}
 @dataclass(slots=True)
-class ModelAssistResult:
     model_id: str
-    memo: dict[str, Any]
     note: str
@@ -59,38 +70,82 @@ def model_id_for_engine(engine: str) -> str | None:
     return str(model_id) if model_id else None
-def run_model_assist(
     *,
     engine: str,
-    result: AnalysisResult,
-    narrative_text: str,
     generate: GenerateFn | None = None,
-) -> ModelAssistResult:
-    """Run the selected model on the GPU and return a concise grounded memo."""
     model_id = model_id_for_engine(engine)
     if not model_id:
         raise ValueError(f"No model is configured for analysis engine {engine!r}.")
-    prompt = build_model_prompt(result, narrative_text)
     messages = [
         {
             "role": "system",
             "content": (
-                "You analyze visible coding-agent narrative messages. "
-                "Do not infer hidden reasoning. Return JSON only."
             ),
         },
         {"role": "user", "content": prompt},
     ]
-    generator = generate or _local_generator
-    content = generator(messages, model_id=model_id, max_new_tokens=MODEL_MAX_NEW_TOKENS)
-    memo = parse_model_json(content)
-    return ModelAssistResult(
         model_id=model_id,
-        memo=memo,
-        note=f"Model assist completed on the Space GPU with {model_id}.",
     )
@@ -99,16 +154,18 @@ def _local_generator(
     *,
     model_id: str,
     max_new_tokens: int,
 ) -> str:
-    """Generate text with a locally loaded model on the ZeroGPU device.
-    Imported lazily: ``torch`` only needs to exist on the GPU Space, never for
-    the deterministic path, tests, or local development.
     """
     import torch
-    tokenizer, model = _load_model(model_id)
     chat_inputs = tokenizer.apply_chat_template(
         messages,
         add_generation_prompt=True,
@@ -163,78 +220,146 @@ def _move_to_device(value: Any, device: Any) -> Any:
 def _chat_template_kwargs(model_id: str) -> dict[str, Any]:
     """Model-specific chat-template controls."""
-    if model_id.startswith("Qwen/"):
-        return {"enable_thinking": True}
     return {}
-def _load_model(model_id: str) -> Any:
-    """Lazily load and cache a (tokenizer, model) pair on the GPU.
     The cache keeps weights resident across requests so only the first call per
-    model pays the load cost. ZeroGPU exposes CUDA inside the ``@spaces.GPU``
-    context, which is where this runs.
     """
-    cached = _MODEL_CACHE.get(model_id)
     if cached is not None:
         return cached
-    import torch
     from transformers import AutoModelForCausalLM, AutoTokenizer
     tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
-    model = AutoModelForCausalLM.from_pretrained(
-        model_id,
-        torch_dtype=torch.bfloat16,
-        device_map="cuda",
-        trust_remote_code=True,
-    )
     model.eval()
-    _MODEL_CACHE[model_id] = (tokenizer, model)
     return tokenizer, model
-def build_model_prompt(result: AnalysisResult, narrative_text: str) -> str:
-    deterministic_json = json.dumps(result.to_dict(), ensure_ascii=False, indent=2)
-    narrative_excerpt = narrative_text[:12000]
-    return f"""Use the deterministic codebook analysis and redacted visible narrative below.
-Return JSON with exactly these keys:
-- executive_memo: 4-6 sentences for a developer
-- detour_memo: 2-4 sentences about productive detours vs wandering
-- outcome_audit_memo: 2-4 sentences about completion claims and caveats
-- caveats: array of short strings
-Rules:
-- Return one valid JSON object and nothing else.
-- The first non-whitespace character must be {{ and the last must be }}.
-- Analyze only visible narrative messages.
-- Do not claim to know hidden reasoning.
-- Cite episode IDs where useful.
-- Do not include raw secrets, tool outputs, or long quotes.
-Deterministic analysis:
-{deterministic_json}
-Redacted narrative excerpt:
-{narrative_excerpt}
 """
-def parse_model_json(content: str) -> dict[str, Any]:
-    parsed = _loads_lenient(content)
-    required = {
-        "executive_memo": str,
-        "detour_memo": str,
-        "outcome_audit_memo": str,
-        "caveats": list,
-    }
-    for key, expected_type in required.items():
-        if key not in parsed or not isinstance(parsed[key], expected_type):
-            raise ValueError(f"Model response missing {key!r} as {expected_type.__name__}.")
-    parsed["caveats"] = [str(item) for item in parsed["caveats"][:6]]
     return parsed

 import json
 import re
+import time
 from collections.abc import Mapping
 from dataclasses import dataclass
 from typing import Any, Callable
+from profiling import get_logger
+from schemas import (
+    APPRAISALS,
+    DETOUR_TYPES,
+    DIFFICULTY_TYPES,
+    OUTCOME_CLAIMS,
+    RECOVERY_PATTERNS,
+    RESOLUTION_MODES,
+)
+logger = get_logger()
 PRIMARY_MODEL_ID = "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16"
+QUICK_MODEL_ID = "openbmb/MiniCPM5-1B"
 MODEL_MAX_NEW_TOKENS = 8192
 MODEL_CHOICES = {
+    "minicpm": {
+        "label": "MiniCPM5 1B — quick analysis",
         "model_id": QUICK_MODEL_ID,
     },
     "nemotron": {
 @dataclass(slots=True)
+class ModelAnalysisResult:
     model_id: str
+    analysis: dict[str, Any]
     note: str
     return str(model_id) if model_id else None
+def resolve_device(device: str | None = None) -> str:
+    """Pick the compute device: explicit override, else cuda -> mps -> cpu."""
+    if device:
+        return device
+    import torch
+    if torch.cuda.is_available():
+        return "cuda"
+    mps = getattr(torch.backends, "mps", None)
+    if mps is not None and mps.is_available():
+        return "mps"
+    return "cpu"
+def run_model_analysis(
     *,
     engine: str,
+    numbered_narrative: str,
+    agent_type: str = "unknown",
+    codebook_hint: str = "",
     generate: GenerateFn | None = None,
+    device: str | None = None,
+) -> ModelAnalysisResult:
+    """Run the selected model as the primary analyst and return a field report.
+    The model identifies and classifies the difficulty episodes and writes the
+    session verdict directly from the visible narrative; the deterministic codebook
+    is only a fallback (used by the caller if this raises). ``device`` forces the
+    compute device for the default local generator; an injected ``generate`` is
+    used as-is.
+    """
     model_id = model_id_for_engine(engine)
     if not model_id:
         raise ValueError(f"No model is configured for analysis engine {engine!r}.")
+    prompt = build_analysis_prompt(
+        numbered_narrative, agent_type=agent_type, codebook_hint=codebook_hint
+    )
     messages = [
         {
             "role": "system",
             "content": (
+                "You are an expert analyst of coding-agent session traces. "
+                "Judge only the visible narrative; never invent hidden reasoning. "
+                "Return one JSON object and nothing else."
             ),
         },
         {"role": "user", "content": prompt},
     ]
+    started = time.perf_counter()
+    if generate is not None:
+        content = generate(messages, model_id=model_id, max_new_tokens=MODEL_MAX_NEW_TOKENS)
+        device_label = "injected"
+    else:
+        device_label = resolve_device(device)
+        content = _local_generator(
+            messages,
+            model_id=model_id,
+            max_new_tokens=MODEL_MAX_NEW_TOKENS,
+            device=device_label,
+        )
+    logger.info(
+        "model analysis: %s on %s in %.2fs (%d chars in)",
+        model_id,
+        device_label,
+        time.perf_counter() - started,
+        len(numbered_narrative),
+    )
+    analysis = parse_analysis_json(content)
+    return ModelAnalysisResult(
         model_id=model_id,
+        analysis=analysis,
+        note=f"Analysis produced by {model_id}.",
     )
     *,
     model_id: str,
     max_new_tokens: int,
+    device: str | None = None,
 ) -> str:
+    """Generate text with a locally loaded model on the chosen device.
+    Imported lazily: ``torch`` only needs to exist on the GPU Space (or a local
+    machine running the model), never for the deterministic path, tests, or
+    light local development.
     """
     import torch
+    tokenizer, model = _load_model(model_id, device=device)
     chat_inputs = tokenizer.apply_chat_template(
         messages,
         add_generation_prompt=True,
 def _chat_template_kwargs(model_id: str) -> dict[str, Any]:
     """Model-specific chat-template controls."""
+    if model_id.startswith("openbmb/"):
+        # MiniCPM5 supports hybrid reasoning; the quick engine keeps thinking
+        # off for fast, reliably parseable JSON memos.
+        return {"enable_thinking": False}
     return {}
+def _load_model(model_id: str, device: str | None = None) -> Any:
+    """Lazily load and cache a (tokenizer, model) pair on the chosen device.
     The cache keeps weights resident across requests so only the first call per
+    (model, device) pays the load cost. ZeroGPU exposes CUDA inside the
+    ``@spaces.GPU`` context; CPU/MPS support lets the app run off-Space (e.g. for
+    users without GPU quota, or local development).
     """
+    import torch
+    resolved = resolve_device(device)
+    cache_key = f"{model_id}@{resolved}"
+    cached = _MODEL_CACHE.get(cache_key)
     if cached is not None:
         return cached
     from transformers import AutoModelForCausalLM, AutoTokenizer
+    started = time.perf_counter()
     tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
+    if resolved == "cuda":
+        # The ZeroGPU Space path: load straight onto the GPU in bfloat16.
+        model = AutoModelForCausalLM.from_pretrained(
+            model_id,
+            dtype=torch.bfloat16,
+            device_map="cuda",
+            trust_remote_code=True,
+        )
+    else:
+        # CPU / Apple MPS: fp16 on MPS, fp32 on CPU for numerical stability.
+        dtype = torch.float16 if resolved == "mps" else torch.float32
+        model = AutoModelForCausalLM.from_pretrained(
+            model_id,
+            dtype=dtype,
+            trust_remote_code=True,
+        ).to(resolved)
     model.eval()
+    logger.info("loaded %s on %s in %.1fs", model_id, resolved, time.perf_counter() - started)
+    _MODEL_CACHE[cache_key] = (tokenizer, model)
     return tokenizer, model
+def _vocab_block(name: str, vocab: dict[str, str]) -> str:
+    return f"{name}:\n" + "\n".join(f"- {key}: {meaning}" for key, meaning in vocab.items())
+def build_analysis_prompt(
+    numbered_narrative: str, *, agent_type: str = "unknown", codebook_hint: str = ""
+) -> str:
+    narrative = numbered_narrative[:16000]
+    vocab = "\n\n".join(
+        [
+            _vocab_block("difficulty_type", DIFFICULTY_TYPES),
+            _vocab_block("appraisal", APPRAISALS),
+            _vocab_block("detour_type", DETOUR_TYPES),
+            _vocab_block("resolution_mode", RESOLUTION_MODES),
+            _vocab_block("recovery_pattern", RECOVERY_PATTERNS),
+            _vocab_block("outcome_claim", OUTCOME_CLAIMS),
+        ]
+    )
+    return f"""Read the agent's visible narrative and produce a structured field report as JSON.
+Identify the real DIFFICULTY EPISODES — moments where the agent hit a snag, reassessed,
+detoured, recovered, or claimed completion. Ignore instructions, skill files, prompts,
+or boilerplate the agent merely read or quoted; those are NOT difficulties. Merge
+duplicates. Prefer 1-8 substantive episodes; if there is genuinely no difficulty,
+return an empty episodes list.
+Return ONE JSON object (first character {{ and last character }}), no prose, EXACTLY:
+{{
+  "verdict": {{
+    "tone": one of ["stable","iterative","detour","partial","risk","unknown"],
+    "headline": "<= 12 words, plain language",
+    "detail": "2-4 sentences a developer can act on",
+    "honesty": one of ["candid","mixed","overclaimed"]
+  }},
+  "overall_patterns": {{
+    "difficulty_style": "1 sentence", "detour_style": "1 sentence",
+    "recovery_style": "1 sentence", "risk_or_caveat": "1 sentence"
+  }},
+  "episodes": [
+    {{
+      "start_index": <a message index shown below>,
+      "end_index": <a message index shown below>,
+      "title": "<= 10 words",
+      "initial_intention": "1 sentence", "reported_difficulty": "1-2 sentences",
+      "difficulty_type": "<one key below>", "appraisal": "<one key below>",
+      "strategy_before": "1 sentence", "strategy_after": "1 sentence",
+      "detour_type": "<one key below>", "resolution_mode": "<one key below>",
+      "recovery_pattern": "<one key below>", "outcome_claim": "<one key below>",
+      "productive_detour": one of ["yes","no","mixed","unknown"],
+      "evidence_quotes": ["short verbatim quote", "up to 3"],
+      "analyst_memo": "1-3 sentences of real insight, NOT a restatement of the codes"
+    }}
+  ]
+}}
+Controlled vocabulary (use these keys exactly):
+{vocab}
+Guidance:
+- Every field must contain real content drawn from the trace. NEVER output a
+  placeholder such as "<= 10 words", "1 sentence", or "<one key below>" literally.
+- difficulty_type, appraisal, detour_type, resolution_mode, recovery_pattern, and
+  outcome_claim must each be EXACTLY one key from the vocabulary above (lowercase,
+  with underscores). If unsure, use "unknown".
+- Be accurate, not generous. If the agent ended unresolved or overclaimed, say so in tone/honesty.
+- honesty = "overclaimed" when a success claim outruns the visible evidence.
+- start_index / end_index must be message indices that appear below.
+- Quote the agent's own words; keep the original language of the quote.
+- Do not include secrets or long tool dumps.
+Agent type: {agent_type}
+Rule-based pre-scan candidate spans (hints only — keep, drop, merge, or add freely): {codebook_hint or "(none)"}
+Numbered visible messages:
+{narrative}
 """
+def parse_analysis_json(content: str) -> dict[str, Any]:
+    """Validate the structural shape of the model's field report (codes coerced later)."""
+    parsed = _loads_lenient(content)
+    episodes = parsed.get("episodes")
+    if not isinstance(episodes, list):
+        raise ValueError("Model response did not include an 'episodes' list.")
+    parsed["episodes"] = [episode for episode in episodes if isinstance(episode, dict)]
+    if not isinstance(parsed.get("overall_patterns"), dict):
+        parsed["overall_patterns"] = {}
+    if not isinstance(parsed.get("verdict"), dict):
+        parsed["verdict"] = {}
     return parsed

privacy_filter.py ADDED Viewed

	@@ -0,0 +1,180 @@

+"""Optional model-based PII redaction using ``openai/privacy-filter``.
+The deterministic pipeline always runs regex redaction (:mod:`redaction`). On the
+Hugging Face Space GPU this module adds a second pass: a token-classification
+model (``openai/privacy-filter``) flags personal or sensitive spans that regex
+patterns miss — names, phone numbers, postal addresses, and the like — and masks
+them with typed placeholders.
+Heavy imports (``torch``/``transformers``) load lazily so the deterministic
+analyzer, the test suite, and local development keep working without GPU
+dependencies. If the model cannot be loaded, the caller falls back to regex-only
+redaction and records the reason in the privacy notes.
+"""
+from __future__ import annotations
+import functools
+import time
+from collections import Counter
+from typing import Any, Callable
+from model_runtime import resolve_device
+from profiling import get_logger
+from redaction import RedactionResult
+logger = get_logger()
+PRIVACY_MODEL_ID = "openai/privacy-filter"
+# Only mask spans the model is reasonably confident about.
+PRIVACY_MIN_SCORE = 0.5
+# Model entity group -> (placeholder written into the text, human label for notes).
+PII_TYPES: dict[str, tuple[str, str]] = {
+    "private_person": ("[REDACTED_NAME]", "personal name"),
+    "private_email": ("[REDACTED_EMAIL]", "email address"),
+    "private_phone": ("[REDACTED_PHONE]", "phone number"),
+    "private_address": ("[REDACTED_ADDRESS]", "postal address"),
+    "private_url": ("[REDACTED_URL]", "personal URL"),
+    "private_date": ("[REDACTED_DATE]", "personal date"),
+    "account_number": ("[REDACTED_ACCOUNT]", "account number"),
+    "secret": ("[REDACTED_SECRET]", "secret"),
+}
+# (texts) -> per-text list of {"start", "end", "label"} spans.
+DetectFn = Callable[[list[str]], list[list[dict[str, Any]]]]
+_PIPELINE_CACHE: dict[str, Any] = {}
+def redact_texts(
+    texts: list[str],
+    *,
+    detect: DetectFn | None = None,
+    device: str | None = None,
+) -> list[RedactionResult]:
+    """Detect and mask PII in each text, returning one result per input.
+    ``detect`` defaults to :func:`_local_detect` (the lazy model); tests inject a
+    stand-in so the masking logic runs without ``torch``. ``device`` forces the
+    compute device for the default detector (``cuda`` / ``mps`` / ``cpu``).
+    """
+    detector = detect or functools.partial(_local_detect, device=device)
+    spans_per_text = detector(texts)
+    return [_apply_spans(text, spans) for text, spans in zip(texts, spans_per_text)]
+def _merge_spans(text: str, spans: list[dict[str, Any]]) -> list[dict[str, Any]]:
+    """Drop malformed spans and merge same-label runs into clean, disjoint spans.
+    ``openai/privacy-filter`` uses BIOES tags, which the pipeline's IOB-oriented
+    "simple" aggregation can split into adjacent fragments of one entity (and a
+    leading separator can leave a one-character gap). Merging same-label spans
+    that overlap or sit within one character keeps each entity to a single
+    placeholder; a remaining different-label overlap is clipped to stay disjoint.
+    """
+    valid = [
+        span
+        for span in spans
+        if span.get("label") in PII_TYPES
+        and 0 <= int(span["start"]) < int(span["end"]) <= len(text)
+    ]
+    valid.sort(key=lambda span: (int(span["start"]), int(span["end"])))
+    merged: list[dict[str, Any]] = []
+    for span in valid:
+        start, end, label = int(span["start"]), int(span["end"]), span["label"]
+        if merged:
+            prev = merged[-1]
+            if label == prev["label"] and start <= prev["end"] + 1:
+                prev["end"] = max(prev["end"], end)
+                continue
+            if start < prev["end"]:  # different-label overlap: keep them disjoint
+                start = prev["end"]
+                if start >= end:
+                    continue
+        merged.append({"start": start, "end": end, "label": label})
+    return merged
+def _apply_spans(text: str, spans: list[dict[str, Any]]) -> RedactionResult:
+    """Replace detected spans with typed placeholders, right-to-left."""
+    counts: Counter[str] = Counter()
+    redacted = text
+    for span in sorted(_merge_spans(text, spans), key=lambda span: span["start"], reverse=True):
+        placeholder, label = PII_TYPES[span["label"]]
+        redacted = redacted[: span["start"]] + placeholder + redacted[span["end"] :]
+        counts[label] += 1
+    notes = [f"{label}: {count}" for label, count in sorted(counts.items())]
+    return RedactionResult(text=redacted, notes=notes, count=sum(counts.values()))
+def _local_detect(texts: list[str], device: str | None = None) -> list[list[dict[str, Any]]]:
+    """Run ``openai/privacy-filter`` and return confident PII spans per text.
+    Imported lazily: ``transformers``/``torch`` only need to exist where the
+    model actually runs, never for the deterministic path, tests, or light local
+    development.
+    """
+    pipe = _load_pipeline(device=device)
+    started = time.perf_counter()
+    results: list[list[dict[str, Any]]] = []
+    for text in texts:
+        if not text.strip():
+            results.append([])
+            continue
+        entities = pipe(text)
+        spans = [
+            {
+                "start": int(entity["start"]),
+                "end": int(entity["end"]),
+                "label": entity["entity_group"],
+            }
+            for entity in entities
+            if entity.get("entity_group") in PII_TYPES
+            and entity.get("start") is not None
+            and entity.get("end") is not None
+            and float(entity.get("score", 1.0)) >= PRIVACY_MIN_SCORE
+        ]
+        results.append(spans)
+    detected = sum(len(spans) for spans in results)
+    logger.debug(
+        "privacy-filter scanned %d messages, %d raw spans in %.2fs",
+        len(texts),
+        detected,
+        time.perf_counter() - started,
+    )
+    return results
+def _load_pipeline(device: str | None = None) -> Any:
+    """Lazily build and cache the token-classification pipeline per device."""
+    resolved = resolve_device(device)
+    cached = _PIPELINE_CACHE.get(resolved)
+    if cached is not None:
+        return cached
+    from transformers import pipeline
+    # transformers pipeline device: 0 for cuda, "mps"/"cpu" otherwise.
+    pipe_device = 0 if resolved == "cuda" else resolved
+    started = time.perf_counter()
+    pipe = pipeline(
+        "token-classification",
+        model=PRIVACY_MODEL_ID,
+        aggregation_strategy="simple",
+        device=pipe_device,
+    )
+    logger.info(
+        "loaded %s on %s in %.1fs", PRIVACY_MODEL_ID, resolved, time.perf_counter() - started
+    )
+    _PIPELINE_CACHE[resolved] = pipe
+    return pipe

profiling.py ADDED Viewed

	@@ -0,0 +1,125 @@

+"""Lightweight logging + profiling for the Trace Field Notes pipeline.
+Everything here writes to the standard logging system, never the UI. Set the log
+level with the ``TFN_LOG_LEVEL`` env var (default ``INFO``); use ``DEBUG`` for
+per-stage detail. Resource probes (process RSS, system memory, CPU, and
+GPU/MPS memory) are best-effort and degrade silently if a dependency is missing
+— so the deterministic path, the test suite, and local development never need
+``psutil`` or ``torch`` installed.
+"""
+from __future__ import annotations
+import logging
+import os
+import time
+from contextlib import contextmanager
+from typing import Any, Iterator
+def get_logger(name: str = "trace_field_notes") -> logging.Logger:
+    logger = logging.getLogger(name)
+    if not logger.handlers:
+        handler = logging.StreamHandler()
+        handler.setFormatter(
+            logging.Formatter("%(asctime)s [%(name)s] %(levelname)s %(message)s")
+        )
+        logger.addHandler(handler)
+        logger.setLevel(os.getenv("TFN_LOG_LEVEL", "INFO").upper())
+        logger.propagate = False
+    return logger
+logger = get_logger()
+def resource_snapshot() -> dict[str, Any]:
+    """Best-effort process + system resource probe. Never raises."""
+    snap: dict[str, Any] = {}
+    try:
+        import psutil
+        proc = psutil.Process()
+        snap["rss_mb"] = round(proc.memory_info().rss / 1024 / 1024, 1)
+        vm = psutil.virtual_memory()
+        snap["sys_mem_pct"] = vm.percent
+        snap["sys_mem_avail_mb"] = round(vm.available / 1024 / 1024, 1)
+        snap["cpu_pct"] = psutil.cpu_percent(interval=None)
+    except Exception:  # noqa: BLE001 - profiling must never break the request
+        pass
+    try:
+        import torch
+        if torch.cuda.is_available():
+            snap["accel"] = "cuda"
+            snap["accel_mem_mb"] = round(torch.cuda.memory_allocated() / 1024 / 1024, 1)
+        elif getattr(torch.backends, "mps", None) and torch.backends.mps.is_available():
+            snap["accel"] = "mps"
+            snap["accel_mem_mb"] = round(
+                torch.mps.current_allocated_memory() / 1024 / 1024, 1
+            )
+    except Exception:  # noqa: BLE001
+        pass
+    return snap
+def format_snapshot(snap: dict[str, Any]) -> str:
+    parts = []
+    if "rss_mb" in snap:
+        parts.append(f"rss={snap['rss_mb']}MB")
+    if "sys_mem_pct" in snap:
+        parts.append(f"sysmem={snap['sys_mem_pct']}%")
+    if "cpu_pct" in snap:
+        parts.append(f"cpu={snap['cpu_pct']}%")
+    if "accel_mem_mb" in snap:
+        parts.append(f"{snap.get('accel', 'accel')}={snap['accel_mem_mb']}MB")
+    return " ".join(parts) or "n/a"
+class Profiler:
+    """Accumulates per-stage timings + counts for one request and logs a summary."""
+    def __init__(self, label: str = "analyze") -> None:
+        self.label = label
+        self._t0 = time.perf_counter()
+        self.stages: list[tuple[str, float]] = []
+        self.meta: dict[str, Any] = {}
+    @contextmanager
+    def stage(self, name: str) -> Iterator[None]:
+        start = time.perf_counter()
+        logger.debug(
+            "%s: stage %r start | %s", self.label, name, format_snapshot(resource_snapshot())
+        )
+        try:
+            yield
+        finally:
+            dt = time.perf_counter() - start
+            self.stages.append((name, dt))
+            logger.debug("%s: stage %r done in %.3fs", self.label, name, dt)
+    def record(self, name: str, seconds: float) -> None:
+        """Record a stage duration measured by the caller (no context manager)."""
+        self.stages.append((name, seconds))
+        logger.debug("%s: stage %r done in %.3fs", self.label, name, seconds)
+    def mark(self, **kwargs: Any) -> None:
+        self.meta.update(kwargs)
+    def elapsed(self) -> float:
+        return time.perf_counter() - self._t0
+    def summary(self) -> None:
+        total = self.elapsed()
+        stage_str = ", ".join(f"{name}={dt * 1000:.0f}ms" for name, dt in self.stages)
+        meta_str = " ".join(f"{key}={value}" for key, value in self.meta.items())
+        logger.info(
+            "%s done in %.3fs | %s | stages: %s | %s",
+            self.label,
+            total,
+            meta_str or "-",
+            stage_str or "-",
+            format_snapshot(resource_snapshot()),
+        )

requirements.txt CHANGED Viewed

@@ -2,6 +2,7 @@ gradio>=6.16,<7
 huggingface_hub>=0.30
 spaces>=0.50
 torch>=2.4
-transformers>=4.57
 accelerate>=1.0
 einops>=0.8

 huggingface_hub>=0.30
 spaces>=0.50
 torch>=2.4
+transformers>=5.6
 accelerate>=1.0
 einops>=0.8
+psutil>=5.9

schemas.py CHANGED Viewed

@@ -149,6 +149,7 @@ class AnalysisResult:
     engine: str = "deterministic-codebook"
     model_notes: list[str] = field(default_factory=list)
     model_memo: dict[str, Any] = field(default_factory=dict)
     def to_dict(self) -> dict[str, Any]:
         return {
@@ -163,4 +164,5 @@ class AnalysisResult:
             "engine": self.engine,
             "model_notes": self.model_notes,
             "model_memo": self.model_memo,
         }

     engine: str = "deterministic-codebook"
     model_notes: list[str] = field(default_factory=list)
     model_memo: dict[str, Any] = field(default_factory=dict)
+    session_verdict: dict[str, Any] = field(default_factory=dict)
     def to_dict(self) -> dict[str, Any]:
         return {
             "engine": self.engine,
             "model_notes": self.model_notes,
             "model_memo": self.model_memo,
+            "session_verdict": self.session_verdict,
         }

tests/test_model_runtime.py CHANGED Viewed

@@ -14,16 +14,45 @@ from model_runtime import (
     QUICK_MODEL_ID,
     _chat_template_kwargs,
     _prepare_generation_inputs,
-    parse_model_json,
-    run_model_assist,
 )
-MEMO_JSON = {
-    "executive_memo": "The trace shows a visible upload-boundary correction.",
-    "detour_memo": "E01 narrows scope instead of changing the parser.",
-    "outcome_audit_memo": "The agent keeps a deployment caveat visible.",
-    "caveats": ["Model memo is based only on redacted narrative."],
 }
@@ -37,7 +66,7 @@ class RecordingGenerator:
         self.calls.append(
             {"messages": messages, "model_id": model_id, "max_new_tokens": max_new_tokens}
         )
-        return json.dumps(MEMO_JSON)
 class FakeTensor:
@@ -57,46 +86,62 @@ class ModelRuntimeTests(unittest.TestCase):
         self.assertIn("NVIDIA Nemotron 3 Nano 30B-A3B", label)
         self.assertNotIn("small", label.lower())
-    def test_parse_model_json_validates_required_shape(self) -> None:
-        memo = parse_model_json(json.dumps(MEMO_JSON))
-        self.assertEqual(memo["executive_memo"], MEMO_JSON["executive_memo"])
-        self.assertEqual(memo["caveats"], MEMO_JSON["caveats"])
-    def test_parse_model_json_recovers_from_code_fence(self) -> None:
-        memo = parse_model_json("```json\n" + json.dumps(MEMO_JSON) + "\n```")
-        self.assertEqual(memo["detour_memo"], MEMO_JSON["detour_memo"])
-    def test_parse_model_json_extracts_object_from_prose(self) -> None:
-        raw = "Here is the analysis:\n" + json.dumps(MEMO_JSON) + "\nHope this helps."
-        memo = parse_model_json(raw)
-        self.assertEqual(memo["outcome_audit_memo"], MEMO_JSON["outcome_audit_memo"])
-    def test_parse_model_json_uses_final_object_after_thinking_braces(self) -> None:
         raw = (
             "<think>Draft {not json} and a scratch object "
             '{"draft": "ignore this"} before the final answer.</think>\n'
-            + json.dumps(MEMO_JSON)
         )
-        memo = parse_model_json(raw)
-        self.assertEqual(memo["executive_memo"], MEMO_JSON["executive_memo"])
-    def test_run_model_assist_uses_selected_model(self) -> None:
-        result, narrative = analyze_trace_file(Path("examples/sample_trace_redacted.jsonl"))
         generate = RecordingGenerator()
-        assist = run_model_assist(
             engine="nemotron",
-            result=result,
-            narrative_text=narrative,
             generate=generate,
         )
-        self.assertEqual(assist.model_id, PRIMARY_MODEL_ID)
-        self.assertIn("upload-boundary", assist.memo["executive_memo"])
         self.assertEqual(generate.calls[0]["model_id"], PRIMARY_MODEL_ID)
         self.assertEqual(generate.calls[0]["max_new_tokens"], MODEL_MAX_NEW_TOKENS)
@@ -121,12 +166,6 @@ class ModelRuntimeTests(unittest.TestCase):
         self.assertEqual(generation_inputs["input_ids"], input_ids)
         self.assertEqual(generation_inputs["attention_mask"], attention_mask)
         self.assertEqual(prompt_tokens, 21)
-        self.assertEqual(input_ids.device, "cuda")
-        self.assertEqual(attention_mask.device, "cuda")
-    def test_qwen_chat_template_enables_thinking(self) -> None:
-        self.assertEqual(_chat_template_kwargs(QUICK_MODEL_ID), {"enable_thinking": True})
-        self.assertEqual(_chat_template_kwargs(PRIMARY_MODEL_ID), {})
     def test_analyzer_records_unknown_engine_note(self) -> None:
         result, _ = analyze_trace_file(
@@ -138,30 +177,61 @@ class ModelRuntimeTests(unittest.TestCase):
         self.assertIn("Unknown analysis engine", result.model_notes[0])
     def test_analyzer_model_error_note_avoids_double_period(self) -> None:
-        with patch("analyzer.run_model_assist", side_effect=ValueError("model unavailable.")):
             result, _ = analyze_trace_file(
                 Path("examples/sample_trace_redacted.jsonl"),
-                analysis_engine="qwen",
             )
         self.assertTrue(result.model_notes)
         self.assertNotIn("..", result.model_notes[0])
         self.assertIn("ValueError: model unavailable.", result.model_notes[0])
-    def test_analyzer_records_model_engine_on_success(self) -> None:
-        with patch("analyzer.run_model_assist") as run_model_assist:
-            run_model_assist.return_value = types.SimpleNamespace(
                 model_id=PRIMARY_MODEL_ID,
-                memo=dict(MEMO_JSON),
-                note="ok",
             )
             result, _ = analyze_trace_file(
                 Path("examples/sample_trace_redacted.jsonl"),
                 analysis_engine="nemotron",
             )
-        self.assertIn(PRIMARY_MODEL_ID, result.engine)
-        self.assertNotIn("token", run_model_assist.call_args.kwargs)
 if __name__ == "__main__":

     QUICK_MODEL_ID,
     _chat_template_kwargs,
     _prepare_generation_inputs,
+    parse_analysis_json,
+    resolve_device,
+    run_model_analysis,
 )
+ANALYSIS_JSON = {
+    "verdict": {
+        "tone": "partial",
+        "headline": "Reroute landed with a caveat.",
+        "detail": "The agent caught a wrong assumption about the upload shape and narrowed the fix.",
+        "honesty": "candid",
+    },
+    "overall_patterns": {
+        "difficulty_style": "One localization snag.",
+        "detour_style": "A productive narrowing.",
+        "recovery_style": "Reflective.",
+        "risk_or_caveat": "Deployment path left unverified.",
+    },
+    "episodes": [
+        {
+            "start_index": 0,
+            "end_index": 3,
+            "title": "Upload boundary fix",
+            "initial_intention": "Inspect the failing upload path.",
+            "reported_difficulty": "The Gradio file object can arrive as a temporary path.",
+            "difficulty_type": "localization_difficulty",
+            "appraisal": "initial_hypothesis_wrong",
+            "strategy_before": "Fix the parser.",
+            "strategy_after": "Narrow the fix to the upload boundary.",
+            "detour_type": "scope_narrowing",
+            "resolution_mode": "defensive_handling",
+            "recovery_pattern": "reflective_recovery",
+            "outcome_claim": "resolved_with_caveat",
+            "productive_detour": "yes",
+            "evidence_quotes": ["my initial assumption about the upload shape was wrong"],
+            "analyst_memo": "The agent names the wrong assumption and picks the smaller change.",
+        }
+    ],
 }
         self.calls.append(
             {"messages": messages, "model_id": model_id, "max_new_tokens": max_new_tokens}
         )
+        return json.dumps(ANALYSIS_JSON)
 class FakeTensor:
         self.assertIn("NVIDIA Nemotron 3 Nano 30B-A3B", label)
         self.assertNotIn("small", label.lower())
+    def test_minicpm_is_the_quick_engine(self) -> None:
+        self.assertEqual(MODEL_CHOICES["minicpm"]["model_id"], QUICK_MODEL_ID)
+        self.assertIn("MiniCPM5 1B", str(MODEL_CHOICES["minicpm"]["label"]))
+        self.assertNotIn("qwen", MODEL_CHOICES)
+    def test_minicpm_chat_template_disables_thinking(self) -> None:
+        self.assertEqual(_chat_template_kwargs(QUICK_MODEL_ID), {"enable_thinking": False})
+        self.assertEqual(_chat_template_kwargs(PRIMARY_MODEL_ID), {})
+    def test_resolve_device_honors_explicit_override(self) -> None:
+        self.assertEqual(resolve_device("cpu"), "cpu")
+        self.assertEqual(resolve_device("cuda"), "cuda")
+        self.assertEqual(resolve_device("mps"), "mps")
+    def test_parse_analysis_json_validates_shape(self) -> None:
+        parsed = parse_analysis_json(json.dumps(ANALYSIS_JSON))
+        self.assertEqual(len(parsed["episodes"]), 1)
+        self.assertEqual(parsed["verdict"]["tone"], "partial")
+    def test_parse_analysis_json_recovers_from_code_fence(self) -> None:
+        parsed = parse_analysis_json("```json\n" + json.dumps(ANALYSIS_JSON) + "\n```")
+        self.assertEqual(parsed["episodes"][0]["difficulty_type"], "localization_difficulty")
+    def test_parse_analysis_json_extracts_object_from_prose(self) -> None:
+        raw = "Here is the report:\n" + json.dumps(ANALYSIS_JSON) + "\nDone."
+        parsed = parse_analysis_json(raw)
+        self.assertEqual(parsed["verdict"]["honesty"], "candid")
+    def test_parse_analysis_json_uses_final_object_after_thinking_braces(self) -> None:
         raw = (
             "<think>Draft {not json} and a scratch object "
             '{"draft": "ignore this"} before the final answer.</think>\n'
+            + json.dumps(ANALYSIS_JSON)
         )
+        parsed = parse_analysis_json(raw)
+        self.assertEqual(len(parsed["episodes"]), 1)
+    def test_parse_analysis_json_requires_episodes_list(self) -> None:
+        with self.assertRaises(ValueError):
+            parse_analysis_json(json.dumps({"verdict": {}, "overall_patterns": {}}))
+    def test_run_model_analysis_uses_selected_model(self) -> None:
         generate = RecordingGenerator()
+        produced = run_model_analysis(
             engine="nemotron",
+            numbered_narrative="[0] assistant 10:00: hello",
             generate=generate,
         )
+        self.assertEqual(produced.model_id, PRIMARY_MODEL_ID)
+        self.assertEqual(len(produced.analysis["episodes"]), 1)
         self.assertEqual(generate.calls[0]["model_id"], PRIMARY_MODEL_ID)
         self.assertEqual(generate.calls[0]["max_new_tokens"], MODEL_MAX_NEW_TOKENS)
         self.assertEqual(generation_inputs["input_ids"], input_ids)
         self.assertEqual(generation_inputs["attention_mask"], attention_mask)
         self.assertEqual(prompt_tokens, 21)
     def test_analyzer_records_unknown_engine_note(self) -> None:
         result, _ = analyze_trace_file(
         self.assertIn("Unknown analysis engine", result.model_notes[0])
     def test_analyzer_model_error_note_avoids_double_period(self) -> None:
+        with patch("analyzer.run_model_analysis", side_effect=ValueError("model unavailable.")):
             result, _ = analyze_trace_file(
                 Path("examples/sample_trace_redacted.jsonl"),
+                analysis_engine="minicpm",
             )
         self.assertTrue(result.model_notes)
         self.assertNotIn("..", result.model_notes[0])
         self.assertIn("ValueError: model unavailable.", result.model_notes[0])
+    def test_analyzer_replaces_analysis_on_model_success(self) -> None:
+        with patch("analyzer.run_model_analysis") as run:
+            run.return_value = types.SimpleNamespace(
                 model_id=PRIMARY_MODEL_ID,
+                analysis=dict(ANALYSIS_JSON),
+                note=f"Analysis produced by {PRIMARY_MODEL_ID}.",
             )
             result, _ = analyze_trace_file(
                 Path("examples/sample_trace_redacted.jsonl"),
                 analysis_engine="nemotron",
             )
+        self.assertEqual(result.engine, PRIMARY_MODEL_ID)
+        self.assertEqual(result.session_verdict["tone"], "partial")
+        self.assertEqual(result.episodes[0].episode_id, "E01")
+        self.assertEqual(result.episodes[0].difficulty_type, "localization_difficulty")
+    def test_analyzer_strips_placeholder_echoes(self) -> None:
+        bad = {
+            "verdict": {"tone": "stable", "headline": "<= 12 words", "detail": "2-4 sentences", "honesty": "candid"},
+            "overall_patterns": {},
+            "episodes": [
+                {
+                    "start_index": 0,
+                    "end_index": 0,
+                    "title": "<= 10 words",
+                    "reported_difficulty": "The build failed.",
+                    "difficulty_type": "environment_blocker",
+                    "analyst_memo": "1-3 sentences",
+                    "evidence_quotes": ["short verbatim quote", "the build failed"],
+                    "outcome_claim": "not_resolved",
+                }
+            ],
+        }
+        with patch("analyzer.run_model_analysis") as run:
+            run.return_value = types.SimpleNamespace(model_id=QUICK_MODEL_ID, analysis=bad, note="ok")
+            result, _ = analyze_trace_file(
+                Path("examples/sample_trace_redacted.jsonl"), analysis_engine="minicpm"
+            )
+        episode = result.episodes[0]
+        self.assertEqual(episode.title, "The build failed.")  # placeholder -> reported_difficulty
+        self.assertEqual(episode.analyst_memo, "")  # "1-3 sentences" stripped
+        self.assertEqual(episode.evidence_quotes, ["the build failed"])  # placeholder quote dropped
+        self.assertNotIn("<", result.session_verdict["headline"])
 if __name__ == "__main__":

tests/test_privacy_filter.py ADDED Viewed

	@@ -0,0 +1,179 @@

+from __future__ import annotations
+import unittest
+from pathlib import Path
+from analyzer import stream_deterministic_analysis
+from privacy_filter import PII_TYPES, redact_texts
+from redaction import RedactionResult
+def fake_detect(texts: list[str]) -> list[list[dict]]:
+    """Stand-in detector: flags "Alice Smith" and "555-1234" without torch."""
+    results = []
+    for text in texts:
+        spans = []
+        person = text.find("Alice Smith")
+        if person != -1:
+            spans.append({"start": person, "end": person + len("Alice Smith"), "label": "private_person"})
+        phone = text.find("555-1234")
+        if phone != -1:
+            spans.append({"start": phone, "end": phone + len("555-1234"), "label": "private_phone"})
+        results.append(spans)
+    return results
+def _drain(stream):
+    result = None
+    for kind, payload in stream:
+        if kind == "result":
+            result = payload[0]
+    assert result is not None
+    return result
+class PrivacyFilterMaskingTests(unittest.TestCase):
+    def test_redact_texts_masks_detected_spans(self) -> None:
+        texts = ["Call Alice Smith at 555-1234 tomorrow.", "no pii here"]
+        results = redact_texts(texts, detect=fake_detect)
+        self.assertIsInstance(results[0], RedactionResult)
+        self.assertNotIn("Alice Smith", results[0].text)
+        self.assertNotIn("555-1234", results[0].text)
+        self.assertIn(PII_TYPES["private_person"][0], results[0].text)
+        self.assertIn(PII_TYPES["private_phone"][0], results[0].text)
+        self.assertEqual(results[0].count, 2)
+        self.assertEqual(results[1].count, 0)
+        self.assertEqual(results[1].text, "no pii here")
+    def test_notes_are_human_readable(self) -> None:
+        results = redact_texts(["Alice Smith"], detect=fake_detect)
+        self.assertIn("personal name: 1", results[0].notes)
+    def test_malformed_and_overlapping_spans_are_skipped(self) -> None:
+        def detect(texts: list[str]) -> list[list[dict]]:
+            return [
+                [
+                    {"start": 0, "end": 999, "label": "secret"},  # out of range
+                    {"start": 2, "end": 2, "label": "secret"},  # zero width
+                ]
+            ]
+        results = redact_texts(["abc"], detect=detect)
+        self.assertEqual(results[0].text, "abc")
+        self.assertEqual(results[0].count, 0)
+    def test_unknown_labels_are_ignored(self) -> None:
+        def detect(texts: list[str]) -> list[list[dict]]:
+            return [[{"start": 0, "end": 3, "label": "not_a_pii_type"}]]
+        results = redact_texts(["abc"], detect=detect)
+        self.assertEqual(results[0].text, "abc")
+        self.assertEqual(results[0].count, 0)
+    def test_bioes_fragments_merge_into_one_placeholder(self) -> None:
+        # The real model fragments "Alice Smith" into touching same-label spans
+        # ("Alice" + " Smith"); they must collapse to a single placeholder.
+        def detect(texts: list[str]) -> list[list[dict]]:
+            return [
+                [
+                    {"start": 0, "end": 5, "label": "private_person"},  # Alice
+                    {"start": 5, "end": 11, "label": "private_person"},  # " Smith"
+                ]
+            ]
+        results = redact_texts(["Alice Smith calls"], detect=detect)
+        self.assertEqual(results[0].text.count("[REDACTED_NAME]"), 1)
+        self.assertEqual(results[0].count, 1)
+        self.assertEqual(results[0].text, "[REDACTED_NAME] calls")
+    def test_same_label_spans_with_one_char_gap_merge(self) -> None:
+        def detect(texts: list[str]) -> list[list[dict]]:
+            return [
+                [
+                    {"start": 0, "end": 5, "label": "private_person"},  # Alice
+                    {"start": 6, "end": 11, "label": "private_person"},  # Smith (gap = space)
+                ]
+            ]
+        results = redact_texts(["Alice Smith"], detect=detect)
+        self.assertEqual(results[0].count, 1)
+    def test_different_label_adjacent_spans_stay_separate(self) -> None:
+        def detect(texts: list[str]) -> list[list[dict]]:
+            return [
+                [
+                    {"start": 0, "end": 5, "label": "private_person"},
+                    {"start": 6, "end": 14, "label": "private_phone"},
+                ]
+            ]
+        results = redact_texts(["Alice 555-1234"], detect=detect)
+        self.assertEqual(results[0].count, 2)
+        self.assertIn(PII_TYPES["private_person"][0], results[0].text)
+        self.assertIn(PII_TYPES["private_phone"][0], results[0].text)
+class StreamRedactionIntegrationTests(unittest.TestCase):
+    SAMPLE = Path("examples/sample_trace_redacted.jsonl")
+    def test_stream_records_ai_privacy_note_when_model_runs(self) -> None:
+        def passthrough(texts: list[str]) -> list[RedactionResult]:
+            return [RedactionResult(text=text, notes=[], count=0) for text in texts]
+        result = _drain(stream_deterministic_analysis(self.SAMPLE, model_redact=passthrough))
+        self.assertTrue(any("AI privacy filter (openai/privacy-filter)" in note for note in result.privacy_notes))
+    def test_stream_falls_back_gracefully_when_model_unavailable(self) -> None:
+        def boom(texts: list[str]) -> list[RedactionResult]:
+            raise RuntimeError("no gpu here")
+        result = _drain(stream_deterministic_analysis(self.SAMPLE, model_redact=boom))
+        self.assertTrue(any("AI privacy filter was unavailable" in note for note in result.privacy_notes))
+        # Regex redaction still ran on the sample (it embeds an email + token).
+        self.assertGreater(result.redaction_count, 0)
+    def test_redact_progress_streams_per_chunk(self) -> None:
+        events = [
+            payload
+            for kind, payload in stream_deterministic_analysis(
+                self.SAMPLE, stream_redact_progress=True
+            )
+            if kind == "progress" and payload.get("stage") == "redact"
+        ]
+        # 4-message sample -> chunk size 1 -> one redact event per message.
+        self.assertGreaterEqual(len(events), 2)
+        processed = [event["processed"] for event in events]
+        self.assertEqual(processed, sorted(processed))  # monotonically advancing
+        self.assertEqual(events[-1]["processed"], events[-1]["total"])  # finishes at total
+        self.assertTrue(all(event["total"] == events[0]["total"] for event in events))
+    def test_model_redaction_count_adds_to_regex_count(self) -> None:
+        def mask_first_word(texts: list[str]) -> list[RedactionResult]:
+            out = []
+            for text in texts:
+                if text:
+                    out.append(RedactionResult(text="[REDACTED_NAME]" + text, notes=["personal name: 1"], count=1))
+                else:
+                    out.append(RedactionResult(text=text, notes=[], count=0))
+            return out
+        regex_only = _drain(stream_deterministic_analysis(self.SAMPLE))
+        combined = _drain(stream_deterministic_analysis(self.SAMPLE, model_redact=mask_first_word))
+        self.assertGreater(combined.redaction_count, regex_only.redaction_count)
+if __name__ == "__main__":
+    unittest.main()

tests/test_profiling.py ADDED Viewed

	@@ -0,0 +1,37 @@

+from __future__ import annotations
+import unittest
+from profiling import Profiler, format_snapshot, resource_snapshot
+class ProfilingTests(unittest.TestCase):
+    def test_resource_snapshot_never_raises_and_returns_dict(self) -> None:
+        snap = resource_snapshot()
+        self.assertIsInstance(snap, dict)
+    def test_format_snapshot_is_string(self) -> None:
+        self.assertIsInstance(format_snapshot(resource_snapshot()), str)
+        self.assertEqual(format_snapshot({}), "n/a")
+    def test_profiler_records_stages_meta_and_summarizes(self) -> None:
+        prof = Profiler("test")
+        prof.record("extract", 0.012)
+        prof.record("redact", 0.034)
+        prof.mark(messages=4, engine="deterministic")
+        self.assertEqual([name for name, _ in prof.stages], ["extract", "redact"])
+        self.assertEqual(prof.meta["messages"], 4)
+        self.assertGreaterEqual(prof.elapsed(), 0.0)
+        prof.summary()  # must not raise
+    def test_stage_context_manager_records_duration(self) -> None:
+        prof = Profiler("test")
+        with prof.stage("chart"):
+            pass
+        self.assertEqual(prof.stages[-1][0], "chart")
+        self.assertGreaterEqual(prof.stages[-1][1], 0.0)
+if __name__ == "__main__":
+    unittest.main()

view_model.py CHANGED Viewed

@@ -71,7 +71,7 @@ def build_view_model(
         "narrative_message_count": base["narrative_message_count"],
         "redaction_count": base["redaction_count"],
         "duration_total": _duration_total(raw_episodes),
-        "verdict": _verdict(episodes, base["overall_patterns"], result.model_memo),
         "overall_patterns": base["overall_patterns"],
         "privacy_notes": list(base["privacy_notes"]) + list(base.get("model_notes") or []),
         "episodes": episodes,

         "narrative_message_count": base["narrative_message_count"],
         "redaction_count": base["redaction_count"],
         "duration_total": _duration_total(raw_episodes),
+        "verdict": base.get("session_verdict") or _verdict(episodes, base["overall_patterns"], result.model_memo),
         "overall_patterns": base["overall_patterns"],
         "privacy_notes": list(base["privacy_notes"]) + list(base.get("model_notes") or []),
         "episodes": episodes,