# Trace Field Notes: a field notebook for coding-agent sessions Demo Space: https://huggingface.co/spaces/build-small-hackathon/trace-field-notes Demo video: https://huggingface.co/spaces/build-small-hackathon/trace-field-notes/resolve/main/assets/trace-field-notes-demo.mp4 GitHub: https://github.com/JacobLinCool/trace-field-notes ## The problem Coding-agent sessions are getting longer. A serious Codex or Claude Code run can include planning, shell commands, failed tests, patches, retries, summaries, caveats, and a confident final message. After the run, the code diff tells you what changed, but it does not explain the route the agent took. That route matters. Did the agent understand the task? Did it get blocked? Did it notice when its first hypothesis was wrong? Did it take a productive detour, or just wander? Did its final success claim match what it had actually verified? Trace Field Notes is built around that narrow but real problem: make coding-agent sessions readable after the fact. ## The idea Instead of treating a trace as raw telemetry, Trace Field Notes treats it like qualitative field data. It reads the visible narrative messages the agent wrote: what it planned, where it got stuck, how it rerouted, what it tried, and how it closed. The result is not a leaderboard or correctness oracle. It is a field report: - a session verdict; - a trail map of difficulty episodes; - per-episode intention, difficulty, reroute, evidence, and analyst memo; - terrain groups showing recurring difficulty types; - a detour read separating exploration from wandering; - a closeout audit comparing the final claim to the agent's own evidence; - a redacted narrative export. ## The experience The first screen is the actual tool, not a landing page. You upload a Codex, Claude Code, or Pi Agent log, choose whether to include user context, keep redaction on, and select an engine: - Quick analysis with `openbmb/MiniCPM5-1B` - Deeper analysis with `nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16` - Rule-based analysis with no model The app streams progress through the real pipeline stages, then opens the field report. The custom React UI is intentionally notebook-like: quiet, dense, scan-friendly, and centered on the trail map rather than a chat transcript. ## How it works Trace Field Notes is a Gradio Space, but the UI is not built from stock Gradio blocks. `app.py` uses `gradio.Server` to serve a custom React frontend and expose an `analyze_trace` endpoint compatible with `@gradio/client`. The backend pipeline is small and explicit: 1. `parser.py` normalizes Codex, Claude Code, Pi Agent, JSONL, JSON, log, and text files into visible narrative messages. 2. `redaction.py` masks likely secrets and private data with deterministic patterns. 3. `privacy_filter.py` can add a second model pass with `openai/privacy-filter`. 4. `analyzer.py` charts difficulty episodes and classifies them against a codebook. 5. `model_runtime.py` can ask MiniCPM5 1B or Nemotron 3 Nano 30B-A3B to write a richer structured analysis. 6. `view_model.py` packages the verdict, trail map, sections, and export text for the frontend. The small-model paths run under Hugging Face ZeroGPU when GPU mode is selected. CPU mode remains available for no-quota runs, and the deterministic analyzer is tested independently. ## Why it fits Build Small This is a Backyard AI project: it solves a specific problem for a specific group of people, using small enough models and a focused interface. It is also a good fit for several Build Small quests: - Best Use of Codex: Codex helped build, debug, document, package, and demo the project, with Codex-attributed commits in the connected GitHub repo. - Best MiniCPM Build: the quick analysis path uses MiniCPM5 1B. - Nemotron Hardware Prize: the deeper analysis path uses Nemotron 3 Nano 30B-A3B. - Off Brand: the app uses a custom React trail-map interface through `gradio.Server`. - Best Demo: the submission includes a polished narrated demo and social post draft. ## Challenges The hardest part was defining the right unit of analysis. A tool call is too low-level. A full trace is too broad. The useful unit became a "difficulty episode": the span where the agent intended to do something, encountered a problem, appraised it, rerouted, attempted a resolution, and made an outcome claim. Another challenge was privacy. Agent traces can contain secrets, paths, user prompts, screenshots, and private code. The app therefore ignores raw tool contents by default, redacts before analysis, and frames its output as a report on visible narrative rather than hidden reasoning. ## Codex's role Codex was used throughout the project: inspecting the repository, implementing backend and frontend changes, debugging model/runtime behavior, writing tests, checking privacy handling, preparing hackathon documentation, generating the demo storyboard, recording app footage, composing the video, and validating the final output with frames and ASR. That is part of the story: Trace Field Notes is an app about understanding coding agents, built with help from a coding agent, and submitted with an audit trail in GitHub.