# Trace Field Notes: a field notebook for coding-agent sessions

Demo Space: https://huggingface.co/spaces/build-small-hackathon/trace-field-notes  
Demo video: https://huggingface.co/spaces/build-small-hackathon/trace-field-notes/resolve/main/assets/trace-field-notes-demo.mp4  
GitHub: https://github.com/JacobLinCool/trace-field-notes

## The problem

Coding-agent sessions are getting longer. A serious Codex or Claude Code run can
include planning, shell commands, failed tests, patches, retries, summaries,
caveats, and a confident final message. After the run, the code diff tells you
what changed, but it does not explain the route the agent took.

That route matters. Did the agent understand the task? Did it get blocked? Did it
notice when its first hypothesis was wrong? Did it take a productive detour, or
just wander? Did its final success claim match what it had actually verified?

Trace Field Notes is built around that narrow but real problem: make coding-agent
sessions readable after the fact.

## The idea

Instead of treating a trace as raw telemetry, Trace Field Notes treats it like
qualitative field data. It reads the visible narrative messages the agent wrote:
what it planned, where it got stuck, how it rerouted, what it tried, and how it
closed.

The result is not a leaderboard or correctness oracle. It is a field report:

- a session verdict;
- a trail map of difficulty episodes;
- per-episode intention, difficulty, reroute, evidence, and analyst memo;
- terrain groups showing recurring difficulty types;
- a detour read separating exploration from wandering;
- a closeout audit comparing the final claim to the agent's own evidence;
- a redacted narrative export.

## The experience

The first screen is the actual tool, not a landing page. You upload a Codex,
Claude Code, or Pi Agent log, choose whether to include user context, keep
redaction on, and select an engine:

- Quick analysis with `openbmb/MiniCPM5-1B`
- Deeper analysis with `nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16`
- Rule-based analysis with no model

The app streams progress through the real pipeline stages, then opens the field
report. The custom React UI is intentionally notebook-like: quiet, dense,
scan-friendly, and centered on the trail map rather than a chat transcript.

## How it works

Trace Field Notes is a Gradio Space, but the UI is not built from stock Gradio
blocks. `app.py` uses `gradio.Server` to serve a custom React frontend and expose
an `analyze_trace` endpoint compatible with `@gradio/client`.

The backend pipeline is small and explicit:

1. `parser.py` normalizes Codex, Claude Code, Pi Agent, JSONL, JSON, log, and text
   files into visible narrative messages.
2. `redaction.py` masks likely secrets and private data with deterministic
   patterns.
3. `privacy_filter.py` can add a second model pass with `openai/privacy-filter`.
4. `analyzer.py` charts difficulty episodes and classifies them against a
   codebook.
5. `model_runtime.py` can ask MiniCPM5 1B or Nemotron 3 Nano 30B-A3B to write a
   richer structured analysis.
6. `view_model.py` packages the verdict, trail map, sections, and export text for
   the frontend.

The small-model paths run under Hugging Face ZeroGPU when GPU mode is selected.
CPU mode remains available for no-quota runs, and the deterministic analyzer is
tested independently.

## Why it fits Build Small

This is a Backyard AI project: it solves a specific problem for a specific group
of people, using small enough models and a focused interface. It is also a good
fit for several Build Small quests:

- Best Use of Codex: Codex helped build, debug, document, package, and demo the
  project, with Codex-attributed commits in the connected GitHub repo.
- Best MiniCPM Build: the quick analysis path uses MiniCPM5 1B.
- Nemotron Hardware Prize: the deeper analysis path uses Nemotron 3 Nano
  30B-A3B.
- Off Brand: the app uses a custom React trail-map interface through
  `gradio.Server`.
- Best Demo: the submission includes a polished narrated demo and social post
  draft.

## Challenges

The hardest part was defining the right unit of analysis. A tool call is too
low-level. A full trace is too broad. The useful unit became a "difficulty
episode": the span where the agent intended to do something, encountered a
problem, appraised it, rerouted, attempted a resolution, and made an outcome
claim.

Another challenge was privacy. Agent traces can contain secrets, paths, user
prompts, screenshots, and private code. The app therefore ignores raw tool
contents by default, redacts before analysis, and frames its output as a report
on visible narrative rather than hidden reasoning.

## Codex's role

Codex was used throughout the project: inspecting the repository, implementing
backend and frontend changes, debugging model/runtime behavior, writing tests,
checking privacy handling, preparing hackathon documentation, generating the demo
storyboard, recording app footage, composing the video, and validating the final
output with frames and ASR.

That is part of the story: Trace Field Notes is an app about understanding coding
agents, built with help from a coding agent, and submitted with an audit trail in
GitHub.