--- title: Trace Field Notes emoji: 🧭 colorFrom: green colorTo: gray sdk: gradio sdk_version: 6.16.0 app_file: app.py pinned: false license: mit short_description: Qualitative field reports for coding-agent session traces. tags: - build-small - backyard-ai - best-demo - off-brand - best-use-of-codex - best-minicpm-build - nemotron-hardware-prize - gradio-server - zerogpu - coding-agents models: - openbmb/MiniCPM5-1B - nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 - openai/privacy-filter --- # Trace Field Notes Trace Field Notes turns long coding-agent session logs into qualitative field reports: where the agent got stuck, how it detoured, what it tried, how it recovered, and whether its final claim matched its own evidence. Most agent traces are too long to read after the fact. Tool telemetry is noisy, private, and often the wrong level of detail. This app focuses on a narrower question: what did the agent *say* about its own work while it was solving a task? The answer becomes a field notebook, not a benchmark. ## Links - Live Space: https://huggingface.co/spaces/build-small-hackathon/trace-field-notes - App runtime: https://build-small-hackathon-trace-field-notes.hf.space/ - GitHub: https://github.com/JacobLinCool/trace-field-notes - Demo video: https://huggingface.co/spaces/build-small-hackathon/trace-field-notes/resolve/main/assets/trace-field-notes-demo.mp4 - Article draft: [`docs/article.md`](docs/article.md) - Social post draft: [`docs/social-post.md`](docs/social-post.md) - Public social post: **pending manual publish**. After publishing, replace this line with the post URL before final submission. ## Who it is for Trace Field Notes is for developers, researchers, and hackathon builders who use Codex, Claude Code, Pi Agent, or similar coding agents and want to understand the session narrative after the code is written: - Was the agent blocked, or just exploring? - Did it change strategy for a good reason? - Did a detour produce a better route? - Did the closeout claim overstate what was verified? - What can the next run learn from this one? The app does **not** claim to inspect hidden reasoning or prove that the final code is correct. It reports the visible narrative the agent wrote. ## How to use it 1. Find a local coding-agent session log. 2. Review and redact anything sensitive before upload. 3. Upload `.jsonl`, `.json`, `.txt`, or `.log`. 4. Choose the analysis engine: - **Quick analysis**: `openbmb/MiniCPM5-1B` - **Deeper analysis**: `nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16` - **Rule-based**: deterministic codebook, no model 5. Choose **GPU** for the Hugging Face ZeroGPU path or **CPU** for a no-quota run. 6. Read the report: verdict, trail map, episode detail, terrain groups, detour analysis, closeout audit, and redacted narrative export. Common local trace locations: ```bash # Codex ls ~/.codex/sessions # Claude Code ls ~/.claude/projects # Pi Agent ls ~/.pi/agent/sessions ``` ## Technology The frontend is a custom React field-notebook UI served through `gradio.Server`. It deliberately avoids the default Gradio component look so the report feels like a qualitative trail map rather than a form. The backend pipeline is: 1. `parser.py` loads Codex, Claude Code, Pi Agent, JSONL, JSON, text, and log files into visible narrative messages. 2. `redaction.py` applies deterministic secret and PII patterns. 3. `privacy_filter.py` optionally adds `openai/privacy-filter` on the Space GPU. 4. `analyzer.py` identifies difficulty episodes and classifies them with a deterministic codebook. 5. `model_runtime.py` optionally asks MiniCPM5 1B or Nemotron 3 Nano 30B-A3B to rewrite the analysis into a richer structured field report. 6. `view_model.py` adapts the result into the JSON shape rendered by the UI. 7. `profiling.py` logs per-stage timing and resource snapshots to server logs. The app streams real progress events so long runs do not look frozen: upload, extract, redact, chart, classify, synthesize, and model analysis. ## Build Small fit Trace Field Notes targets the **Backyard AI** track: it solves a specific, practical problem for people already using coding agents. It also targets these Build Small prizes / badges: - **Best Use of Codex**: Codex helped develop, debug, package, document, and produce the demo video. The connected GitHub history includes Codex-attributed commits. - **Best MiniCPM Build**: Quick analysis uses `openbmb/MiniCPM5-1B`. - **Nemotron Hardware Prize**: Deeper analysis uses `nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16`. - **Off Brand**: the app uses `gradio.Server` with a custom React trail-map UI, not stock Gradio blocks. - **Best Demo**: the repo includes a polished demo video and ready-to-post article/social drafts. It does **not** target Tiny Titan because the optional Nemotron path is 30B, and it does **not** target Best Use of Modal because the runtime is Hugging Face ZeroGPU / CPU, not Modal. ## Privacy posture Agent traces can include prompts, tool inputs, command output, local paths, screenshots, secrets, private source code, and personal data. Review and redact before uploading or sharing. By default, Trace Field Notes: - ignores raw tool-call contents; - analyzes only visible assistant narrative messages plus optional user context; - runs deterministic secret redaction; - can run `openai/privacy-filter` for a second PII pass; - exports only redacted narrative text. ## Local development ```bash python3.11 -m venv .venv source .venv/bin/activate pip install -r requirements.txt python app.py ``` Run tests: ```bash python3.11 -m unittest discover -s tests ``` Optional environment settings are listed in [`.env.example`](.env.example). ## Codex contribution Codex assisted with repository inspection, implementation debugging, test verification, privacy/README hardening, Hugging Face deployment preparation, demo-video scripting, voiceover generation, video composition, frame/ASR verification, and hackathon submission packaging.