feat: deploy evidence investigation agent and fine-tuned router
Browse filesBuilt with OpenAI Codex: deploy the bounded agent, visible investigation trace, public fine-tuned router integration, and Field Notes.
- .gitignore +1 -1
- FIELD_NOTES.md +83 -0
- README.md +15 -6
- app.py +5 -0
- data/router_training.jsonl +54 -0
- frontend/app.js +12 -0
- frontend/index.html +11 -0
- frontend/styles.css +2 -1
- requirements.txt +2 -0
- scripts/export_traces.py +42 -0
- scripts/train_router.py +135 -0
- src/packetcourt/audit.py +7 -0
- src/packetcourt/evidence_router.py +39 -0
- src/packetcourt/investigator.py +74 -0
- src/packetcourt/models.py +16 -0
- tests/test_audit.py +7 -0
- traces/README.md +28 -0
- traces/packetcourt_traces.jsonl +10 -0
.gitignore
CHANGED
|
@@ -3,4 +3,4 @@ __pycache__/
|
|
| 3 |
.venv/
|
| 4 |
*.pyc
|
| 5 |
.env
|
| 6 |
-
|
|
|
|
| 3 |
.venv/
|
| 4 |
*.pyc
|
| 5 |
.env
|
| 6 |
+
router_model/
|
FIELD_NOTES.md
ADDED
|
@@ -0,0 +1,83 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Field Notes: Building PacketCourt
|
| 2 |
+
|
| 3 |
+
## The packet takes the stand
|
| 4 |
+
|
| 5 |
+
PacketCourt began with a narrow household problem: a food packet's front is
|
| 6 |
+
designed to persuade, while the evidence needed to interpret that persuasion
|
| 7 |
+
is scattered across the back. A shopper should not need to understand serving
|
| 8 |
+
bases, ingredient ordering, date arithmetic, or regulatory language while
|
| 9 |
+
standing in a grocery aisle.
|
| 10 |
+
|
| 11 |
+
The first idea was a nutrition scanner. That was too broad and too easy to turn
|
| 12 |
+
into an unexplained health score. PacketCourt instead asks one auditable
|
| 13 |
+
question:
|
| 14 |
+
|
| 15 |
+
> Does the evidence printed on this packet support the impression created by
|
| 16 |
+
> its front?
|
| 17 |
+
|
| 18 |
+
## Small models as witnesses, not judges
|
| 19 |
+
|
| 20 |
+
The system deliberately separates three responsibilities:
|
| 21 |
+
|
| 22 |
+
1. OpenBMB MiniCPM-V-4.6 transcribes visible front and back label evidence.
|
| 23 |
+
2. A fine-tuned 4.4M-parameter PacketCourt router selects the evidence tools
|
| 24 |
+
required by each detected claim.
|
| 25 |
+
3. Deterministic code performs calculations and produces final verdicts.
|
| 26 |
+
|
| 27 |
+
The models can read and route an investigation. They cannot silently invent a
|
| 28 |
+
nutrition value or override the evidence standard.
|
| 29 |
+
|
| 30 |
+
## What the investigation agent does
|
| 31 |
+
|
| 32 |
+
Each packet creates a claim-dependent investigation plan. A `NO ADDED SUGAR`
|
| 33 |
+
claim sends the investigation toward ingredients. `HIGH PROTEIN` requires a
|
| 34 |
+
nutrition panel and its measurement basis. `FSSAI APPROVED` requires licensing
|
| 35 |
+
evidence and a warning that registration is not a health endorsement.
|
| 36 |
+
|
| 37 |
+
The agent stops in one of two explicit states:
|
| 38 |
+
|
| 39 |
+
- all evidence tools required by the detected claims completed; or
|
| 40 |
+
- required evidence is missing, so the audit returns a concrete request rather
|
| 41 |
+
than guessing.
|
| 42 |
+
|
| 43 |
+
Every plan, tool decision, evidence extraction, calculation, verdict, and
|
| 44 |
+
limitation is exported as a trace.
|
| 45 |
+
|
| 46 |
+
## A failed first fine-tune
|
| 47 |
+
|
| 48 |
+
The first evidence-router training run reached only `0.40` held-out accuracy.
|
| 49 |
+
The dataset was too small and its random split did not preserve every routing
|
| 50 |
+
class. That model was published privately but was not enabled in the product.
|
| 51 |
+
|
| 52 |
+
The corrected run balanced claim variants across five routing classes and used
|
| 53 |
+
a stratified held-out split. PacketCourt only enables the router after its
|
| 54 |
+
measured result is recorded in the model card and its suggestions remain
|
| 55 |
+
bounded by deterministic policy fallbacks.
|
| 56 |
+
|
| 57 |
+
## Persuasion Gap
|
| 58 |
+
|
| 59 |
+
Claim verification alone was not enough. A `HIGH PROTEIN` claim can be
|
| 60 |
+
technically supportable while a full packet also contains substantial sugar or
|
| 61 |
+
sodium. PacketCourt therefore calculates a **Persuasion Gap**: material
|
| 62 |
+
back-label context that competes with the impression emphasized on the front.
|
| 63 |
+
|
| 64 |
+
This is not a health score. The output cites the exact calculation and leaves
|
| 65 |
+
the decision with the user.
|
| 66 |
+
|
| 67 |
+
## Current evidence
|
| 68 |
+
|
| 69 |
+
- `9` unit tests pass.
|
| 70 |
+
- `35/35` golden-case checks pass across `10` packet cases.
|
| 71 |
+
- `10` transparent investigation traces are exported.
|
| 72 |
+
- The vision model has `1.30B` parameters.
|
| 73 |
+
- The fine-tuned evidence router has approximately `4.4M` parameters.
|
| 74 |
+
- The complete product interface is responsive and built on Gradio.
|
| 75 |
+
|
| 76 |
+
## What PacketCourt refuses to claim
|
| 77 |
+
|
| 78 |
+
PacketCourt does not declare food healthy, safe, illegal, or fraudulent. It
|
| 79 |
+
does not treat OCR as ground truth. It does not use an LLM to perform arithmetic
|
| 80 |
+
that deterministic code can perform exactly. When supplied evidence is
|
| 81 |
+
insufficient, the correct result is `CANNOT VERIFY`.
|
| 82 |
+
|
| 83 |
+
That refusal is not a missing feature. It is the product's standard of proof.
|
README.md
CHANGED
|
@@ -49,7 +49,10 @@ flowchart LR
|
|
| 49 |
Z --> V
|
| 50 |
V -->|"Label transcription"| M
|
| 51 |
|
| 52 |
-
M -->
|
|
|
|
|
|
|
|
|
|
| 53 |
P --> C["Claim-to-evidence audit"]
|
| 54 |
P --> N["Whole-packet nutrition math"]
|
| 55 |
P --> D["Expiry and date arithmetic"]
|
|
@@ -68,9 +71,11 @@ flowchart LR
|
|
| 68 |
```
|
| 69 |
|
| 70 |
Photo transcription uses the 1.30B-parameter OpenBMB `MiniCPM-V-4.6` through
|
| 71 |
-
a private ZeroGPU companion.
|
| 72 |
-
|
| 73 |
-
|
|
|
|
|
|
|
| 74 |
|
| 75 |
## What It Audits
|
| 76 |
|
|
@@ -133,16 +138,20 @@ python scripts/export_traces.py
|
|
| 133 |
|
| 134 |
Current deterministic evaluation result:
|
| 135 |
|
| 136 |
-
- `
|
| 137 |
- `35/35` golden-case checks passing across `10` cases
|
| 138 |
- `10` transparent traces exported
|
|
|
|
| 139 |
|
| 140 |
## Live Assets
|
| 141 |
|
| 142 |
- Main private product: https://huggingface.co/spaces/build-small-hackathon/packetcourt
|
| 143 |
- Private OpenBMB ZeroGPU vision companion: https://huggingface.co/spaces/build-small-hackathon/packetcourt-vision
|
| 144 |
- Private golden evaluation dataset: https://huggingface.co/datasets/build-small-hackathon/packetcourt-golden-cases
|
| 145 |
-
-
|
|
|
|
|
|
|
|
|
|
| 146 |
|
| 147 |
## Safety Boundary
|
| 148 |
|
|
|
|
| 49 |
Z --> V
|
| 50 |
V -->|"Label transcription"| M
|
| 51 |
|
| 52 |
+
M --> A["Investigation agent"]
|
| 53 |
+
A --> FR["Fine-tuned evidence router<br/>4.4M parameters"]
|
| 54 |
+
FR --> A
|
| 55 |
+
A --> P["Deterministic evidence parser<br/>CPU"]
|
| 56 |
P --> C["Claim-to-evidence audit"]
|
| 57 |
P --> N["Whole-packet nutrition math"]
|
| 58 |
P --> D["Expiry and date arithmetic"]
|
|
|
|
| 71 |
```
|
| 72 |
|
| 73 |
Photo transcription uses the 1.30B-parameter OpenBMB `MiniCPM-V-4.6` through
|
| 74 |
+
a private ZeroGPU companion. A fine-tuned 4.4M-parameter evidence router
|
| 75 |
+
selects the investigation tools required by each claim. The main CPU Space
|
| 76 |
+
performs deterministic evidence auditing, whole-packet calculations,
|
| 77 |
+
persuasion-gap analysis, and refusals. ZeroGPU is requested only while reading
|
| 78 |
+
photos.
|
| 79 |
|
| 80 |
## What It Audits
|
| 81 |
|
|
|
|
| 138 |
|
| 139 |
Current deterministic evaluation result:
|
| 140 |
|
| 141 |
+
- `9` unit tests passing
|
| 142 |
- `35/35` golden-case checks passing across `10` cases
|
| 143 |
- `10` transparent traces exported
|
| 144 |
+
- `1.000` held-out accuracy on the stratified evidence-router evaluation
|
| 145 |
|
| 146 |
## Live Assets
|
| 147 |
|
| 148 |
- Main private product: https://huggingface.co/spaces/build-small-hackathon/packetcourt
|
| 149 |
- Private OpenBMB ZeroGPU vision companion: https://huggingface.co/spaces/build-small-hackathon/packetcourt-vision
|
| 150 |
- Private golden evaluation dataset: https://huggingface.co/datasets/build-small-hackathon/packetcourt-golden-cases
|
| 151 |
+
- Public transparent agent traces: https://huggingface.co/datasets/build-small-hackathon/packetcourt-traces
|
| 152 |
+
- Fine-tuned evidence router: https://huggingface.co/build-small-hackathon/packetcourt-evidence-router
|
| 153 |
+
- Public router training set: https://huggingface.co/datasets/build-small-hackathon/packetcourt-router-training
|
| 154 |
+
- [Field Notes](FIELD_NOTES.md)
|
| 155 |
|
| 156 |
## Safety Boundary
|
| 157 |
|
app.py
CHANGED
|
@@ -63,6 +63,11 @@ def samples() -> dict:
|
|
| 63 |
@app.get("/api/model")
|
| 64 |
def model() -> dict:
|
| 65 |
status = model_status()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
if is_configured():
|
| 67 |
status.update(
|
| 68 |
enabled=True,
|
|
|
|
| 63 |
@app.get("/api/model")
|
| 64 |
def model() -> dict:
|
| 65 |
status = model_status()
|
| 66 |
+
status["router"] = (
|
| 67 |
+
os.getenv("PACKETCOURT_ROUTER_MODEL", "build-small-hackathon/packetcourt-evidence-router")
|
| 68 |
+
if os.getenv("PACKETCOURT_ROUTER", "0") == "1"
|
| 69 |
+
else "deterministic fallback"
|
| 70 |
+
)
|
| 71 |
if is_configured():
|
| 72 |
status.update(
|
| 73 |
enabled=True,
|
data/router_training.jsonl
ADDED
|
@@ -0,0 +1,54 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{"text":"HIGH PROTEIN","label":"nutrition"}
|
| 2 |
+
{"text":"Protein rich snack","label":"nutrition"}
|
| 3 |
+
{"text":"Power packed with protein","label":"nutrition"}
|
| 4 |
+
{"text":"Source of protein","label":"nutrition"}
|
| 5 |
+
{"text":"BAKED NOT FRIED","label":"nutrition"}
|
| 6 |
+
{"text":"Oven baked, never fried","label":"nutrition"}
|
| 7 |
+
{"text":"ZERO TRANS FAT","label":"nutrition"}
|
| 8 |
+
{"text":"0g trans fat","label":"nutrition"}
|
| 9 |
+
{"text":"NO ADDED SUGAR","label":"ingredients"}
|
| 10 |
+
{"text":"Without added sugar","label":"ingredients"}
|
| 11 |
+
{"text":"No sugar added","label":"ingredients"}
|
| 12 |
+
{"text":"MULTIGRAIN","label":"ingredients"}
|
| 13 |
+
{"text":"Made with multiple grains","label":"ingredients"}
|
| 14 |
+
{"text":"Seven grain goodness","label":"ingredients"}
|
| 15 |
+
{"text":"WHOLE GRAIN","label":"ingredients"}
|
| 16 |
+
{"text":"Made with whole grains","label":"ingredients"}
|
| 17 |
+
{"text":"NO PRESERVATIVES","label":"ingredients"}
|
| 18 |
+
{"text":"Preservative free","label":"ingredients"}
|
| 19 |
+
{"text":"Contains no preservatives","label":"ingredients"}
|
| 20 |
+
{"text":"FSSAI APPROVED","label":"license"}
|
| 21 |
+
{"text":"Approved by FSSAI","label":"license"}
|
| 22 |
+
{"text":"FSSAI certified","label":"license"}
|
| 23 |
+
{"text":"BEST BEFORE 6 MONTHS","label":"dates"}
|
| 24 |
+
{"text":"Use by 08 JUL 2026","label":"dates"}
|
| 25 |
+
{"text":"Consume within 3 days after opening","label":"dates"}
|
| 26 |
+
{"text":"100% NATURAL","label":"refuse_absolute"}
|
| 27 |
+
{"text":"Completely natural","label":"refuse_absolute"}
|
| 28 |
+
{"text":"All natural ingredients","label":"refuse_absolute"}
|
| 29 |
+
{"text":"Absolutely healthy","label":"refuse_absolute"}
|
| 30 |
+
{"text":"Guaranteed safe food","label":"refuse_absolute"}
|
| 31 |
+
{"text":"Loaded with protein","label":"nutrition"}
|
| 32 |
+
{"text":"Protein packed breakfast","label":"nutrition"}
|
| 33 |
+
{"text":"High protein formula","label":"nutrition"}
|
| 34 |
+
{"text":"Not fried, only baked","label":"nutrition"}
|
| 35 |
+
{"text":"Trans fat free","label":"nutrition"}
|
| 36 |
+
{"text":"No added sweetener","label":"ingredients"}
|
| 37 |
+
{"text":"Contains five grains","label":"ingredients"}
|
| 38 |
+
{"text":"Made from whole wheat","label":"ingredients"}
|
| 39 |
+
{"text":"No artificial preservatives","label":"ingredients"}
|
| 40 |
+
{"text":"FSSAI licensed product","label":"license"}
|
| 41 |
+
{"text":"FSSAI registration number","label":"license"}
|
| 42 |
+
{"text":"Food safety license","label":"license"}
|
| 43 |
+
{"text":"License number printed below","label":"license"}
|
| 44 |
+
{"text":"Regulatory registration details","label":"license"}
|
| 45 |
+
{"text":"Expiry date","label":"dates"}
|
| 46 |
+
{"text":"Packed on 13 JUN 2026","label":"dates"}
|
| 47 |
+
{"text":"Manufactured on 01 MAY 2026","label":"dates"}
|
| 48 |
+
{"text":"Use within seven days of opening","label":"dates"}
|
| 49 |
+
{"text":"Best before date","label":"dates"}
|
| 50 |
+
{"text":"Purely natural","label":"refuse_absolute"}
|
| 51 |
+
{"text":"One hundred percent natural","label":"refuse_absolute"}
|
| 52 |
+
{"text":"The healthiest snack","label":"refuse_absolute"}
|
| 53 |
+
{"text":"Completely safe for everyone","label":"refuse_absolute"}
|
| 54 |
+
{"text":"Chemical free","label":"refuse_absolute"}
|
frontend/app.js
CHANGED
|
@@ -47,6 +47,18 @@ function escapeHtml(value = "") {
|
|
| 47 |
|
| 48 |
function render(data) {
|
| 49 |
$("#claim-count").textContent = data.claims.length;
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 50 |
$("#claim-grid").innerHTML = data.claims.length
|
| 51 |
? data.claims.map((claim) => `
|
| 52 |
<article class="claim-card ${verdictClass[claim.verdict]}">
|
|
|
|
| 47 |
|
| 48 |
function render(data) {
|
| 49 |
$("#claim-count").textContent = data.claims.length;
|
| 50 |
+
$("#router-model").textContent = data.investigation.router_model;
|
| 51 |
+
$("#agent-steps").innerHTML = data.investigation.steps.map((step, index) => `
|
| 52 |
+
<article>
|
| 53 |
+
<span>${String(index + 1).padStart(2, "0")}</span>
|
| 54 |
+
<div><b>${escapeHtml(step.tool.replaceAll("_", " "))}</b><p>${escapeHtml(step.reason)}</p></div>
|
| 55 |
+
<small>${escapeHtml(step.source)} · ${escapeHtml(step.status)}</small>
|
| 56 |
+
</article>
|
| 57 |
+
`).join("");
|
| 58 |
+
$("#stop-reason").textContent = data.investigation.stop_reason;
|
| 59 |
+
$("#missing-evidence").textContent = data.investigation.missing_evidence.length
|
| 60 |
+
? data.investigation.missing_evidence.join(" · ")
|
| 61 |
+
: "None. The required evidence path completed.";
|
| 62 |
$("#claim-grid").innerHTML = data.claims.length
|
| 63 |
? data.claims.map((claim) => `
|
| 64 |
<article class="claim-card ${verdictClass[claim.verdict]}">
|
frontend/index.html
CHANGED
|
@@ -94,6 +94,17 @@
|
|
| 94 |
<div><p class="kicker">CASE FINDINGS</p><h2>What the front says.<br>What the back proves.</h2></div>
|
| 95 |
<div class="case-score"><span id="claim-count">0</span><small>CLAIMS<br>EXAMINED</small></div>
|
| 96 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 97 |
<section class="gap-section">
|
| 98 |
<div class="gap-heading"><p class="kicker">PERSUASION GAP</p><h3>Material context the front leaves quiet.</h3></div>
|
| 99 |
<div class="gap-grid" id="gap-grid"></div>
|
|
|
|
| 94 |
<div><p class="kicker">CASE FINDINGS</p><h2>What the front says.<br>What the back proves.</h2></div>
|
| 95 |
<div class="case-score"><span id="claim-count">0</span><small>CLAIMS<br>EXAMINED</small></div>
|
| 96 |
</div>
|
| 97 |
+
<section class="agent-section">
|
| 98 |
+
<div class="agent-heading">
|
| 99 |
+
<div><p class="kicker">INVESTIGATION AGENT</p><h3>How this packet was examined.</h3></div>
|
| 100 |
+
<span id="router-model"></span>
|
| 101 |
+
</div>
|
| 102 |
+
<div class="agent-steps" id="agent-steps"></div>
|
| 103 |
+
<div class="agent-stop">
|
| 104 |
+
<p><b>STOP REASON</b><span id="stop-reason"></span></p>
|
| 105 |
+
<p><b>MISSING EVIDENCE</b><span id="missing-evidence"></span></p>
|
| 106 |
+
</div>
|
| 107 |
+
</section>
|
| 108 |
<section class="gap-section">
|
| 109 |
<div class="gap-heading"><p class="kicker">PERSUASION GAP</p><h3>Material context the front leaves quiet.</h3></div>
|
| 110 |
<div class="gap-grid" id="gap-grid"></div>
|
frontend/styles.css
CHANGED
|
@@ -14,10 +14,11 @@ main{max-width:1320px;margin:auto;padding:0 4vw}.hero{min-height:670px;display:g
|
|
| 14 |
.status-line{text-align:center;font:400 11px "DM Mono";color:var(--muted)}.text-grid label>span{display:block;font:500 11px "DM Mono";letter-spacing:.1em;margin-bottom:8px}.text-grid textarea{width:100%;min-height:260px;padding:18px;border:1px solid var(--line);border-radius:14px;background:var(--cream);resize:vertical;line-height:1.55}.text-grid textarea:focus{outline:2px solid var(--red);outline-offset:2px}
|
| 15 |
.sample-card{padding:24px;background:var(--cream);border:1px solid var(--line);border-radius:16px;cursor:pointer;text-align:left;transition:.2s}.sample-card:hover{border-color:var(--red);transform:translateY(-3px)}.sample-card b{display:block;font-size:19px;margin-bottom:8px}.sample-card span{font-size:13px;color:var(--muted)}
|
| 16 |
.results{border-top:1px solid var(--line)}.hidden{display:none}.case-score{width:120px;height:120px;border:1px solid var(--ink);border-radius:50%;display:flex;align-items:center;justify-content:center;gap:8px}.case-score span{font:800 45px "Playfair Display"}.case-score small{font:500 8px/1.4 "DM Mono"}
|
|
|
|
| 17 |
.gap-section{margin-bottom:28px;padding:26px;border:1px solid var(--ink);background:#1b1b17;color:var(--cream);border-radius:18px}.gap-heading{display:flex;justify-content:space-between;gap:20px;align-items:end;margin-bottom:18px}.gap-heading .kicker{color:#bdb4a6}.gap-heading h3{font:700 clamp(27px,4vw,46px)/1 Georgia,serif;max-width:700px;margin:0}.gap-grid{display:grid;grid-template-columns:repeat(2,1fr);gap:12px}.gap-card,.gap-empty{padding:19px;border:1px solid #4a483f;border-radius:13px;background:#26251f}.gap-card.high{border-color:var(--red)}.gap-card.medium{border-color:var(--amber)}.gap-severity{font:500 8px ui-monospace,SFMono-Regular,Menlo,monospace;letter-spacing:.13em;text-transform:uppercase;color:#d9a65e}.gap-card h4{font-size:20px;margin:10px 0 16px}.gap-compare{display:grid;grid-template-columns:1fr 1fr;gap:10px}.gap-compare p{margin:0;padding:11px;background:#313029;border-radius:8px;font-size:12px;line-height:1.45}.gap-compare b{display:block;font:500 8px ui-monospace,SFMono-Regular,Menlo,monospace;color:#bdb4a6;margin-bottom:5px}.gap-card .evidence{border-top-color:#4a483f}.gap-empty{display:grid;gap:5px;color:#bdb4a6}
|
| 18 |
.claim-grid{grid-template-columns:repeat(2,1fr)}.claim-card{background:var(--cream);border:1px solid var(--line);border-top:6px solid var(--muted);border-radius:16px;padding:23px}.claim-card.supported{border-top-color:var(--green)}.claim-card.contradicted{border-top-color:var(--red)}.claim-card.context{border-top-color:var(--amber)}.claim-top{display:flex;justify-content:space-between;gap:10px;align-items:start}.claim-name{font-size:21px;font-weight:800}.verdict{font:500 8px/1.3 ui-monospace,SFMono-Regular,Menlo,monospace;letter-spacing:.08em;border:1px solid var(--line);border-radius:99px;padding:7px 9px;text-align:right}.confidence{display:block;margin-top:8px;font:500 8px ui-monospace,SFMono-Regular,Menlo,monospace;letter-spacing:.1em;text-transform:uppercase;color:var(--muted)}.summary{min-height:45px;color:#534d43;line-height:1.5}.evidence{padding:11px 0;border-top:1px solid var(--line)}.evidence b,.evidence span{display:block}.evidence b{font:500 8px ui-monospace,SFMono-Regular,Menlo,monospace;letter-spacing:.12em;color:var(--muted);text-transform:uppercase}.evidence span{font-size:13px;margin-top:4px}.caveat{font-size:11px;color:var(--muted);margin-top:15px}
|
| 19 |
.evidence-summary{margin-top:16px}.evidence-summary article{padding:25px;border:1px solid var(--line);border-radius:16px;background:var(--cream)}#nutrition-grid div{display:flex;justify-content:space-between;padding:10px 0;border-bottom:1px solid var(--line);font-size:13px}.date-card{background:var(--ink)!important;color:var(--cream)}.date-card .kicker{color:#cfc5b6}.date-card h3{font:700 28px/1.15 "Playfair Display"}details{margin-top:16px;border:1px solid var(--line);border-radius:14px;padding:17px;background:var(--cream)}summary{cursor:pointer;font-weight:700}pre{white-space:pre-wrap;font:11px/1.5 "DM Mono";overflow:auto}
|
| 20 |
.method{border-top:1px solid var(--line)}.method-grid{grid-template-columns:repeat(4,1fr);margin-top:40px}.method-grid div{padding:20px;border-top:2px solid var(--ink)}.method-grid span{font:500 10px "DM Mono";color:var(--red)}.method-grid p{font-size:13px;line-height:1.5;color:var(--muted)}
|
| 21 |
footer{display:flex;justify-content:space-between;gap:20px;padding:25px 5vw;border-top:1px solid var(--line);font:500 10px "DM Mono";color:var(--muted)}
|
| 22 |
-
@media(max-width:900px){.hero{grid-template-columns:1fr;min-height:auto;padding:80px 0}.hero-visual{height:430px}.trust-strip{grid-template-columns:1fr}.trust-strip div{border-right:0}.section-heading,.case-header{align-items:start;flex-direction:column}.upload-grid,.text-grid,.claim-grid,.evidence-summary,.gap-grid{grid-template-columns:1fr}.method-grid{grid-template-columns:repeat(2,1fr)}}
|
| 23 |
@media(max-width:560px){.top-status,.engine-link{display:none}.hero h1{font-size:58px}.hero-visual{transform:scale(.8);transform-origin:left top;height:350px;width:125%}.workspace,.results,.method{padding:65px 0}.mode-switch{overflow:auto}.mode-switch button{white-space:nowrap;padding:13px 10px}.sample-grid,.method-grid{grid-template-columns:1fr}.case-score{width:90px;height:90px}.claim-top{display:block}.verdict{display:inline-block;margin-top:8px}footer{display:block}footer span{display:block;margin:5px 0}}
|
|
|
|
| 14 |
.status-line{text-align:center;font:400 11px "DM Mono";color:var(--muted)}.text-grid label>span{display:block;font:500 11px "DM Mono";letter-spacing:.1em;margin-bottom:8px}.text-grid textarea{width:100%;min-height:260px;padding:18px;border:1px solid var(--line);border-radius:14px;background:var(--cream);resize:vertical;line-height:1.55}.text-grid textarea:focus{outline:2px solid var(--red);outline-offset:2px}
|
| 15 |
.sample-card{padding:24px;background:var(--cream);border:1px solid var(--line);border-radius:16px;cursor:pointer;text-align:left;transition:.2s}.sample-card:hover{border-color:var(--red);transform:translateY(-3px)}.sample-card b{display:block;font-size:19px;margin-bottom:8px}.sample-card span{font-size:13px;color:var(--muted)}
|
| 16 |
.results{border-top:1px solid var(--line)}.hidden{display:none}.case-score{width:120px;height:120px;border:1px solid var(--ink);border-radius:50%;display:flex;align-items:center;justify-content:center;gap:8px}.case-score span{font:800 45px "Playfair Display"}.case-score small{font:500 8px/1.4 "DM Mono"}
|
| 17 |
+
.agent-section{margin-bottom:28px;padding:26px;border:1px solid var(--line);background:var(--cream);border-radius:18px}.agent-heading{display:flex;justify-content:space-between;align-items:end;gap:20px;margin-bottom:18px}.agent-heading h3{font:700 clamp(27px,4vw,46px)/1 Georgia,serif;margin:0}.agent-heading>span{font:500 9px ui-monospace,SFMono-Regular,Menlo,monospace;padding:8px 11px;border:1px solid var(--line);border-radius:99px;color:var(--green)}.agent-steps{display:grid;grid-template-columns:repeat(2,1fr);gap:9px}.agent-steps article{display:grid;grid-template-columns:auto 1fr;gap:10px;padding:15px;border:1px solid var(--line);border-radius:11px;background:#f8f3e9}.agent-steps article>span{font:500 10px ui-monospace,SFMono-Regular,Menlo,monospace;color:var(--red)}.agent-steps b{font-size:13px;text-transform:capitalize}.agent-steps p{font-size:11px;line-height:1.4;color:var(--muted);margin:5px 0}.agent-steps small{grid-column:2;font:500 8px ui-monospace,SFMono-Regular,Menlo,monospace;text-transform:uppercase;color:var(--green)}.agent-stop{display:grid;grid-template-columns:1fr 1fr;gap:9px;margin-top:9px}.agent-stop p{margin:0;padding:14px;border-top:1px solid var(--line);font-size:11px;line-height:1.5}.agent-stop b,.agent-stop span{display:block}.agent-stop b{font:500 8px ui-monospace,SFMono-Regular,Menlo,monospace;color:var(--muted);margin-bottom:5px}
|
| 18 |
.gap-section{margin-bottom:28px;padding:26px;border:1px solid var(--ink);background:#1b1b17;color:var(--cream);border-radius:18px}.gap-heading{display:flex;justify-content:space-between;gap:20px;align-items:end;margin-bottom:18px}.gap-heading .kicker{color:#bdb4a6}.gap-heading h3{font:700 clamp(27px,4vw,46px)/1 Georgia,serif;max-width:700px;margin:0}.gap-grid{display:grid;grid-template-columns:repeat(2,1fr);gap:12px}.gap-card,.gap-empty{padding:19px;border:1px solid #4a483f;border-radius:13px;background:#26251f}.gap-card.high{border-color:var(--red)}.gap-card.medium{border-color:var(--amber)}.gap-severity{font:500 8px ui-monospace,SFMono-Regular,Menlo,monospace;letter-spacing:.13em;text-transform:uppercase;color:#d9a65e}.gap-card h4{font-size:20px;margin:10px 0 16px}.gap-compare{display:grid;grid-template-columns:1fr 1fr;gap:10px}.gap-compare p{margin:0;padding:11px;background:#313029;border-radius:8px;font-size:12px;line-height:1.45}.gap-compare b{display:block;font:500 8px ui-monospace,SFMono-Regular,Menlo,monospace;color:#bdb4a6;margin-bottom:5px}.gap-card .evidence{border-top-color:#4a483f}.gap-empty{display:grid;gap:5px;color:#bdb4a6}
|
| 19 |
.claim-grid{grid-template-columns:repeat(2,1fr)}.claim-card{background:var(--cream);border:1px solid var(--line);border-top:6px solid var(--muted);border-radius:16px;padding:23px}.claim-card.supported{border-top-color:var(--green)}.claim-card.contradicted{border-top-color:var(--red)}.claim-card.context{border-top-color:var(--amber)}.claim-top{display:flex;justify-content:space-between;gap:10px;align-items:start}.claim-name{font-size:21px;font-weight:800}.verdict{font:500 8px/1.3 ui-monospace,SFMono-Regular,Menlo,monospace;letter-spacing:.08em;border:1px solid var(--line);border-radius:99px;padding:7px 9px;text-align:right}.confidence{display:block;margin-top:8px;font:500 8px ui-monospace,SFMono-Regular,Menlo,monospace;letter-spacing:.1em;text-transform:uppercase;color:var(--muted)}.summary{min-height:45px;color:#534d43;line-height:1.5}.evidence{padding:11px 0;border-top:1px solid var(--line)}.evidence b,.evidence span{display:block}.evidence b{font:500 8px ui-monospace,SFMono-Regular,Menlo,monospace;letter-spacing:.12em;color:var(--muted);text-transform:uppercase}.evidence span{font-size:13px;margin-top:4px}.caveat{font-size:11px;color:var(--muted);margin-top:15px}
|
| 20 |
.evidence-summary{margin-top:16px}.evidence-summary article{padding:25px;border:1px solid var(--line);border-radius:16px;background:var(--cream)}#nutrition-grid div{display:flex;justify-content:space-between;padding:10px 0;border-bottom:1px solid var(--line);font-size:13px}.date-card{background:var(--ink)!important;color:var(--cream)}.date-card .kicker{color:#cfc5b6}.date-card h3{font:700 28px/1.15 "Playfair Display"}details{margin-top:16px;border:1px solid var(--line);border-radius:14px;padding:17px;background:var(--cream)}summary{cursor:pointer;font-weight:700}pre{white-space:pre-wrap;font:11px/1.5 "DM Mono";overflow:auto}
|
| 21 |
.method{border-top:1px solid var(--line)}.method-grid{grid-template-columns:repeat(4,1fr);margin-top:40px}.method-grid div{padding:20px;border-top:2px solid var(--ink)}.method-grid span{font:500 10px "DM Mono";color:var(--red)}.method-grid p{font-size:13px;line-height:1.5;color:var(--muted)}
|
| 22 |
footer{display:flex;justify-content:space-between;gap:20px;padding:25px 5vw;border-top:1px solid var(--line);font:500 10px "DM Mono";color:var(--muted)}
|
| 23 |
+
@media(max-width:900px){.hero{grid-template-columns:1fr;min-height:auto;padding:80px 0}.hero-visual{height:430px}.trust-strip{grid-template-columns:1fr}.trust-strip div{border-right:0}.section-heading,.case-header,.agent-heading{align-items:start;flex-direction:column}.upload-grid,.text-grid,.claim-grid,.evidence-summary,.gap-grid,.agent-steps,.agent-stop{grid-template-columns:1fr}.method-grid{grid-template-columns:repeat(2,1fr)}}
|
| 24 |
@media(max-width:560px){.top-status,.engine-link{display:none}.hero h1{font-size:58px}.hero-visual{transform:scale(.8);transform-origin:left top;height:350px;width:125%}.workspace,.results,.method{padding:65px 0}.mode-switch{overflow:auto}.mode-switch button{white-space:nowrap;padding:13px 10px}.sample-grid,.method-grid{grid-template-columns:1fr}.case-score{width:90px;height:90px}.claim-top{display:block}.verdict{display:inline-block;margin-top:8px}footer{display:block}footer span{display:block;margin:5px 0}}
|
requirements.txt
CHANGED
|
@@ -5,3 +5,5 @@ pydantic>=2.10.0
|
|
| 5 |
pytesseract>=0.3.13
|
| 6 |
python-multipart>=0.0.20
|
| 7 |
uvicorn>=0.34.0
|
|
|
|
|
|
|
|
|
| 5 |
pytesseract>=0.3.13
|
| 6 |
python-multipart>=0.0.20
|
| 7 |
uvicorn>=0.34.0
|
| 8 |
+
transformers>=4.53.0
|
| 9 |
+
torch>=2.2.0
|
scripts/export_traces.py
ADDED
|
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
import json
|
| 4 |
+
import sys
|
| 5 |
+
from pathlib import Path
|
| 6 |
+
|
| 7 |
+
ROOT = Path(__file__).resolve().parents[1]
|
| 8 |
+
sys.path.insert(0, str(ROOT / "src"))
|
| 9 |
+
|
| 10 |
+
from packetcourt import audit_packet
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
def main() -> None:
|
| 14 |
+
cases = [json.loads(line) for line in (ROOT / "data" / "golden_cases.jsonl").read_text().splitlines() if line.strip()]
|
| 15 |
+
target = ROOT / "traces" / "packetcourt_traces.jsonl"
|
| 16 |
+
target.parent.mkdir(exist_ok=True)
|
| 17 |
+
records = []
|
| 18 |
+
for case in cases:
|
| 19 |
+
audit = audit_packet(case["front_text"], case["back_text"])
|
| 20 |
+
records.append(
|
| 21 |
+
{
|
| 22 |
+
"trace_id": f"trace-{case['id']}",
|
| 23 |
+
"case_id": case["id"],
|
| 24 |
+
"input": {"front_text": case["front_text"], "back_text": case["back_text"]},
|
| 25 |
+
"steps": [
|
| 26 |
+
{"name": "plan_investigation", "output": audit.investigation.model_dump()},
|
| 27 |
+
{"name": "detect_front_claims", "output": [claim.claim for claim in audit.claims]},
|
| 28 |
+
{"name": "extract_back_evidence", "output": {"ingredients": audit.ingredients, "nutrition": audit.nutrition.model_dump()}},
|
| 29 |
+
{"name": "calculate_whole_packet", "output": audit.whole_packet.model_dump()},
|
| 30 |
+
{"name": "audit_claims", "output": [claim.model_dump(mode="json") for claim in audit.claims]},
|
| 31 |
+
{"name": "surface_persuasion_gap", "output": [finding.model_dump() for finding in audit.persuasion_gap]},
|
| 32 |
+
{"name": "resolve_dates", "output": audit.expiry.model_dump()},
|
| 33 |
+
],
|
| 34 |
+
"limitations": audit.limitations,
|
| 35 |
+
}
|
| 36 |
+
)
|
| 37 |
+
target.write_text("\n".join(json.dumps(record) for record in records) + "\n")
|
| 38 |
+
print(f"Wrote {len(records)} transparent traces to {target}")
|
| 39 |
+
|
| 40 |
+
|
| 41 |
+
if __name__ == "__main__":
|
| 42 |
+
main()
|
scripts/train_router.py
ADDED
|
@@ -0,0 +1,135 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
import argparse
|
| 4 |
+
import json
|
| 5 |
+
import random
|
| 6 |
+
from pathlib import Path
|
| 7 |
+
|
| 8 |
+
import torch
|
| 9 |
+
from huggingface_hub import HfApi
|
| 10 |
+
from torch.utils.data import DataLoader, Dataset
|
| 11 |
+
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
ROOT = Path(__file__).resolve().parents[1]
|
| 15 |
+
LABELS = ["ingredients", "nutrition", "license", "dates", "refuse_absolute"]
|
| 16 |
+
LABEL_TO_ID = {label: index for index, label in enumerate(LABELS)}
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
class RouterDataset(Dataset):
|
| 20 |
+
def __init__(self, records, tokenizer):
|
| 21 |
+
self.records = records
|
| 22 |
+
self.tokenizer = tokenizer
|
| 23 |
+
|
| 24 |
+
def __len__(self):
|
| 25 |
+
return len(self.records)
|
| 26 |
+
|
| 27 |
+
def __getitem__(self, index):
|
| 28 |
+
record = self.records[index]
|
| 29 |
+
encoded = self.tokenizer(
|
| 30 |
+
record["text"],
|
| 31 |
+
padding="max_length",
|
| 32 |
+
truncation=True,
|
| 33 |
+
max_length=32,
|
| 34 |
+
return_tensors="pt",
|
| 35 |
+
)
|
| 36 |
+
return {
|
| 37 |
+
"input_ids": encoded["input_ids"].squeeze(0),
|
| 38 |
+
"attention_mask": encoded["attention_mask"].squeeze(0),
|
| 39 |
+
"labels": torch.tensor(LABEL_TO_ID[record["label"]]),
|
| 40 |
+
}
|
| 41 |
+
|
| 42 |
+
|
| 43 |
+
def evaluate(model, loader, device):
|
| 44 |
+
model.eval()
|
| 45 |
+
correct = total = 0
|
| 46 |
+
with torch.no_grad():
|
| 47 |
+
for batch in loader:
|
| 48 |
+
labels = batch.pop("labels").to(device)
|
| 49 |
+
logits = model(**{key: value.to(device) for key, value in batch.items()}).logits
|
| 50 |
+
correct += (logits.argmax(dim=-1) == labels).sum().item()
|
| 51 |
+
total += labels.numel()
|
| 52 |
+
return correct / total
|
| 53 |
+
|
| 54 |
+
|
| 55 |
+
def main():
|
| 56 |
+
parser = argparse.ArgumentParser()
|
| 57 |
+
parser.add_argument("--repo-id", default="build-small-hackathon/packetcourt-evidence-router")
|
| 58 |
+
parser.add_argument("--base-model", default="google/bert_uncased_L-2_H-128_A-2")
|
| 59 |
+
parser.add_argument("--epochs", type=int, default=30)
|
| 60 |
+
args = parser.parse_args()
|
| 61 |
+
|
| 62 |
+
random.seed(42)
|
| 63 |
+
torch.manual_seed(42)
|
| 64 |
+
records = [json.loads(line) for line in (ROOT / "data/router_training.jsonl").read_text().splitlines()]
|
| 65 |
+
grouped = {label: [] for label in LABELS}
|
| 66 |
+
for record in records:
|
| 67 |
+
grouped[record["label"]].append(record)
|
| 68 |
+
for group in grouped.values():
|
| 69 |
+
random.shuffle(group)
|
| 70 |
+
validation = [group.pop() for group in grouped.values()]
|
| 71 |
+
training = [record for group in grouped.values() for record in group]
|
| 72 |
+
random.shuffle(training)
|
| 73 |
+
|
| 74 |
+
tokenizer = AutoTokenizer.from_pretrained(args.base_model)
|
| 75 |
+
model = AutoModelForSequenceClassification.from_pretrained(
|
| 76 |
+
args.base_model,
|
| 77 |
+
num_labels=len(LABELS),
|
| 78 |
+
id2label={index: label for index, label in enumerate(LABELS)},
|
| 79 |
+
label2id=LABEL_TO_ID,
|
| 80 |
+
)
|
| 81 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 82 |
+
model.to(device)
|
| 83 |
+
train_loader = DataLoader(RouterDataset(training, tokenizer), batch_size=8, shuffle=True)
|
| 84 |
+
validation_loader = DataLoader(RouterDataset(validation, tokenizer), batch_size=5)
|
| 85 |
+
optimizer = torch.optim.AdamW(model.parameters(), lr=3e-4)
|
| 86 |
+
|
| 87 |
+
for epoch in range(args.epochs):
|
| 88 |
+
model.train()
|
| 89 |
+
for batch in train_loader:
|
| 90 |
+
optimizer.zero_grad()
|
| 91 |
+
labels = batch.pop("labels").to(device)
|
| 92 |
+
loss = model(**{key: value.to(device) for key, value in batch.items()}, labels=labels).loss
|
| 93 |
+
loss.backward()
|
| 94 |
+
optimizer.step()
|
| 95 |
+
print(f"epoch={epoch + 1} validation_accuracy={evaluate(model, validation_loader, device):.3f}")
|
| 96 |
+
|
| 97 |
+
output = ROOT / "router_model"
|
| 98 |
+
model.save_pretrained(output)
|
| 99 |
+
tokenizer.save_pretrained(output)
|
| 100 |
+
score = evaluate(model, validation_loader, device)
|
| 101 |
+
card = f"""---
|
| 102 |
+
license: apache-2.0
|
| 103 |
+
base_model: {args.base_model}
|
| 104 |
+
tags:
|
| 105 |
+
- text-classification
|
| 106 |
+
- build-small-hackathon
|
| 107 |
+
- packetcourt
|
| 108 |
+
- fine-tuned
|
| 109 |
+
---
|
| 110 |
+
|
| 111 |
+
# PacketCourt Evidence Router
|
| 112 |
+
|
| 113 |
+
A {sum(parameter.numel() for parameter in model.parameters()):,}-parameter fine-tuned classifier used by
|
| 114 |
+
PacketCourt's investigation agent to choose the next evidence tool for a packet claim.
|
| 115 |
+
|
| 116 |
+
Labels: `{", ".join(LABELS)}`.
|
| 117 |
+
|
| 118 |
+
Held-out validation accuracy: `{score:.3f}` on a small PacketCourt-specific routing set.
|
| 119 |
+
The router proposes an investigation tool; deterministic code remains responsible for final verdicts.
|
| 120 |
+
"""
|
| 121 |
+
(output / "README.md").write_text(card)
|
| 122 |
+
|
| 123 |
+
api = HfApi()
|
| 124 |
+
api.create_repo(args.repo_id, repo_type="model", private=True, exist_ok=True)
|
| 125 |
+
api.upload_folder(
|
| 126 |
+
repo_id=args.repo_id,
|
| 127 |
+
repo_type="model",
|
| 128 |
+
folder_path=output,
|
| 129 |
+
commit_message="feat: publish PacketCourt fine-tuned evidence router",
|
| 130 |
+
)
|
| 131 |
+
print(f"published={args.repo_id} validation_accuracy={score:.3f}")
|
| 132 |
+
|
| 133 |
+
|
| 134 |
+
if __name__ == "__main__":
|
| 135 |
+
main()
|
src/packetcourt/audit.py
CHANGED
|
@@ -3,6 +3,7 @@ from __future__ import annotations
|
|
| 3 |
import re
|
| 4 |
|
| 5 |
from .models import ClaimAudit, Evidence, PacketAudit, PersuasionFinding, Verdict
|
|
|
|
| 6 |
from .parser import calculate_whole_packet, extract_claims, extract_ingredients, parse_expiry, parse_nutrition
|
| 7 |
|
| 8 |
|
|
@@ -266,4 +267,10 @@ def audit_packet(front_text: str, back_text: str) -> PacketAudit:
|
|
| 266 |
front_text=front_text,
|
| 267 |
back_text=back_text,
|
| 268 |
limitations=limitations,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 269 |
)
|
|
|
|
| 3 |
import re
|
| 4 |
|
| 5 |
from .models import ClaimAudit, Evidence, PacketAudit, PersuasionFinding, Verdict
|
| 6 |
+
from .investigator import build_investigation
|
| 7 |
from .parser import calculate_whole_packet, extract_claims, extract_ingredients, parse_expiry, parse_nutrition
|
| 8 |
|
| 9 |
|
|
|
|
| 267 |
front_text=front_text,
|
| 268 |
back_text=back_text,
|
| 269 |
limitations=limitations,
|
| 270 |
+
investigation=build_investigation(
|
| 271 |
+
[claim.claim for claim in claim_audits],
|
| 272 |
+
ingredients,
|
| 273 |
+
nutrition,
|
| 274 |
+
expiry,
|
| 275 |
+
),
|
| 276 |
)
|
src/packetcourt/evidence_router.py
ADDED
|
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
import os
|
| 4 |
+
from functools import lru_cache
|
| 5 |
+
|
| 6 |
+
|
| 7 |
+
MODEL_ID = os.getenv(
|
| 8 |
+
"PACKETCOURT_ROUTER_MODEL",
|
| 9 |
+
"build-small-hackathon/packetcourt-evidence-router",
|
| 10 |
+
)
|
| 11 |
+
|
| 12 |
+
LABEL_TO_TOOL = {
|
| 13 |
+
"ingredients": "inspect_ingredients",
|
| 14 |
+
"nutrition": "inspect_nutrition",
|
| 15 |
+
"license": "inspect_license",
|
| 16 |
+
"dates": "resolve_dates",
|
| 17 |
+
"refuse_absolute": "apply_safety_boundary",
|
| 18 |
+
}
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
@lru_cache(maxsize=1)
|
| 22 |
+
def _pipeline():
|
| 23 |
+
if os.getenv("PACKETCOURT_ROUTER", "0") != "1":
|
| 24 |
+
return None
|
| 25 |
+
from transformers import pipeline
|
| 26 |
+
|
| 27 |
+
return pipeline("text-classification", model=MODEL_ID, tokenizer=MODEL_ID)
|
| 28 |
+
|
| 29 |
+
|
| 30 |
+
def route_claim(claim: str) -> tuple[str | None, str]:
|
| 31 |
+
try:
|
| 32 |
+
classifier = _pipeline()
|
| 33 |
+
except Exception:
|
| 34 |
+
return None, "deterministic fallback"
|
| 35 |
+
if classifier is None:
|
| 36 |
+
return None, "deterministic fallback"
|
| 37 |
+
result = classifier(claim, truncation=True, max_length=32)[0]
|
| 38 |
+
label = str(result["label"]).lower()
|
| 39 |
+
return LABEL_TO_TOOL.get(label), MODEL_ID
|
src/packetcourt/investigator.py
ADDED
|
@@ -0,0 +1,74 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
from .evidence_router import route_claim
|
| 4 |
+
from .models import InvestigationPlan, InvestigationStep
|
| 5 |
+
|
| 6 |
+
|
| 7 |
+
POLICY_TOOLS = {
|
| 8 |
+
"No Added Sugar": "inspect_ingredients",
|
| 9 |
+
"Multigrain": "inspect_ingredients",
|
| 10 |
+
"100% Natural": "apply_safety_boundary",
|
| 11 |
+
"FSSAI Approved": "inspect_license",
|
| 12 |
+
"No Preservatives": "inspect_ingredients",
|
| 13 |
+
"Baked Not Fried": "inspect_nutrition",
|
| 14 |
+
"Zero Trans Fat": "inspect_nutrition",
|
| 15 |
+
"Whole Grain": "inspect_ingredients",
|
| 16 |
+
"High Protein": "inspect_nutrition",
|
| 17 |
+
}
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
def build_investigation(
|
| 21 |
+
claim_names: list[str],
|
| 22 |
+
ingredients: list[str],
|
| 23 |
+
nutrition,
|
| 24 |
+
expiry,
|
| 25 |
+
) -> InvestigationPlan:
|
| 26 |
+
steps: list[InvestigationStep] = []
|
| 27 |
+
missing: list[str] = []
|
| 28 |
+
seen: set[str] = set()
|
| 29 |
+
router_model = "deterministic fallback"
|
| 30 |
+
|
| 31 |
+
for claim in claim_names:
|
| 32 |
+
routed_tool, source = route_claim(claim)
|
| 33 |
+
router_model = source if source != "deterministic fallback" else router_model
|
| 34 |
+
tool = routed_tool or POLICY_TOOLS[claim]
|
| 35 |
+
if tool in seen:
|
| 36 |
+
continue
|
| 37 |
+
seen.add(tool)
|
| 38 |
+
steps.append(
|
| 39 |
+
InvestigationStep(
|
| 40 |
+
tool=tool,
|
| 41 |
+
reason=f"Required to audit the front claim: {claim}.",
|
| 42 |
+
status="completed",
|
| 43 |
+
source="fine-tuned router" if routed_tool else "policy fallback",
|
| 44 |
+
)
|
| 45 |
+
)
|
| 46 |
+
|
| 47 |
+
if claim_names and not ingredients and any(POLICY_TOOLS[name] == "inspect_ingredients" for name in claim_names):
|
| 48 |
+
missing.append("A readable ingredient list")
|
| 49 |
+
if claim_names and nutrition.basis == "unknown" and any(POLICY_TOOLS[name] == "inspect_nutrition" for name in claim_names):
|
| 50 |
+
missing.append("A readable nutrition panel with its measurement basis")
|
| 51 |
+
if expiry.instruction and not expiry.packed_on:
|
| 52 |
+
missing.append("The packing or manufacturing date needed to resolve relative shelf life")
|
| 53 |
+
|
| 54 |
+
if expiry.best_before or expiry.instruction or expiry.after_opening_instruction:
|
| 55 |
+
steps.append(
|
| 56 |
+
InvestigationStep(
|
| 57 |
+
tool="resolve_dates",
|
| 58 |
+
reason="Date or after-opening evidence is visible on the supplied label.",
|
| 59 |
+
status="completed" if expiry.best_before or expiry.after_opening_instruction else "needs evidence",
|
| 60 |
+
)
|
| 61 |
+
)
|
| 62 |
+
|
| 63 |
+
stop_reason = (
|
| 64 |
+
"Stopped with explicit missing-evidence requests."
|
| 65 |
+
if missing
|
| 66 |
+
else "Stopped after all evidence tools required by the detected claims completed."
|
| 67 |
+
)
|
| 68 |
+
return InvestigationPlan(
|
| 69 |
+
objective="Audit front-of-pack claims against evidence printed on the same packet.",
|
| 70 |
+
steps=steps,
|
| 71 |
+
missing_evidence=missing,
|
| 72 |
+
stop_reason=stop_reason,
|
| 73 |
+
router_model=router_model,
|
| 74 |
+
)
|
src/packetcourt/models.py
CHANGED
|
@@ -65,6 +65,21 @@ class ExpiryInfo(BaseModel):
|
|
| 65 |
status: str = "Not enough label evidence"
|
| 66 |
|
| 67 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 68 |
class PacketAudit(BaseModel):
|
| 69 |
claims: list[ClaimAudit]
|
| 70 |
nutrition: NutritionFacts
|
|
@@ -75,3 +90,4 @@ class PacketAudit(BaseModel):
|
|
| 75 |
front_text: str
|
| 76 |
back_text: str
|
| 77 |
limitations: list[str]
|
|
|
|
|
|
| 65 |
status: str = "Not enough label evidence"
|
| 66 |
|
| 67 |
|
| 68 |
+
class InvestigationStep(BaseModel):
|
| 69 |
+
tool: str
|
| 70 |
+
reason: str
|
| 71 |
+
status: str
|
| 72 |
+
source: str = "policy"
|
| 73 |
+
|
| 74 |
+
|
| 75 |
+
class InvestigationPlan(BaseModel):
|
| 76 |
+
objective: str
|
| 77 |
+
steps: list[InvestigationStep] = Field(default_factory=list)
|
| 78 |
+
missing_evidence: list[str] = Field(default_factory=list)
|
| 79 |
+
stop_reason: str
|
| 80 |
+
router_model: str = "deterministic fallback"
|
| 81 |
+
|
| 82 |
+
|
| 83 |
class PacketAudit(BaseModel):
|
| 84 |
claims: list[ClaimAudit]
|
| 85 |
nutrition: NutritionFacts
|
|
|
|
| 90 |
front_text: str
|
| 91 |
back_text: str
|
| 92 |
limitations: list[str]
|
| 93 |
+
investigation: InvestigationPlan
|
tests/test_audit.py
CHANGED
|
@@ -80,3 +80,10 @@ def test_after_opening_instruction_is_extracted():
|
|
| 80 |
"Ingredients: tomato, salt. Use by: 08 JUL 2026. Consume within 3 days after opening.",
|
| 81 |
)
|
| 82 |
assert result.expiry.after_opening_instruction == "Consume within 3 days after opening"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
"Ingredients: tomato, salt. Use by: 08 JUL 2026. Consume within 3 days after opening.",
|
| 81 |
)
|
| 82 |
assert result.expiry.after_opening_instruction == "Consume within 3 days after opening"
|
| 83 |
+
|
| 84 |
+
|
| 85 |
+
def test_investigation_requests_missing_evidence_and_stops_explicitly():
|
| 86 |
+
result = audit_packet("HIGH PROTEIN", "Protein 9g.")
|
| 87 |
+
assert any(step.tool == "inspect_nutrition" for step in result.investigation.steps)
|
| 88 |
+
assert any("nutrition panel" in item.lower() for item in result.investigation.missing_evidence)
|
| 89 |
+
assert "missing-evidence" in result.investigation.stop_reason
|
traces/README.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-4.0
|
| 3 |
+
task_categories:
|
| 4 |
+
- text-classification
|
| 5 |
+
language:
|
| 6 |
+
- en
|
| 7 |
+
tags:
|
| 8 |
+
- build-small-hackathon
|
| 9 |
+
- agent-traces
|
| 10 |
+
- claim-verification
|
| 11 |
+
- openbmb
|
| 12 |
+
size_categories:
|
| 13 |
+
- n<1K
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
# PacketCourt Transparent Traces
|
| 17 |
+
|
| 18 |
+
Transparent PacketCourt investigation-agent runs showing the evidence pipeline
|
| 19 |
+
from claim-dependent tool planning through deterministic verdicts,
|
| 20 |
+
whole-packet arithmetic, persuasion-gap findings, and date resolution.
|
| 21 |
+
|
| 22 |
+
These traces contain no hidden chain-of-thought. They expose auditable tool and
|
| 23 |
+
decision outputs suitable for debugging and evaluation. Each trace records:
|
| 24 |
+
|
| 25 |
+
- the investigation objective and selected evidence tools;
|
| 26 |
+
- whether a tool came from the fine-tuned router or policy fallback;
|
| 27 |
+
- explicit missing-evidence requests and stop reason;
|
| 28 |
+
- extracted evidence, calculations, verdicts, and safety limitations.
|
traces/packetcourt_traces.jsonl
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{"trace_id": "trace-pc-001", "case_id": "pc-001", "input": {"front_text": "HIGH PROTEIN MULTIGRAIN 100% NATURAL", "back_text": "Ingredients: Refined wheat flour, rolled oats, ragi flour, sugar, cocoa, salt. Nutrition per 100g: Protein 12.4g, Total Sugars 22g, Added Sugars 18g, Sodium 410mg. Net weight 300g. PKD: 13 JUN 26. Best before 6 months from packaging."}, "steps": [{"name": "plan_investigation", "output": {"objective": "Audit front-of-pack claims against evidence printed on the same packet.", "steps": [{"tool": "inspect_nutrition", "reason": "Required to audit the front claim: High Protein.", "status": "completed", "source": "fine-tuned router"}, {"tool": "inspect_ingredients", "reason": "Required to audit the front claim: Multigrain.", "status": "completed", "source": "fine-tuned router"}, {"tool": "apply_safety_boundary", "reason": "Required to audit the front claim: 100% Natural.", "status": "completed", "source": "fine-tuned router"}, {"tool": "resolve_dates", "reason": "Date or after-opening evidence is visible on the supplied label.", "status": "completed", "source": "policy"}], "missing_evidence": [], "stop_reason": "Stopped after all evidence tools required by the detected claims completed.", "router_model": "build-small-hackathon/packetcourt-evidence-router"}}, {"name": "detect_front_claims", "output": ["High Protein", "Multigrain", "100% Natural"]}, {"name": "extract_back_evidence", "output": {"ingredients": ["Refined wheat flour", "rolled oats", "ragi flour", "sugar", "cocoa", "salt"], "nutrition": {"basis": "per 100g", "serving_size_g": null, "package_size_g": 300.0, "protein_g": 12.4, "total_sugar_g": 22.0, "added_sugar_g": 18.0, "sodium_mg": 410.0, "saturated_fat_g": null}}}, {"name": "calculate_whole_packet", "output": {"calculable": true, "multiplier": 3.0, "protein_g": 37.2, "total_sugar_g": 66.0, "added_sugar_g": 54.0, "sugar_teaspoons": 16.5, "sodium_mg": 1230.0, "saturated_fat_g": null, "explanation": "Calculated from per 100g values across a 300g packet."}}, {"name": "audit_claims", "output": [{"claim": "High Protein", "verdict": "TECHNICALLY TRUE, CONTEXT MISSING", "summary": "The protein quantity is visible, but claim compliance depends on product category and applicable rules.", "evidence": [{"source": "nutrition panel", "text": "Protein 12.4g (per 100g)"}], "caveat": "PacketCourt does not make a regulatory-compliance determination in this prototype.", "confidence": "medium"}, {"claim": "Multigrain", "verdict": "TECHNICALLY TRUE, CONTEXT MISSING", "summary": "Multiple grains are listed, but refined grain appears first.", "evidence": [{"source": "ingredient list", "text": "Refined wheat flour"}, {"source": "ingredient list", "text": "rolled oats"}, {"source": "ingredient list", "text": "ragi flour"}], "caveat": "Ingredient order indicates relative quantity, but exact grain percentages may be unavailable.", "confidence": "high"}, {"claim": "100% Natural", "verdict": "CANNOT VERIFY", "summary": "An absolute naturalness claim cannot be established from package text alone.", "evidence": [{"source": "front claim", "text": "100% Natural"}], "caveat": "PacketCourt refuses to infer product composition beyond the supplied label.", "confidence": "high"}]}, {"name": "surface_persuasion_gap", "output": [{"headline": "Protein leads. Whole-packet sugar stays quiet.", "front_impression": "The front positions protein as the packet's defining fact.", "quiet_context": "The complete packet contains about 16.5 teaspoons of total sugar.", "severity": "high", "evidence": [{"source": "whole-packet calculation", "text": "Total sugar 66g"}, {"source": "conversion", "text": "66g \u00f7 4 = 16.5 teaspoons"}]}, {"headline": "A positive front claim competes with substantial sodium.", "front_impression": "The front emphasizes a favorable product attribute.", "quiet_context": "The complete packet calculates to approximately 1230mg sodium.", "severity": "high", "evidence": [{"source": "whole-packet calculation", "text": "Sodium 1230mg"}]}, {"headline": "Grain variety is prominent. The first ingredient is refined.", "front_impression": "The front suggests a grain-forward product.", "quiet_context": "The ingredient list begins with \u201cRefined wheat flour\u201d.", "severity": "medium", "evidence": [{"source": "first ingredient", "text": "Refined wheat flour"}]}]}, {"name": "resolve_dates", "output": {"packed_on": "2026-06-13", "best_before": "2026-12-13", "instruction": "Best before 6 months from packaging", "after_opening_instruction": null, "status": "Best-before evidence resolves to 2026-12-13"}}], "limitations": ["PacketCourt audits only the text and images supplied by the user.", "Verdicts are evidence summaries, not legal, medical, or food-safety determinations.", "Users should verify low-confidence OCR against the physical packet."]}
|
| 2 |
+
{"trace_id": "trace-pc-002", "case_id": "pc-002", "input": {"front_text": "NO ADDED SUGAR", "back_text": "Ingredients: Rolled oats, glucose syrup, peanuts. Nutrition per 100g: Total Sugars 19g, Added Sugars 12g."}, "steps": [{"name": "plan_investigation", "output": {"objective": "Audit front-of-pack claims against evidence printed on the same packet.", "steps": [{"tool": "inspect_ingredients", "reason": "Required to audit the front claim: No Added Sugar.", "status": "completed", "source": "fine-tuned router"}], "missing_evidence": [], "stop_reason": "Stopped after all evidence tools required by the detected claims completed.", "router_model": "build-small-hackathon/packetcourt-evidence-router"}}, {"name": "detect_front_claims", "output": ["No Added Sugar"]}, {"name": "extract_back_evidence", "output": {"ingredients": ["Rolled oats", "glucose syrup", "peanuts"], "nutrition": {"basis": "per 100g", "serving_size_g": null, "package_size_g": null, "protein_g": null, "total_sugar_g": 19.0, "added_sugar_g": 12.0, "sodium_mg": null, "saturated_fat_g": null}}}, {"name": "calculate_whole_packet", "output": {"calculable": false, "multiplier": null, "protein_g": null, "total_sugar_g": null, "added_sugar_g": null, "sugar_teaspoons": null, "sodium_mg": null, "saturated_fat_g": null, "explanation": "Package size and nutrition basis are required."}}, {"name": "audit_claims", "output": [{"claim": "No Added Sugar", "verdict": "CONTRADICTED BY PROVIDED LABEL", "summary": "The provided ingredient list names one or more added-sugar ingredients.", "evidence": [{"source": "ingredient list", "text": "glucose syrup"}], "caveat": "This verdict only checks the supplied label text; it is not a laboratory analysis.", "confidence": "high"}]}, {"name": "surface_persuasion_gap", "output": []}, {"name": "resolve_dates", "output": {"packed_on": null, "best_before": null, "instruction": null, "after_opening_instruction": null, "status": "No resolvable best-before date found"}}], "limitations": ["PacketCourt audits only the text and images supplied by the user.", "Verdicts are evidence summaries, not legal, medical, or food-safety determinations.", "Users should verify low-confidence OCR against the physical packet."]}
|
| 3 |
+
{"trace_id": "trace-pc-003", "case_id": "pc-003", "input": {"front_text": "NO ADDED SUGAR", "back_text": "Ingredients: Rolled oats, peanuts, cocoa, salt. Nutrition per 100g: Total Sugars 2g."}, "steps": [{"name": "plan_investigation", "output": {"objective": "Audit front-of-pack claims against evidence printed on the same packet.", "steps": [{"tool": "inspect_ingredients", "reason": "Required to audit the front claim: No Added Sugar.", "status": "completed", "source": "fine-tuned router"}], "missing_evidence": [], "stop_reason": "Stopped after all evidence tools required by the detected claims completed.", "router_model": "build-small-hackathon/packetcourt-evidence-router"}}, {"name": "detect_front_claims", "output": ["No Added Sugar"]}, {"name": "extract_back_evidence", "output": {"ingredients": ["Rolled oats", "peanuts", "cocoa", "salt"], "nutrition": {"basis": "per 100g", "serving_size_g": null, "package_size_g": null, "protein_g": null, "total_sugar_g": 2.0, "added_sugar_g": null, "sodium_mg": null, "saturated_fat_g": null}}}, {"name": "calculate_whole_packet", "output": {"calculable": false, "multiplier": null, "protein_g": null, "total_sugar_g": null, "added_sugar_g": null, "sugar_teaspoons": null, "sodium_mg": null, "saturated_fat_g": null, "explanation": "Package size and nutrition basis are required."}}, {"name": "audit_claims", "output": [{"claim": "No Added Sugar", "verdict": "SUPPORTED BY PROVIDED LABEL", "summary": "No common added-sugar term was found in the provided ingredient list.", "evidence": [{"source": "ingredient list", "text": "Rolled oats, peanuts, cocoa, salt"}], "caveat": "Unrecognized sweeteners or incomplete OCR may change this result.", "confidence": "medium"}]}, {"name": "surface_persuasion_gap", "output": []}, {"name": "resolve_dates", "output": {"packed_on": null, "best_before": null, "instruction": null, "after_opening_instruction": null, "status": "No resolvable best-before date found"}}], "limitations": ["PacketCourt audits only the text and images supplied by the user.", "Verdicts are evidence summaries, not legal, medical, or food-safety determinations.", "Users should verify low-confidence OCR against the physical packet."]}
|
| 4 |
+
{"trace_id": "trace-pc-004", "case_id": "pc-004", "input": {"front_text": "FSSAI APPROVED", "back_text": "FSSAI Lic. No. 12345678901234. Ingredients: oats, salt."}, "steps": [{"name": "plan_investigation", "output": {"objective": "Audit front-of-pack claims against evidence printed on the same packet.", "steps": [{"tool": "inspect_license", "reason": "Required to audit the front claim: FSSAI Approved.", "status": "completed", "source": "fine-tuned router"}], "missing_evidence": [], "stop_reason": "Stopped after all evidence tools required by the detected claims completed.", "router_model": "build-small-hackathon/packetcourt-evidence-router"}}, {"name": "detect_front_claims", "output": ["FSSAI Approved"]}, {"name": "extract_back_evidence", "output": {"ingredients": ["oats", "salt"], "nutrition": {"basis": "unknown", "serving_size_g": null, "package_size_g": null, "protein_g": null, "total_sugar_g": null, "added_sugar_g": null, "sodium_mg": null, "saturated_fat_g": null}}}, {"name": "calculate_whole_packet", "output": {"calculable": false, "multiplier": null, "protein_g": null, "total_sugar_g": null, "added_sugar_g": null, "sugar_teaspoons": null, "sodium_mg": null, "saturated_fat_g": null, "explanation": "Package size and nutrition basis are required."}}, {"name": "audit_claims", "output": [{"claim": "FSSAI Approved", "verdict": "TECHNICALLY TRUE, CONTEXT MISSING", "summary": "An FSSAI license indicates regulatory registration; it is not a health endorsement.", "evidence": [{"source": "back label", "text": "FSSAI license number 12345678901234"}], "caveat": "", "confidence": "high"}]}, {"name": "surface_persuasion_gap", "output": [{"headline": "Registration language can look like a health endorsement.", "front_impression": "\u201cFSSAI Approved\u201d may imply the product has been endorsed as healthy.", "quiet_context": "An FSSAI license identifies regulatory registration; it is not a nutrition recommendation.", "severity": "medium", "evidence": [{"source": "claim interpretation", "text": "FSSAI registration is not a health score."}]}]}, {"name": "resolve_dates", "output": {"packed_on": null, "best_before": null, "instruction": null, "after_opening_instruction": null, "status": "No resolvable best-before date found"}}], "limitations": ["PacketCourt audits only the text and images supplied by the user.", "Verdicts are evidence summaries, not legal, medical, or food-safety determinations.", "Users should verify low-confidence OCR against the physical packet."]}
|
| 5 |
+
{"trace_id": "trace-pc-005", "case_id": "pc-005", "input": {"front_text": "BAKED NOT FRIED WHOLE GRAIN ZERO TRANS FAT", "back_text": "Ingredients: Refined wheat flour, whole wheat flour, vegetable oil, seasoning, salt. Nutrition per 100g: Protein 7g, Total Sugars 3g, Sodium 780mg, Saturated Fat 5g, Trans Fat 0g. Net weight 180g. PKD: 01 JUN 26. Best before 4 months from packaging."}, "steps": [{"name": "plan_investigation", "output": {"objective": "Audit front-of-pack claims against evidence printed on the same packet.", "steps": [{"tool": "inspect_nutrition", "reason": "Required to audit the front claim: Baked Not Fried.", "status": "completed", "source": "fine-tuned router"}, {"tool": "inspect_ingredients", "reason": "Required to audit the front claim: Whole Grain.", "status": "completed", "source": "fine-tuned router"}, {"tool": "resolve_dates", "reason": "Date or after-opening evidence is visible on the supplied label.", "status": "completed", "source": "policy"}], "missing_evidence": [], "stop_reason": "Stopped after all evidence tools required by the detected claims completed.", "router_model": "build-small-hackathon/packetcourt-evidence-router"}}, {"name": "detect_front_claims", "output": ["Baked Not Fried", "Zero Trans Fat", "Whole Grain"]}, {"name": "extract_back_evidence", "output": {"ingredients": ["Refined wheat flour", "whole wheat flour", "vegetable oil", "seasoning", "salt"], "nutrition": {"basis": "per 100g", "serving_size_g": null, "package_size_g": 180.0, "protein_g": 7.0, "total_sugar_g": 3.0, "added_sugar_g": null, "sodium_mg": 780.0, "saturated_fat_g": 5.0}}}, {"name": "calculate_whole_packet", "output": {"calculable": true, "multiplier": 1.8, "protein_g": 12.6, "total_sugar_g": 5.4, "added_sugar_g": null, "sugar_teaspoons": 1.4, "sodium_mg": 1404.0, "saturated_fat_g": 9.0, "explanation": "Calculated from per 100g values across a 180g packet."}}, {"name": "audit_claims", "output": [{"claim": "Baked Not Fried", "verdict": "TECHNICALLY TRUE, CONTEXT MISSING", "summary": "The preparation claim does not establish that the complete packet is low in fat, sodium, or calories.", "evidence": [{"source": "front claim", "text": "Baked Not Fried"}], "caveat": "Review the nutrition panel and ingredient list for the complete product context.", "confidence": "high"}, {"claim": "Zero Trans Fat", "verdict": "SUPPORTED BY PROVIDED LABEL", "summary": "The supplied nutrition panel reports 0g trans fat.", "evidence": [{"source": "nutrition panel", "text": "Trans Fat 0g"}], "caveat": "A zero declaration may still be subject to applicable rounding rules.", "confidence": "high"}, {"claim": "Whole Grain", "verdict": "TECHNICALLY TRUE, CONTEXT MISSING", "summary": "Whole grain is present, but refined grain appears first.", "evidence": [{"source": "ingredient list", "text": "whole wheat flour"}, {"source": "ingredient list", "text": "Refined wheat flour"}], "caveat": "", "confidence": "high"}]}, {"name": "surface_persuasion_gap", "output": [{"headline": "A positive front claim competes with substantial sodium.", "front_impression": "The front emphasizes a favorable product attribute.", "quiet_context": "The complete packet calculates to approximately 1404mg sodium.", "severity": "high", "evidence": [{"source": "whole-packet calculation", "text": "Sodium 1404mg"}]}, {"headline": "Grain variety is prominent. The first ingredient is refined.", "front_impression": "The front suggests a grain-forward product.", "quiet_context": "The ingredient list begins with \u201cRefined wheat flour\u201d.", "severity": "medium", "evidence": [{"source": "first ingredient", "text": "Refined wheat flour"}]}]}, {"name": "resolve_dates", "output": {"packed_on": "2026-06-01", "best_before": "2026-10-01", "instruction": "Best before 4 months from packaging", "after_opening_instruction": null, "status": "Best-before evidence resolves to 2026-10-01"}}], "limitations": ["PacketCourt audits only the text and images supplied by the user.", "Verdicts are evidence summaries, not legal, medical, or food-safety determinations.", "Users should verify low-confidence OCR against the physical packet."]}
|
| 6 |
+
{"trace_id": "trace-pc-006", "case_id": "pc-006", "input": {"front_text": "100% NATURAL", "back_text": "Ingredients: Chickpea flour, spices, salt."}, "steps": [{"name": "plan_investigation", "output": {"objective": "Audit front-of-pack claims against evidence printed on the same packet.", "steps": [{"tool": "apply_safety_boundary", "reason": "Required to audit the front claim: 100% Natural.", "status": "completed", "source": "fine-tuned router"}], "missing_evidence": [], "stop_reason": "Stopped after all evidence tools required by the detected claims completed.", "router_model": "build-small-hackathon/packetcourt-evidence-router"}}, {"name": "detect_front_claims", "output": ["100% Natural"]}, {"name": "extract_back_evidence", "output": {"ingredients": ["Chickpea flour", "spices", "salt"], "nutrition": {"basis": "unknown", "serving_size_g": null, "package_size_g": null, "protein_g": null, "total_sugar_g": null, "added_sugar_g": null, "sodium_mg": null, "saturated_fat_g": null}}}, {"name": "calculate_whole_packet", "output": {"calculable": false, "multiplier": null, "protein_g": null, "total_sugar_g": null, "added_sugar_g": null, "sugar_teaspoons": null, "sodium_mg": null, "saturated_fat_g": null, "explanation": "Package size and nutrition basis are required."}}, {"name": "audit_claims", "output": [{"claim": "100% Natural", "verdict": "CANNOT VERIFY", "summary": "An absolute naturalness claim cannot be established from package text alone.", "evidence": [{"source": "front claim", "text": "100% Natural"}], "caveat": "PacketCourt refuses to infer product composition beyond the supplied label.", "confidence": "high"}]}, {"name": "surface_persuasion_gap", "output": []}, {"name": "resolve_dates", "output": {"packed_on": null, "best_before": null, "instruction": null, "after_opening_instruction": null, "status": "No resolvable best-before date found"}}], "limitations": ["PacketCourt audits only the text and images supplied by the user.", "Verdicts are evidence summaries, not legal, medical, or food-safety determinations.", "Users should verify low-confidence OCR against the physical packet."]}
|
| 7 |
+
{"trace_id": "trace-pc-007", "case_id": "pc-007", "input": {"front_text": "NO PRESERVATIVES", "back_text": "Ingredients: Tomato pulp, sugar, salt, sodium benzoate. Use by: 08 JUL 2026."}, "steps": [{"name": "plan_investigation", "output": {"objective": "Audit front-of-pack claims against evidence printed on the same packet.", "steps": [{"tool": "inspect_ingredients", "reason": "Required to audit the front claim: No Preservatives.", "status": "completed", "source": "fine-tuned router"}, {"tool": "resolve_dates", "reason": "Date or after-opening evidence is visible on the supplied label.", "status": "completed", "source": "policy"}], "missing_evidence": [], "stop_reason": "Stopped after all evidence tools required by the detected claims completed.", "router_model": "build-small-hackathon/packetcourt-evidence-router"}}, {"name": "detect_front_claims", "output": ["No Preservatives"]}, {"name": "extract_back_evidence", "output": {"ingredients": ["Tomato pulp", "sugar", "salt", "sodium benzoate. Use by: 08 JUL 2026"], "nutrition": {"basis": "unknown", "serving_size_g": null, "package_size_g": null, "protein_g": null, "total_sugar_g": null, "added_sugar_g": null, "sodium_mg": null, "saturated_fat_g": null}}}, {"name": "calculate_whole_packet", "output": {"calculable": false, "multiplier": null, "protein_g": null, "total_sugar_g": null, "added_sugar_g": null, "sugar_teaspoons": null, "sodium_mg": null, "saturated_fat_g": null, "explanation": "Package size and nutrition basis are required."}}, {"name": "audit_claims", "output": [{"claim": "No Preservatives", "verdict": "CONTRADICTED BY PROVIDED LABEL", "summary": "The ingredient list contains a recognizable preservative term or code.", "evidence": [{"source": "ingredient list", "text": "sodium benzoate. Use by: 08 JUL 2026"}], "caveat": "", "confidence": "high"}]}, {"name": "surface_persuasion_gap", "output": []}, {"name": "resolve_dates", "output": {"packed_on": null, "best_before": "2026-07-08", "instruction": null, "after_opening_instruction": null, "status": "Best-before evidence resolves to 2026-07-08"}}], "limitations": ["PacketCourt audits only the text and images supplied by the user.", "Verdicts are evidence summaries, not legal, medical, or food-safety determinations.", "Users should verify low-confidence OCR against the physical packet."]}
|
| 8 |
+
{"trace_id": "trace-pc-008", "case_id": "pc-008", "input": {"front_text": "NO PRESERVATIVES", "back_text": "Ingredients: Tomato, salt. Use by: 08 JUL 2026. Consume within 3 days after opening."}, "steps": [{"name": "plan_investigation", "output": {"objective": "Audit front-of-pack claims against evidence printed on the same packet.", "steps": [{"tool": "inspect_ingredients", "reason": "Required to audit the front claim: No Preservatives.", "status": "completed", "source": "fine-tuned router"}, {"tool": "resolve_dates", "reason": "Date or after-opening evidence is visible on the supplied label.", "status": "completed", "source": "policy"}], "missing_evidence": [], "stop_reason": "Stopped after all evidence tools required by the detected claims completed.", "router_model": "build-small-hackathon/packetcourt-evidence-router"}}, {"name": "detect_front_claims", "output": ["No Preservatives"]}, {"name": "extract_back_evidence", "output": {"ingredients": ["Tomato", "salt. Use by: 08 JUL 2026. Consume within 3 days after opening"], "nutrition": {"basis": "unknown", "serving_size_g": null, "package_size_g": null, "protein_g": null, "total_sugar_g": null, "added_sugar_g": null, "sodium_mg": null, "saturated_fat_g": null}}}, {"name": "calculate_whole_packet", "output": {"calculable": false, "multiplier": null, "protein_g": null, "total_sugar_g": null, "added_sugar_g": null, "sugar_teaspoons": null, "sodium_mg": null, "saturated_fat_g": null, "explanation": "Package size and nutrition basis are required."}}, {"name": "audit_claims", "output": [{"claim": "No Preservatives", "verdict": "SUPPORTED BY PROVIDED LABEL", "summary": "No recognizable preservative term was found in the supplied ingredient list.", "evidence": [{"source": "ingredient list", "text": "Tomato, salt. Use by: 08 JUL 2026. Consume within 3 days after opening"}], "caveat": "Incomplete OCR or unfamiliar additive codes may change this result.", "confidence": "medium"}]}, {"name": "surface_persuasion_gap", "output": []}, {"name": "resolve_dates", "output": {"packed_on": null, "best_before": "2026-07-08", "instruction": null, "after_opening_instruction": "Consume within 3 days after opening", "status": "Best-before evidence resolves to 2026-07-08"}}], "limitations": ["PacketCourt audits only the text and images supplied by the user.", "Verdicts are evidence summaries, not legal, medical, or food-safety determinations.", "Users should verify low-confidence OCR against the physical packet."]}
|
| 9 |
+
{"trace_id": "trace-pc-009", "case_id": "pc-009", "input": {"front_text": "MULTIGRAIN", "back_text": "Ingredients: Whole wheat flour, oats, ragi flour. Best before 6 months from packaging."}, "steps": [{"name": "plan_investigation", "output": {"objective": "Audit front-of-pack claims against evidence printed on the same packet.", "steps": [{"tool": "inspect_ingredients", "reason": "Required to audit the front claim: Multigrain.", "status": "completed", "source": "fine-tuned router"}, {"tool": "resolve_dates", "reason": "Date or after-opening evidence is visible on the supplied label.", "status": "needs evidence", "source": "policy"}], "missing_evidence": ["The packing or manufacturing date needed to resolve relative shelf life"], "stop_reason": "Stopped with explicit missing-evidence requests.", "router_model": "build-small-hackathon/packetcourt-evidence-router"}}, {"name": "detect_front_claims", "output": ["Multigrain"]}, {"name": "extract_back_evidence", "output": {"ingredients": ["Whole wheat flour", "oats", "ragi flour"], "nutrition": {"basis": "unknown", "serving_size_g": null, "package_size_g": null, "protein_g": null, "total_sugar_g": null, "added_sugar_g": null, "sodium_mg": null, "saturated_fat_g": null}}}, {"name": "calculate_whole_packet", "output": {"calculable": false, "multiplier": null, "protein_g": null, "total_sugar_g": null, "added_sugar_g": null, "sugar_teaspoons": null, "sodium_mg": null, "saturated_fat_g": null, "explanation": "Package size and nutrition basis are required."}}, {"name": "audit_claims", "output": [{"claim": "Multigrain", "verdict": "SUPPORTED BY PROVIDED LABEL", "summary": "Multiple grain ingredients are present in the supplied ingredient list.", "evidence": [{"source": "ingredient list", "text": "Whole wheat flour"}, {"source": "ingredient list", "text": "oats"}, {"source": "ingredient list", "text": "ragi flour"}], "caveat": "Ingredient order indicates relative quantity, but exact grain percentages may be unavailable.", "confidence": "high"}]}, {"name": "surface_persuasion_gap", "output": []}, {"name": "resolve_dates", "output": {"packed_on": null, "best_before": null, "instruction": "Best before 6 months from packaging", "after_opening_instruction": null, "status": "Relative shelf-life found, but the starting date is missing"}}], "limitations": ["PacketCourt audits only the text and images supplied by the user.", "Verdicts are evidence summaries, not legal, medical, or food-safety determinations.", "Users should verify low-confidence OCR against the physical packet."]}
|
| 10 |
+
{"trace_id": "trace-pc-010", "case_id": "pc-010", "input": {"front_text": "HIGH PROTEIN", "back_text": "Ingredients: Chickpea flour, salt. Protein 9g."}, "steps": [{"name": "plan_investigation", "output": {"objective": "Audit front-of-pack claims against evidence printed on the same packet.", "steps": [{"tool": "inspect_nutrition", "reason": "Required to audit the front claim: High Protein.", "status": "completed", "source": "fine-tuned router"}], "missing_evidence": ["A readable nutrition panel with its measurement basis"], "stop_reason": "Stopped with explicit missing-evidence requests.", "router_model": "build-small-hackathon/packetcourt-evidence-router"}}, {"name": "detect_front_claims", "output": ["High Protein"]}, {"name": "extract_back_evidence", "output": {"ingredients": ["Chickpea flour", "salt. Protein 9g"], "nutrition": {"basis": "unknown", "serving_size_g": null, "package_size_g": null, "protein_g": 9.0, "total_sugar_g": null, "added_sugar_g": null, "sodium_mg": null, "saturated_fat_g": null}}}, {"name": "calculate_whole_packet", "output": {"calculable": false, "multiplier": null, "protein_g": null, "total_sugar_g": null, "added_sugar_g": null, "sugar_teaspoons": null, "sodium_mg": null, "saturated_fat_g": null, "explanation": "Package size and nutrition basis are required."}}, {"name": "audit_claims", "output": [{"claim": "High Protein", "verdict": "CANNOT VERIFY", "summary": "Protein is listed, but its measurement basis could not be determined.", "evidence": [{"source": "nutrition panel", "text": "Protein 9g"}], "caveat": "", "confidence": "low"}]}, {"name": "surface_persuasion_gap", "output": []}, {"name": "resolve_dates", "output": {"packed_on": null, "best_before": null, "instruction": null, "after_opening_instruction": null, "status": "No resolvable best-before date found"}}], "limitations": ["PacketCourt audits only the text and images supplied by the user.", "Verdicts are evidence summaries, not legal, medical, or food-safety determinations.", "Users should verify low-confidence OCR against the physical packet."]}
|