# PacketCourt: The packet takes the stand

Food packets are unusually good at telling two different stories at once.

The front has seconds to persuade: **HIGH PROTEIN**, **MULTIGRAIN**, **100%
NATURAL**, **BAKED NOT FRIED**. The back carries the evidence needed to
interpret those claims: ingredient order, nutrition basis, package size,
licensing text, dates, and instructions that only matter after opening.

PacketCourt is my attempt to make those two surfaces answer to each other.

It is a phone-first Gradio app for Indian packaged-food labels. A user can add
multiple front, back, and side-panel photographs when a packet wraps evidence
around its dimensions. PacketCourt reads each photo independently, labels and
merges the visible evidence, plans an investigation, performs deterministic
calculations, and returns conservative verdicts with citations.

It does not produce a health score. It asks a narrower question:

> Does the evidence printed on this packet support the impression created by
> its front?

Try the Space: https://huggingface.co/spaces/build-small-hackathon/packetcourt

Read the Codex-attributed source:
https://github.com/N-45div/PacketCourt

## The product decision that shaped everything

An early version of the idea was a general nutrition scanner. That direction
was broad, crowded, and difficult to trust. A single red, yellow, or green score
would hide too many judgments:

- Is sugar always worse than protein is good?
- How should serving size affect a score?
- Does an FSSAI license imply a health endorsement?
- Can OCR uncertainty silently change the answer?

PacketCourt therefore avoids ranking products. It audits claims against
evidence from the same supplied packet.

The output language is intentionally constrained:

- `SUPPORTED BY PROVIDED LABEL`
- `CONTRADICTED BY PROVIDED LABEL`
- `TECHNICALLY TRUE, CONTEXT MISSING`
- `CANNOT VERIFY`

The phrase **provided label** matters. PacketCourt does not pretend that a
photograph is a laboratory analysis or that a missing line of text does not
exist.

## A three-model investigation with a deterministic judge

PacketCourt uses small models where interpretation is useful and deterministic
code where exactness is required.

```mermaid
flowchart LR
    Phone["Phone or desktop<br/>multi-angle packet photos"] --> App["PacketCourt<br/>custom Gradio app"]
    App --> Vision["OpenBMB MiniCPM-V-4.6<br/>1.30B visual witness"]
    Vision --> Agent["Evidence investigation agent"]
    Agent --> Router["Fine-tuned PacketCourt router<br/>4.38M parameters"]
    Router --> Agent
    Agent --> Nemotron["NVIDIA Nemotron Mini 4B<br/>independent evidence-gap review"]
    Nemotron --> Agent
    Agent --> Judge["Deterministic evidence judge"]
    Judge --> Report["Verdicts, citations,<br/>calculations, and trace"]
    Report --> Feedback["Community Review Agent"]
    Feedback --> Queue["Public approval-gated<br/>learning queue"]
```

### OpenBMB MiniCPM-V-4.6: the visual witness

The vision companion runs on ZeroGPU. It receives up to six front/side and six
back/side images and transcribes only visibly printed evidence. PacketCourt
labels every transcription by photo number, merges unique evidence, and skips
exact duplicates. The front prompt focuses on claims. The back prompt preserves
ingredients, every visible nutrition-table row and basis, net weight, FSSAI
license text, dates, and after-opening instructions.

The model is asked not to explain or infer. Its responsibility is to surface
what is visible for the next stage.

### A fine-tuned 4.38M-parameter evidence router

Different claims require different evidence.

- `NO ADDED SUGAR` requires ingredient inspection.
- `HIGH PROTEIN` requires nutrition values and their measurement basis.
- `FSSAI APPROVED` requires license evidence and a registration-versus-
  endorsement distinction.
- `100% NATURAL` requires the safety boundary because the absolute claim cannot
  be established from packet text alone.

I fine-tuned a tiny BERT classifier to route claims to five bounded tools:
`ingredients`, `nutrition`, `license`, `dates`, and `refuse_absolute`.

The first training run reached only `0.40` held-out accuracy. The random split
did not preserve every routing class, and the dataset was too thin. I did not
enable that checkpoint.

After balancing the claim variants and using a stratified five-class holdout,
the corrected checkpoint reached `1.000` on the small held-out set. That result
is useful evidence that the routing task is learnable, not proof of broad
generalization. Deterministic policy fallback remains available when the model
cannot load. Real packet testing later exposed new routes such as `SUGAR FREE`,
`REAL BADAM`, and `EXTRA CALCIUM WITH DHA`; those reviewed cases were added to
the public training set and the router was fine-tuned again.

Model: https://huggingface.co/build-small-hackathon/packetcourt-evidence-router

Training data:
https://huggingface.co/datasets/build-small-hackathon/packetcourt-router-training

### NVIDIA Nemotron: an independent reviewer, not the judge

After the investigation plan completes, NVIDIA
`Nemotron-Mini-4B-Instruct` reviews the structured case for missing evidence.
It can identify the highest-priority next action or confirm that the bounded
investigation is complete.

It cannot change a verdict or manufacture a required-evidence state outside
PacketCourt's deterministic investigation. Companion responses cross a typed
`AgentReview` boundary before they can appear in the product.

This separation matters. A language model is useful for reviewing whether the
investigation overlooked an evidence gap. It should not silently override
arithmetic or invent a regulatory conclusion.

The first Nemotron deployment also failed. I initially used
`NVIDIA-Nemotron-3-Nano-4B-BF16`, but a real ZeroGPU probe exposed a dependency
on a specialized Mamba CUDA runtime unavailable in the standard Gradio image.
I switched to Nemotron Mini 4B only after the replacement completed a real
ZeroGPU review.

## The deterministic evidence judge

The final verdict path is ordinary Python.

That code:

- detects known and meaningful previously unseen front claims;
- extracts ingredients;
- parses nutrition values and their declared basis, including table-style OCR
  such as `Protein (g) 12` and `Sodium | mg | 410`;
- calculates whole-packet protein, sugar, sodium, and saturated fat;
- converts total sugar into a teaspoon equivalent;
- resolves direct and relative best-before dates;
- extracts after-opening deadlines;
- applies conservative claim-specific verdict rules.

For example, when a nutrition panel declares values per `100g` and the packet
contains `300g`, PacketCourt scales the values by exactly `3`. It does not ask a
language model to perform that arithmetic.

## Persuasion Gap

Claim verification alone did not capture the most interesting part of the
problem.

A `HIGH PROTEIN` claim can be supported by visible protein evidence while the
complete packet also contains substantial sugar or sodium. A multigrain claim
can be technically true while refined flour remains the first ingredient.

PacketCourt therefore calculates a **Persuasion Gap**: material context on the
back that competes with the impression emphasized on the front.

Examples include:

- “Protein leads. Whole-packet sugar stays quiet.”
- “A positive front claim competes with substantial sodium.”
- “Grain variety is prominent. The first ingredient is refined.”
- “Registration language can look like a health endorsement.”

Each finding cites the exact evidence or calculation. PacketCourt still leaves
the final decision with the user.

## A correction-driven learning loop

PacketCourt now includes a Community Review Agent. After an audit, a user can
confirm the result or submit an evidence-backed correction. The review is
bundled with the original label text, verdicts, investigation path, and
Nemotron review in a public queue.

Feedback does not immediately retrain production models. That would allow an
accidental or malicious correction to poison later audits. New records begin
as `pending_human_review` and `training_eligible: false`. Approved corrections
can enter a versioned router-training release, followed by fine-tuning and the
golden-case regression suite before deployment.

Community feedback queue:
https://huggingface.co/datasets/build-small-hackathon/packetcourt-community-feedback

## What makes the agent bounded

For every packet, PacketCourt emits an explicit investigation record:

- objective;
- selected evidence tools;
- reason each tool was selected;
- whether the fine-tuned router or policy fallback selected it;
- missing-evidence requests;
- stop reason;
- independent Nemotron review;
- deterministic verdicts and limitations.

There are only two valid stopping conditions:

1. every evidence tool required by the detected claims completed; or
2. required evidence is missing, so PacketCourt stops and asks for it.

The public trace dataset contains no hidden chain-of-thought. It exposes tool
decisions, evidence outputs, calculations, and boundaries suitable for
inspection.

Traces:
https://huggingface.co/datasets/build-small-hackathon/packetcourt-traces

## Evaluation

The current release has:

- `20` passing unit and end-to-end integration tests;
- `35/35` passing checks across `10` golden packet cases;
- `10` transparent investigation traces;
- one published real end-to-end Nemotron review trace;
- successful live audits using the fine-tuned router and Nemotron reviewer;
- a real public Community Review Agent record;
- multi-angle packet-photo ingestion with duplicate removal.

The golden cases cover contradictions, supported claims, missing context,
whole-packet calculations, refined-grain context, FSSAI registration language,
relative shelf-life arithmetic, and after-opening instructions.

Golden cases:
https://huggingface.co/datasets/build-small-hackathon/packetcourt-golden-cases

## The interface is part of the evidence standard

PacketCourt uses a custom responsive frontend mounted over a Gradio engine.
The phone workflow matters because the packet is physically in the user's
hand. Some packets place claims, ingredients, dates, directions, and nutrition
on different wrapped panels, so the interface supports additive multi-angle
capture rather than assuming two perfect photos. The results view shows the
investigation path before the verdict cards, then separates persuasion gaps,
claim findings, nutrition calculations, date evidence, machine-readable JSON,
and the community review path.

Uncertainty is not hidden in a tooltip. It is part of the primary result.

## What PacketCourt refuses to claim

PacketCourt does not declare a food:

- healthy;
- safe;
- illegal;
- fraudulent;
- suitable for a medical condition.

It audits only supplied packet evidence. OCR should be checked against the
physical label. `CANNOT VERIFY` is a successful outcome when the evidence is
insufficient.

That refusal is not a missing feature. It is PacketCourt's standard of proof.

## Built small

The complete model budget is approximately `5.3B` parameters:

- OpenBMB MiniCPM-V-4.6: `1.30B`;
- NVIDIA Nemotron Mini: approximately `4B`;
- fine-tuned PacketCourt router: `4.38M`.

The main evidence judge remains deterministic and CPU-based. ZeroGPU is
requested only for visual transcription and the independent Nemotron review.

PacketCourt was built with OpenAI Codex as the primary coding agent. The public
GitHub repository preserves Codex-attributed commits covering the architecture,
tests, fine-tuning workflow, model companions, trace publication, UI, and
deployment.

Space: https://huggingface.co/spaces/build-small-hackathon/packetcourt

GitHub: https://github.com/N-45div/PacketCourt

Model: https://huggingface.co/build-small-hackathon/packetcourt-evidence-router

Traces: https://huggingface.co/datasets/build-small-hackathon/packetcourt-traces