# PacketCourt: The packet takes the stand Food packets are unusually good at telling two different stories at once. The front has seconds to persuade: **HIGH PROTEIN**, **MULTIGRAIN**, **100% NATURAL**, **BAKED NOT FRIED**. The back carries the evidence needed to interpret those claims: ingredient order, nutrition basis, package size, licensing text, dates, and instructions that only matter after opening. PacketCourt is my attempt to make those two surfaces answer to each other. It is a phone-first Gradio app for Indian packaged-food labels. A user can add multiple front, back, and side-panel photographs when a packet wraps evidence around its dimensions. PacketCourt reads each photo independently, labels and merges the visible evidence, plans an investigation, performs deterministic calculations, and returns conservative verdicts with citations. It does not produce a health score. It asks a narrower question: > Does the evidence printed on this packet support the impression created by > its front? Try the Space: https://huggingface.co/spaces/build-small-hackathon/packetcourt Read the Codex-attributed source: https://github.com/N-45div/PacketCourt ## The product decision that shaped everything An early version of the idea was a general nutrition scanner. That direction was broad, crowded, and difficult to trust. A single red, yellow, or green score would hide too many judgments: - Is sugar always worse than protein is good? - How should serving size affect a score? - Does an FSSAI license imply a health endorsement? - Can OCR uncertainty silently change the answer? PacketCourt therefore avoids ranking products. It audits claims against evidence from the same supplied packet. The output language is intentionally constrained: - `SUPPORTED BY PROVIDED LABEL` - `CONTRADICTED BY PROVIDED LABEL` - `TECHNICALLY TRUE, CONTEXT MISSING` - `CANNOT VERIFY` The phrase **provided label** matters. PacketCourt does not pretend that a photograph is a laboratory analysis or that a missing line of text does not exist. ## A three-model investigation with a deterministic judge PacketCourt uses small models where interpretation is useful and deterministic code where exactness is required. ```mermaid flowchart LR Phone["Phone or desktop
multi-angle packet photos"] --> App["PacketCourt
custom Gradio app"] App --> Vision["OpenBMB MiniCPM-V-4.6
1.30B visual witness"] Vision --> Agent["Evidence investigation agent"] Agent --> Router["Fine-tuned PacketCourt router
4.38M parameters"] Router --> Agent Agent --> Nemotron["NVIDIA Nemotron Mini 4B
independent evidence-gap review"] Nemotron --> Agent Agent --> Judge["Deterministic evidence judge"] Judge --> Report["Verdicts, citations,
calculations, and trace"] Report --> Feedback["Community Review Agent"] Feedback --> Queue["Public approval-gated
learning queue"] ``` ### OpenBMB MiniCPM-V-4.6: the visual witness The vision companion runs on ZeroGPU. It receives up to six front/side and six back/side images and transcribes only visibly printed evidence. PacketCourt labels every transcription by photo number, merges unique evidence, and skips exact duplicates. The front prompt focuses on claims. The back prompt preserves ingredients, every visible nutrition-table row and basis, net weight, FSSAI license text, dates, and after-opening instructions. The model is asked not to explain or infer. Its responsibility is to surface what is visible for the next stage. ### A fine-tuned 4.38M-parameter evidence router Different claims require different evidence. - `NO ADDED SUGAR` requires ingredient inspection. - `HIGH PROTEIN` requires nutrition values and their measurement basis. - `FSSAI APPROVED` requires license evidence and a registration-versus- endorsement distinction. - `100% NATURAL` requires the safety boundary because the absolute claim cannot be established from packet text alone. I fine-tuned a tiny BERT classifier to route claims to five bounded tools: `ingredients`, `nutrition`, `license`, `dates`, and `refuse_absolute`. The first training run reached only `0.40` held-out accuracy. The random split did not preserve every routing class, and the dataset was too thin. I did not enable that checkpoint. After balancing the claim variants and using a stratified five-class holdout, the corrected checkpoint reached `1.000` on the small held-out set. That result is useful evidence that the routing task is learnable, not proof of broad generalization. Deterministic policy fallback remains available when the model cannot load. Real packet testing later exposed new routes such as `SUGAR FREE`, `REAL BADAM`, and `EXTRA CALCIUM WITH DHA`; those reviewed cases were added to the public training set and the router was fine-tuned again. Model: https://huggingface.co/build-small-hackathon/packetcourt-evidence-router Training data: https://huggingface.co/datasets/build-small-hackathon/packetcourt-router-training ### NVIDIA Nemotron: an independent reviewer, not the judge After the investigation plan completes, NVIDIA `Nemotron-Mini-4B-Instruct` reviews the structured case for missing evidence. It can identify the highest-priority next action or confirm that the bounded investigation is complete. It cannot change a verdict or manufacture a required-evidence state outside PacketCourt's deterministic investigation. Companion responses cross a typed `AgentReview` boundary before they can appear in the product. This separation matters. A language model is useful for reviewing whether the investigation overlooked an evidence gap. It should not silently override arithmetic or invent a regulatory conclusion. The first Nemotron deployment also failed. I initially used `NVIDIA-Nemotron-3-Nano-4B-BF16`, but a real ZeroGPU probe exposed a dependency on a specialized Mamba CUDA runtime unavailable in the standard Gradio image. I switched to Nemotron Mini 4B only after the replacement completed a real ZeroGPU review. ## The deterministic evidence judge The final verdict path is ordinary Python. That code: - detects known and meaningful previously unseen front claims; - extracts ingredients; - parses nutrition values and their declared basis, including table-style OCR such as `Protein (g) 12` and `Sodium | mg | 410`; - calculates whole-packet protein, sugar, sodium, and saturated fat; - converts total sugar into a teaspoon equivalent; - resolves direct and relative best-before dates; - extracts after-opening deadlines; - applies conservative claim-specific verdict rules. For example, when a nutrition panel declares values per `100g` and the packet contains `300g`, PacketCourt scales the values by exactly `3`. It does not ask a language model to perform that arithmetic. ## Persuasion Gap Claim verification alone did not capture the most interesting part of the problem. A `HIGH PROTEIN` claim can be supported by visible protein evidence while the complete packet also contains substantial sugar or sodium. A multigrain claim can be technically true while refined flour remains the first ingredient. PacketCourt therefore calculates a **Persuasion Gap**: material context on the back that competes with the impression emphasized on the front. Examples include: - “Protein leads. Whole-packet sugar stays quiet.” - “A positive front claim competes with substantial sodium.” - “Grain variety is prominent. The first ingredient is refined.” - “Registration language can look like a health endorsement.” Each finding cites the exact evidence or calculation. PacketCourt still leaves the final decision with the user. ## A correction-driven learning loop PacketCourt now includes a Community Review Agent. After an audit, a user can confirm the result or submit an evidence-backed correction. The review is bundled with the original label text, verdicts, investigation path, and Nemotron review in a public queue. Feedback does not immediately retrain production models. That would allow an accidental or malicious correction to poison later audits. New records begin as `pending_human_review` and `training_eligible: false`. Approved corrections can enter a versioned router-training release, followed by fine-tuning and the golden-case regression suite before deployment. Community feedback queue: https://huggingface.co/datasets/build-small-hackathon/packetcourt-community-feedback ## What makes the agent bounded For every packet, PacketCourt emits an explicit investigation record: - objective; - selected evidence tools; - reason each tool was selected; - whether the fine-tuned router or policy fallback selected it; - missing-evidence requests; - stop reason; - independent Nemotron review; - deterministic verdicts and limitations. There are only two valid stopping conditions: 1. every evidence tool required by the detected claims completed; or 2. required evidence is missing, so PacketCourt stops and asks for it. The public trace dataset contains no hidden chain-of-thought. It exposes tool decisions, evidence outputs, calculations, and boundaries suitable for inspection. Traces: https://huggingface.co/datasets/build-small-hackathon/packetcourt-traces ## Evaluation The current release has: - `20` passing unit and end-to-end integration tests; - `35/35` passing checks across `10` golden packet cases; - `10` transparent investigation traces; - one published real end-to-end Nemotron review trace; - successful live audits using the fine-tuned router and Nemotron reviewer; - a real public Community Review Agent record; - multi-angle packet-photo ingestion with duplicate removal. The golden cases cover contradictions, supported claims, missing context, whole-packet calculations, refined-grain context, FSSAI registration language, relative shelf-life arithmetic, and after-opening instructions. Golden cases: https://huggingface.co/datasets/build-small-hackathon/packetcourt-golden-cases ## The interface is part of the evidence standard PacketCourt uses a custom responsive frontend mounted over a Gradio engine. The phone workflow matters because the packet is physically in the user's hand. Some packets place claims, ingredients, dates, directions, and nutrition on different wrapped panels, so the interface supports additive multi-angle capture rather than assuming two perfect photos. The results view shows the investigation path before the verdict cards, then separates persuasion gaps, claim findings, nutrition calculations, date evidence, machine-readable JSON, and the community review path. Uncertainty is not hidden in a tooltip. It is part of the primary result. ## What PacketCourt refuses to claim PacketCourt does not declare a food: - healthy; - safe; - illegal; - fraudulent; - suitable for a medical condition. It audits only supplied packet evidence. OCR should be checked against the physical label. `CANNOT VERIFY` is a successful outcome when the evidence is insufficient. That refusal is not a missing feature. It is PacketCourt's standard of proof. ## Built small The complete model budget is approximately `5.3B` parameters: - OpenBMB MiniCPM-V-4.6: `1.30B`; - NVIDIA Nemotron Mini: approximately `4B`; - fine-tuned PacketCourt router: `4.38M`. The main evidence judge remains deterministic and CPU-based. ZeroGPU is requested only for visual transcription and the independent Nemotron review. PacketCourt was built with OpenAI Codex as the primary coding agent. The public GitHub repository preserves Codex-attributed commits covering the architecture, tests, fine-tuning workflow, model companions, trace publication, UI, and deployment. Space: https://huggingface.co/spaces/build-small-hackathon/packetcourt GitHub: https://github.com/N-45div/PacketCourt Model: https://huggingface.co/build-small-hackathon/packetcourt-evidence-router Traces: https://huggingface.co/datasets/build-small-hackathon/packetcourt-traces