matchday / README.md
mzidan000's picture
Upload folder using huggingface_hub
b9f2ba1 verified
|
Raw
History Blame
10.7 kB
---
title: MatchDay
emoji:
colorFrom: indigo
colorTo: green
sdk: gradio
app_file: app.py
pinned: true
license: mit
tags:
- build-small-hackathon
- backyard-ai
- agents
- react-agent
- agentic
- agent-loop
- tool-use
- tool-calling
- multi-step-planning
- nemotron
- nvidia
- nvidia-nemotron
- nemotron-3-nano
- modal
- modal-labs
- sglang
- gradio
- gradio-server
- off-brand
- fifa-world-cup-2026
- vancouver
- travel-planning
- trip-planner
- leaflet
models:
- nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
---
# MatchDay ⚽ — your 2026 FIFA World Cup trip, planned by a small-model agent
> **A Backyard AI app for a real Vancouver World Cup use case: helping a fan plan
> one match-day trip with small-model reasoning, Gradio polish, and safe manual
> booking links.**
Type one sentence — *"Flying from Montreal, want Canada vs Qatar, mid-range,
June 26-29, just me"* — and MatchDay's agent **grounds your request in the real
schedule, searches live flights + hotels + weather, ranks 3 packages, and
explains why each one won.** Every price is tagged `● live` or `example` so
nothing is hallucinated, and every booking link is a safe **search** (never a
fake "confirmed booking").
## The idea
The 2026 World Cup is in Vancouver. Fans have one chaotic question — *"how do I
actually get to one match?"* — and existing tools split the answer across five
tabs (flights here, hotels there, weather somewhere else). MatchDay collapses it
into a single agent turn: **understand intent → ground it against the real
fixture list → call live data tools → rank → explain.** It's a focused, backyard
trip planner that treats a small model as a genuine decision-maker, not a chatbot.
The standout agent behavior: **MatchDay corrects you when you're wrong and
refuses to plan when a match doesn't exist.** Ask for *"Canada vs Qatar, June
26"* and it tells you the real match is **June 18 at BC Place, 3:00 PM PT** and
re-plans around it. Ask for *"Canada vs Morocco"* and it won't pretend — that
match doesn't exist, so it offers the real alternatives instead. That grounding
is the difference between an agent and a form.
## How it works — Brain + Hands
- **🧠 Brain (decides + explains):** **NVIDIA Nemotron-3-Nano-30B-A3B** — a
30B-total / **3B-active** Mixture-of-Experts model — served on **Modal A100**
via **SGLang**. It reads the request, picks tools, reasons about results, and
writes the final comparison. **It never calls an API, fetches a URL, or states
a price itself.**
- **✋ Hands (execute + score):** deterministic **Python** calls every API
(flights, hotels, weather, nearby spots), fans them out concurrently, and
scores each package with a fixed formula (cost / arrival-buffer /
stadium-proximity). Every value gets a provenance badge.
- **🔁 Loop:** a bounded agent loop (**≤5 tool rounds**) with a tool allowlist,
Pydantic argument validation, one malformed-call self-correction pass,
per-tool timeouts, a **cold-start retry** (a round-1 Modal timeout is retried
once so the agent actually runs instead of silently degrading to the parser),
and an honest, user-visible deterministic fallback. Nemotron emits structured
tool calls via SGLang's `qwen3_coder` + `nemotron_3` parsers.
## 🤖 Best Agent — multi-step tool use & planning (under the 32B cap)
This is the category we care about most, so here's exactly what makes MatchDay
an agent and not a pipeline:
- **3 tools, picked autonomously:** `build_trip_packages` (the data/scoring tool),
`web_search` (factual grounding — kickoff times, venue policy), and `clarify`
(ask one question when origin/date is genuinely missing).
- **Genuine multi-step turns:** Nemotron can `web_search` to ground a fact, read
the result, *then* call `build_trip_packages` with corrected understanding —
results threaded back into the conversation between rounds. Happy path is 2-3
rounds; the ceiling is 5 (`matchday/agent_loop.py`).
- **Schedule grounding before planning** (`matchday/wc2026.py`): a verified
fixture table is the ground truth. The agent re-centers the trip on the *real*
match date (preserving the user's nights) and refuses nonexistent matchups
with honest alternatives — proven by `tests/test_wc2026_grounding.py`
(6/6 zero-network checks: Canada vs Qatar → Jun 18 / 3:00 PM PT / 3 nights;
Brazil vs Germany and Canada vs Morocco refused).
- **Guardrails that keep it honest:** tool allowlist, Pydantic arg validation,
one malformed-call self-correction, per-tool timeouts, cold-start retry, and a
user-visible fallback to deterministic parsing when Modal is cold-starting.
The loop's agentic behavior — tool dispatch, self-correction, deterministic
fallback, cold-start retry, and trace recording — is proven by
`tests/test_agent_loop.py` (9 zero-network checks).
- **Brain + Hands separation:** the model decides and explains; Python executes
every external call and scores every price — so the model can't hallucinate a
flight number or invent a rate.
Nemotron-3-Nano-30B is **30B total parameters < the 32B cap.**
## 🎨 Off-Brand — a custom UI on `gradio.Server`, well past stock Gradio
MatchDay does **not** use stock Gradio components. It runs on **`gradio.Server`**
(`app.py`), which serves a fully bespoke `index.html` frontend at `/` while a
single `@app.api("plan_trip")` async generator streams typed JSON events through
Gradio's queue (SSE) — so the UI updates live as the agent decides → Python
scores → Nemotron explains. `gr.Server` gives us Gradio's backend (queuing,
concurrency, Spaces hosting) under a hand-built product UI:
- Layla-style **photo-header package cards** with overlaid price + "★ Best match".
- **Provenance pills** on every figure (`● live` vs `example`) — the
anti-hallucination differentiator, visible right in the card.
- An interactive **Leaflet map** (stadium + hotels + POIs, hotel→stadium lines,
full-screen toggle) built in `matchday/render.py`.
- A **day-by-day itinerary** with unique, date-aware roles (arrival / match day /
local explore / departure) and a live **agent progress panel**.
- Per-option **action buttons**: a real flight/hotel **search** and
trip-specific **transit directions** (always with explicit origins) — never an
over-claiming "Book" button.
## 🟢 NVIDIA Nemotron Quest — Nemotron is the Brain
- **Model:** `nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16` (30B MoE, ~3B active/token).
- **Served with SGLang** (`matchday/modal_spike.py`) using the NVIDIA-card
recommended tool-calling config: `--tool-call-parser qwen3_coder` +
`--reasoning-parser nemotron_3` + `--attention-backend flashinfer`. Verified
live that SGLang returns *parsed* `tool_calls` (not raw text) — the whole
Brain+Hands design depends on it. See `matchday/NEMOTRON_SGLANG_VERIFICATION.md`.
- **Reasoning mode:** Nemotron-3-Nano's thinking toggle (`enable_thinking`) is
wired end-to-end (`modal_spike.generate``matchday.agent.MatchDayAgent`
`app.py`) per the official Nemotron usage guide. Enable on the Space with
`MATCHDAY_THINKING=1` to run the decide/ground/explain turns with
chain-of-thought reasoning.
- **Sampling** follows the model card: `temperature=0.6 / top_p=0.95` for tool
routing, reasoning on for complex planning.
## 🟣 Modal — the runtime & inference layer
- Nemotron runs **remotely on Modal** (`modal.App("matchday-spike")`) on an
**A100-80GB** via a containerized SGLang server (`matchday/modal_spike.py`).
- The Gradio Space calls it with `modal.Cls.from_name(...).generate.remote.aio`
— the Space stays lightweight while the heavy 30B inference happens on
sanctioned Modal GPU compute.
- **Cold-start engineering:** a 60GB-model HF cache **Volume** (warm reload
~1-2 GB/s vs re-download), `startup_timeout=120 min` for first load, a
server-side `warmup()`, and a Space-boot `_warm_nemotron()` task so the first
user query isn't stuck behind a cold start.
## Tech stack
Nemotron-3-Nano-30B-A3B (3B-active MoE) · Modal A100-80GB + SGLang v0.5.12
(`qwen3_coder` + `nemotron_3`) · **gradio.Server** bespoke frontend · SerpApi
(Google Flights / Hotels / Search) · Open-Meteo (weather) · OpenStreetMap/Overpass
(nearby spots) · Leaflet + CARTO map · httpx + Pydantic v2 · Python 3.11.
## Try it
- **Live app:** https://build-small-hackathon-matchday.hf.space
- **Space:** https://huggingface.co/spaces/build-small-hackathon/matchday
- **Field Notes (architecture story):** `matchday/FIELD_NOTES.md`
- **Nemotron + SGLang verification:** `matchday/NEMOTRON_SGLANG_VERIFICATION.md`
**Example queries to try:**
1. *Flying from Montreal, want Canada vs Qatar, mid-range, June 26-29, just me* → watch it correct the date to **June 18**.
2. *Toronto to see Brazil vs Germany, premium, July 12, 2 adults* → watch it **refuse** a nonexistent match honestly.
3. *From Halifax, Canada vs Morocco, June 18, couple, luxury* → refused with real Group B alternatives.
## Prizes we're competing for
| Prize | Why MatchDay qualifies |
| --- | --- |
| 🤖 **Best Agent** | Bounded agent loop (≤5 rounds), 3 tools chosen autonomously, genuine multi-step turns (search → build), schedule grounding + honest refusal, guardrails. Proven by 9 zero-network loop tests. 30B < 32B. |
| 🎨 **Off-Brand** | Bespoke Layla-style UI on `gradio.Server` — custom HTML/CSS/JS, photo cards, Leaflet map, provenance pills. Not stock Gradio. |
| 🟢 **NVIDIA Nemotron Quest** | Nemotron-3-Nano-30B is the Brain; SGLang tool-calling verified live; reasoning mode wired. |
| 🟣 **Modal** | A100 inference runtime, documented above (`matchday/modal_spike.py`). |
| 🎬 **Best Demo** | App + demo script (`matchday/DEMO_VIDEO_SCRIPT.md`) + social post (`matchday/SOCIAL_POST.md`). |
| 🏆 **Bonus Quest Champion** | Nemotron + Modal + Gradio + agent + custom UI, all in one focused app. |
| 🗳️ **Judges' Wildcard** | A genuinely useful, honest, small-model trip planner that corrects its user. |
> **Honest note on Tiny Titan:** we are **not** claiming Tiny Titan. That prize
> requires a model of ≤4B parameters; Nemotron-3-Nano-30B is a 30B-total MoE
> (only ~3B *active* per token, but 30B total weights). We'd rather flag this
> than over-claim.
## Built for Build Small
**Track: Backyard AI** — a focused, real-world Vancouver World Cup use case.
Sponsor tools used: **NVIDIA Nemotron-3-Nano-30B** (the Brain) + **Modal A100**
(the runtime) + **Gradio `gradio.Server`** (the Off-Brand UI).
## Social
**Post:** _<paste your social post URL here, then redeploy>_ — a ready-to-post
draft is in `matchday/SOCIAL_POST.md`.