--- title: MatchDay emoji: ⚽ colorFrom: indigo colorTo: green sdk: gradio app_file: app.py pinned: true license: mit tags: - build-small-hackathon - backyard-ai - agents - react-agent - agentic - agent-loop - tool-use - tool-calling - multi-step-planning - nemotron - nvidia - nvidia-nemotron - nemotron-3-nano - modal - modal-labs - sglang - gradio - gradio-server - off-brand - fifa-world-cup-2026 - vancouver - travel-planning - trip-planner - leaflet models: - nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 --- # MatchDay ⚽ — your 2026 FIFA World Cup trip, planned by a small-model agent > **A Backyard AI app for a real Vancouver World Cup use case: helping a fan plan > one match-day trip with small-model reasoning, Gradio polish, and safe manual > booking links.** Type one sentence — *"Flying from Montreal, want Canada vs Qatar, mid-range, June 26-29, just me"* — and MatchDay's agent **grounds your request in the real schedule, searches live flights + hotels + weather, ranks 3 packages, and explains why each one won.** Every price is tagged `● live` or `example` so nothing is hallucinated, and every booking link is a safe **search** (never a fake "confirmed booking"). ## The idea The 2026 World Cup is in Vancouver. Fans have one chaotic question — *"how do I actually get to one match?"* — and existing tools split the answer across five tabs (flights here, hotels there, weather somewhere else). MatchDay collapses it into a single agent turn: **understand intent → ground it against the real fixture list → call live data tools → rank → explain.** It's a focused, backyard trip planner that treats a small model as a genuine decision-maker, not a chatbot. The standout agent behavior: **MatchDay corrects you when you're wrong and refuses to plan when a match doesn't exist.** Ask for *"Canada vs Qatar, June 26"* and it tells you the real match is **June 18 at BC Place, 3:00 PM PT** and re-plans around it. Ask for *"Canada vs Morocco"* and it won't pretend — that match doesn't exist, so it offers the real alternatives instead. That grounding is the difference between an agent and a form. ## How it works — Brain + Hands - **🧠 Brain (decides + explains):** **NVIDIA Nemotron-3-Nano-30B-A3B** — a 30B-total / **3B-active** Mixture-of-Experts model — served on **Modal A100** via **SGLang**. It reads the request, picks tools, reasons about results, and writes the final comparison. **It never calls an API, fetches a URL, or states a price itself.** - **✋ Hands (execute + score):** deterministic **Python** calls every API (flights, hotels, weather, nearby spots), fans them out concurrently, and scores each package with a fixed formula (cost / arrival-buffer / stadium-proximity). Every value gets a provenance badge. - **🔁 Loop:** a bounded agent loop (**≤5 tool rounds**) with a tool allowlist, Pydantic argument validation, one malformed-call self-correction pass, per-tool timeouts, a **cold-start retry** (a round-1 Modal timeout is retried once so the agent actually runs instead of silently degrading to the parser), and an honest, user-visible deterministic fallback. Nemotron emits structured tool calls via SGLang's `qwen3_coder` + `nemotron_3` parsers. ## 🤖 Best Agent — multi-step tool use & planning (under the 32B cap) This is the category we care about most, so here's exactly what makes MatchDay an agent and not a pipeline: - **3 tools, picked autonomously:** `build_trip_packages` (the data/scoring tool), `web_search` (factual grounding — kickoff times, venue policy), and `clarify` (ask one question when origin/date is genuinely missing). - **Genuine multi-step turns:** Nemotron can `web_search` to ground a fact, read the result, *then* call `build_trip_packages` with corrected understanding — results threaded back into the conversation between rounds. Happy path is 2-3 rounds; the ceiling is 5 (`matchday/agent_loop.py`). - **Schedule grounding before planning** (`matchday/wc2026.py`): a verified fixture table is the ground truth. The agent re-centers the trip on the *real* match date (preserving the user's nights) and refuses nonexistent matchups with honest alternatives — proven by `tests/test_wc2026_grounding.py` (6/6 zero-network checks: Canada vs Qatar → Jun 18 / 3:00 PM PT / 3 nights; Brazil vs Germany and Canada vs Morocco refused). - **Guardrails that keep it honest:** tool allowlist, Pydantic arg validation, one malformed-call self-correction, per-tool timeouts, cold-start retry, and a user-visible fallback to deterministic parsing when Modal is cold-starting. The loop's agentic behavior — tool dispatch, self-correction, deterministic fallback, cold-start retry, and trace recording — is proven by `tests/test_agent_loop.py` (9 zero-network checks). - **Brain + Hands separation:** the model decides and explains; Python executes every external call and scores every price — so the model can't hallucinate a flight number or invent a rate. Nemotron-3-Nano-30B is **30B total parameters < the 32B cap.** ## 🎨 Off-Brand — a custom UI on `gradio.Server`, well past stock Gradio MatchDay does **not** use stock Gradio components. It runs on **`gradio.Server`** (`app.py`), which serves a fully bespoke `index.html` frontend at `/` while a single `@app.api("plan_trip")` async generator streams typed JSON events through Gradio's queue (SSE) — so the UI updates live as the agent decides → Python scores → Nemotron explains. `gr.Server` gives us Gradio's backend (queuing, concurrency, Spaces hosting) under a hand-built product UI: - Layla-style **photo-header package cards** with overlaid price + "★ Best match". - **Provenance pills** on every figure (`● live` vs `example`) — the anti-hallucination differentiator, visible right in the card. - An interactive **Leaflet map** (stadium + hotels + POIs, hotel→stadium lines, full-screen toggle) built in `matchday/render.py`. - A **day-by-day itinerary** with unique, date-aware roles (arrival / match day / local explore / departure) and a live **agent progress panel**. - Per-option **action buttons**: a real flight/hotel **search** and trip-specific **transit directions** (always with explicit origins) — never an over-claiming "Book" button. ## 🟢 NVIDIA Nemotron Quest — Nemotron is the Brain - **Model:** `nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16` (30B MoE, ~3B active/token). - **Served with SGLang** (`matchday/modal_spike.py`) using the NVIDIA-card recommended tool-calling config: `--tool-call-parser qwen3_coder` + `--reasoning-parser nemotron_3` + `--attention-backend flashinfer`. Verified live that SGLang returns *parsed* `tool_calls` (not raw text) — the whole Brain+Hands design depends on it. See `matchday/NEMOTRON_SGLANG_VERIFICATION.md`. - **Reasoning mode:** Nemotron-3-Nano's thinking toggle (`enable_thinking`) is wired end-to-end (`modal_spike.generate` → `matchday.agent.MatchDayAgent` → `app.py`) per the official Nemotron usage guide. Enable on the Space with `MATCHDAY_THINKING=1` to run the decide/ground/explain turns with chain-of-thought reasoning. - **Sampling** follows the model card: `temperature=0.6 / top_p=0.95` for tool routing, reasoning on for complex planning. ## 🟣 Modal — the runtime & inference layer - Nemotron runs **remotely on Modal** (`modal.App("matchday-spike")`) on an **A100-80GB** via a containerized SGLang server (`matchday/modal_spike.py`). - The Gradio Space calls it with `modal.Cls.from_name(...).generate.remote.aio` — the Space stays lightweight while the heavy 30B inference happens on sanctioned Modal GPU compute. - **Cold-start engineering:** a 60GB-model HF cache **Volume** (warm reload ~1-2 GB/s vs re-download), `startup_timeout=120 min` for first load, a server-side `warmup()`, and a Space-boot `_warm_nemotron()` task so the first user query isn't stuck behind a cold start. ## Tech stack Nemotron-3-Nano-30B-A3B (3B-active MoE) · Modal A100-80GB + SGLang v0.5.12 (`qwen3_coder` + `nemotron_3`) · **gradio.Server** bespoke frontend · SerpApi (Google Flights / Hotels / Search) · Open-Meteo (weather) · OpenStreetMap/Overpass (nearby spots) · Leaflet + CARTO map · httpx + Pydantic v2 · Python 3.11. ## Try it - **Demo video (X):** https://x.com/islandsofmoons/status/2066669685668811071 - **Live app:** https://build-small-hackathon-matchday.hf.space - **Space:** https://huggingface.co/spaces/build-small-hackathon/matchday - **Field Notes (architecture story):** `matchday/FIELD_NOTES.md` - **Nemotron + SGLang verification:** `matchday/NEMOTRON_SGLANG_VERIFICATION.md` **Example queries to try** (the app's example chips are labeled so you can tell a clean valid trip from the correction/refusal demos): 1. ✅ *From Seattle, want Canada vs Switzerland on June 24* → a **clean valid trip**: a real Group B match on its correct date (June 24, BC Place) — no correction, just ranked packages. 2. *Flying from Montreal, want Canada vs Qatar, mid-range, June 26-29, just me* → watch it correct the date to **June 18**. 3. *Toronto to see Brazil vs Germany, premium, July 12, 2 adults* → watch it **refuse** a nonexistent match honestly. 4. *From Halifax, Canada vs Morocco, June 18, couple, luxury* → refused with real Group B alternatives. ## Prizes we're competing for | Prize | Why MatchDay qualifies | | --- | --- | | 🤖 **Best Agent** | Bounded agent loop (≤5 rounds), 3 tools chosen autonomously, genuine multi-step turns (search → build), schedule grounding + honest refusal, guardrails. Proven by 9 zero-network loop tests. 30B < 32B. | | 🎨 **Off-Brand** | Bespoke Layla-style UI on `gradio.Server` — custom HTML/CSS/JS, photo cards, Leaflet map, provenance pills. Not stock Gradio. | | 🟢 **NVIDIA Nemotron Quest** | Nemotron-3-Nano-30B is the Brain; SGLang tool-calling verified live; reasoning mode wired. | | 🟣 **Modal** | A100 inference runtime, documented above (`matchday/modal_spike.py`). | | 🎬 **Best Demo** | App + live demo video / social post ([X](https://x.com/islandsofmoons/status/2066669685668811071)) + demo script (`matchday/DEMO_VIDEO_SCRIPT.md`). | | 🏆 **Bonus Quest Champion** | Nemotron + Modal + Gradio + agent + custom UI, all in one focused app. | | 🗳️ **Judges' Wildcard** | A genuinely useful, honest, small-model trip planner that corrects its user. | > **Honest note on Tiny Titan:** we are **not** claiming Tiny Titan. That prize > requires a model of ≤4B parameters; Nemotron-3-Nano-30B is a 30B-total MoE > (only ~3B *active* per token, but 30B total weights). We'd rather flag this > than over-claim. ## Built for Build Small **Track: Backyard AI** — a focused, real-world Vancouver World Cup use case. Sponsor tools used: **NVIDIA Nemotron-3-Nano-30B** (the Brain) + **Modal A100** (the runtime) + **Gradio `gradio.Server`** (the Off-Brand UI). ## Social & demo - **Demo video / social post (X):** https://x.com/islandsofmoons/status/2066669685668811071 - A ready-to-post draft is in `matchday/SOCIAL_POST.md`.