Spaces:
Running
Running
| title: MatchDay | |
| emoji: ⚽ | |
| colorFrom: indigo | |
| colorTo: green | |
| sdk: gradio | |
| app_file: app.py | |
| pinned: true | |
| license: mit | |
| tags: | |
| - build-small-hackathon | |
| - backyard-ai | |
| - agents | |
| - react-agent | |
| - agentic | |
| - agent-loop | |
| - tool-use | |
| - tool-calling | |
| - multi-step-planning | |
| - nemotron | |
| - nvidia | |
| - nvidia-nemotron | |
| - nemotron-3-nano | |
| - modal | |
| - modal-labs | |
| - sglang | |
| - gradio | |
| - gradio-server | |
| - off-brand | |
| - fifa-world-cup-2026 | |
| - vancouver | |
| - travel-planning | |
| - trip-planner | |
| - leaflet | |
| models: | |
| - nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 | |
| # MatchDay ⚽ — your 2026 FIFA World Cup trip, planned by a small-model agent | |
| > **A Backyard AI app for a real Vancouver World Cup use case: helping a fan plan | |
| > one match-day trip with small-model reasoning, Gradio polish, and safe manual | |
| > booking links.** | |
| Type one sentence — *"Flying from Montreal, want Canada vs Qatar, mid-range, | |
| June 26-29, just me"* — and MatchDay's agent **grounds your request in the real | |
| schedule, searches live flights + hotels + weather, ranks 3 packages, and | |
| explains why each one won.** Every price is tagged `● live` or `example` so | |
| nothing is hallucinated, and every booking link is a safe **search** (never a | |
| fake "confirmed booking"). | |
| ## The idea | |
| The 2026 World Cup is in Vancouver. Fans have one chaotic question — *"how do I | |
| actually get to one match?"* — and existing tools split the answer across five | |
| tabs (flights here, hotels there, weather somewhere else). MatchDay collapses it | |
| into a single agent turn: **understand intent → ground it against the real | |
| fixture list → call live data tools → rank → explain.** It's a focused, backyard | |
| trip planner that treats a small model as a genuine decision-maker, not a chatbot. | |
| The standout agent behavior: **MatchDay corrects you when you're wrong and | |
| refuses to plan when a match doesn't exist.** Ask for *"Canada vs Qatar, June | |
| 26"* and it tells you the real match is **June 18 at BC Place, 3:00 PM PT** and | |
| re-plans around it. Ask for *"Canada vs Morocco"* and it won't pretend — that | |
| match doesn't exist, so it offers the real alternatives instead. That grounding | |
| is the difference between an agent and a form. | |
| ## How it works — Brain + Hands | |
| - **🧠 Brain (decides + explains):** **NVIDIA Nemotron-3-Nano-30B-A3B** — a | |
| 30B-total / **3B-active** Mixture-of-Experts model — served on **Modal A100** | |
| via **SGLang**. It reads the request, picks tools, reasons about results, and | |
| writes the final comparison. **It never calls an API, fetches a URL, or states | |
| a price itself.** | |
| - **✋ Hands (execute + score):** deterministic **Python** calls every API | |
| (flights, hotels, weather, nearby spots), fans them out concurrently, and | |
| scores each package with a fixed formula (cost / arrival-buffer / | |
| stadium-proximity). Every value gets a provenance badge. | |
| - **🔁 Loop:** a bounded agent loop (**≤5 tool rounds**) with a tool allowlist, | |
| Pydantic argument validation, one malformed-call self-correction pass, | |
| per-tool timeouts, a **cold-start retry** (a round-1 Modal timeout is retried | |
| once so the agent actually runs instead of silently degrading to the parser), | |
| and an honest, user-visible deterministic fallback. Nemotron emits structured | |
| tool calls via SGLang's `qwen3_coder` + `nemotron_3` parsers. | |
| ## 🤖 Best Agent — multi-step tool use & planning (under the 32B cap) | |
| This is the category we care about most, so here's exactly what makes MatchDay | |
| an agent and not a pipeline: | |
| - **3 tools, picked autonomously:** `build_trip_packages` (the data/scoring tool), | |
| `web_search` (factual grounding — kickoff times, venue policy), and `clarify` | |
| (ask one question when origin/date is genuinely missing). | |
| - **Genuine multi-step turns:** Nemotron can `web_search` to ground a fact, read | |
| the result, *then* call `build_trip_packages` with corrected understanding — | |
| results threaded back into the conversation between rounds. Happy path is 2-3 | |
| rounds; the ceiling is 5 (`matchday/agent_loop.py`). | |
| - **Schedule grounding before planning** (`matchday/wc2026.py`): a verified | |
| fixture table is the ground truth. The agent re-centers the trip on the *real* | |
| match date (preserving the user's nights) and refuses nonexistent matchups | |
| with honest alternatives — proven by `tests/test_wc2026_grounding.py` | |
| (6/6 zero-network checks: Canada vs Qatar → Jun 18 / 3:00 PM PT / 3 nights; | |
| Brazil vs Germany and Canada vs Morocco refused). | |
| - **Guardrails that keep it honest:** tool allowlist, Pydantic arg validation, | |
| one malformed-call self-correction, per-tool timeouts, cold-start retry, and a | |
| user-visible fallback to deterministic parsing when Modal is cold-starting. | |
| The loop's agentic behavior — tool dispatch, self-correction, deterministic | |
| fallback, cold-start retry, and trace recording — is proven by | |
| `tests/test_agent_loop.py` (9 zero-network checks). | |
| - **Brain + Hands separation:** the model decides and explains; Python executes | |
| every external call and scores every price — so the model can't hallucinate a | |
| flight number or invent a rate. | |
| Nemotron-3-Nano-30B is **30B total parameters < the 32B cap.** | |
| ## 🎨 Off-Brand — a custom UI on `gradio.Server`, well past stock Gradio | |
| MatchDay does **not** use stock Gradio components. It runs on **`gradio.Server`** | |
| (`app.py`), which serves a fully bespoke `index.html` frontend at `/` while a | |
| single `@app.api("plan_trip")` async generator streams typed JSON events through | |
| Gradio's queue (SSE) — so the UI updates live as the agent decides → Python | |
| scores → Nemotron explains. `gr.Server` gives us Gradio's backend (queuing, | |
| concurrency, Spaces hosting) under a hand-built product UI: | |
| - Layla-style **photo-header package cards** with overlaid price + "★ Best match". | |
| - **Provenance pills** on every figure (`● live` vs `example`) — the | |
| anti-hallucination differentiator, visible right in the card. | |
| - An interactive **Leaflet map** (stadium + hotels + POIs, hotel→stadium lines, | |
| full-screen toggle) built in `matchday/render.py`. | |
| - A **day-by-day itinerary** with unique, date-aware roles (arrival / match day / | |
| local explore / departure) and a live **agent progress panel**. | |
| - Per-option **action buttons**: a real flight/hotel **search** and | |
| trip-specific **transit directions** (always with explicit origins) — never an | |
| over-claiming "Book" button. | |
| ## 🟢 NVIDIA Nemotron Quest — Nemotron is the Brain | |
| - **Model:** `nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16` (30B MoE, ~3B active/token). | |
| - **Served with SGLang** (`matchday/modal_spike.py`) using the NVIDIA-card | |
| recommended tool-calling config: `--tool-call-parser qwen3_coder` + | |
| `--reasoning-parser nemotron_3` + `--attention-backend flashinfer`. Verified | |
| live that SGLang returns *parsed* `tool_calls` (not raw text) — the whole | |
| Brain+Hands design depends on it. See `matchday/NEMOTRON_SGLANG_VERIFICATION.md`. | |
| - **Reasoning mode:** Nemotron-3-Nano's thinking toggle (`enable_thinking`) is | |
| wired end-to-end (`modal_spike.generate` → `matchday.agent.MatchDayAgent` → | |
| `app.py`) per the official Nemotron usage guide. Enable on the Space with | |
| `MATCHDAY_THINKING=1` to run the decide/ground/explain turns with | |
| chain-of-thought reasoning. | |
| - **Sampling** follows the model card: `temperature=0.6 / top_p=0.95` for tool | |
| routing, reasoning on for complex planning. | |
| ## 🟣 Modal — the runtime & inference layer | |
| - Nemotron runs **remotely on Modal** (`modal.App("matchday-spike")`) on an | |
| **A100-80GB** via a containerized SGLang server (`matchday/modal_spike.py`). | |
| - The Gradio Space calls it with `modal.Cls.from_name(...).generate.remote.aio` | |
| — the Space stays lightweight while the heavy 30B inference happens on | |
| sanctioned Modal GPU compute. | |
| - **Cold-start engineering:** a 60GB-model HF cache **Volume** (warm reload | |
| ~1-2 GB/s vs re-download), `startup_timeout=120 min` for first load, a | |
| server-side `warmup()`, and a Space-boot `_warm_nemotron()` task so the first | |
| user query isn't stuck behind a cold start. | |
| ## Tech stack | |
| Nemotron-3-Nano-30B-A3B (3B-active MoE) · Modal A100-80GB + SGLang v0.5.12 | |
| (`qwen3_coder` + `nemotron_3`) · **gradio.Server** bespoke frontend · SerpApi | |
| (Google Flights / Hotels / Search) · Open-Meteo (weather) · OpenStreetMap/Overpass | |
| (nearby spots) · Leaflet + CARTO map · httpx + Pydantic v2 · Python 3.11. | |
| ## Try it | |
| - **Live app:** https://build-small-hackathon-matchday.hf.space | |
| - **Space:** https://huggingface.co/spaces/build-small-hackathon/matchday | |
| - **Field Notes (architecture story):** `matchday/FIELD_NOTES.md` | |
| - **Nemotron + SGLang verification:** `matchday/NEMOTRON_SGLANG_VERIFICATION.md` | |
| **Example queries to try:** | |
| 1. *Flying from Montreal, want Canada vs Qatar, mid-range, June 26-29, just me* → watch it correct the date to **June 18**. | |
| 2. *Toronto to see Brazil vs Germany, premium, July 12, 2 adults* → watch it **refuse** a nonexistent match honestly. | |
| 3. *From Halifax, Canada vs Morocco, June 18, couple, luxury* → refused with real Group B alternatives. | |
| ## Prizes we're competing for | |
| | Prize | Why MatchDay qualifies | | |
| | --- | --- | | |
| | 🤖 **Best Agent** | Bounded agent loop (≤5 rounds), 3 tools chosen autonomously, genuine multi-step turns (search → build), schedule grounding + honest refusal, guardrails. Proven by 9 zero-network loop tests. 30B < 32B. | | |
| | 🎨 **Off-Brand** | Bespoke Layla-style UI on `gradio.Server` — custom HTML/CSS/JS, photo cards, Leaflet map, provenance pills. Not stock Gradio. | | |
| | 🟢 **NVIDIA Nemotron Quest** | Nemotron-3-Nano-30B is the Brain; SGLang tool-calling verified live; reasoning mode wired. | | |
| | 🟣 **Modal** | A100 inference runtime, documented above (`matchday/modal_spike.py`). | | |
| | 🎬 **Best Demo** | App + demo script (`matchday/DEMO_VIDEO_SCRIPT.md`) + social post (`matchday/SOCIAL_POST.md`). | | |
| | 🏆 **Bonus Quest Champion** | Nemotron + Modal + Gradio + agent + custom UI, all in one focused app. | | |
| | 🗳️ **Judges' Wildcard** | A genuinely useful, honest, small-model trip planner that corrects its user. | | |
| > **Honest note on Tiny Titan:** we are **not** claiming Tiny Titan. That prize | |
| > requires a model of ≤4B parameters; Nemotron-3-Nano-30B is a 30B-total MoE | |
| > (only ~3B *active* per token, but 30B total weights). We'd rather flag this | |
| > than over-claim. | |
| ## Built for Build Small | |
| **Track: Backyard AI** — a focused, real-world Vancouver World Cup use case. | |
| Sponsor tools used: **NVIDIA Nemotron-3-Nano-30B** (the Brain) + **Modal A100** | |
| (the runtime) + **Gradio `gradio.Server`** (the Off-Brand UI). | |
| ## Social | |
| **Post:** _<paste your social post URL here, then redeploy>_ — a ready-to-post | |
| draft is in `matchday/SOCIAL_POST.md`. | |