Spaces:
Running
title: MatchDay
emoji: ⚽
colorFrom: indigo
colorTo: green
sdk: gradio
app_file: app.py
pinned: true
license: mit
tags:
- build-small-hackathon
- backyard-ai
- agents
- react-agent
- agentic
- agent-loop
- tool-use
- tool-calling
- multi-step-planning
- nemotron
- nvidia
- nvidia-nemotron
- nemotron-3-nano
- modal
- modal-labs
- sglang
- gradio
- gradio-server
- off-brand
- fifa-world-cup-2026
- vancouver
- travel-planning
- trip-planner
- leaflet
models:
- nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
MatchDay ⚽ — your 2026 FIFA World Cup trip, planned by a small-model agent
A Backyard AI app for a real Vancouver World Cup use case: helping a fan plan one match-day trip with small-model reasoning, Gradio polish, and safe manual booking links.
Type one sentence — "Flying from Montreal, want Canada vs Qatar, mid-range,
June 26-29, just me" — and MatchDay's agent grounds your request in the real
schedule, searches live flights + hotels + weather, ranks 3 packages, and
explains why each one won. Every price is tagged ● live or example so
nothing is hallucinated, and every booking link is a safe search (never a
fake "confirmed booking").
The idea
The 2026 World Cup is in Vancouver. Fans have one chaotic question — "how do I actually get to one match?" — and existing tools split the answer across five tabs (flights here, hotels there, weather somewhere else). MatchDay collapses it into a single agent turn: understand intent → ground it against the real fixture list → call live data tools → rank → explain. It's a focused, backyard trip planner that treats a small model as a genuine decision-maker, not a chatbot.
The standout agent behavior: MatchDay corrects you when you're wrong and refuses to plan when a match doesn't exist. Ask for "Canada vs Qatar, June 26" and it tells you the real match is June 18 at BC Place, 3:00 PM PT and re-plans around it. Ask for "Canada vs Morocco" and it won't pretend — that match doesn't exist, so it offers the real alternatives instead. That grounding is the difference between an agent and a form.
How it works — Brain + Hands
- 🧠 Brain (decides + explains): NVIDIA Nemotron-3-Nano-30B-A3B — a 30B-total / 3B-active Mixture-of-Experts model — served on Modal A100 via SGLang. It reads the request, picks tools, reasons about results, and writes the final comparison. It never calls an API, fetches a URL, or states a price itself.
- ✋ Hands (execute + score): deterministic Python calls every API (flights, hotels, weather, nearby spots), fans them out concurrently, and scores each package with a fixed formula (cost / arrival-buffer / stadium-proximity). Every value gets a provenance badge.
- 🔁 Loop: a bounded agent loop (≤5 tool rounds) with a tool allowlist,
Pydantic argument validation, one malformed-call self-correction pass,
per-tool timeouts, a cold-start retry (a round-1 Modal timeout is retried
once so the agent actually runs instead of silently degrading to the parser),
and an honest, user-visible deterministic fallback. Nemotron emits structured
tool calls via SGLang's
qwen3_coder+nemotron_3parsers.
🤖 Best Agent — multi-step tool use & planning (under the 32B cap)
This is the category we care about most, so here's exactly what makes MatchDay an agent and not a pipeline:
- 3 tools, picked autonomously:
build_trip_packages(the data/scoring tool),web_search(factual grounding — kickoff times, venue policy), andclarify(ask one question when origin/date is genuinely missing). - Genuine multi-step turns: Nemotron can
web_searchto ground a fact, read the result, then callbuild_trip_packageswith corrected understanding — results threaded back into the conversation between rounds. Happy path is 2-3 rounds; the ceiling is 5 (matchday/agent_loop.py). - Schedule grounding before planning (
matchday/wc2026.py): a verified fixture table is the ground truth. The agent re-centers the trip on the real match date (preserving the user's nights) and refuses nonexistent matchups with honest alternatives — proven bytests/test_wc2026_grounding.py(6/6 zero-network checks: Canada vs Qatar → Jun 18 / 3:00 PM PT / 3 nights; Brazil vs Germany and Canada vs Morocco refused). - Guardrails that keep it honest: tool allowlist, Pydantic arg validation,
one malformed-call self-correction, per-tool timeouts, cold-start retry, and a
user-visible fallback to deterministic parsing when Modal is cold-starting.
The loop's agentic behavior — tool dispatch, self-correction, deterministic
fallback, cold-start retry, and trace recording — is proven by
tests/test_agent_loop.py(9 zero-network checks). - Brain + Hands separation: the model decides and explains; Python executes every external call and scores every price — so the model can't hallucinate a flight number or invent a rate.
Nemotron-3-Nano-30B is 30B total parameters < the 32B cap.
🎨 Off-Brand — a custom UI on gradio.Server, well past stock Gradio
MatchDay does not use stock Gradio components. It runs on gradio.Server
(app.py), which serves a fully bespoke index.html frontend at / while a
single @app.api("plan_trip") async generator streams typed JSON events through
Gradio's queue (SSE) — so the UI updates live as the agent decides → Python
scores → Nemotron explains. gr.Server gives us Gradio's backend (queuing,
concurrency, Spaces hosting) under a hand-built product UI:
- Layla-style photo-header package cards with overlaid price + "★ Best match".
- Provenance pills on every figure (
● livevsexample) — the anti-hallucination differentiator, visible right in the card. - An interactive Leaflet map (stadium + hotels + POIs, hotel→stadium lines,
full-screen toggle) built in
matchday/render.py. - A day-by-day itinerary with unique, date-aware roles (arrival / match day / local explore / departure) and a live agent progress panel.
- Per-option action buttons: a real flight/hotel search and trip-specific transit directions (always with explicit origins) — never an over-claiming "Book" button.
🟢 NVIDIA Nemotron Quest — Nemotron is the Brain
- Model:
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16(30B MoE, ~3B active/token). - Served with SGLang (
matchday/modal_spike.py) using the NVIDIA-card recommended tool-calling config:--tool-call-parser qwen3_coder+--reasoning-parser nemotron_3+--attention-backend flashinfer. Verified live that SGLang returns parsedtool_calls(not raw text) — the whole Brain+Hands design depends on it. Seematchday/NEMOTRON_SGLANG_VERIFICATION.md. - Reasoning mode: Nemotron-3-Nano's thinking toggle (
enable_thinking) is wired end-to-end (modal_spike.generate→matchday.agent.MatchDayAgent→app.py) per the official Nemotron usage guide. Enable on the Space withMATCHDAY_THINKING=1to run the decide/ground/explain turns with chain-of-thought reasoning. - Sampling follows the model card:
temperature=0.6 / top_p=0.95for tool routing, reasoning on for complex planning.
🟣 Modal — the runtime & inference layer
- Nemotron runs remotely on Modal (
modal.App("matchday-spike")) on an A100-80GB via a containerized SGLang server (matchday/modal_spike.py). - The Gradio Space calls it with
modal.Cls.from_name(...).generate.remote.aio— the Space stays lightweight while the heavy 30B inference happens on sanctioned Modal GPU compute. - Cold-start engineering: a 60GB-model HF cache Volume (warm reload
~1-2 GB/s vs re-download),
startup_timeout=120 minfor first load, a server-sidewarmup(), and a Space-boot_warm_nemotron()task so the first user query isn't stuck behind a cold start.
Tech stack
Nemotron-3-Nano-30B-A3B (3B-active MoE) · Modal A100-80GB + SGLang v0.5.12
(qwen3_coder + nemotron_3) · gradio.Server bespoke frontend · SerpApi
(Google Flights / Hotels / Search) · Open-Meteo (weather) · OpenStreetMap/Overpass
(nearby spots) · Leaflet + CARTO map · httpx + Pydantic v2 · Python 3.11.
Try it
- Live app: https://build-small-hackathon-matchday.hf.space
- Space: https://huggingface.co/spaces/build-small-hackathon/matchday
- Field Notes (architecture story):
matchday/FIELD_NOTES.md - Nemotron + SGLang verification:
matchday/NEMOTRON_SGLANG_VERIFICATION.md
Example queries to try:
- Flying from Montreal, want Canada vs Qatar, mid-range, June 26-29, just me → watch it correct the date to June 18.
- Toronto to see Brazil vs Germany, premium, July 12, 2 adults → watch it refuse a nonexistent match honestly.
- From Halifax, Canada vs Morocco, June 18, couple, luxury → refused with real Group B alternatives.
Prizes we're competing for
| Prize | Why MatchDay qualifies |
|---|---|
| 🤖 Best Agent | Bounded agent loop (≤5 rounds), 3 tools chosen autonomously, genuine multi-step turns (search → build), schedule grounding + honest refusal, guardrails. Proven by 9 zero-network loop tests. 30B < 32B. |
| 🎨 Off-Brand | Bespoke Layla-style UI on gradio.Server — custom HTML/CSS/JS, photo cards, Leaflet map, provenance pills. Not stock Gradio. |
| 🟢 NVIDIA Nemotron Quest | Nemotron-3-Nano-30B is the Brain; SGLang tool-calling verified live; reasoning mode wired. |
| 🟣 Modal | A100 inference runtime, documented above (matchday/modal_spike.py). |
| 🎬 Best Demo | App + demo script (matchday/DEMO_VIDEO_SCRIPT.md) + social post (matchday/SOCIAL_POST.md). |
| 🏆 Bonus Quest Champion | Nemotron + Modal + Gradio + agent + custom UI, all in one focused app. |
| 🗳️ Judges' Wildcard | A genuinely useful, honest, small-model trip planner that corrects its user. |
Honest note on Tiny Titan: we are not claiming Tiny Titan. That prize requires a model of ≤4B parameters; Nemotron-3-Nano-30B is a 30B-total MoE (only ~3B active per token, but 30B total weights). We'd rather flag this than over-claim.
Built for Build Small
Track: Backyard AI — a focused, real-world Vancouver World Cup use case.
Sponsor tools used: NVIDIA Nemotron-3-Nano-30B (the Brain) + Modal A100
(the runtime) + Gradio gradio.Server (the Off-Brand UI).
Social
Post: <paste your social post URL here, then redeploy> — a ready-to-post
draft is in matchday/SOCIAL_POST.md.