---
title: MatchDay
emoji: ⚽
colorFrom: indigo
colorTo: green
sdk: gradio
app_file: app.py
pinned: true
license: mit
tags:
  - build-small-hackathon
  - backyard-ai
  - agents
  - react-agent
  - agentic
  - agent-loop
  - tool-use
  - tool-calling
  - multi-step-planning
  - nemotron
  - nvidia
  - nvidia-nemotron
  - nemotron-3-nano
  - modal
  - modal-labs
  - sglang
  - gradio
  - gradio-server
  - off-brand
  - fifa-world-cup-2026
  - vancouver
  - travel-planning
  - trip-planner
  - leaflet
models:
  - nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
---

# MatchDay ⚽ — your 2026 FIFA World Cup trip, planned by a small-model agent

> **A Backyard AI app for a real Vancouver World Cup use case: helping a fan plan
> one match-day trip with small-model reasoning, Gradio polish, and safe manual
> booking links.**

Type one sentence — *"Flying from Montreal, want Canada vs Qatar, mid-range,
June 26-29, just me"* — and MatchDay's agent **grounds your request in the real
schedule, searches live flights + hotels + weather, ranks 3 packages, and
explains why each one won.** Every price is tagged `● live` or `example` so
nothing is hallucinated, and every booking link is a safe **search** (never a
fake "confirmed booking").

## The idea

The 2026 World Cup is in Vancouver. Fans have one chaotic question — *"how do I
actually get to one match?"* — and existing tools split the answer across five
tabs (flights here, hotels there, weather somewhere else). MatchDay collapses it
into a single agent turn: **understand intent → ground it against the real
fixture list → call live data tools → rank → explain.** It's a focused, backyard
trip planner that treats a small model as a genuine decision-maker, not a chatbot.

The standout agent behavior: **MatchDay corrects you when you're wrong and
refuses to plan when a match doesn't exist.** Ask for *"Canada vs Qatar, June
26"* and it tells you the real match is **June 18 at BC Place, 3:00 PM PT** and
re-plans around it. Ask for *"Canada vs Morocco"* and it won't pretend — that
match doesn't exist, so it offers the real alternatives instead. That grounding
is the difference between an agent and a form.

## How it works — Brain + Hands

- **🧠 Brain (decides + explains):** **NVIDIA Nemotron-3-Nano-30B-A3B** — a
  30B-total / **3B-active** Mixture-of-Experts model — served on **Modal A100**
  via **SGLang**. It reads the request, picks tools, reasons about results, and
  writes the final comparison. **It never calls an API, fetches a URL, or states
  a price itself.**
- **✋ Hands (execute + score):** deterministic **Python** calls every API
  (flights, hotels, weather, nearby spots), fans them out concurrently, and
  scores each package with a fixed formula (cost / arrival-buffer /
  stadium-proximity). Every value gets a provenance badge.
- **🔁 Loop:** a bounded agent loop (**≤5 tool rounds**) with a tool allowlist,
  Pydantic argument validation, one malformed-call self-correction pass,
  per-tool timeouts, a **cold-start retry** (a round-1 Modal timeout is retried
  once so the agent actually runs instead of silently degrading to the parser),
  and an honest, user-visible deterministic fallback. Nemotron emits structured
  tool calls via SGLang's `qwen3_coder` + `nemotron_3` parsers.

## 🤖 Best Agent — multi-step tool use & planning (under the 32B cap)

This is the category we care about most, so here's exactly what makes MatchDay
an agent and not a pipeline:

- **3 tools, picked autonomously:** `build_trip_packages` (the data/scoring tool),
  `web_search` (factual grounding — kickoff times, venue policy), and `clarify`
  (ask one question when origin/date is genuinely missing).
- **Genuine multi-step turns:** Nemotron can `web_search` to ground a fact, read
  the result, *then* call `build_trip_packages` with corrected understanding —
  results threaded back into the conversation between rounds. Happy path is 2-3
  rounds; the ceiling is 5 (`matchday/agent_loop.py`).
- **Schedule grounding before planning** (`matchday/wc2026.py`): a verified
  fixture table is the ground truth. The agent re-centers the trip on the *real*
  match date (preserving the user's nights) and refuses nonexistent matchups
  with honest alternatives — proven by `tests/test_wc2026_grounding.py`
  (6/6 zero-network checks: Canada vs Qatar → Jun 18 / 3:00 PM PT / 3 nights;
  Brazil vs Germany and Canada vs Morocco refused).
- **Guardrails that keep it honest:** tool allowlist, Pydantic arg validation,
  one malformed-call self-correction, per-tool timeouts, cold-start retry, and a
  user-visible fallback to deterministic parsing when Modal is cold-starting.
  The loop's agentic behavior — tool dispatch, self-correction, deterministic
  fallback, cold-start retry, and trace recording — is proven by
  `tests/test_agent_loop.py` (9 zero-network checks).
- **Brain + Hands separation:** the model decides and explains; Python executes
  every external call and scores every price — so the model can't hallucinate a
  flight number or invent a rate.

Nemotron-3-Nano-30B is **30B total parameters < the 32B cap.**

## 🎨 Off-Brand — a custom UI on `gradio.Server`, well past stock Gradio

MatchDay does **not** use stock Gradio components. It runs on **`gradio.Server`**
(`app.py`), which serves a fully bespoke `index.html` frontend at `/` while a
single `@app.api("plan_trip")` async generator streams typed JSON events through
Gradio's queue (SSE) — so the UI updates live as the agent decides → Python
scores → Nemotron explains. `gr.Server` gives us Gradio's backend (queuing,
concurrency, Spaces hosting) under a hand-built product UI:

- Layla-style **photo-header package cards** with overlaid price + "★ Best match".
- **Provenance pills** on every figure (`● live` vs `example`) — the
  anti-hallucination differentiator, visible right in the card.
- An interactive **Leaflet map** (stadium + hotels + POIs, hotel→stadium lines,
  full-screen toggle) built in `matchday/render.py`.
- A **day-by-day itinerary** with unique, date-aware roles (arrival / match day /
  local explore / departure) and a live **agent progress panel**.
- Per-option **action buttons**: a real flight/hotel **search** and
  trip-specific **transit directions** (always with explicit origins) — never an
  over-claiming "Book" button.

## 🟢 NVIDIA Nemotron Quest — Nemotron is the Brain

- **Model:** `nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16` (30B MoE, ~3B active/token).
- **Served with SGLang** (`matchday/modal_spike.py`) using the NVIDIA-card
  recommended tool-calling config: `--tool-call-parser qwen3_coder` +
  `--reasoning-parser nemotron_3` + `--attention-backend flashinfer`. Verified
  live that SGLang returns *parsed* `tool_calls` (not raw text) — the whole
  Brain+Hands design depends on it. See `matchday/NEMOTRON_SGLANG_VERIFICATION.md`.
- **Reasoning mode:** Nemotron-3-Nano's thinking toggle (`enable_thinking`) is
  wired end-to-end (`modal_spike.generate` → `matchday.agent.MatchDayAgent` →
  `app.py`) per the official Nemotron usage guide. Enable on the Space with
  `MATCHDAY_THINKING=1` to run the decide/ground/explain turns with
  chain-of-thought reasoning.
- **Sampling** follows the model card: `temperature=0.6 / top_p=0.95` for tool
  routing, reasoning on for complex planning.

## 🟣 Modal — the runtime & inference layer

- Nemotron runs **remotely on Modal** (`modal.App("matchday-spike")`) on an
  **A100-80GB** via a containerized SGLang server (`matchday/modal_spike.py`).
- The Gradio Space calls it with `modal.Cls.from_name(...).generate.remote.aio`
  — the Space stays lightweight while the heavy 30B inference happens on
  sanctioned Modal GPU compute.
- **Cold-start engineering:** a 60GB-model HF cache **Volume** (warm reload
  ~1-2 GB/s vs re-download), `startup_timeout=120 min` for first load, a
  server-side `warmup()`, and a Space-boot `_warm_nemotron()` task so the first
  user query isn't stuck behind a cold start.

## Tech stack

Nemotron-3-Nano-30B-A3B (3B-active MoE) · Modal A100-80GB + SGLang v0.5.12
(`qwen3_coder` + `nemotron_3`) · **gradio.Server** bespoke frontend · SerpApi
(Google Flights / Hotels / Search) · Open-Meteo (weather) · OpenStreetMap/Overpass
(nearby spots) · Leaflet + CARTO map · httpx + Pydantic v2 · Python 3.11.

## Try it

- **Demo video (X):** https://x.com/islandsofmoons/status/2066669685668811071
- **Live app:** https://build-small-hackathon-matchday.hf.space
- **Space:** https://huggingface.co/spaces/build-small-hackathon/matchday
- **Field Notes (architecture story):** `matchday/FIELD_NOTES.md`
- **Nemotron + SGLang verification:** `matchday/NEMOTRON_SGLANG_VERIFICATION.md`

**Example queries to try** (the app's example chips are labeled so you can tell a
clean valid trip from the correction/refusal demos):

1. ✅ *From Seattle, want Canada vs Switzerland on June 24* → a **clean valid trip**: a real Group B match on its correct date (June 24, BC Place) — no correction, just ranked packages.
2. *Flying from Montreal, want Canada vs Qatar, mid-range, June 26-29, just me* → watch it correct the date to **June 18**.
3. *Toronto to see Brazil vs Germany, premium, July 12, 2 adults* → watch it **refuse** a nonexistent match honestly.
4. *From Halifax, Canada vs Morocco, June 18, couple, luxury* → refused with real Group B alternatives.

## Prizes we're competing for

| Prize | Why MatchDay qualifies |
| --- | --- |
| 🤖 **Best Agent** | Bounded agent loop (≤5 rounds), 3 tools chosen autonomously, genuine multi-step turns (search → build), schedule grounding + honest refusal, guardrails. Proven by 9 zero-network loop tests. 30B < 32B. |
| 🎨 **Off-Brand** | Bespoke Layla-style UI on `gradio.Server` — custom HTML/CSS/JS, photo cards, Leaflet map, provenance pills. Not stock Gradio. |
| 🟢 **NVIDIA Nemotron Quest** | Nemotron-3-Nano-30B is the Brain; SGLang tool-calling verified live; reasoning mode wired. |
| 🟣 **Modal** | A100 inference runtime, documented above (`matchday/modal_spike.py`). |
| 🎬 **Best Demo** | App + live demo video / social post ([X](https://x.com/islandsofmoons/status/2066669685668811071)) + demo script (`matchday/DEMO_VIDEO_SCRIPT.md`). |
| 🏆 **Bonus Quest Champion** | Nemotron + Modal + Gradio + agent + custom UI, all in one focused app. |
| 🗳️ **Judges' Wildcard** | A genuinely useful, honest, small-model trip planner that corrects its user. |

> **Honest note on Tiny Titan:** we are **not** claiming Tiny Titan. That prize
> requires a model of ≤4B parameters; Nemotron-3-Nano-30B is a 30B-total MoE
> (only ~3B *active* per token, but 30B total weights). We'd rather flag this
> than over-claim.

## Built for Build Small

**Track: Backyard AI** — a focused, real-world Vancouver World Cup use case.
Sponsor tools used: **NVIDIA Nemotron-3-Nano-30B** (the Brain) + **Modal A100**
(the runtime) + **Gradio `gradio.Server`** (the Off-Brand UI).

## Social & demo

- **Demo video / social post (X):** https://x.com/islandsofmoons/status/2066669685668811071
- A ready-to-post draft is in `matchday/SOCIAL_POST.md`.