Spaces:

build-small-hackathon
/

matchday

Running

App Files Files Community

matchday / README.md

mzidan000

Upload folder using huggingface_hub

b9f2ba1 verified 20 days ago

preview code

Raw

History Blame

10.7 kB

	---
	title: MatchDay
	emoji: ⚽
	colorFrom: indigo
	colorTo: green
	sdk: gradio
	app_file: app.py
	pinned: true
	license: mit
	tags:
	- build-small-hackathon
	- backyard-ai
	- agents
	- react-agent
	- agentic
	- agent-loop
	- tool-use
	- tool-calling
	- multi-step-planning
	- nemotron
	- nvidia
	- nvidia-nemotron
	- nemotron-3-nano
	- modal
	- modal-labs
	- sglang
	- gradio
	- gradio-server
	- off-brand
	- fifa-world-cup-2026
	- vancouver
	- travel-planning
	- trip-planner
	- leaflet
	models:
	- nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
	---

	# MatchDay ⚽ — your 2026 FIFA World Cup trip, planned by a small-model agent

	> **A Backyard AI app for a real Vancouver World Cup use case: helping a fan plan
	> one match-day trip with small-model reasoning, Gradio polish, and safe manual
	> booking links.**

	Type one sentence — *"Flying from Montreal, want Canada vs Qatar, mid-range,
	June 26-29, just me"* — and MatchDay's agent **grounds your request in the real
	schedule, searches live flights + hotels + weather, ranks 3 packages, and
	explains why each one won.** Every price is tagged `● live` or `example` so
	nothing is hallucinated, and every booking link is a safe search (never a
	fake "confirmed booking").

	## The idea

	The 2026 World Cup is in Vancouver. Fans have one chaotic question — *"how do I
	actually get to one match?"* — and existing tools split the answer across five
	tabs (flights here, hotels there, weather somewhere else). MatchDay collapses it
	into a single agent turn: **understand intent → ground it against the real
	fixture list → call live data tools → rank → explain.** It's a focused, backyard
	trip planner that treats a small model as a genuine decision-maker, not a chatbot.

	The standout agent behavior: **MatchDay corrects you when you're wrong and
	refuses to plan when a match doesn't exist.** Ask for *"Canada vs Qatar, June
	26"* and it tells you the real match is June 18 at BC Place, 3:00 PM PT and
	re-plans around it. Ask for "Canada vs Morocco" and it won't pretend — that
	match doesn't exist, so it offers the real alternatives instead. That grounding
	is the difference between an agent and a form.

	## How it works — Brain + Hands

	- 🧠 Brain (decides + explains): NVIDIA Nemotron-3-Nano-30B-A3B — a
	30B-total / 3B-active Mixture-of-Experts model — served on Modal A100
	via SGLang. It reads the request, picks tools, reasons about results, and
	writes the final comparison. **It never calls an API, fetches a URL, or states
	a price itself.**
	- ✋ Hands (execute + score): deterministic Python calls every API
	(flights, hotels, weather, nearby spots), fans them out concurrently, and
	scores each package with a fixed formula (cost / arrival-buffer /
	stadium-proximity). Every value gets a provenance badge.
	- 🔁 Loop: a bounded agent loop (≤5 tool rounds) with a tool allowlist,
	Pydantic argument validation, one malformed-call self-correction pass,
	per-tool timeouts, a cold-start retry (a round-1 Modal timeout is retried
	once so the agent actually runs instead of silently degrading to the parser),
	and an honest, user-visible deterministic fallback. Nemotron emits structured
	tool calls via SGLang's `qwen3_coder` + `nemotron_3` parsers.

	## 🤖 Best Agent — multi-step tool use & planning (under the 32B cap)

	This is the category we care about most, so here's exactly what makes MatchDay
	an agent and not a pipeline:

	- 3 tools, picked autonomously: `build_trip_packages` (the data/scoring tool),
	`web_search` (factual grounding — kickoff times, venue policy), and `clarify`
	(ask one question when origin/date is genuinely missing).
	- Genuine multi-step turns: Nemotron can `web_search` to ground a fact, read
	the result, then call `build_trip_packages` with corrected understanding —
	results threaded back into the conversation between rounds. Happy path is 2-3
	rounds; the ceiling is 5 (`matchday/agent_loop.py`).
	- Schedule grounding before planning (`matchday/wc2026.py`): a verified
	fixture table is the ground truth. The agent re-centers the trip on the real
	match date (preserving the user's nights) and refuses nonexistent matchups
	with honest alternatives — proven by `tests/test_wc2026_grounding.py`
	(6/6 zero-network checks: Canada vs Qatar → Jun 18 / 3:00 PM PT / 3 nights;
	Brazil vs Germany and Canada vs Morocco refused).
	- Guardrails that keep it honest: tool allowlist, Pydantic arg validation,
	one malformed-call self-correction, per-tool timeouts, cold-start retry, and a
	user-visible fallback to deterministic parsing when Modal is cold-starting.
	The loop's agentic behavior — tool dispatch, self-correction, deterministic
	fallback, cold-start retry, and trace recording — is proven by
	`tests/test_agent_loop.py` (9 zero-network checks).
	- Brain + Hands separation: the model decides and explains; Python executes
	every external call and scores every price — so the model can't hallucinate a
	flight number or invent a rate.

	Nemotron-3-Nano-30B is 30B total parameters < the 32B cap.

	## 🎨 Off-Brand — a custom UI on `gradio.Server`, well past stock Gradio

	MatchDay does not use stock Gradio components. It runs on `gradio.Server`
	(`app.py`), which serves a fully bespoke `index.html` frontend at `/` while a
	single `@app.api("plan_trip")` async generator streams typed JSON events through
	Gradio's queue (SSE) — so the UI updates live as the agent decides → Python
	scores → Nemotron explains. `gr.Server` gives us Gradio's backend (queuing,
	concurrency, Spaces hosting) under a hand-built product UI:

	- Layla-style photo-header package cards with overlaid price + "★ Best match".
	- Provenance pills on every figure (`● live` vs `example`) — the
	anti-hallucination differentiator, visible right in the card.
	- An interactive Leaflet map (stadium + hotels + POIs, hotel→stadium lines,
	full-screen toggle) built in `matchday/render.py`.
	- A day-by-day itinerary with unique, date-aware roles (arrival / match day /
	local explore / departure) and a live agent progress panel.
	- Per-option action buttons: a real flight/hotel search and
	trip-specific transit directions (always with explicit origins) — never an
	over-claiming "Book" button.

	## 🟢 NVIDIA Nemotron Quest — Nemotron is the Brain

	- Model: `nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16` (30B MoE, ~3B active/token).
	- Served with SGLang (`matchday/modal_spike.py`) using the NVIDIA-card
	recommended tool-calling config: `--tool-call-parser qwen3_coder` +
	`--reasoning-parser nemotron_3` + `--attention-backend flashinfer`. Verified
	live that SGLang returns parsed `tool_calls` (not raw text) — the whole
	Brain+Hands design depends on it. See `matchday/NEMOTRON_SGLANG_VERIFICATION.md`.
	- Reasoning mode: Nemotron-3-Nano's thinking toggle (`enable_thinking`) is
	wired end-to-end (`modal_spike.generate` → `matchday.agent.MatchDayAgent` →
	`app.py`) per the official Nemotron usage guide. Enable on the Space with
	`MATCHDAY_THINKING=1` to run the decide/ground/explain turns with
	chain-of-thought reasoning.
	- Sampling follows the model card: `temperature=0.6 / top_p=0.95` for tool
	routing, reasoning on for complex planning.

	## 🟣 Modal — the runtime & inference layer

	- Nemotron runs remotely on Modal (`modal.App("matchday-spike")`) on an
	A100-80GB via a containerized SGLang server (`matchday/modal_spike.py`).
	- The Gradio Space calls it with `modal.Cls.from_name(...).generate.remote.aio`
	— the Space stays lightweight while the heavy 30B inference happens on
	sanctioned Modal GPU compute.
	- Cold-start engineering: a 60GB-model HF cache Volume (warm reload
	~1-2 GB/s vs re-download), `startup_timeout=120 min` for first load, a
	server-side `warmup()`, and a Space-boot `_warm_nemotron()` task so the first
	user query isn't stuck behind a cold start.

	## Tech stack

	Nemotron-3-Nano-30B-A3B (3B-active MoE) · Modal A100-80GB + SGLang v0.5.12
	(`qwen3_coder` + `nemotron_3`) · gradio.Server bespoke frontend · SerpApi
	(Google Flights / Hotels / Search) · Open-Meteo (weather) · OpenStreetMap/Overpass
	(nearby spots) · Leaflet + CARTO map · httpx + Pydantic v2 · Python 3.11.

	## Try it

	- Live app: https://build-small-hackathon-matchday.hf.space
	- Space: https://huggingface.co/spaces/build-small-hackathon/matchday
	- Field Notes (architecture story): `matchday/FIELD_NOTES.md`
	- Nemotron + SGLang verification: `matchday/NEMOTRON_SGLANG_VERIFICATION.md`

	Example queries to try:
	1. Flying from Montreal, want Canada vs Qatar, mid-range, June 26-29, just me → watch it correct the date to June 18.
	2. Toronto to see Brazil vs Germany, premium, July 12, 2 adults → watch it refuse a nonexistent match honestly.
	3. From Halifax, Canada vs Morocco, June 18, couple, luxury → refused with real Group B alternatives.

	## Prizes we're competing for

	\| Prize \| Why MatchDay qualifies \|
	\| --- \| --- \|
	\| 🤖 Best Agent \| Bounded agent loop (≤5 rounds), 3 tools chosen autonomously, genuine multi-step turns (search → build), schedule grounding + honest refusal, guardrails. Proven by 9 zero-network loop tests. 30B < 32B. \|
	\| 🎨 Off-Brand \| Bespoke Layla-style UI on `gradio.Server` — custom HTML/CSS/JS, photo cards, Leaflet map, provenance pills. Not stock Gradio. \|
	\| 🟢 NVIDIA Nemotron Quest \| Nemotron-3-Nano-30B is the Brain; SGLang tool-calling verified live; reasoning mode wired. \|
	\| 🟣 Modal \| A100 inference runtime, documented above (`matchday/modal_spike.py`). \|
	\| 🎬 Best Demo \| App + demo script (`matchday/DEMO_VIDEO_SCRIPT.md`) + social post (`matchday/SOCIAL_POST.md`). \|
	\| 🏆 Bonus Quest Champion \| Nemotron + Modal + Gradio + agent + custom UI, all in one focused app. \|
	\| 🗳️ Judges' Wildcard \| A genuinely useful, honest, small-model trip planner that corrects its user. \|

	> Honest note on Tiny Titan: we are not claiming Tiny Titan. That prize
	> requires a model of ≤4B parameters; Nemotron-3-Nano-30B is a 30B-total MoE
	> (only ~3B active per token, but 30B total weights). We'd rather flag this
	> than over-claim.

	## Built for Build Small

	Track: Backyard AI — a focused, real-world Vancouver World Cup use case.
	Sponsor tools used: NVIDIA Nemotron-3-Nano-30B (the Brain) + Modal A100
	(the runtime) + Gradio `gradio.Server` (the Off-Brand UI).

	## Social

	Post: _<paste your social post URL here, then redeploy>_ — a ready-to-post
	draft is in `matchday/SOCIAL_POST.md`.