matchday / README.md
mzidan000's picture
Upload folder using huggingface_hub
b9f2ba1 verified
|
Raw
History Blame
10.7 kB
metadata
title: MatchDay
emoji: 
colorFrom: indigo
colorTo: green
sdk: gradio
app_file: app.py
pinned: true
license: mit
tags:
  - build-small-hackathon
  - backyard-ai
  - agents
  - react-agent
  - agentic
  - agent-loop
  - tool-use
  - tool-calling
  - multi-step-planning
  - nemotron
  - nvidia
  - nvidia-nemotron
  - nemotron-3-nano
  - modal
  - modal-labs
  - sglang
  - gradio
  - gradio-server
  - off-brand
  - fifa-world-cup-2026
  - vancouver
  - travel-planning
  - trip-planner
  - leaflet
models:
  - nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

MatchDay ⚽ — your 2026 FIFA World Cup trip, planned by a small-model agent

A Backyard AI app for a real Vancouver World Cup use case: helping a fan plan one match-day trip with small-model reasoning, Gradio polish, and safe manual booking links.

Type one sentence — "Flying from Montreal, want Canada vs Qatar, mid-range, June 26-29, just me" — and MatchDay's agent grounds your request in the real schedule, searches live flights + hotels + weather, ranks 3 packages, and explains why each one won. Every price is tagged ● live or example so nothing is hallucinated, and every booking link is a safe search (never a fake "confirmed booking").

The idea

The 2026 World Cup is in Vancouver. Fans have one chaotic question — "how do I actually get to one match?" — and existing tools split the answer across five tabs (flights here, hotels there, weather somewhere else). MatchDay collapses it into a single agent turn: understand intent → ground it against the real fixture list → call live data tools → rank → explain. It's a focused, backyard trip planner that treats a small model as a genuine decision-maker, not a chatbot.

The standout agent behavior: MatchDay corrects you when you're wrong and refuses to plan when a match doesn't exist. Ask for "Canada vs Qatar, June 26" and it tells you the real match is June 18 at BC Place, 3:00 PM PT and re-plans around it. Ask for "Canada vs Morocco" and it won't pretend — that match doesn't exist, so it offers the real alternatives instead. That grounding is the difference between an agent and a form.

How it works — Brain + Hands

  • 🧠 Brain (decides + explains): NVIDIA Nemotron-3-Nano-30B-A3B — a 30B-total / 3B-active Mixture-of-Experts model — served on Modal A100 via SGLang. It reads the request, picks tools, reasons about results, and writes the final comparison. It never calls an API, fetches a URL, or states a price itself.
  • ✋ Hands (execute + score): deterministic Python calls every API (flights, hotels, weather, nearby spots), fans them out concurrently, and scores each package with a fixed formula (cost / arrival-buffer / stadium-proximity). Every value gets a provenance badge.
  • 🔁 Loop: a bounded agent loop (≤5 tool rounds) with a tool allowlist, Pydantic argument validation, one malformed-call self-correction pass, per-tool timeouts, a cold-start retry (a round-1 Modal timeout is retried once so the agent actually runs instead of silently degrading to the parser), and an honest, user-visible deterministic fallback. Nemotron emits structured tool calls via SGLang's qwen3_coder + nemotron_3 parsers.

🤖 Best Agent — multi-step tool use & planning (under the 32B cap)

This is the category we care about most, so here's exactly what makes MatchDay an agent and not a pipeline:

  • 3 tools, picked autonomously: build_trip_packages (the data/scoring tool), web_search (factual grounding — kickoff times, venue policy), and clarify (ask one question when origin/date is genuinely missing).
  • Genuine multi-step turns: Nemotron can web_search to ground a fact, read the result, then call build_trip_packages with corrected understanding — results threaded back into the conversation between rounds. Happy path is 2-3 rounds; the ceiling is 5 (matchday/agent_loop.py).
  • Schedule grounding before planning (matchday/wc2026.py): a verified fixture table is the ground truth. The agent re-centers the trip on the real match date (preserving the user's nights) and refuses nonexistent matchups with honest alternatives — proven by tests/test_wc2026_grounding.py (6/6 zero-network checks: Canada vs Qatar → Jun 18 / 3:00 PM PT / 3 nights; Brazil vs Germany and Canada vs Morocco refused).
  • Guardrails that keep it honest: tool allowlist, Pydantic arg validation, one malformed-call self-correction, per-tool timeouts, cold-start retry, and a user-visible fallback to deterministic parsing when Modal is cold-starting. The loop's agentic behavior — tool dispatch, self-correction, deterministic fallback, cold-start retry, and trace recording — is proven by tests/test_agent_loop.py (9 zero-network checks).
  • Brain + Hands separation: the model decides and explains; Python executes every external call and scores every price — so the model can't hallucinate a flight number or invent a rate.

Nemotron-3-Nano-30B is 30B total parameters < the 32B cap.

🎨 Off-Brand — a custom UI on gradio.Server, well past stock Gradio

MatchDay does not use stock Gradio components. It runs on gradio.Server (app.py), which serves a fully bespoke index.html frontend at / while a single @app.api("plan_trip") async generator streams typed JSON events through Gradio's queue (SSE) — so the UI updates live as the agent decides → Python scores → Nemotron explains. gr.Server gives us Gradio's backend (queuing, concurrency, Spaces hosting) under a hand-built product UI:

  • Layla-style photo-header package cards with overlaid price + "★ Best match".
  • Provenance pills on every figure (● live vs example) — the anti-hallucination differentiator, visible right in the card.
  • An interactive Leaflet map (stadium + hotels + POIs, hotel→stadium lines, full-screen toggle) built in matchday/render.py.
  • A day-by-day itinerary with unique, date-aware roles (arrival / match day / local explore / departure) and a live agent progress panel.
  • Per-option action buttons: a real flight/hotel search and trip-specific transit directions (always with explicit origins) — never an over-claiming "Book" button.

🟢 NVIDIA Nemotron Quest — Nemotron is the Brain

  • Model: nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 (30B MoE, ~3B active/token).
  • Served with SGLang (matchday/modal_spike.py) using the NVIDIA-card recommended tool-calling config: --tool-call-parser qwen3_coder + --reasoning-parser nemotron_3 + --attention-backend flashinfer. Verified live that SGLang returns parsed tool_calls (not raw text) — the whole Brain+Hands design depends on it. See matchday/NEMOTRON_SGLANG_VERIFICATION.md.
  • Reasoning mode: Nemotron-3-Nano's thinking toggle (enable_thinking) is wired end-to-end (modal_spike.generatematchday.agent.MatchDayAgentapp.py) per the official Nemotron usage guide. Enable on the Space with MATCHDAY_THINKING=1 to run the decide/ground/explain turns with chain-of-thought reasoning.
  • Sampling follows the model card: temperature=0.6 / top_p=0.95 for tool routing, reasoning on for complex planning.

🟣 Modal — the runtime & inference layer

  • Nemotron runs remotely on Modal (modal.App("matchday-spike")) on an A100-80GB via a containerized SGLang server (matchday/modal_spike.py).
  • The Gradio Space calls it with modal.Cls.from_name(...).generate.remote.aio — the Space stays lightweight while the heavy 30B inference happens on sanctioned Modal GPU compute.
  • Cold-start engineering: a 60GB-model HF cache Volume (warm reload ~1-2 GB/s vs re-download), startup_timeout=120 min for first load, a server-side warmup(), and a Space-boot _warm_nemotron() task so the first user query isn't stuck behind a cold start.

Tech stack

Nemotron-3-Nano-30B-A3B (3B-active MoE) · Modal A100-80GB + SGLang v0.5.12 (qwen3_coder + nemotron_3) · gradio.Server bespoke frontend · SerpApi (Google Flights / Hotels / Search) · Open-Meteo (weather) · OpenStreetMap/Overpass (nearby spots) · Leaflet + CARTO map · httpx + Pydantic v2 · Python 3.11.

Try it

Example queries to try:

  1. Flying from Montreal, want Canada vs Qatar, mid-range, June 26-29, just me → watch it correct the date to June 18.
  2. Toronto to see Brazil vs Germany, premium, July 12, 2 adults → watch it refuse a nonexistent match honestly.
  3. From Halifax, Canada vs Morocco, June 18, couple, luxury → refused with real Group B alternatives.

Prizes we're competing for

Prize Why MatchDay qualifies
🤖 Best Agent Bounded agent loop (≤5 rounds), 3 tools chosen autonomously, genuine multi-step turns (search → build), schedule grounding + honest refusal, guardrails. Proven by 9 zero-network loop tests. 30B < 32B.
🎨 Off-Brand Bespoke Layla-style UI on gradio.Server — custom HTML/CSS/JS, photo cards, Leaflet map, provenance pills. Not stock Gradio.
🟢 NVIDIA Nemotron Quest Nemotron-3-Nano-30B is the Brain; SGLang tool-calling verified live; reasoning mode wired.
🟣 Modal A100 inference runtime, documented above (matchday/modal_spike.py).
🎬 Best Demo App + demo script (matchday/DEMO_VIDEO_SCRIPT.md) + social post (matchday/SOCIAL_POST.md).
🏆 Bonus Quest Champion Nemotron + Modal + Gradio + agent + custom UI, all in one focused app.
🗳️ Judges' Wildcard A genuinely useful, honest, small-model trip planner that corrects its user.

Honest note on Tiny Titan: we are not claiming Tiny Titan. That prize requires a model of ≤4B parameters; Nemotron-3-Nano-30B is a 30B-total MoE (only ~3B active per token, but 30B total weights). We'd rather flag this than over-claim.

Built for Build Small

Track: Backyard AI — a focused, real-world Vancouver World Cup use case. Sponsor tools used: NVIDIA Nemotron-3-Nano-30B (the Brain) + Modal A100 (the runtime) + Gradio gradio.Server (the Off-Brand UI).

Social

Post: <paste your social post URL here, then redeploy> — a ready-to-post draft is in matchday/SOCIAL_POST.md.