---
title: FitCheck
emoji: ✅
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 6.16.0
app_file: app.py
python_version: "3.12"
pinned: false
license: mit
short_description: Honest, plain answers about what AI your computer can run
models:
  - nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
tags:
  - track:backyard
  - sponsor:nvidia
  - achievement:offbrand
  - achievement:welltuned
  - achievement:fieldnotes
---

<!--
ZeroGPU is selected in the Space's Settings (the README can't set it). The
model brick (/api/ask) only loads the LLM when SPACES_ZERO_GPU is set, so
local `python app.py` stays instant.
-->

# FitCheck

**What AI can your computer actually run?** And the other way round: **what
computer do you need for the AI you want to run?**

Tell FitCheck about your machine in plain words. It answers honestly — real
models, real memory figures, real licenses, real copy-paste commands — from
chatbots to object detection, image generation, speech, and robotics.

## Demo and social post

- Demo video: https://www.youtube.com/watch?v=nz6sBVwA7N8
- Social post: https://x.com/chandran0303/status/2066644768914837994
- Build log and write-up: https://huggingface.co/blog/build-small-hackathon/fitcheck

## Why it's trustworthy

- **A deterministic engine does the math, not an AI.** Verdicts come from a
  transparent rules engine over `catalogue.json` (110 models, generated from
  `scripts/curation.json` and verified against the Hugging Face API). Nothing in
  the verdict can be hallucinated.
- **Model sizes are exact where they can be.** For qualifying GGUF files the
  weights figure is the real Hub file size (displayed rounded to two decimals);
  other entries are parameter-count estimates, labelled as such. Chat memory
  uses each model's real architecture (GQA-aware) where available. Estimates add
  a small fixed safety buffer (a borrowed margin from community load data, not a
  FitCheck-measured confidence).
- **Provenance on every number.** The UI says whether a figure is an exact
  file size, a vendor/Hub figure, third-party aggregate data, community-reported,
  or estimated.
- **Licenses up front.** AGPL, non-commercial, and gated models are labelled
  on every card — before you build your project on one.
- **Speed estimates with receipts, not vibes.** For LLMs, FitCheck predicts
  decode tokens/sec from your memory bandwidth (decode is bandwidth-bound) and
  shows where your machine lands among **real community benchmark runs**
  ([LocalScore](https://www.localscore.ai)) on an interactive roofline chart.
  A gradient-boosted predictor (following IBM's
  [LLM-Pilot methodology](https://arxiv.org/abs/2410.02425)) is grouped by
  accelerator label in cross-validation; it beats the analytical baseline on
  median error but not on every metric, so the bands are honest, not precise.
  Vision and diffusion are compute-bound: FitCheck shows a calibrated
  compute-roofline estimate (a ceiling) for detection and diffusion, with proxy
  limits flagged, and stays silent where it has no calibration.
- **Conservative by design.** Three plain bands (Runs great / Tight, but works
  / Won't fit) that would rather under-promise than over-promise.

## What's inside

1. **The catalogue** — `scripts/curation.json` (hand-picked models across
   LLM, vision-language, vision, image/video generation, speech, music,
   embeddings, forecasting) enriched by `scripts/refresh_catalogue.py` from
   public Hub endpoints into `catalogue.json`. Re-run the script to refresh;
   the result is baked in at build time, so the *deterministic advice* needs no
   network (the parser, narrator, and live lookup do).
2. **The engine** (`engine/`) — pure Python memory math and honest banding.
   Also answers the reverse question: minimum vs comfortable hardware tiers
   for a goal ("Help me pick one" mode).
3. **The model brick** (`model_brick.py`) — NVIDIA Nemotron 3 Nano 4B running
   in-Space on ZeroGPU (hybrid Mamba-2, accelerated by prebuilt hub kernels),
   explaining the engine's numbers in plain words. It never does the math; a
   faithfulness check LOGS any figure it states that isn't in the engine's facts
   (observability only — it does not block or rewrite the answer).
4. **The frontend** (`static/`) — hand-built HTML/CSS/JS, no framework, served
   by Gradio server mode (`gr.Server`). Optional extra: paste any Hugging Face
   model id and FitCheck walks its finetune/quantized lineage to a known base
   ("if the base runs, your finetune runs") — the one clearly-labelled online
   feature.

## Run it locally

```
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
python app.py
```

Open http://127.0.0.1:7860/ (add `?go` for an instant sample result). Locally
the explainer reports the model isn't loaded (it only loads on the Space). The
deterministic advice works without network; the live Hugging Face lookup, the
spec parser, the narrator, web fonts, and the Gradio client need connectivity.

Built for the [Build Small hackathon](https://huggingface.co/build-small-hackathon)
(Backyard AI track).