--- title: FitCheck emoji: ✅ colorFrom: indigo colorTo: green sdk: gradio sdk_version: 6.16.0 app_file: app.py python_version: "3.12" pinned: false license: mit short_description: Honest, plain answers about what AI your computer can run models: - nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16 tags: - track:backyard - sponsor:nvidia - achievement:offbrand - achievement:welltuned - achievement:fieldnotes --- # FitCheck **What AI can your computer actually run?** And the other way round: **what computer do you need for the AI you want to run?** Tell FitCheck about your machine in plain words. It answers honestly — real models, real memory figures, real licenses, real copy-paste commands — from chatbots to object detection, image generation, speech, and robotics. ## Demo and social post - Demo video: https://www.youtube.com/watch?v=nz6sBVwA7N8 - Social post: https://x.com/chandran0303/status/2066644768914837994 - Build log and write-up: https://huggingface.co/blog/build-small-hackathon/fitcheck ## Why it's trustworthy - **A deterministic engine does the math, not an AI.** Verdicts come from a transparent rules engine over `catalogue.json` (110 models, generated from `scripts/curation.json` and verified against the Hugging Face API). Nothing in the verdict can be hallucinated. - **Model sizes are exact where they can be.** For qualifying GGUF files the weights figure is the real Hub file size (displayed rounded to two decimals); other entries are parameter-count estimates, labelled as such. Chat memory uses each model's real architecture (GQA-aware) where available. Estimates add a small fixed safety buffer (a borrowed margin from community load data, not a FitCheck-measured confidence). - **Provenance on every number.** The UI says whether a figure is an exact file size, a vendor/Hub figure, third-party aggregate data, community-reported, or estimated. - **Licenses up front.** AGPL, non-commercial, and gated models are labelled on every card — before you build your project on one. - **Speed estimates with receipts, not vibes.** For LLMs, FitCheck predicts decode tokens/sec from your memory bandwidth (decode is bandwidth-bound) and shows where your machine lands among **real community benchmark runs** ([LocalScore](https://www.localscore.ai)) on an interactive roofline chart. A gradient-boosted predictor (following IBM's [LLM-Pilot methodology](https://arxiv.org/abs/2410.02425)) is grouped by accelerator label in cross-validation; it beats the analytical baseline on median error but not on every metric, so the bands are honest, not precise. Vision and diffusion are compute-bound: FitCheck shows a calibrated compute-roofline estimate (a ceiling) for detection and diffusion, with proxy limits flagged, and stays silent where it has no calibration. - **Conservative by design.** Three plain bands (Runs great / Tight, but works / Won't fit) that would rather under-promise than over-promise. ## What's inside 1. **The catalogue** — `scripts/curation.json` (hand-picked models across LLM, vision-language, vision, image/video generation, speech, music, embeddings, forecasting) enriched by `scripts/refresh_catalogue.py` from public Hub endpoints into `catalogue.json`. Re-run the script to refresh; the result is baked in at build time, so the *deterministic advice* needs no network (the parser, narrator, and live lookup do). 2. **The engine** (`engine/`) — pure Python memory math and honest banding. Also answers the reverse question: minimum vs comfortable hardware tiers for a goal ("Help me pick one" mode). 3. **The model brick** (`model_brick.py`) — NVIDIA Nemotron 3 Nano 4B running in-Space on ZeroGPU (hybrid Mamba-2, accelerated by prebuilt hub kernels), explaining the engine's numbers in plain words. It never does the math; a faithfulness check LOGS any figure it states that isn't in the engine's facts (observability only — it does not block or rewrite the answer). 4. **The frontend** (`static/`) — hand-built HTML/CSS/JS, no framework, served by Gradio server mode (`gr.Server`). Optional extra: paste any Hugging Face model id and FitCheck walks its finetune/quantized lineage to a known base ("if the base runs, your finetune runs") — the one clearly-labelled online feature. ## Run it locally ``` python -m venv .venv .venv\Scripts\activate pip install -r requirements.txt python app.py ``` Open http://127.0.0.1:7860/ (add `?go` for an instant sample result). Locally the explainer reports the model isn't loaded (it only loads on the Space). The deterministic advice works without network; the live Hugging Face lookup, the spec parser, the narrator, web fonts, and the Gradio client need connectivity. Built for the [Build Small hackathon](https://huggingface.co/build-small-hackathon) (Backyard AI track).