--- title: UX Crime Scene emoji: π colorFrom: red colorTo: gray sdk: gradio sdk_version: 6.16.0 app_file: app.py pinned: true license: mit short_description: A film-noir detective investigates your UI as a crime scene. models: - Qwen/Qwen2.5-VL-7B-Instruct - black-forest-labs/FLUX.2-klein-4B - nvidia/Nemotron-Mini-4B-Instruct - hexgrad/Kokoro-82M datasets: - build-small-hackathon/ux-crime-scene-traces tags: - track:wood - sponsor:nvidia - sponsor:modal - achievement:offbrand - achievement:sharing - achievement:fieldnotes --- # π UX Crime Scene ### *Every interface hides a crime.* [](https://youtu.be/6u58YIEPrkA)
βΆοΈ Watch the trailer
Drop a screenshot of **any** website or app. **THE INSPECTOR** β a hard-boiled, film-noir detective β works the scene, circles every UX flaw as evidence, and files a verdict with a letter grade. It's a UX audit that plays like a detective thriller. > β³ **Worth the wait β please run a real scan.** The whole experience is the *live* case: > drop your own screenshot and watch the Inspector work it end-to-end β scan β verdict β > **The Trial** β **FLUX reconstruction** β the **voice** β the **Most Wanted** board. To keep > this **solo, self-funded** project affordable, the GPU backends (vision Β· FLUX Β· voice) run on > **Modal scale-to-zero**, so the **first** scan after an idle spell takes **~1β2 min** to wake > them β **the app is NOT broken**, it's just warming up; once warm, each scan is **~20β30s**. > *In a hurry?* A **cold case** or the **Precinct Archive** load pre-rendered verdicts instantly β > but those are only a preview. The real magic is a live investigation, and it's worth the minute. π΅οΈ --- ## π§ββοΈ TL;DR for judges - **Track:** π **Thousand Token Wood** β the AI *is* the detective; remove the model and there's no case. - **What it does:** drop any UI screenshot β a film-noir detective investigates it, **circles each UX flaw on the real pixels**, names the charge, suggests a fix, and files a verdict + letter grade. Every result gets a unique shareable case file. - **Small-model bet:** runs on **`Qwen2.5-VL-7B`** (8.3B β well under the 32B cap), pushed above its weight with a multi-step agent + ~4 MP vision + tiling. - **Measured, not vibes:** a senior UX designer graded every charge from 16 live pages β **84% of circles land on the exact named element, 92% of charges are real design issues** (N=38, evidence public). Full methodology + failure analysis: [EVAL.md](EVAL.md). - **Then it goes further β powers no other entry has:** - βοΈ **The Trial β two small models argue one case.** The verdict opens a courtroom: a **separate NVIDIA Nemotron model** steps in as **THE PROSECUTION** and presses the filed charges, while the guilty UI elements **take the stand and defend themselves** (Qwen again) and the Inspector rules from the grounded evidence. Press a move (On the Stand Β· Cross-Examine Β· Confess Β· The Verdict) or speak freely. *Qwen sees and defends Β· Nemotron prosecutes Β· the Inspector judges* β agentic, not a script. - πΌοΈ **The Reconstruction** β one click rebuilds the worst exhibit *fixed*, rendered live by **FLUX.2 Klein** (Black Forest Labs). Before/after, on the real element. - π **The Inspector's Voice** β hear the verdict read aloud by an 82M-param **Kokoro** voice running locally. No API, no keys. - π¨ **Most Wanted** β a public, multiplayer **rogues' gallery**: opt-in to book your case onto a shared board where the city's worst interfaces are ranked by their crimes. *Booked by the public.* Seeded with real archive cases, with a live Inspector's city report. - **Why the badges:** - π¨ **Off-Brand** β a fully bespoke noir world (cinematic alley intro, evidence desk, live laptop investigation, typewritten case file, a *Precinct Archive* of famous booked sites). Nothing looks like default Gradio. - π€ **Best Agent** β a real visual agent: **sweep β zoom into each suspect β verify/clear β file**, then **answers follow-up interrogation** β all under 32B. - π¬ **Best Demo** β cinematic trailer + social post + polished app. - π’ **Best Use of Modal** β four GPU backends (vision, FLUX, voice, prosecutor) all run on Modal. - π© **NVIDIA Nemotron** β `Nemotron-Mini-4B` is THE PROSECUTION: a distinct small model arguing the case against the interface, not a generic text call. - **Try it in one click:** open the desk β *"grab a cold case"* (HuggingFace / NYTimes) or *"browse the precinct archive"* β no screenshot needed. - **Links:** [βΆοΈ trailer](https://youtu.be/6u58YIEPrkA) Β· [πΉ full walkthrough](https://youtu.be/WyQbY0XJ_9E) Β· [π± social post](https://x.com/p36649/status/2066277845567930447) Β· [π Field Notes article](https://huggingface.co/blog/kasbsquall/ux-crime-scene) Β· [π‘ traces dataset](https://huggingface.co/datasets/build-small-hackathon/ux-crime-scene-traces) - **Author:** [@kasbsquall](https://huggingface.co/kasbsquall) β solo. --- ## βΆοΈ The Trailer **[βΆοΈ Watch the trailer on YouTube](https://youtu.be/6u58YIEPrkA)** β the case file, on film. **[πΉ The full walkthrough](https://youtu.be/WyQbY0XJ_9E)** β the complete flow, uncut: scan β verdict β The Trial β FLUX reconstruction β the voice β Most Wanted. **[π± The social post (X)](https://x.com/p36649/status/2066277845567930447)** β the case goes public. --- ## π Submission | | | | --- | --- | | **Track** | π **Thousand Token Wood** β the AI *is* the detective; remove the model and there's no case | | **Demo video** | [βΆοΈ trailer](https://youtu.be/6u58YIEPrkA) Β· [πΉ full walkthrough](https://youtu.be/WyQbY0XJ_9E) | | **Social post** | https://x.com/p36649/status/2066277845567930447 | | **Badges** | π¨ **Off-Brand** (fully custom noir frontend) Β· π€ **Best Agent** (sweep β zoom β verify β file β a two-model Trial: Nemotron prosecutes, Qwen defends) Β· π¬ **Best Demo** (app + cinematic trailer + social post) | | **Sponsor awards** | π’ **Modal β Best Use of Modal** β four GPU backends (vLLM vision, FLUX.2 Klein, Kokoro voice, Nemotron prosecutor), all scale-to-zero on Modal, + case storage on a Modal Volume Β· π© **NVIDIA β Nemotron** β `Nemotron-Mini-4B` argues **THE PROSECUTION** in The Trial (a distinct model with a real job, not a generic text call) | | **Extra** | π [Quantitative eval β 84% grounding / 92% validity, human-graded](EVAL.md) Β· π [Field Notes article](https://huggingface.co/blog/kasbsquall/ux-crime-scene) Β· π‘ [public traces dataset](https://huggingface.co/datasets/build-small-hackathon/ux-crime-scene-traces) | --- ## π΅οΈ How to use it 1. **Drop the evidence** β a UI screenshot onto the detective's desk (or grab a cold case in one click). 2. **Watch The Inspector investigate** β the scene gets worked, live. 3. **Read the case file** β every *"crime against the user"* circled on the image, each charge explained, with a final **verdict and grade**. 4. **Put it on trial** βοΈ β open **The Trial**: a separate **NVIDIA Nemotron** model opens as **THE PROSECUTION** and presses the charges; press a move (On the Stand Β· Cross-Examine Β· Confess Β· The Verdict) or speak freely, and the guilty element defends itself while the Inspector rules from the evidence. *Two small models arguing one case.* 5. **See the Reconstruction** πΌοΈ β one click and **FLUX.2 Klein** rebuilds the worst exhibit the way it *should* look. Before/after, on the real element. 6. **Hear the verdict** π β let the Inspector read his closing statement aloud. 7. **Share the case** β every investigation gets a unique, shareable link. 8. **Book it onto Most Wanted** π¨ β opt-in to publish your case to the public board, where the city's worst interfaces are ranked by their crimes β booked by the public. 9. **Browse the Precinct Archive** ποΈ β a records-room drawer of famous interfaces the Inspector has already booked (NYT, eBay, GitHub, NASA, even Hugging Face itself). Every folder reopens its real case file. > π‘ Best experienced on **desktop, with sound on**. π§ --- ## π« The crimes it catches - Weak or confusing calls-to-action - Buried, hidden, or unreachable actions - Visual overload & broken hierarchy - Dark patterns & ambiguous labels - β¦and whatever else is hiding in plain sight Every charge points to a **real element on the screen** β coordinates grounded by the vision model, not guessed. --- ## π§ Under the hood | | | | --- | --- | | ποΈ **Vision** | `Qwen2.5-VL-7B-Instruct` (8.3B β comfortably under the 32B cap) on **Modal** (vLLM, L40S, scale-to-zero) | | π΅οΈ **Agentic** | Multi-step: **sweep** the scene β **zoom into each suspect** β **verify or clear** the charge β file the verdict β **hold the Trial** (the guilty elements defend themselves; the Inspector rules) grounded in the same image | | βοΈ **The Prosecution** | `nvidia/Nemotron-Mini-4B-Instruct` on **Modal** (L40S, scale-to-zero) β a **separate** small model reads the filed charges and argues the case for the State. Qwen sees & defends, **Nemotron prosecutes**, the Inspector judges: two small models in one courtroom | | πΌοΈ **Reconstruction** | `FLUX.2-klein-4B` (Black Forest Labs, Apache-2.0) on **Modal** β crops the guilty element and rebuilds it *fixed* from the Inspector's own remedy | | π **Voice** | `Kokoro-82M` local TTS on **Modal** β narrates the verdict in the Inspector's gravel (no external API, no keys) | | π¨ **Most Wanted** | Public, multiplayer board β opt-in `POST /publish` books a case summary onto a shared **Modal Volume**; the board merges real archive cases + community submissions, ranked by crimes, with a live city report | | π¨ **Frontend** | **Gradio** app on **Hugging Face Spaces** (CPU only) | | π **Grounding** | High-res vision (~4 MP) + `bbox_2d` rescaled to original pixels; panoramic shots are **tiled** so the whole page is scanned, not just the top strip | | π¬ **Craft** | Custom noir / forensic UI β cinematic intro, evidence desk, live investigation, case file | **How it actually works** β four things make the Inspector more than a prompt: 1. **A real visual agent.** A full-scene *sweep* flags suspects; the app then **crops and zooms into each suspect region** and re-examines it with a focused prompt to confirm or clear the charge and tighten the evidence box, before filing the verdict (plan β act/zoom β verify β synthesize). 2. **It defends its work.** After the verdict you can **interrogate the Inspector** β the same model is re-prompted with the screenshot + the filed case and answers follow-up questions in character, conceding or defending each charge from the visible evidence. 3. **It rebuilds the evidence.** One click crops the worst guilty element and sends it to **FLUX.2 Klein** with the Inspector's own remedy as the instruction β a live, before/after *reconstruction* of how the element should look. 4. **Honest coordinates, and it survives the worst case.** `bbox_2d` is rescaled from Qwen's smart-resized space back to original pixels so every circle sits on the real element; and a salvage parser recovers every complete piece of evidence from a truncated JSON response, so a dense page never crashes the investigation. Built for the **Build Small Hackathon** (Gradio Γ Hugging Face) β *Thousand Token Wood* track. π **[Read the Field Notes article](https://huggingface.co/blog/kasbsquall/ux-crime-scene)** β how it was built, and what I learned ([short version](FIELD_NOTES.md)). ---