--- title: UX Crime Scene emoji: πŸ”Ž colorFrom: red colorTo: gray sdk: gradio sdk_version: 6.16.0 app_file: app.py pinned: true license: mit short_description: A film-noir detective investigates your UI as a crime scene. models: - Qwen/Qwen2.5-VL-7B-Instruct - black-forest-labs/FLUX.2-klein-4B - nvidia/Nemotron-Mini-4B-Instruct - hexgrad/Kokoro-82M datasets: - build-small-hackathon/ux-crime-scene-traces tags: - track:wood - sponsor:nvidia - sponsor:modal - achievement:offbrand - achievement:sharing - achievement:fieldnotes --- # πŸ”Ž UX Crime Scene ### *Every interface hides a crime.* [![UX Crime Scene β€” watch the trailer](assets/poster.jpg)](https://youtu.be/6u58YIEPrkA)

▢️ Watch the trailer

Drop a screenshot of **any** website or app. **THE INSPECTOR** β€” a hard-boiled, film-noir detective β€” works the scene, circles every UX flaw as evidence, and files a verdict with a letter grade. It's a UX audit that plays like a detective thriller. > ⏳ **Worth the wait β€” please run a real scan.** The whole experience is the *live* case: > drop your own screenshot and watch the Inspector work it end-to-end β€” scan β†’ verdict β†’ > **The Trial** β†’ **FLUX reconstruction** β†’ the **voice** β†’ the **Most Wanted** board. To keep > this **solo, self-funded** project affordable, the GPU backends (vision Β· FLUX Β· voice) run on > **Modal scale-to-zero**, so the **first** scan after an idle spell takes **~1–2 min** to wake > them β€” **the app is NOT broken**, it's just warming up; once warm, each scan is **~20–30s**. > *In a hurry?* A **cold case** or the **Precinct Archive** load pre-rendered verdicts instantly β€” > but those are only a preview. The real magic is a live investigation, and it's worth the minute. πŸ•΅οΈ --- ## πŸ§‘β€βš–οΈ TL;DR for judges - **Track:** πŸ„ **Thousand Token Wood** β€” the AI *is* the detective; remove the model and there's no case. - **What it does:** drop any UI screenshot β†’ a film-noir detective investigates it, **circles each UX flaw on the real pixels**, names the charge, suggests a fix, and files a verdict + letter grade. Every result gets a unique shareable case file. - **Small-model bet:** runs on **`Qwen2.5-VL-7B`** (8.3B β€” well under the 32B cap), pushed above its weight with a multi-step agent + ~4 MP vision + tiling. - **Measured, not vibes:** a senior UX designer graded every charge from 16 live pages β€” **84% of circles land on the exact named element, 92% of charges are real design issues** (N=38, evidence public). Full methodology + failure analysis: [EVAL.md](EVAL.md). - **Then it goes further β€” powers no other entry has:** - βš–οΈ **The Trial β€” two small models argue one case.** The verdict opens a courtroom: a **separate NVIDIA Nemotron model** steps in as **THE PROSECUTION** and presses the filed charges, while the guilty UI elements **take the stand and defend themselves** (Qwen again) and the Inspector rules from the grounded evidence. Press a move (On the Stand Β· Cross-Examine Β· Confess Β· The Verdict) or speak freely. *Qwen sees and defends Β· Nemotron prosecutes Β· the Inspector judges* β€” agentic, not a script. - πŸ–ΌοΈ **The Reconstruction** β€” one click rebuilds the worst exhibit *fixed*, rendered live by **FLUX.2 Klein** (Black Forest Labs). Before/after, on the real element. - πŸ”Š **The Inspector's Voice** β€” hear the verdict read aloud by an 82M-param **Kokoro** voice running locally. No API, no keys. - 🚨 **Most Wanted** β€” a public, multiplayer **rogues' gallery**: opt-in to book your case onto a shared board where the city's worst interfaces are ranked by their crimes. *Booked by the public.* Seeded with real archive cases, with a live Inspector's city report. - **Why the badges:** - 🎨 **Off-Brand** β€” a fully bespoke noir world (cinematic alley intro, evidence desk, live laptop investigation, typewritten case file, a *Precinct Archive* of famous booked sites). Nothing looks like default Gradio. - πŸ€– **Best Agent** β€” a real visual agent: **sweep β†’ zoom into each suspect β†’ verify/clear β†’ file**, then **answers follow-up interrogation** β€” all under 32B. - 🎬 **Best Demo** β€” cinematic trailer + social post + polished app. - 🟒 **Best Use of Modal** β€” four GPU backends (vision, FLUX, voice, prosecutor) all run on Modal. - 🟩 **NVIDIA Nemotron** β€” `Nemotron-Mini-4B` is THE PROSECUTION: a distinct small model arguing the case against the interface, not a generic text call. - **Try it in one click:** open the desk β†’ *"grab a cold case"* (HuggingFace / NYTimes) or *"browse the precinct archive"* β€” no screenshot needed. - **Links:** [▢️ trailer](https://youtu.be/6u58YIEPrkA) Β· [πŸ“Ή full walkthrough](https://youtu.be/WyQbY0XJ_9E) Β· [πŸ“± social post](https://x.com/p36649/status/2066277845567930447) Β· [πŸ““ Field Notes article](https://huggingface.co/blog/kasbsquall/ux-crime-scene) Β· [πŸ“‘ traces dataset](https://huggingface.co/datasets/build-small-hackathon/ux-crime-scene-traces) - **Author:** [@kasbsquall](https://huggingface.co/kasbsquall) β€” solo. --- ## ▢️ The Trailer **[▢️ Watch the trailer on YouTube](https://youtu.be/6u58YIEPrkA)** β€” the case file, on film. **[πŸ“Ή The full walkthrough](https://youtu.be/WyQbY0XJ_9E)** β€” the complete flow, uncut: scan β†’ verdict β†’ The Trial β†’ FLUX reconstruction β†’ the voice β†’ Most Wanted. **[πŸ“± The social post (X)](https://x.com/p36649/status/2066277845567930447)** β€” the case goes public. --- ## πŸ† Submission | | | | --- | --- | | **Track** | πŸ„ **Thousand Token Wood** β€” the AI *is* the detective; remove the model and there's no case | | **Demo video** | [▢️ trailer](https://youtu.be/6u58YIEPrkA) Β· [πŸ“Ή full walkthrough](https://youtu.be/WyQbY0XJ_9E) | | **Social post** | https://x.com/p36649/status/2066277845567930447 | | **Badges** | 🎨 **Off-Brand** (fully custom noir frontend) Β· πŸ€– **Best Agent** (sweep β†’ zoom β†’ verify β†’ file β†’ a two-model Trial: Nemotron prosecutes, Qwen defends) Β· 🎬 **Best Demo** (app + cinematic trailer + social post) | | **Sponsor awards** | 🟒 **Modal β€” Best Use of Modal** β€” four GPU backends (vLLM vision, FLUX.2 Klein, Kokoro voice, Nemotron prosecutor), all scale-to-zero on Modal, + case storage on a Modal Volume Β· 🟩 **NVIDIA β€” Nemotron** β€” `Nemotron-Mini-4B` argues **THE PROSECUTION** in The Trial (a distinct model with a real job, not a generic text call) | | **Extra** | πŸ“Š [Quantitative eval β€” 84% grounding / 92% validity, human-graded](EVAL.md) Β· πŸ““ [Field Notes article](https://huggingface.co/blog/kasbsquall/ux-crime-scene) Β· πŸ“‘ [public traces dataset](https://huggingface.co/datasets/build-small-hackathon/ux-crime-scene-traces) | --- ## πŸ•΅οΈ How to use it 1. **Drop the evidence** β€” a UI screenshot onto the detective's desk (or grab a cold case in one click). 2. **Watch The Inspector investigate** β€” the scene gets worked, live. 3. **Read the case file** β€” every *"crime against the user"* circled on the image, each charge explained, with a final **verdict and grade**. 4. **Put it on trial** βš–οΈ β€” open **The Trial**: a separate **NVIDIA Nemotron** model opens as **THE PROSECUTION** and presses the charges; press a move (On the Stand Β· Cross-Examine Β· Confess Β· The Verdict) or speak freely, and the guilty element defends itself while the Inspector rules from the evidence. *Two small models arguing one case.* 5. **See the Reconstruction** πŸ–ΌοΈ β€” one click and **FLUX.2 Klein** rebuilds the worst exhibit the way it *should* look. Before/after, on the real element. 6. **Hear the verdict** πŸ”Š β€” let the Inspector read his closing statement aloud. 7. **Share the case** β€” every investigation gets a unique, shareable link. 8. **Book it onto Most Wanted** 🚨 β€” opt-in to publish your case to the public board, where the city's worst interfaces are ranked by their crimes β€” booked by the public. 9. **Browse the Precinct Archive** πŸ—„οΈ β€” a records-room drawer of famous interfaces the Inspector has already booked (NYT, eBay, GitHub, NASA, even Hugging Face itself). Every folder reopens its real case file. > πŸ’‘ Best experienced on **desktop, with sound on**. 🎧 --- ## πŸ”« The crimes it catches - Weak or confusing calls-to-action - Buried, hidden, or unreachable actions - Visual overload & broken hierarchy - Dark patterns & ambiguous labels - …and whatever else is hiding in plain sight Every charge points to a **real element on the screen** β€” coordinates grounded by the vision model, not guessed. --- ## 🧠 Under the hood | | | | --- | --- | | πŸ‘οΈ **Vision** | `Qwen2.5-VL-7B-Instruct` (8.3B β€” comfortably under the 32B cap) on **Modal** (vLLM, L40S, scale-to-zero) | | πŸ•΅οΈ **Agentic** | Multi-step: **sweep** the scene β†’ **zoom into each suspect** β†’ **verify or clear** the charge β†’ file the verdict β†’ **hold the Trial** (the guilty elements defend themselves; the Inspector rules) grounded in the same image | | βš–οΈ **The Prosecution** | `nvidia/Nemotron-Mini-4B-Instruct` on **Modal** (L40S, scale-to-zero) β€” a **separate** small model reads the filed charges and argues the case for the State. Qwen sees & defends, **Nemotron prosecutes**, the Inspector judges: two small models in one courtroom | | πŸ–ΌοΈ **Reconstruction** | `FLUX.2-klein-4B` (Black Forest Labs, Apache-2.0) on **Modal** β€” crops the guilty element and rebuilds it *fixed* from the Inspector's own remedy | | πŸ”Š **Voice** | `Kokoro-82M` local TTS on **Modal** β€” narrates the verdict in the Inspector's gravel (no external API, no keys) | | 🚨 **Most Wanted** | Public, multiplayer board β€” opt-in `POST /publish` books a case summary onto a shared **Modal Volume**; the board merges real archive cases + community submissions, ranked by crimes, with a live city report | | 🎨 **Frontend** | **Gradio** app on **Hugging Face Spaces** (CPU only) | | πŸ“ **Grounding** | High-res vision (~4 MP) + `bbox_2d` rescaled to original pixels; panoramic shots are **tiled** so the whole page is scanned, not just the top strip | | 🎬 **Craft** | Custom noir / forensic UI β€” cinematic intro, evidence desk, live investigation, case file | **How it actually works** β€” four things make the Inspector more than a prompt: 1. **A real visual agent.** A full-scene *sweep* flags suspects; the app then **crops and zooms into each suspect region** and re-examines it with a focused prompt to confirm or clear the charge and tighten the evidence box, before filing the verdict (plan β†’ act/zoom β†’ verify β†’ synthesize). 2. **It defends its work.** After the verdict you can **interrogate the Inspector** β€” the same model is re-prompted with the screenshot + the filed case and answers follow-up questions in character, conceding or defending each charge from the visible evidence. 3. **It rebuilds the evidence.** One click crops the worst guilty element and sends it to **FLUX.2 Klein** with the Inspector's own remedy as the instruction β€” a live, before/after *reconstruction* of how the element should look. 4. **Honest coordinates, and it survives the worst case.** `bbox_2d` is rescaled from Qwen's smart-resized space back to original pixels so every circle sits on the real element; and a salvage parser recovers every complete piece of evidence from a truncated JSON response, so a dense page never crashes the investigation. Built for the **Build Small Hackathon** (Gradio Γ— Hugging Face) β€” *Thousand Token Wood* track. πŸ““ **[Read the Field Notes article](https://huggingface.co/blog/kasbsquall/ux-crime-scene)** β€” how it was built, and what I learned ([short version](FIELD_NOTES.md)). ---
βš™οΈ Tech & local setup This Space talks to GPU endpoints on Modal. Set the **Space secrets**: | Secret | Value | | --- | --- | | `MODAL_ENDPOINT_URL` | Vision endpoint β€” the URL `modal deploy modal_backend/serve_qwen.py` printed. | | `FLUX_ENDPOINT_URL` | *(optional)* Reconstruction endpoint β€” from `modal deploy modal_backend/serve_flux.py`. Enables "The Reconstruction". | | `TTS_ENDPOINT_URL` | *(optional)* Voice endpoint β€” from `modal deploy modal_backend/serve_tts.py`. Enables "The Inspector's Voice". | Both optional endpoints degrade gracefully β€” leave them unset and those panels simply don't appear. ```bash pip install -r requirements.txt export MODAL_ENDPOINT_URL="https://--ux-crime-scene-qwen-web.modal.run" python app.py # -> http://127.0.0.1:7860 ``` The backend (`modal_backend/serve_qwen.py`) serves Qwen2.5-VL-7B via vLLM behind a FastAPI endpoint, returns `bbox_2d` evidence per crime, and the frontend rescales + draws the markers. Cases are stored on a Modal volume so each verdict gets a unique shareable `?case=ID` link.