--- title: Karate Wiener emoji: 🥋 colorFrom: green colorTo: gray sdk: gradio sdk_version: 6.9.0 app_file: app.py python_version: "3.10" suggested_hardware: zero-a10g pinned: true license: apache-2.0 hf_oauth: true short_description: Create your own karate moves with a sausage man tags: - track:wood - sponsor:nvidia - sponsor:openbmb - achievement:offbrand - achievement:welltuned - achievement:fieldnotes - achievement:tinytitan - achievement:bestdemo models: - nvidia/Kimodo-SMPLX-RP-v1 - McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp - McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-supervised - openbmb/VoxCPM-0.5B - black-forest-labs/FLUX.2-klein-4B - nvidia/nemotron-3-nano-30b-a3b - polats/weiner-klein-lora --- # Karate Wiener 🥋 A whimsical voice-chat dojo where **Karate Wiener** — a wise hotdog sensei — teaches you kata. Talk to him by voice, and he generates and demonstrates karate moves on a 3D character in real time, then builds you a custom dojo to train in. **Build Small Hackathon** · Track: *Thousand Token Wood* · Sponsors: NVIDIA (Nemotron), OpenBMB (VoxCPM) · Achievements: Off Brand, Well Tuned, Field Notes, Tiny Titan, Best Demo 🚀 **Live Space:** https://huggingface.co/spaces/build-small-hackathon/karate-wiener 📹 **Demo video:** https://www.youtube.com/watch?v=t1j7ZGs03ps 📣 **Social post:** https://www.linkedin.com/feed/update/urn:li:activity:7472190937620951040/ ## Tracks & badges What each tag in the YAML block above means: - **`track:wood`** — Thousand Token Wood (whimsical, fun). - **`sponsor:nvidia`** — Nemotron (Skill Forge); powers Karate Wiener's chat as the fallback text model via NVIDIA NIM. - **`sponsor:openbmb`** — VoxCPM, the MiniCPM-based voice-cloning TTS for Wiener's replies. - **`achievement:offbrand`** — fully custom Three.js + hand-written CSS/HTML UI, no default Gradio styling. - **`achievement:welltuned`** — the Karate Wiener LoRA (`polats/weiner-klein-lora`, trained on FLUX.2-klein-4B) is published on HF. - **`achievement:fieldnotes`** — build write-up in this README (see [Build notes](#build-notes)). - **`achievement:tinytitan`** — built on ≤4B models (VoxCPM ~0.5B, FLUX.2-klein-4B). - **`achievement:bestdemo`** — demo video + social post (linked above). ## Tech stack Custom Gradio + Three.js UI (no default Gradio styling) over a set of small models, all **< 32B parameters**: | Role | Model / Space | |---|---| | Text-to-motion (kata + moves) | `nvidia/Kimodo-SMPLX-RP-v1` | | Karate Wiener chat persona + dojo prompt suggestion/refinement | Cohere **Tiny Aya** (sidecar Space), with hosted **Nemotron** (`nvidia/nemotron-3-nano-30b-a3b`, NVIDIA NIM) as fallback | | Voice cloning TTS for Wiener's replies | **VoxCPM** ~0.5B — MiniCPM-based (sidecar Space) | | Prompt→motion cache embeddings | `McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp` | | Dojo scene image generation | **`black-forest-labs/FLUX.2-klein-4B`** isometric render (via the Klein sidecar Space) | | Dojo scene image→3D | **TripoSplat** Gaussian-splat conversion | The image/3D/TTS/LLM models run as remote ZeroGPU "sidecar" Spaces reached via `gradio_client`, so the same code path runs locally and on the Space. ## Generation model The Space generates on the **SMPL-X 22-joint** skeleton via `kimodo-smplx-rp` (set `KIMODO_MODEL` to override). This matches the skeleton used by the kimodo web/kata viewers, so generated clips can be retargeted directly onto skinned display characters with no joint remapping. Generated animations are saved into the configured Hugging Face Dataset store so they survive Space rebuilds and can be reused by the kata maker. Animation IDs are deterministic from the normalized prompt, model, seconds, denoising steps, seed, and generation schema; regenerating with the same settings returns the cached animation instead of spending GPU time again. > Note: `nvidia/Kimodo-SMPLX-RP-v1` is distributed under NVIDIA's research/R&D > model license, which is more restrictive than the NVIDIA Open Model license > used by the SOMA models. Review it before public/commercial use. ## Dojo scenes The dojo a clip plays in is generated on demand: **Tiny Aya** expands a few keywords into a full scene description, **FLUX** (via the Klein Space) renders an isometric image of that dojo, and **TripoSplat** lifts the image into a 3D Gaussian splat you can orbit around your character. ## Display characters The 3D viewer can play a clip as the procedural skeleton or retarget it onto a skinned rig (an s&box Citizen ships in `assets/`). The picker in the viewer HUD switches between them; retargeting is rest-pose alignment driven by each clip's `global_quats_xyzw`. Additional rigs can be added to `_CHARACTER_CATALOG` in `app.py` as `{ id, label, mapping, glb_b64 }`. ## Build notes A short field report from building Karate Wiener under the < 32B constraint: - **Everything heavy is a sidecar.** Rather than load every model in one Space, the GPU work (FLUX image gen, TripoSplat, VoxCPM TTS, Tiny Aya / Nemotron text) lives in separate ZeroGPU Spaces reached over `gradio_client`. The exact same code path runs locally and in production, which made iterating far faster and kept this Space's own footprint tiny. - **Small models, composed.** No single model does the whole job — a 4B image model (FLUX.2-klein), a ~0.5B voice model (VoxCPM), an 8B embedding model (LLM2Vec) for the prompt→motion cache, and a small text model for persona all cooperate. Staying under 32B was never a limitation once the work was split by modality. - **Graceful fallback beats a single dependency.** The Tiny Aya persona Space cold-starts and occasionally runs out of ZeroGPU quota, so Wiener's chat falls back to hosted **Nemotron** (NVIDIA NIM) before any token is emitted — the user never sees the seam. - **Deterministic caching saves GPU.** Generated animations are keyed by a hash of the normalized prompt + settings and stored in an HF Dataset, so a repeated kata returns instantly instead of re-spending GPU time. - **A custom voice for the mascot.** A LoRA (`polats/weiner-klein-lora`, trained on FLUX.2-klein-4B) gives Karate Wiener a consistent look across generated art. See [ATTRIBUTIONS.MD](ATTRIBUTIONS.MD) for third-party model, character, and asset credits.