---
title: Karate Wiener
emoji: 🥋
colorFrom: green
colorTo: gray
sdk: gradio
sdk_version: 6.9.0
app_file: app.py
python_version: "3.10"
suggested_hardware: zero-a10g
pinned: true
license: apache-2.0
hf_oauth: true
short_description: Create your own karate moves with a sausage man
tags:
  - track:wood
  - sponsor:nvidia
  - sponsor:openbmb
  - achievement:offbrand
  - achievement:welltuned
  - achievement:fieldnotes
  - achievement:tinytitan
  - achievement:bestdemo
models:
  - nvidia/Kimodo-SMPLX-RP-v1
  - McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp
  - McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-supervised
  - openbmb/VoxCPM-0.5B
  - black-forest-labs/FLUX.2-klein-4B
  - nvidia/nemotron-3-nano-30b-a3b
  - polats/weiner-klein-lora
---

# Karate Wiener 🥋

A whimsical voice-chat dojo where **Karate Wiener** — a wise hotdog sensei —
teaches you kata. Talk to him by voice, and he generates and demonstrates
karate moves on a 3D character in real time, then builds you a custom dojo to
train in.

**Build Small Hackathon** · Track: *Thousand Token Wood* ·
Sponsors: NVIDIA (Nemotron), OpenBMB (VoxCPM) ·
Achievements: Off Brand, Well Tuned, Field Notes, Tiny Titan, Best Demo

🚀 **Live Space:** https://huggingface.co/spaces/build-small-hackathon/karate-wiener
📹 **Demo video:** https://www.youtube.com/watch?v=t1j7ZGs03ps
📣 **Social post:** https://www.linkedin.com/feed/update/urn:li:activity:7472190937620951040/

## Tracks & badges

What each tag in the YAML block above means:

- **`track:wood`** — Thousand Token Wood (whimsical, fun).
- **`sponsor:nvidia`** — Nemotron (Skill Forge); powers Karate Wiener's chat as the fallback text model via NVIDIA NIM.
- **`sponsor:openbmb`** — VoxCPM, the MiniCPM-based voice-cloning TTS for Wiener's replies.
- **`achievement:offbrand`** — fully custom Three.js + hand-written CSS/HTML UI, no default Gradio styling.
- **`achievement:welltuned`** — the Karate Wiener LoRA (`polats/weiner-klein-lora`, trained on FLUX.2-klein-4B) is published on HF.
- **`achievement:fieldnotes`** — build write-up in this README (see [Build notes](#build-notes)).
- **`achievement:tinytitan`** — built on ≤4B models (VoxCPM ~0.5B, FLUX.2-klein-4B).
- **`achievement:bestdemo`** — demo video + social post (linked above).

## Tech stack

Custom Gradio + Three.js UI (no default Gradio styling) over a set of small
models, all **< 32B parameters**:

| Role | Model / Space |
|---|---|
| Text-to-motion (kata + moves) | `nvidia/Kimodo-SMPLX-RP-v1` |
| Karate Wiener chat persona + dojo prompt suggestion/refinement | Cohere **Tiny Aya** (sidecar Space), with hosted **Nemotron** (`nvidia/nemotron-3-nano-30b-a3b`, NVIDIA NIM) as fallback |
| Voice cloning TTS for Wiener's replies | **VoxCPM** ~0.5B — MiniCPM-based (sidecar Space) |
| Prompt→motion cache embeddings | `McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp` |
| Dojo scene image generation | **`black-forest-labs/FLUX.2-klein-4B`** isometric render (via the Klein sidecar Space) |
| Dojo scene image→3D | **TripoSplat** Gaussian-splat conversion |

The image/3D/TTS/LLM models run as remote ZeroGPU "sidecar" Spaces reached via
`gradio_client`, so the same code path runs locally and on the Space.

## Generation model

The Space generates on the **SMPL-X 22-joint** skeleton via `kimodo-smplx-rp`
(set `KIMODO_MODEL` to override). This matches the skeleton used by the kimodo
web/kata viewers, so generated clips can be retargeted directly onto skinned
display characters with no joint remapping.

Generated animations are saved into the configured Hugging Face Dataset store so
they survive Space rebuilds and can be reused by the kata maker. Animation IDs
are deterministic from the normalized prompt, model, seconds, denoising steps,
seed, and generation schema; regenerating with the same settings returns the
cached animation instead of spending GPU time again.

> Note: `nvidia/Kimodo-SMPLX-RP-v1` is distributed under NVIDIA's research/R&D
> model license, which is more restrictive than the NVIDIA Open Model license
> used by the SOMA models. Review it before public/commercial use.

## Dojo scenes

The dojo a clip plays in is generated on demand: **Tiny Aya** expands a few
keywords into a full scene description, **FLUX** (via the Klein Space) renders an
isometric image of that dojo, and **TripoSplat** lifts the image into a 3D
Gaussian splat you can orbit around your character.

## Display characters

The 3D viewer can play a clip as the procedural skeleton or retarget it onto a
skinned rig (an s&box Citizen ships in `assets/`). The picker in the viewer HUD
switches between them; retargeting is rest-pose alignment driven by each clip's
`global_quats_xyzw`. Additional rigs can be added to `_CHARACTER_CATALOG` in
`app.py` as `{ id, label, mapping, glb_b64 }`.

## Build notes

A short field report from building Karate Wiener under the < 32B constraint:

- **Everything heavy is a sidecar.** Rather than load every model in one Space,
  the GPU work (FLUX image gen, TripoSplat, VoxCPM TTS, Tiny Aya / Nemotron text)
  lives in separate ZeroGPU Spaces reached over `gradio_client`. The exact same
  code path runs locally and in production, which made iterating far faster and
  kept this Space's own footprint tiny.
- **Small models, composed.** No single model does the whole job — a 4B image
  model (FLUX.2-klein), a ~0.5B voice model (VoxCPM), an 8B embedding model
  (LLM2Vec) for the prompt→motion cache, and a small text model for persona all
  cooperate. Staying under 32B was never a limitation once the work was split by
  modality.
- **Graceful fallback beats a single dependency.** The Tiny Aya persona Space
  cold-starts and occasionally runs out of ZeroGPU quota, so Wiener's chat falls
  back to hosted **Nemotron** (NVIDIA NIM) before any token is emitted — the user
  never sees the seam.
- **Deterministic caching saves GPU.** Generated animations are keyed by a hash
  of the normalized prompt + settings and stored in an HF Dataset, so a repeated
  kata returns instantly instead of re-spending GPU time.
- **A custom voice for the mascot.** A LoRA (`polats/weiner-klein-lora`,
  trained on FLUX.2-klein-4B) gives Karate Wiener a consistent look across
  generated art.

See [ATTRIBUTIONS.MD](ATTRIBUTIONS.MD) for third-party model, character, and
asset credits.