ZeyuLing
/

hftrainer-mdm-humanml3d

+---
+license: apache-2.0
+library_name: hftrainer
+pipeline_tag: text-to-motion
+tags:
+- motion-generation
+- text-to-motion
+- diffusion
+- humanml3d
+- mdm
+---
+# MDM — Human Motion Diffusion Model (hftrainer reproduction)
+Self-contained, `ref_repo`-independent reproduction of **MDM** (Tevet et al.,
+ICLR 2023), packaged as an [hftrainer](https://github.com/) `ModelBundle`
+artifact. The vendored network + Gaussian diffusion are **bit-identical** to the
+released checkpoint.
+- **Task:** Text-to-Motion · **Representation:** HumanML3D-263 (20 fps, 22 joints)
+- **Text encoder:** CLIP ViT-B/32 (frozen, reloaded by name — not stored here)
+- **Paper:** [arXiv:2209.14916](https://arxiv.org/abs/2209.14916) · **Code:** https://github.com/GuyTevet/motion-diffusion-model
+## Usage
+```python
+from hftrainer.models.mdm import MDMBundle
+from hftrainer.pipelines.mdm import MDMPipeline
+bundle = MDMBundle.from_pretrained("ZeyuLing/hftrainer-mdm-humanml3d")
+pipe   = MDMPipeline(bundle, device="cuda")
+motions = pipe.infer_t2m(["a person walks forward then sits down"], [120])  # list of (T, 263)
+```
+## Metrics (official HumanML3D-263 protocol, n=3970)
+| FID ↓ | Diversity → | R-Prec Top-3 ↑ | MM-Dist ↓ |
+|---|---|---|---|
+| **0.509** (paper 0.544) | **9.563** (paper 9.559) | 0.711 | 3.681 |
+Files: `model.safetensors` (no CLIP) · `mdm_config.json` · `Mean.npy` / `Std.npy`
+(HumanML3D training stats, embedded so the checkpoint is self-contained).