FlowMDM - Seamless Human Motion Composition with Blended Positional Encodings
Text-to-motion and multi-prompt motion-composition baseline integrated into the
hftrainer Model Zoo. The runtime is self-contained under
hftrainer.models.motion.flowmdm.network and does not import the original
repository at inference time.
| Task | Text-to-Motion (T2M), sequential / multi-prompt T2M |
| Bundle / Pipeline | FlowMDMBundle / FlowMDMPipeline |
| Processed HF artifact | ZeyuLing/hftrainer-flowmdm-humanml3d |
| Motion representation | HumanML3D-263 (263-dim, 20 fps, 22 joints) |
| Model family | MDM-style diffusion with blended positional encodings |
| Paper | Seamless Human Motion Composition with Blended Positional Encodings, Barquero et al., CVPR 2024 - arXiv:2402.15509 |
| Original code | https://github.com/BarqueroGerman/FlowMDM |
Weights
Self-contained hftrainer artifact:
| Artifact | Location | Contents | Status |
|---|---|---|---|
| FlowMDM HumanML3D | ZeyuLing/hftrainer-flowmdm-humanml3d |
model000500000.pt + args.json + Mean.npy / Std.npy + model_index.json |
public Hub artifact |
| local mirror | checkpoints/baselines/flowmdm |
same layout | optional local cache |
Use directly from the Hub:
from hftrainer.pipelines.flowmdm import FlowMDMPipeline
pipe = FlowMDMPipeline.from_pretrained(
"ZeyuLing/hftrainer-flowmdm-humanml3d",
device="cuda",
)
motions = pipe.infer_t2m(
["a person walks forward then sits down"],
[120],
) # list of (T, 263)
For a local mirror:
pipe = FlowMDMPipeline.from_pretrained("checkpoints/baselines/flowmdm", device="cuda")
Sequential multi-prompt generation is exposed as:
motions = pipe.infer_sequential_t2m(
[["a person walks forward", "then turns around"]],
[[80, 80]],
)
Motion Representation
FlowMDM natively generates HumanML3D-263 at 20 fps. For shared SMPL and MotionStreamer-272 evaluation, use the validated bridge:
HumanML3D-263 -> SMPL motion_135 via IK refine-80 -> MotionStreamer-272
The bridge is a representation-conversion diagnostic. Native HumanML3D quality should be assessed in the 263-dim evaluator when paper-comparable numbers are needed.
HumanML3D Leaderboard Metrics
The row below uses the shared HumanML3D official-test caption protocol and the HML263 round-trip GT reference for SMPL-based evaluators.
| Evaluator | R1 up | R2 up | R3 up | FID down | MM down | Div up |
|---|---|---|---|---|---|---|
| MotionStreamer-272 | 0.4737 | 0.6496 | 0.7312 | 36.3767 | 20.0018 | 25.1783 |
| MotionCLIP-135 no-L2 | 0.3317 | 0.4795 | 0.5737 | 131.9653 | 43.0012 | 22.9482 |
Physical metrics:
| Slide down | Float down | Jitter down | Dynamic down |
|---|---|---|---|
| 3.0452 | 7.4055 | 5.0130 | 22.3205 |
Implementation Notes
- Artifact inference imports only
hftrainer.models.motion.flowmdm.network. - The SMPL visualizer path from the original implementation is stubbed for T2M inference because the released HumanML3D checkpoint predicts HML263 features.
Mean.npyandStd.npyare packaged with the artifact to avoid the recurring wrong-statistics failure mode.