FlowMDM - Seamless Human Motion Composition with Blended Positional Encodings

Text-to-motion and multi-prompt motion-composition baseline integrated into the hftrainer Model Zoo. The runtime is self-contained under hftrainer.models.motion.flowmdm.network and does not import the original repository at inference time.

Task Text-to-Motion (T2M), sequential / multi-prompt T2M
Bundle / Pipeline FlowMDMBundle / FlowMDMPipeline
Processed HF artifact ZeyuLing/hftrainer-flowmdm-humanml3d
Motion representation HumanML3D-263 (263-dim, 20 fps, 22 joints)
Model family MDM-style diffusion with blended positional encodings
Paper Seamless Human Motion Composition with Blended Positional Encodings, Barquero et al., CVPR 2024 - arXiv:2402.15509
Original code https://github.com/BarqueroGerman/FlowMDM

Weights

Self-contained hftrainer artifact:

Artifact Location Contents Status
FlowMDM HumanML3D ZeyuLing/hftrainer-flowmdm-humanml3d model000500000.pt + args.json + Mean.npy / Std.npy + model_index.json public Hub artifact
local mirror checkpoints/baselines/flowmdm same layout optional local cache

Use directly from the Hub:

from hftrainer.pipelines.flowmdm import FlowMDMPipeline

pipe = FlowMDMPipeline.from_pretrained(
    "ZeyuLing/hftrainer-flowmdm-humanml3d",
    device="cuda",
)
motions = pipe.infer_t2m(
    ["a person walks forward then sits down"],
    [120],
)  # list of (T, 263)

For a local mirror:

pipe = FlowMDMPipeline.from_pretrained("checkpoints/baselines/flowmdm", device="cuda")

Sequential multi-prompt generation is exposed as:

motions = pipe.infer_sequential_t2m(
    [["a person walks forward", "then turns around"]],
    [[80, 80]],
)

Motion Representation

FlowMDM natively generates HumanML3D-263 at 20 fps. For shared SMPL and MotionStreamer-272 evaluation, use the validated bridge:

HumanML3D-263 -> SMPL motion_135 via IK refine-80 -> MotionStreamer-272

The bridge is a representation-conversion diagnostic. Native HumanML3D quality should be assessed in the 263-dim evaluator when paper-comparable numbers are needed.

HumanML3D Leaderboard Metrics

The row below uses the shared HumanML3D official-test caption protocol and the HML263 round-trip GT reference for SMPL-based evaluators.

Evaluator R1 up R2 up R3 up FID down MM down Div up
MotionStreamer-272 0.4737 0.6496 0.7312 36.3767 20.0018 25.1783
MotionCLIP-135 no-L2 0.3317 0.4795 0.5737 131.9653 43.0012 22.9482

Physical metrics:

Slide down Float down Jitter down Dynamic down
3.0452 7.4055 5.0130 22.3205

Implementation Notes

  • Artifact inference imports only hftrainer.models.motion.flowmdm.network.
  • The SMPL visualizer path from the original implementation is stubbed for T2M inference because the released HumanML3D checkpoint predicts HML263 features.
  • Mean.npy and Std.npy are packaged with the artifact to avoid the recurring wrong-statistics failure mode.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for ZeyuLing/hftrainer-flowmdm-humanml3d