FlowMDM - Seamless Human Motion Composition with Blended Positional Encodings

Text-to-motion and multi-prompt motion-composition baseline integrated into the hftrainer Model Zoo. The runtime is self-contained under hftrainer.models.motion.flowmdm.network and does not import the original repository at inference time.


Task	Text-to-Motion (T2M), sequential / multi-prompt T2M
Bundle / Pipeline	`FlowMDMBundle` / `FlowMDMPipeline`
Processed HF artifact	`ZeyuLing/hftrainer-flowmdm-humanml3d`
Motion representation	HumanML3D-263 (263-dim, 20 fps, 22 joints)
Model family	MDM-style diffusion with blended positional encodings
Paper	Seamless Human Motion Composition with Blended Positional Encodings, Barquero et al., CVPR 2024 - arXiv:2402.15509
Original code	https://github.com/BarqueroGerman/FlowMDM

Weights

Self-contained hftrainer artifact:

Artifact	Location	Contents	Status
FlowMDM HumanML3D	`ZeyuLing/hftrainer-flowmdm-humanml3d`	`model000500000.pt` + `args.json` + `Mean.npy` / `Std.npy` + `model_index.json`	public Hub artifact
local mirror	`checkpoints/baselines/flowmdm`	same layout	optional local cache

Use directly from the Hub:

from hftrainer.pipelines.flowmdm import FlowMDMPipeline

pipe = FlowMDMPipeline.from_pretrained(
    "ZeyuLing/hftrainer-flowmdm-humanml3d",
    device="cuda",
)
motions = pipe.infer_t2m(
    ["a person walks forward then sits down"],
    [120],
)  # list of (T, 263)

For a local mirror:

pipe = FlowMDMPipeline.from_pretrained("checkpoints/baselines/flowmdm", device="cuda")

Sequential multi-prompt generation is exposed as:

motions = pipe.infer_sequential_t2m(
    [["a person walks forward", "then turns around"]],
    [[80, 80]],
)

Motion Representation

FlowMDM natively generates HumanML3D-263 at 20 fps. For shared SMPL and MotionStreamer-272 evaluation, use the validated bridge:

HumanML3D-263 -> SMPL motion_135 via IK refine-80 -> MotionStreamer-272

The bridge is a representation-conversion diagnostic. Native HumanML3D quality should be assessed in the 263-dim evaluator when paper-comparable numbers are needed.

HumanML3D Leaderboard Metrics

The row below uses the shared HumanML3D official-test caption protocol and the HML263 round-trip GT reference for SMPL-based evaluators.

Evaluator	R1 up	R2 up	R3 up	FID down	MM down	Div up
MotionStreamer-272	0.4737	0.6496	0.7312	36.3767	20.0018	25.1783
MotionCLIP-135 no-L2	0.3317	0.4795	0.5737	131.9653	43.0012	22.9482

Physical metrics:

Slide down	Float down	Jitter down	Dynamic down
3.0452	7.4055	5.0130	22.3205

Implementation Notes

Artifact inference imports only hftrainer.models.motion.flowmdm.network.
The SMPL visualizer path from the original implementation is stubbed for T2M inference because the released HumanML3D checkpoint predicts HML263 features.
Mean.npy and Std.npy are packaged with the artifact to avoid the recurring wrong-statistics failure mode.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Other

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for ZeyuLing/hftrainer-flowmdm-humanml3d

Seamless Human Motion Composition with Blended Positional Encodings

Paper • 2402.15509 • Published Feb 23, 2024 • 14