--- license: cc-by-nc-4.0 tags: - text-to-motion - bimanual-hands - diffusion library_name: pytorch --- # HandX — Diffusion Text-to-Motion Checkpoints Diffusion checkpoints for **HandX: Scaling Bimanual Motion and Interaction Generation** (CVPR 2026). They generate two-hand motion from text (separate text branches for the left hand, right hand, and their interaction), using an MDM-style diffusion model with a frozen T5-base text encoder. - 📄 Paper: https://arxiv.org/abs/2603.28766 - 📦 Dataset: https://huggingface.co/datasets/alexzhang598/HandX ## Checkpoints | Folder | Decoder layers | latent_dim | |--------|----------------|------------| | `layers4` | 4 | 256 | | `layers8` | 8 | 512 | | `layers12` | 12 | 512 (best model in the paper) | Each folder has `model.pt` (weights) and `config.yaml`. ## Loading ```python import torch from huggingface_hub import hf_hub_download from omegaconf import OmegaConf # run from the `diffusion/` directory of the HandX repo from src.diffusion.utils.model_utils import create_model_and_diffusion variant = "layers12" cfg = OmegaConf.load(hf_hub_download("alexzhang598/HandX-diffusion", f"{variant}/config.yaml")) model, diffusion = create_model_and_diffusion(cfg.model) sd = torch.load(hf_hub_download("alexzhang598/HandX-diffusion", f"{variant}/model.pt"), map_location="cpu")["state_dict"] model.load_state_dict(sd, strict=False) # missing keys are the frozen T5 encoder (loaded from t5-base) ``` The checkpoints load with a standard `load_state_dict(..., strict=False)`; the only missing keys are the frozen T5 weights, restored from `t5-base` at construction.