MIN-Lab/minWM-data
Viewer β’ Updated β’ 6.4k β’ 14.1k
How to use MIN-Lab/minWM with Diffusers:
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline
from diffusers.utils import load_image, export_to_video
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("MIN-Lab/minWM", dtype=torch.bfloat16, device_map="cuda")
pipe.to("cuda")
prompt = "A man with short gray hair plays a red electric guitar."
image = load_image(
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png"
)
output = pipe(image=image, prompt=prompt).frames[0]
export_to_video(output, "output.mp4")A full-stack framework and tutorial for newcomers, rather than a specific model.
minWM is our contribution to the world-model community: a full-stack open-source framework that walks you end-to-end through turning a bidirectional T2V foundation model into an action-conditioned video world model β with example data, runnable scripts, Claude skills capturing our hands-on experience, and onboarding knowledge for newcomers. We hope more researchers and developers join us in growing the community together.
If you find this work useful, please cite:
@article{zhu2026causal,
title={Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation},
author={Zhu, Hongzhou and Zhao, Min and He, Guande heg and Su, Hang and Li, Chongxuan and Zhu, Jun},
journal={arXiv preprint arXiv:2602.02214},
year={2026}
}
@article{zhao2026causal,
title={Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation},
author={Zhao, Min and Zhu, Hongzhou and Zheng, Kaiwen and Zhou, Zihan and Yan, Bokai and Li, Xinyuan and Yang, Xiao and Li, Chongxuan and Zhu, Jun},
journal={arXiv preprint arXiv:2605.15141},
year={2026}
}