OutDreamer checkpoint for video outpainting

This repository provides the OutDreamer checkpoint for OutDreamer: Video Outpainting with a Diffusion Transformer.

OutDreamer is a DiT-based video outpainting framework designed to extend video content beyond the original frame boundaries while maintaining spatial and temporal consistency. The model introduces an efficient video control branch, a conditional outpainting branch, mask-driven self-attention, latent alignment loss, and a cross-video-clip refiner for long video outpainting.

The method and its results are detailed in the arXiv paper: OutDreamer: Video Outpainting with a Diffusion Transformer.

How to Use

Important: This checkpoint is intended to be used with the OutDreamer codebase and is not a standalone Hugging Face pipeline.

For project details, please refer to the OutDreamer GitHub repository: zhongzero/OutDreamer

For setup and inference scripts compatible with this checkpoint, please refer to the reproduction repository: zhongzero/OutDreamer-unofficial

Citation

If you find this work helpful for your research, please cite:

@article{zhong2026outdreamer,
  title={Outdreamer: Video outpainting with a diffusion transformer},
  author={Zhong, Linhao and Li, Fan and Huang, Yi and Liu, Jianzhuang and Pei, Renjing and Song, Fenglong},
  journal={IEEE Transactions on Image Processing},
  year={2026},
  publisher={IEEE}
}

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zhongzero/outdreamer_model

Base model

LanguageBind/Open-Sora-Plan-v1.2.0

Finetuned

(1)

this model

Paper for zhongzero/outdreamer_model

OutDreamer: Video Outpainting with a Diffusion Transformer

Paper • 2506.22298 • Published Jun 27, 2025