Instructions to use zhongzero/outdreamer_model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use zhongzero/outdreamer_model with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("zhongzero/outdreamer_model", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
Add files using upload-large-folder tool
Browse files- README.md +37 -0
- config.json +41 -0
- diffusion_pytorch_model.safetensors +3 -0
README.md
CHANGED
|
@@ -1,3 +1,40 @@
|
|
| 1 |
---
|
| 2 |
license: bsd-2-clause
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: bsd-2-clause
|
| 3 |
+
base_model:
|
| 4 |
+
- LanguageBind/Open-Sora-Plan-v1.2.0
|
| 5 |
+
library_name: diffusers
|
| 6 |
+
tags:
|
| 7 |
+
- OutDreamer
|
| 8 |
+
- video-outpainting
|
| 9 |
+
- diffusion-transformer
|
| 10 |
+
- DiT
|
| 11 |
---
|
| 12 |
+
# OutDreamer checkpoint for video outpainting
|
| 13 |
+
|
| 14 |
+
This repository provides the OutDreamer checkpoint for **OutDreamer: Video Outpainting with a Diffusion Transformer**.
|
| 15 |
+
|
| 16 |
+
OutDreamer is a DiT-based video outpainting framework designed to extend video content beyond the original frame boundaries while maintaining spatial and temporal consistency. The model introduces an efficient video control branch, a conditional outpainting branch, mask-driven self-attention, latent alignment loss, and a cross-video-clip refiner for long video outpainting.
|
| 17 |
+
|
| 18 |
+
The method and its results are detailed in the arXiv paper: [OutDreamer: Video Outpainting with a Diffusion Transformer](https://arxiv.org/abs/2506.22298).
|
| 19 |
+
|
| 20 |
+
## How to Use
|
| 21 |
+
|
| 22 |
+
**Important:** This checkpoint is intended to be used with the OutDreamer codebase and is not a standalone Hugging Face pipeline.
|
| 23 |
+
|
| 24 |
+
For project details, please refer to the OutDreamer GitHub repository: [zhongzero/OutDreamer](https://github.com/zhongzero/OutDreamer)
|
| 25 |
+
|
| 26 |
+
For setup and inference scripts compatible with this checkpoint, please refer to the reproduction repository: [zhongzero/OutDreamer-unofficial](https://github.com/zhongzero/OutDreamer-unofficial)
|
| 27 |
+
|
| 28 |
+
## Citation
|
| 29 |
+
|
| 30 |
+
If you find this work helpful for your research, please cite:
|
| 31 |
+
|
| 32 |
+
```BibTeX
|
| 33 |
+
@article{zhong2026outdreamer,
|
| 34 |
+
title={Outdreamer: Video outpainting with a diffusion transformer},
|
| 35 |
+
author={Zhong, Linhao and Li, Fan and Huang, Yi and Liu, Jianzhuang and Pei, Renjing and Song, Fenglong},
|
| 36 |
+
journal={IEEE Transactions on Image Processing},
|
| 37 |
+
year={2026},
|
| 38 |
+
publisher={IEEE}
|
| 39 |
+
}
|
| 40 |
+
```
|
config.json
ADDED
|
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_class_name": "OpenSoraCNext",
|
| 3 |
+
"_diffusers_version": "0.28.0",
|
| 4 |
+
"activation_fn": "gelu-approximate",
|
| 5 |
+
"attention_bias": true,
|
| 6 |
+
"attention_head_dim": 96,
|
| 7 |
+
"attention_mode": "xformers",
|
| 8 |
+
"attention_type": "default",
|
| 9 |
+
"caption_channels": 4096,
|
| 10 |
+
"control_in_channels": 8,
|
| 11 |
+
"cross_attention_dim": 2304,
|
| 12 |
+
"double_self_attention": false,
|
| 13 |
+
"downsampler": null,
|
| 14 |
+
"dropout": 0.0,
|
| 15 |
+
"in_channels": 4,
|
| 16 |
+
"interpolation_scale_h": 1.0,
|
| 17 |
+
"interpolation_scale_t": 1.0,
|
| 18 |
+
"interpolation_scale_w": 1.0,
|
| 19 |
+
"norm_elementwise_affine": false,
|
| 20 |
+
"norm_eps": 1e-06,
|
| 21 |
+
"norm_num_groups": 32,
|
| 22 |
+
"norm_type": "ada_norm_single",
|
| 23 |
+
"num_attention_heads": 24,
|
| 24 |
+
"num_embeds_ada_norm": 1000,
|
| 25 |
+
"num_layers": 32,
|
| 26 |
+
"num_vector_embeds": null,
|
| 27 |
+
"only_cross_attention": false,
|
| 28 |
+
"out_channels": 4,
|
| 29 |
+
"patch_size": 2,
|
| 30 |
+
"patch_size_t": 1,
|
| 31 |
+
"sample_size": [
|
| 32 |
+
60,
|
| 33 |
+
80
|
| 34 |
+
],
|
| 35 |
+
"sample_size_t": 8,
|
| 36 |
+
"upcast_attention": false,
|
| 37 |
+
"use_additional_conditions": null,
|
| 38 |
+
"use_linear_projection": false,
|
| 39 |
+
"use_rope": true,
|
| 40 |
+
"use_stable_fp32": false
|
| 41 |
+
}
|
diffusion_pytorch_model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5c873bd3fc6a5efcc70fa2e4134214285cea5537780f6d36ad23c15d3b40ecdc
|
| 3 |
+
size 6278569664
|