zhongzero commited on
Commit
54b9f75
·
verified ·
1 Parent(s): 6e48a7a

Add files using upload-large-folder tool

Browse files
Files changed (3) hide show
  1. README.md +37 -0
  2. config.json +41 -0
  3. diffusion_pytorch_model.safetensors +3 -0
README.md CHANGED
@@ -1,3 +1,40 @@
1
  ---
2
  license: bsd-2-clause
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: bsd-2-clause
3
+ base_model:
4
+ - LanguageBind/Open-Sora-Plan-v1.2.0
5
+ library_name: diffusers
6
+ tags:
7
+ - OutDreamer
8
+ - video-outpainting
9
+ - diffusion-transformer
10
+ - DiT
11
  ---
12
+ # OutDreamer checkpoint for video outpainting
13
+
14
+ This repository provides the OutDreamer checkpoint for **OutDreamer: Video Outpainting with a Diffusion Transformer**.
15
+
16
+ OutDreamer is a DiT-based video outpainting framework designed to extend video content beyond the original frame boundaries while maintaining spatial and temporal consistency. The model introduces an efficient video control branch, a conditional outpainting branch, mask-driven self-attention, latent alignment loss, and a cross-video-clip refiner for long video outpainting.
17
+
18
+ The method and its results are detailed in the arXiv paper: [OutDreamer: Video Outpainting with a Diffusion Transformer](https://arxiv.org/abs/2506.22298).
19
+
20
+ ## How to Use
21
+
22
+ **Important:** This checkpoint is intended to be used with the OutDreamer codebase and is not a standalone Hugging Face pipeline.
23
+
24
+ For project details, please refer to the OutDreamer GitHub repository: [zhongzero/OutDreamer](https://github.com/zhongzero/OutDreamer)
25
+
26
+ For setup and inference scripts compatible with this checkpoint, please refer to the reproduction repository: [zhongzero/OutDreamer-unofficial](https://github.com/zhongzero/OutDreamer-unofficial)
27
+
28
+ ## Citation
29
+
30
+ If you find this work helpful for your research, please cite:
31
+
32
+ ```BibTeX
33
+ @article{zhong2026outdreamer,
34
+ title={Outdreamer: Video outpainting with a diffusion transformer},
35
+ author={Zhong, Linhao and Li, Fan and Huang, Yi and Liu, Jianzhuang and Pei, Renjing and Song, Fenglong},
36
+ journal={IEEE Transactions on Image Processing},
37
+ year={2026},
38
+ publisher={IEEE}
39
+ }
40
+ ```
config.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "OpenSoraCNext",
3
+ "_diffusers_version": "0.28.0",
4
+ "activation_fn": "gelu-approximate",
5
+ "attention_bias": true,
6
+ "attention_head_dim": 96,
7
+ "attention_mode": "xformers",
8
+ "attention_type": "default",
9
+ "caption_channels": 4096,
10
+ "control_in_channels": 8,
11
+ "cross_attention_dim": 2304,
12
+ "double_self_attention": false,
13
+ "downsampler": null,
14
+ "dropout": 0.0,
15
+ "in_channels": 4,
16
+ "interpolation_scale_h": 1.0,
17
+ "interpolation_scale_t": 1.0,
18
+ "interpolation_scale_w": 1.0,
19
+ "norm_elementwise_affine": false,
20
+ "norm_eps": 1e-06,
21
+ "norm_num_groups": 32,
22
+ "norm_type": "ada_norm_single",
23
+ "num_attention_heads": 24,
24
+ "num_embeds_ada_norm": 1000,
25
+ "num_layers": 32,
26
+ "num_vector_embeds": null,
27
+ "only_cross_attention": false,
28
+ "out_channels": 4,
29
+ "patch_size": 2,
30
+ "patch_size_t": 1,
31
+ "sample_size": [
32
+ 60,
33
+ 80
34
+ ],
35
+ "sample_size_t": 8,
36
+ "upcast_attention": false,
37
+ "use_additional_conditions": null,
38
+ "use_linear_projection": false,
39
+ "use_rope": true,
40
+ "use_stable_fp32": false
41
+ }
diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5c873bd3fc6a5efcc70fa2e4134214285cea5537780f6d36ad23c15d3b40ecdc
3
+ size 6278569664