Image-to-Video
Diffusers
Safetensors
MochunniaN1 nielsr HF Staff commited on
Commit
80cf6e6
Β·
verified Β·
1 Parent(s): 450a8f5

Improve model card: Add pipeline tag, library name, links, and usage (#1)

Browse files

- Improve model card: Add pipeline tag, library name, links, and usage (62e2e6aeefd89d8f74b6cd1a823470c9467afeff)


Co-authored-by: Niels Rogge <nielsr@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +237 -3
README.md CHANGED
@@ -1,3 +1,237 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: image-to-video
4
+ library_name: diffusers
5
+ ---
6
+
7
+ # One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer
8
+
9
+ This repository contains the model and code for the paper [One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer](https://huggingface.co/papers/2511.22940).
10
+
11
+ This project aims to provide a unified framework for high-fidelity character animation and image pose transfer for references with arbitrary layouts, addressing limitations in existing diffusion models regarding spatially misaligned reference-pose pairs.
12
+
13
+ - πŸ“„ [Paper](https://huggingface.co/papers/2511.22940)
14
+ - 🌐 [Project Page](https://ssj9596.github.io/one-to-all-animation-project/)
15
+ - πŸ’» [Code on GitHub](https://github.com/ssj9596/One-to-All-Animation)
16
+
17
+ ## 🌟 Highlights
18
+
19
+ We provide a **complete and reproducible** training and evaluation pipeline:
20
+
21
+ - βœ… **Full Training Code**: Three-stage progressive training from scratch
22
+ - βœ… **Complete Benchmarks**: Reproduction code and pre-trained checkpoints
23
+ - βœ… **Flexible Training Codebase**: Multi-resolution, multi-aspect-ratio, and multi-frame training codebase
24
+ - βœ… **Datasets**: Pre-processed open-source datasets + self-collected cartoon data
25
+
26
+ <br>
27
+
28
+ ## 🎭 Showcase - 1.3B Model Results
29
+
30
+ <p align="center">
31
+ <img src="https://github.com/ssj9596/One-to-All-Animation/raw/main/assets/combined_video1.gif" height="300"/> &nbsp;&nbsp; <img src="https://github.com/ssj9596/One-to-All-Animation/raw/main/assets/combined_video2.gif" height="300"/>
32
+ </p>
33
+
34
+ <br>
35
+
36
+ ## πŸ”₯ Update
37
+
38
+ - [2025.11] Paper reproduction and evaluation code released.
39
+ - [2025.11] [Sample training data and Benchmark](https://huggingface.co/datasets/MochunniaN1/One-to-All-sub) on HuggingFace released.
40
+ - [2025.11] Inference and Training codes are released.
41
+ - [2025.11] [1.3B-v1](https://huggingface.co/MochunniaN1/One-to-All-1.3b_1), [1.3B-v2](https://huggingface.co/MochunniaN1/One-to-All-1.3b_2) and [14B](https://huggingface.co/MochunniaN1/One-to-All-14b) checkpoints are released.
42
+
43
+ <br>
44
+
45
+ ## πŸ”§ Dependencies and Installation
46
+
47
+ 1. Clone Repo
48
+ ```bash
49
+ git clone https://github.com/ssj9596/One-to-All-Animation.git
50
+ cd One-to-All-Animation
51
+ ```
52
+
53
+ 2. Create Conda Environment and Install Dependencies
54
+ ```bash
55
+ # create new conda env
56
+ conda create -n one-to-all python=3.12
57
+ conda activate one-to-all
58
+
59
+ # install pytorch
60
+ pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
61
+ # or
62
+ pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 -i https://mirrors.aliyun.com/pypi/simple/
63
+
64
+ # install python dependencies
65
+ pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/
66
+
67
+
68
+ # (Recommended) install flash attention 3 (or 2) from source:
69
+ # https://github.com/Dao-AILab/flash-attention
70
+ ```
71
+
72
+ 3. Download Models
73
+
74
+ - Download pretrained models
75
+ ```bash
76
+ cd ./pretrained_models
77
+ bash download_pretrained_models.py
78
+ ```
79
+
80
+ - Download checkpoints
81
+ ```bash
82
+ cd ./checkpoints
83
+ bash download_checkpoints.py
84
+ ```
85
+
86
+ > πŸ’‘ **Tip**: Edit the script and uncomment the specific models you want to download.
87
+ > - **1.3B_1**: Best performance on video benchmark among 1.3B models (paper results).
88
+ > - **1.3B_2**: Further trained on v1 with large camera movement data and increased image ratio. Better for dynamic video generation. Best on image benchmark (paper results).
89
+ > - **14B**: Best overall performance among 14B models (paper results).
90
+
91
+ <br>
92
+
93
+ ## β˜•οΈ Quick Inference
94
+
95
+ We provide several examples in the [`examples`](https://github.com/ssj9596/One-to-All-Animation/tree/main/examples) folder.
96
+ Run the following commands to try it out:
97
+
98
+ ```bash
99
+ # Step 1: Prepare model input
100
+ cd video-generation
101
+ python infer_preprocess.py
102
+
103
+ # Step 2: Run inference with your preferred model
104
+ python inference_1.3b.py # For 1.3B model
105
+ # or
106
+ python inference_14b.py # For 14B model
107
+ ```
108
+ You can enter the script to modify the input path.
109
+
110
+ <br>
111
+
112
+ ## 🎬 Training from scratch
113
+
114
+ >πŸ’‘ **Data Collection Required**: We find current open-source datasets are not sufficient for training from scratch. We strongly recommend collecting *at least 3,000 additional high-quality video samples* for better results.
115
+
116
+ We divide the training process into several steps to help you reproduce our results from scratch (using 1.3B as an example).
117
+
118
+ 1. Download Pretrained Models
119
+
120
+ Download the base model from HuggingFace: [Wan-AI/Wan2.1-T2V-1.3B-Diffusers](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B-Diffusers)
121
+
122
+ 2. Download Training Datasets and Pose Pool
123
+
124
+ ```bash
125
+ cd datasets
126
+ bash setup_datasets.sh
127
+ ```
128
+
129
+ This will download and prepare:
130
+ - Training datasets (open-source + cartoon): `datasets/opensource_dataset/`
131
+ - Pose pool for face enhancement: `datasets/opensource_pose_pool/`
132
+
133
+ <details>
134
+ <summary>Manual Download Links</summary>
135
+
136
+ - [opensource_dataset](https://huggingface.co/datasets/MochunniaN1/One-to-All-sub/tree/main/opensource_dataset)
137
+ - [opensource_pose_pool](https://huggingface.co/datasets/MochunniaN1/One-to-All-sub/tree/main/opensource_pose_pool)
138
+
139
+ </details>
140
+
141
+ 3. Training
142
+
143
+ We provide three-stage training scripts:
144
+ * Stage 1: Reference Extractor
145
+
146
+ ```bash
147
+ cd video-generation
148
+ bash training_scripts/train1.3b_only_refextractor_2d.sh
149
+ # Convert checkpoint to FP32
150
+ cd outputs_wanx1.3b/train1.3b_only_refextractor_2d/checkpoint-xxx
151
+ mkdir fp32_model_xxx
152
+ python zero_to_fp32.py . fp32_model_xxx --safe_serialization
153
+ # Run inference (update model path in inference_refextractor.py first)
154
+ cd ../../../
155
+ # Edit inference_refextractor.py and change ckpt_path to:
156
+ # ./outputs_wanx1.3b/train1.3b_only_refextractor_2d/checkpoint-xxx/fp32_model_xxx
157
+ python inference_refextractor.py
158
+ ```
159
+
160
+ * Stage 2: Pose Control
161
+ ```bash
162
+ bash training_scripts/train1.3b_posecontrol_prefix_2d.sh
163
+ ```
164
+ * Stage 3: Token Replace for Long video generation
165
+ ```bash
166
+ bash training_scripts/train1.3b_posecontrol_prefix_2d_tokenreplace.sh
167
+ ```
168
+ > πŸ’‘ **Training Notes**:
169
+ > - **Each stage uses different training resolutions** - check the scripts for specific resolution settings
170
+ > - **Fine-tuning from our checkpoints**: If you want to continue training from our pre-trained models, directly use the *Stage 3 script* and modify the checkpoint path
171
+
172
+ <br>
173
+
174
+ ## πŸ“Š Reproduce Paper Results
175
+
176
+ We provide scripts to reproduce the quantitative results reported in our paper.
177
+
178
+ 1. Download Benchmark
179
+ ```bash
180
+ cd benchmark
181
+ bash setup_datasets.sh
182
+ ```
183
+ 2. Prepare Model Input
184
+ ```bash
185
+ cd ../video-generation
186
+ python reproduce/infer_preprocess.py
187
+ ```
188
+ 3. Run Inference
189
+
190
+ We provide inference scripts for different model sizes and datasets:
191
+ ```bash
192
+ # TikTok dataset
193
+ python reproduce/inference_tiktok1.3b.py # 1.3B model
194
+ python reproduce/inference_tiktok14b.py # 14B model
195
+
196
+ # Cartoon dataset
197
+ python reproduce/inference_cartoon1.3b.py # 1.3B model
198
+ python reproduce/inference_cartoon14b.py # 14B model
199
+
200
+ 4. Prepare gt/pred pairs for Judge
201
+ ```bash
202
+ cd ../benchmark
203
+ # TikTok dataset
204
+ python prepare_eval_frames_tiktok.py
205
+ # Cartoon dataset
206
+ python prepare_eval_frames_cartoon.py
207
+ ```
208
+
209
+ 5. Run judge
210
+ ```bash
211
+ # prepare DisCo environment and lpips fvd ckpt for judge
212
+ cd DisCo
213
+ # TikTok dataset
214
+ bash eval_tiktok.sh
215
+ python summary.py
216
+ ```
217
+
218
+ <br>
219
+
220
+ ## Acknowledgments
221
+
222
+ Our project is based on [opensora](https://github.com/hpcaitech/Open-Sora). Some codes are brought from [StableAnimator](https://github.com/Francis-Rings/StableAnimator) and [Wan-Animate](https://github.com/Wan-Video/Wan2.2). Thanks for their awesome works.
223
+
224
+ ## πŸ“§ Contact
225
+ If you have any questions, please feel free to reach us at `ssj180123@gmail.com`
226
+
227
+ ## πŸ“ Citation
228
+ If you find our work helpful or inspiring, please feel free to cite it.
229
+
230
+ ```bibtex
231
+ @article{shi2025onetoall,
232
+ title={One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer},
233
+ author={Shi, Shijun and Xu, Jing and Li, Zhihang and Peng, Chunli and Yang, Xiaoda and Lu, Lijing and Hu, Kai and Zhang, Jiangning},
234
+ journal={arXiv preprint arXiv:2511.22940},
235
+ year={2025}
236
+ }
237
+ ```