Image-to-Video
Diffusers
Safetensors
MochunniaN1 commited on
Commit
4418d2e
·
verified ·
1 Parent(s): 1cc430e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -174
README.md CHANGED
@@ -72,180 +72,6 @@ Also support longer video & out-of-domain cases
72
 
73
  <br>
74
 
75
- ## 🔧 Dependencies and Installation
76
-
77
- 1. Clone Repo
78
- ```bash
79
- git clone https://github.com/ssj9596/One-to-All-Animation.git
80
- cd One-to-All-Animation
81
- ```
82
-
83
- 2. Create Conda Environment and Install Dependencies
84
- ```bash
85
- # create new conda env
86
- conda create -n one-to-all python=3.12
87
- conda activate one-to-all
88
-
89
- # install pytorch
90
- pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
91
- # or
92
- pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 -i https://mirrors.aliyun.com/pypi/simple/
93
-
94
- # install python dependencies
95
- pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/
96
-
97
-
98
- # (Recommended) install flash attention 3 (or 2) from source:
99
- # https://github.com/Dao-AILab/flash-attention
100
- ```
101
-
102
- 3. Download Models
103
-
104
- - Download pretrained models
105
- ```bash
106
- cd ./pretrained_models
107
- bash download_pretrained_models.py
108
- ```
109
-
110
- - Download checkpoints
111
- ```bash
112
- cd ./checkpoints
113
- bash download_checkpoints.py
114
- ```
115
-
116
- > 💡 **Tip**: Edit the script and uncomment the specific models you want to download.
117
- > - **1.3B_1**: Best performance on video benchmark among 1.3B models (paper results).
118
- > - **1.3B_2**: Further trained on v1 with large camera movement data and increased image ratio. Better for dynamic video generation. Best on image benchmark (paper results).
119
- > - **14B**: Best overall performance among 14B models (paper results).
120
-
121
- <br>
122
-
123
- ## ☕️ Quick Inference
124
-
125
- We provide several examples in the [`examples`](https://github.com/ssj9596/One-to-All-Animation/tree/main/examples) folder.
126
- Run the following commands to try it out:
127
-
128
- ```bash
129
- # Step 1: Prepare model input
130
- cd video-generation
131
- python infer_preprocess.py
132
-
133
- # Step 2: Run inference with your preferred model
134
- python inference_1.3b.py # For 1.3B model
135
- # or
136
- python inference_14b.py # For 14B model
137
- ```
138
- You can enter the script to modify the input path.
139
-
140
- <br>
141
-
142
- ## 🎬 Training from scratch
143
-
144
- >💡 **Data Collection Required**: We find current open-source datasets are not sufficient for training from scratch. We strongly recommend collecting *at least 3,000 additional high-quality video samples* for better results.
145
-
146
- We divide the training process into several steps to help you reproduce our results from scratch (using 1.3B as an example).
147
-
148
- 1. Download Pretrained Models
149
-
150
- Download the base model from HuggingFace: [Wan-AI/Wan2.1-T2V-1.3B-Diffusers](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B-Diffusers)
151
-
152
- 2. Download Training Datasets and Pose Pool
153
-
154
- ```bash
155
- cd datasets
156
- bash setup_datasets.sh
157
- ```
158
-
159
- This will download and prepare:
160
- - Training datasets (open-source + cartoon): `datasets/opensource_dataset/`
161
- - Pose pool for face enhancement: `datasets/opensource_pose_pool/`
162
-
163
- <details>
164
- <summary>Manual Download Links</summary>
165
-
166
- - [opensource_dataset](https://huggingface.co/datasets/MochunniaN1/One-to-All-sub/tree/main/opensource_dataset)
167
- - [opensource_pose_pool](https://huggingface.co/datasets/MochunniaN1/One-to-All-sub/tree/main/opensource_pose_pool)
168
-
169
- </details>
170
-
171
- 3. Training
172
-
173
- We provide three-stage training scripts:
174
- * Stage 1: Reference Extractor
175
-
176
- ```bash
177
- cd video-generation
178
- bash training_scripts/train1.3b_only_refextractor_2d.sh
179
- # Convert checkpoint to FP32
180
- cd outputs_wanx1.3b/train1.3b_only_refextractor_2d/checkpoint-xxx
181
- mkdir fp32_model_xxx
182
- python zero_to_fp32.py . fp32_model_xxx --safe_serialization
183
- # Run inference (update model path in inference_refextractor.py first)
184
- cd ../../../
185
- # Edit inference_refextractor.py and change ckpt_path to:
186
- # ./outputs_wanx1.3b/train1.3b_only_refextractor_2d/checkpoint-xxx/fp32_model_xxx
187
- python inference_refextractor.py
188
- ```
189
-
190
- * Stage 2: Pose Control
191
- ```bash
192
- bash training_scripts/train1.3b_posecontrol_prefix_2d.sh
193
- ```
194
- * Stage 3: Token Replace for Long video generation
195
- ```bash
196
- bash training_scripts/train1.3b_posecontrol_prefix_2d_tokenreplace.sh
197
- ```
198
- > 💡 **Training Notes**:
199
- > - **Each stage uses different training resolutions** - check the scripts for specific resolution settings
200
- > - **Fine-tuning from our checkpoints**: If you want to continue training from our pre-trained models, directly use the *Stage 3 script* and modify the checkpoint path
201
-
202
- <br>
203
-
204
- ## 📊 Reproduce Paper Results
205
-
206
- We provide scripts to reproduce the quantitative results reported in our paper.
207
-
208
- 1. Download Benchmark
209
- ```bash
210
- cd benchmark
211
- bash setup_datasets.sh
212
- ```
213
- 2. Prepare Model Input
214
- ```bash
215
- cd ../video-generation
216
- python reproduce/infer_preprocess.py
217
- ```
218
- 3. Run Inference
219
-
220
- We provide inference scripts for different model sizes and datasets:
221
- ```bash
222
- # TikTok dataset
223
- python reproduce/inference_tiktok1.3b.py # 1.3B model
224
- python reproduce/inference_tiktok14b.py # 14B model
225
-
226
- # Cartoon dataset
227
- python reproduce/inference_cartoon1.3b.py # 1.3B model
228
- python reproduce/inference_cartoon14b.py # 14B model
229
-
230
- 4. Prepare gt/pred pairs for Judge
231
- ```bash
232
- cd ../benchmark
233
- # TikTok dataset
234
- python prepare_eval_frames_tiktok.py
235
- # Cartoon dataset
236
- python prepare_eval_frames_cartoon.py
237
- ```
238
-
239
- 5. Run judge
240
- ```bash
241
- # prepare DisCo environment and lpips fvd ckpt for judge
242
- cd DisCo
243
- # TikTok dataset
244
- bash eval_tiktok.sh
245
- python summary.py
246
- ```
247
-
248
- <br>
249
 
250
  ## Acknowledgments
251
 
 
72
 
73
  <br>
74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
 
76
  ## Acknowledgments
77