MochunniaN1
/

One-to-All-14b

Image-to-Video

Diffusers

Safetensors

Model card Files Files and versions

xet

Community

MochunniaN1 commited on Dec 9, 2025

Commit

4418d2e

verified ·

1 Parent(s): 1cc430e

Update README.md

Browse files

Files changed (1) hide show

README.md +0 -174

README.md CHANGED Viewed

@@ -72,180 +72,6 @@ Also support longer video & out-of-domain cases
 <br>
-## 🔧 Dependencies and Installation
-1. Clone Repo
-    ```bash
-    git clone https://github.com/ssj9596/One-to-All-Animation.git
-    cd One-to-All-Animation
-    ```
-2. Create Conda Environment and Install Dependencies
-    ```bash
-    # create new conda env
-    conda create -n one-to-all python=3.12
-    conda activate one-to-all
-    # install pytorch
-    pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
-    # or
-    pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 -i https://mirrors.aliyun.com/pypi/simple/
-    # install python dependencies
-    pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/
-    # (Recommended) install flash attention 3 (or 2) from source:
-    # https://github.com/Dao-AILab/flash-attention
-    ```
-3. Download Models
-   - Download pretrained models
-   ```bash
-    cd ./pretrained_models
-    bash download_pretrained_models.py
-    ```
-   - Download checkpoints
-    ```bash
-    cd ./checkpoints
-    bash download_checkpoints.py
-    ```
-    > 💡 **Tip**: Edit the script and uncomment the specific models you want to download.
-    > - **1.3B_1**: Best performance on video benchmark among 1.3B models (paper results).
-    > - **1.3B_2**: Further trained on v1 with large camera movement data and increased image ratio. Better for dynamic video generation. Best on image benchmark (paper results).
-    > - **14B**: Best overall performance among 14B models (paper results).
-<br>
-## ☕️ Quick Inference
-We provide several examples in the [`examples`](https://github.com/ssj9596/One-to-All-Animation/tree/main/examples) folder.
-Run the following commands to try it out:
-```bash
-# Step 1: Prepare model input
-cd video-generation
-python infer_preprocess.py
-# Step 2: Run inference with your preferred model
-python inference_1.3b.py  # For 1.3B model
-# or
-python inference_14b.py   # For 14B model
-```
-You can enter the script to modify the input path.
-<br>
-## 🎬 Training from scratch
->💡 **Data Collection Required**: We find current open-source datasets are not sufficient for training from scratch. We strongly recommend collecting *at least 3,000 additional high-quality video samples* for better results.
-We divide the training process into several steps to help you reproduce our results from scratch (using 1.3B as an example).
-1. Download Pretrained Models
-    Download the base model from HuggingFace: [Wan-AI/Wan2.1-T2V-1.3B-Diffusers](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B-Diffusers)
-2. Download Training Datasets and Pose Pool
-    ```bash
-    cd datasets
-    bash setup_datasets.sh
-    ```
-    This will download and prepare:
-    - Training datasets (open-source + cartoon): `datasets/opensource_dataset/`
-    - Pose pool for face enhancement: `datasets/opensource_pose_pool/`
-    <details>
-    <summary>Manual Download Links</summary>
-    - [opensource_dataset](https://huggingface.co/datasets/MochunniaN1/One-to-All-sub/tree/main/opensource_dataset)
-    - [opensource_pose_pool](https://huggingface.co/datasets/MochunniaN1/One-to-All-sub/tree/main/opensource_pose_pool)
-    </details>
-3. Training
-    We provide three-stage training scripts:
-    * Stage 1: Reference Extractor
-    ```bash
-    cd video-generation
-    bash training_scripts/train1.3b_only_refextractor_2d.sh
-    # Convert checkpoint to FP32
-    cd outputs_wanx1.3b/train1.3b_only_refextractor_2d/checkpoint-xxx
-    mkdir fp32_model_xxx
-    python zero_to_fp32.py . fp32_model_xxx --safe_serialization
-    # Run inference (update model path in inference_refextractor.py first)
-    cd ../../../
-    # Edit inference_refextractor.py and change ckpt_path to:
-    # ./outputs_wanx1.3b/train1.3b_only_refextractor_2d/checkpoint-xxx/fp32_model_xxx
-    python inference_refextractor.py
-    ```
-    * Stage 2: Pose Control
-    ```bash
-    bash training_scripts/train1.3b_posecontrol_prefix_2d.sh
-    ```
-    * Stage 3: Token Replace for Long video generation
-    ```bash
-    bash training_scripts/train1.3b_posecontrol_prefix_2d_tokenreplace.sh
-    ```
-    > 💡 **Training Notes**:
-    > - **Each stage uses different training resolutions** - check the scripts for specific resolution settings
-    > - **Fine-tuning from our checkpoints**: If you want to continue training from our pre-trained models, directly use the *Stage 3 script* and modify the checkpoint path
-<br>
-## 📊 Reproduce Paper Results
-We provide scripts to reproduce the quantitative results reported in our paper.
-1. Download Benchmark
-    ```bash
-    cd benchmark
-    bash setup_datasets.sh
-    ```
-2. Prepare Model Input
-    ```bash
-    cd ../video-generation
-    python reproduce/infer_preprocess.py
-    ```
-3. Run Inference
-    We provide inference scripts for different model sizes and datasets:
-    ```bash
-    # TikTok dataset
-    python reproduce/inference_tiktok1.3b.py   # 1.3B model
-    python reproduce/inference_tiktok14b.py    # 14B model
-    # Cartoon dataset
-    python reproduce/inference_cartoon1.3b.py  # 1.3B model
-    python reproduce/inference_cartoon14b.py   # 14B model
-4. Prepare gt/pred pairs for Judge
-   ```bash
-   cd ../benchmark
-   # TikTok dataset
-   python prepare_eval_frames_tiktok.py
-   # Cartoon dataset
-   python prepare_eval_frames_cartoon.py
-   ```
-5. Run judge
-   ```bash
-   # prepare DisCo environment and lpips fvd ckpt for judge
-   cd DisCo
-   # TikTok dataset
-   bash eval_tiktok.sh
-   python summary.py
-   ```
-<br>
 ## Acknowledgments


72
73	<br>
74














































































































































































75
76	## Acknowledgments
77