Liu-Junhua
/

TwiFF-7B

@@ -1,13 +1,19 @@
 ---
-license: apache-2.0
 datasets:
 - Liu-Junhua/TwiFF-2.7M
 language:
 - en
-base_model:
-- ByteDance-Seed/BAGEL-7B-MoT
 pipeline_tag: any-to-any
 ---
 # TwiFF (Think With Future Frames): A Large-Scale Dataset for Dynamic Visual Reasoning
 <p align="center">
   <a href="https://arxiv.org/abs/2602.10675">
@@ -31,15 +37,66 @@ pipeline_tag: any-to-any
   <a href="https://github.com/LiuJunhua02/TwiFF">
     <img
         src="https://img.shields.io/badge/TwiFF-Codebase-536af5?color=536af5&logo=github"
-        alt="TwiFF-Bench Dataset"
     />
   </a>
 </p>
 ## 🧠 Method
 <p align="center"><img src="https://github.com/LiuJunhua02/TwiFF/raw/main/assets/data_show.png" width="95%"></p>
-We present TwiFF, a unified model fine-tuned on a high-quality dynamic visual Chain-of-Thought (VCoT) dataset comprising 2.7 million samples. In dynamic multimodal question-answering tasks involving instructional, predictive, and camera, TwiFF iteratively generates future event frames alongside textual reasoning, thereby producing temporally coherent visual reasoning trajectories. Experimental results demonstrate that, on dynamic scenario reasoning benchmarks, our dynamic VCoT approach outperforms both static VCoT methods based on tool-calling paradigms and purely textual chain-of-thought baselines.
 ## ✍️ Citation
@@ -51,3 +108,7 @@ We present TwiFF, a unified model fine-tuned on a high-quality dynamic visual Ch
          year={2026},
 }
 ```

 ---
+base_model:
+- ByteDance-Seed/BAGEL-7B-MoT
 datasets:
 - Liu-Junhua/TwiFF-2.7M
+- Liu-Junhua/TwiFF-Bench
 language:
 - en
+license: apache-2.0
 pipeline_tag: any-to-any
+tags:
+- visual-chain-of-thought
+- VCoT
+- dynamic-visual-reasoning
 ---
 # TwiFF (Think With Future Frames): A Large-Scale Dataset for Dynamic Visual Reasoning
 <p align="center">
   <a href="https://arxiv.org/abs/2602.10675">
   <a href="https://github.com/LiuJunhua02/TwiFF">
     <img
         src="https://img.shields.io/badge/TwiFF-Codebase-536af5?color=536af5&logo=github"
+        alt="TwiFF Codebase"
     />
   </a>
 </p>
+TwiFF is a unified model fine-tuned on a high-quality dynamic visual Chain-of-Thought (VCoT) dataset comprising 2.7 million samples. In dynamic multimodal question-answering tasks involving instructional, predictive, and camera motion, TwiFF iteratively generates future event frames alongside textual reasoning, thereby producing temporally coherent visual reasoning trajectories.
 ## 🧠 Method
 <p align="center"><img src="https://github.com/LiuJunhua02/TwiFF/raw/main/assets/data_show.png" width="95%"></p>
+Experimental results demonstrate that, on dynamic scenario reasoning benchmarks, our dynamic VCoT approach outperforms both static VCoT methods based on tool-calling paradigms and purely textual chain-of-thought baselines.
+## 🚀 Quick Start
+To use TwiFF, follow the instructions below derived from the [official repository](https://github.com/LiuJunhua02/TwiFF).
+### 1. Set up environment
+```bash
+git clone https://github.com/LiuJunhua02/TwiFF.git
+cd TwiFF
+conda create -n TwiFF python=3.10 -y
+conda activate TwiFF
+pip install -r requirements.txt
+pip install flash_attn==2.5.8 --no-build-isolation
+```
+### 2. Download checkpoint
+```python
+from huggingface_hub import snapshot_download
+save_dir = "models/TwiFF-7B"
+repo_id = "Liu-Junhua/TwiFF-7B"
+cache_dir = save_dir + "/cache"
+snapshot_download(cache_dir=cache_dir,
+  local_dir=save_dir,
+  repo_id=repo_id,
+  local_dir_use_symlinks=False,
+  resume_download=True,
+  allow_patterns=["*.json", "*.safetensors", "*.bin", "*.py", "*.md", "*.txt"],
+)
+```
+### 3. Start Inference
+Store your test cases in `output/demo.jsonl` (see the GitHub README for the specific JSON format) and run:
+```bash
+python \
+  scripts/inference.py \
+  --max_round 8 \
+  --model_dir models/TwiFF-7B \
+  --checkpoint_file model.safetensors \
+  --checkpoint_dir models/TwiFF-7B \
+  --QA_file output/demo.jsonl \
+  --seed 42
+```
 ## ✍️ Citation
          year={2026},
 }
 ```
+## 📜 License
+TwiFF is licensed under the Apache 2.0.