Add Quick Start and improve model documentation
Browse filesHi! I'm Niels from the Hugging Face community team.
I've updated your model card to include the "Quick Start" section from your GitHub repository, which provides clear instructions on how to set up the environment and run inference. I've also added the `TwiFF-Bench` dataset to the metadata to make it easier for users to find the associated evaluation benchmark.
README.md
CHANGED
|
@@ -1,13 +1,19 @@
|
|
| 1 |
---
|
| 2 |
-
|
|
|
|
| 3 |
datasets:
|
| 4 |
- Liu-Junhua/TwiFF-2.7M
|
|
|
|
| 5 |
language:
|
| 6 |
- en
|
| 7 |
-
|
| 8 |
-
- ByteDance-Seed/BAGEL-7B-MoT
|
| 9 |
pipeline_tag: any-to-any
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
---
|
|
|
|
| 11 |
# TwiFF (Think With Future Frames): A Large-Scale Dataset for Dynamic Visual Reasoning
|
| 12 |
<p align="center">
|
| 13 |
<a href="https://arxiv.org/abs/2602.10675">
|
|
@@ -31,15 +37,66 @@ pipeline_tag: any-to-any
|
|
| 31 |
<a href="https://github.com/LiuJunhua02/TwiFF">
|
| 32 |
<img
|
| 33 |
src="https://img.shields.io/badge/TwiFF-Codebase-536af5?color=536af5&logo=github"
|
| 34 |
-
alt="TwiFF
|
| 35 |
/>
|
| 36 |
</a>
|
| 37 |
</p>
|
| 38 |
|
|
|
|
|
|
|
| 39 |
## 🧠 Method
|
| 40 |
|
| 41 |
<p align="center"><img src="https://github.com/LiuJunhua02/TwiFF/raw/main/assets/data_show.png" width="95%"></p>
|
| 42 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
## ✍️ Citation
|
| 45 |
|
|
@@ -51,3 +108,7 @@ We present TwiFF, a unified model fine-tuned on a high-quality dynamic visual Ch
|
|
| 51 |
year={2026},
|
| 52 |
}
|
| 53 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model:
|
| 3 |
+
- ByteDance-Seed/BAGEL-7B-MoT
|
| 4 |
datasets:
|
| 5 |
- Liu-Junhua/TwiFF-2.7M
|
| 6 |
+
- Liu-Junhua/TwiFF-Bench
|
| 7 |
language:
|
| 8 |
- en
|
| 9 |
+
license: apache-2.0
|
|
|
|
| 10 |
pipeline_tag: any-to-any
|
| 11 |
+
tags:
|
| 12 |
+
- visual-chain-of-thought
|
| 13 |
+
- VCoT
|
| 14 |
+
- dynamic-visual-reasoning
|
| 15 |
---
|
| 16 |
+
|
| 17 |
# TwiFF (Think With Future Frames): A Large-Scale Dataset for Dynamic Visual Reasoning
|
| 18 |
<p align="center">
|
| 19 |
<a href="https://arxiv.org/abs/2602.10675">
|
|
|
|
| 37 |
<a href="https://github.com/LiuJunhua02/TwiFF">
|
| 38 |
<img
|
| 39 |
src="https://img.shields.io/badge/TwiFF-Codebase-536af5?color=536af5&logo=github"
|
| 40 |
+
alt="TwiFF Codebase"
|
| 41 |
/>
|
| 42 |
</a>
|
| 43 |
</p>
|
| 44 |
|
| 45 |
+
TwiFF is a unified model fine-tuned on a high-quality dynamic visual Chain-of-Thought (VCoT) dataset comprising 2.7 million samples. In dynamic multimodal question-answering tasks involving instructional, predictive, and camera motion, TwiFF iteratively generates future event frames alongside textual reasoning, thereby producing temporally coherent visual reasoning trajectories.
|
| 46 |
+
|
| 47 |
## 🧠 Method
|
| 48 |
|
| 49 |
<p align="center"><img src="https://github.com/LiuJunhua02/TwiFF/raw/main/assets/data_show.png" width="95%"></p>
|
| 50 |
+
|
| 51 |
+
Experimental results demonstrate that, on dynamic scenario reasoning benchmarks, our dynamic VCoT approach outperforms both static VCoT methods based on tool-calling paradigms and purely textual chain-of-thought baselines.
|
| 52 |
+
|
| 53 |
+
## 🚀 Quick Start
|
| 54 |
+
|
| 55 |
+
To use TwiFF, follow the instructions below derived from the [official repository](https://github.com/LiuJunhua02/TwiFF).
|
| 56 |
+
|
| 57 |
+
### 1. Set up environment
|
| 58 |
+
|
| 59 |
+
```bash
|
| 60 |
+
git clone https://github.com/LiuJunhua02/TwiFF.git
|
| 61 |
+
cd TwiFF
|
| 62 |
+
conda create -n TwiFF python=3.10 -y
|
| 63 |
+
conda activate TwiFF
|
| 64 |
+
pip install -r requirements.txt
|
| 65 |
+
pip install flash_attn==2.5.8 --no-build-isolation
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
### 2. Download checkpoint
|
| 69 |
+
|
| 70 |
+
```python
|
| 71 |
+
from huggingface_hub import snapshot_download
|
| 72 |
+
|
| 73 |
+
save_dir = "models/TwiFF-7B"
|
| 74 |
+
repo_id = "Liu-Junhua/TwiFF-7B"
|
| 75 |
+
cache_dir = save_dir + "/cache"
|
| 76 |
+
|
| 77 |
+
snapshot_download(cache_dir=cache_dir,
|
| 78 |
+
local_dir=save_dir,
|
| 79 |
+
repo_id=repo_id,
|
| 80 |
+
local_dir_use_symlinks=False,
|
| 81 |
+
resume_download=True,
|
| 82 |
+
allow_patterns=["*.json", "*.safetensors", "*.bin", "*.py", "*.md", "*.txt"],
|
| 83 |
+
)
|
| 84 |
+
```
|
| 85 |
+
|
| 86 |
+
### 3. Start Inference
|
| 87 |
+
|
| 88 |
+
Store your test cases in `output/demo.jsonl` (see the GitHub README for the specific JSON format) and run:
|
| 89 |
+
|
| 90 |
+
```bash
|
| 91 |
+
python \
|
| 92 |
+
scripts/inference.py \
|
| 93 |
+
--max_round 8 \
|
| 94 |
+
--model_dir models/TwiFF-7B \
|
| 95 |
+
--checkpoint_file model.safetensors \
|
| 96 |
+
--checkpoint_dir models/TwiFF-7B \
|
| 97 |
+
--QA_file output/demo.jsonl \
|
| 98 |
+
--seed 42
|
| 99 |
+
```
|
| 100 |
|
| 101 |
## ✍️ Citation
|
| 102 |
|
|
|
|
| 108 |
year={2026},
|
| 109 |
}
|
| 110 |
```
|
| 111 |
+
|
| 112 |
+
## 📜 License
|
| 113 |
+
|
| 114 |
+
TwiFF is licensed under the Apache 2.0.
|