nielsr HF Staff commited on
Commit
30e9574
·
verified ·
1 Parent(s): b0aba2b

Add Quick Start and improve model documentation

Browse files

Hi! I'm Niels from the Hugging Face community team.

I've updated your model card to include the "Quick Start" section from your GitHub repository, which provides clear instructions on how to set up the environment and run inference. I've also added the `TwiFF-Bench` dataset to the metadata to make it easier for users to find the associated evaluation benchmark.

Files changed (1) hide show
  1. README.md +66 -5
README.md CHANGED
@@ -1,13 +1,19 @@
1
  ---
2
- license: apache-2.0
 
3
  datasets:
4
  - Liu-Junhua/TwiFF-2.7M
 
5
  language:
6
  - en
7
- base_model:
8
- - ByteDance-Seed/BAGEL-7B-MoT
9
  pipeline_tag: any-to-any
 
 
 
 
10
  ---
 
11
  # TwiFF (Think With Future Frames): A Large-Scale Dataset for Dynamic Visual Reasoning
12
  <p align="center">
13
  <a href="https://arxiv.org/abs/2602.10675">
@@ -31,15 +37,66 @@ pipeline_tag: any-to-any
31
  <a href="https://github.com/LiuJunhua02/TwiFF">
32
  <img
33
  src="https://img.shields.io/badge/TwiFF-Codebase-536af5?color=536af5&logo=github"
34
- alt="TwiFF-Bench Dataset"
35
  />
36
  </a>
37
  </p>
38
 
 
 
39
  ## 🧠 Method
40
 
41
  <p align="center"><img src="https://github.com/LiuJunhua02/TwiFF/raw/main/assets/data_show.png" width="95%"></p>
42
- We present TwiFF, a unified model fine-tuned on a high-quality dynamic visual Chain-of-Thought (VCoT) dataset comprising 2.7 million samples. In dynamic multimodal question-answering tasks involving instructional, predictive, and camera, TwiFF iteratively generates future event frames alongside textual reasoning, thereby producing temporally coherent visual reasoning trajectories. Experimental results demonstrate that, on dynamic scenario reasoning benchmarks, our dynamic VCoT approach outperforms both static VCoT methods based on tool-calling paradigms and purely textual chain-of-thought baselines.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
 
44
  ## ✍️ Citation
45
 
@@ -51,3 +108,7 @@ We present TwiFF, a unified model fine-tuned on a high-quality dynamic visual Ch
51
  year={2026},
52
  }
53
  ```
 
 
 
 
 
1
  ---
2
+ base_model:
3
+ - ByteDance-Seed/BAGEL-7B-MoT
4
  datasets:
5
  - Liu-Junhua/TwiFF-2.7M
6
+ - Liu-Junhua/TwiFF-Bench
7
  language:
8
  - en
9
+ license: apache-2.0
 
10
  pipeline_tag: any-to-any
11
+ tags:
12
+ - visual-chain-of-thought
13
+ - VCoT
14
+ - dynamic-visual-reasoning
15
  ---
16
+
17
  # TwiFF (Think With Future Frames): A Large-Scale Dataset for Dynamic Visual Reasoning
18
  <p align="center">
19
  <a href="https://arxiv.org/abs/2602.10675">
 
37
  <a href="https://github.com/LiuJunhua02/TwiFF">
38
  <img
39
  src="https://img.shields.io/badge/TwiFF-Codebase-536af5?color=536af5&logo=github"
40
+ alt="TwiFF Codebase"
41
  />
42
  </a>
43
  </p>
44
 
45
+ TwiFF is a unified model fine-tuned on a high-quality dynamic visual Chain-of-Thought (VCoT) dataset comprising 2.7 million samples. In dynamic multimodal question-answering tasks involving instructional, predictive, and camera motion, TwiFF iteratively generates future event frames alongside textual reasoning, thereby producing temporally coherent visual reasoning trajectories.
46
+
47
  ## 🧠 Method
48
 
49
  <p align="center"><img src="https://github.com/LiuJunhua02/TwiFF/raw/main/assets/data_show.png" width="95%"></p>
50
+
51
+ Experimental results demonstrate that, on dynamic scenario reasoning benchmarks, our dynamic VCoT approach outperforms both static VCoT methods based on tool-calling paradigms and purely textual chain-of-thought baselines.
52
+
53
+ ## 🚀 Quick Start
54
+
55
+ To use TwiFF, follow the instructions below derived from the [official repository](https://github.com/LiuJunhua02/TwiFF).
56
+
57
+ ### 1. Set up environment
58
+
59
+ ```bash
60
+ git clone https://github.com/LiuJunhua02/TwiFF.git
61
+ cd TwiFF
62
+ conda create -n TwiFF python=3.10 -y
63
+ conda activate TwiFF
64
+ pip install -r requirements.txt
65
+ pip install flash_attn==2.5.8 --no-build-isolation
66
+ ```
67
+
68
+ ### 2. Download checkpoint
69
+
70
+ ```python
71
+ from huggingface_hub import snapshot_download
72
+
73
+ save_dir = "models/TwiFF-7B"
74
+ repo_id = "Liu-Junhua/TwiFF-7B"
75
+ cache_dir = save_dir + "/cache"
76
+
77
+ snapshot_download(cache_dir=cache_dir,
78
+ local_dir=save_dir,
79
+ repo_id=repo_id,
80
+ local_dir_use_symlinks=False,
81
+ resume_download=True,
82
+ allow_patterns=["*.json", "*.safetensors", "*.bin", "*.py", "*.md", "*.txt"],
83
+ )
84
+ ```
85
+
86
+ ### 3. Start Inference
87
+
88
+ Store your test cases in `output/demo.jsonl` (see the GitHub README for the specific JSON format) and run:
89
+
90
+ ```bash
91
+ python \
92
+ scripts/inference.py \
93
+ --max_round 8 \
94
+ --model_dir models/TwiFF-7B \
95
+ --checkpoint_file model.safetensors \
96
+ --checkpoint_dir models/TwiFF-7B \
97
+ --QA_file output/demo.jsonl \
98
+ --seed 42
99
+ ```
100
 
101
  ## ✍️ Citation
102
 
 
108
  year={2026},
109
  }
110
  ```
111
+
112
+ ## 📜 License
113
+
114
+ TwiFF is licensed under the Apache 2.0.