---
base_model:
- ByteDance-Seed/BAGEL-7B-MoT
datasets:
- Liu-Junhua/TwiFF-2.7M
- Liu-Junhua/TwiFF-Bench
language:
- en
license: apache-2.0
pipeline_tag: any-to-any
tags:
- visual-chain-of-thought
- VCoT
- dynamic-visual-reasoning
---
# TwiFF (Think With Future Frames): A Large-Scale Dataset for Dynamic Visual Reasoning
TwiFF is a unified model fine-tuned on a high-quality dynamic visual Chain-of-Thought (VCoT) dataset comprising 2.7 million samples. In dynamic multimodal question-answering tasks involving instructional, predictive, and camera motion, TwiFF iteratively generates future event frames alongside textual reasoning, thereby producing temporally coherent visual reasoning trajectories.
## 🧠 Method

Experimental results demonstrate that, on dynamic scenario reasoning benchmarks, our dynamic VCoT approach outperforms both static VCoT methods based on tool-calling paradigms and purely textual chain-of-thought baselines.
## 🚀 Quick Start
To use TwiFF, follow the instructions below derived from the [official repository](https://github.com/LiuJunhua02/TwiFF).
### 1. Set up environment
```bash
git clone https://github.com/LiuJunhua02/TwiFF.git
cd TwiFF
conda create -n TwiFF python=3.10 -y
conda activate TwiFF
pip install -r requirements.txt
pip install flash_attn==2.5.8 --no-build-isolation
```
### 2. Download checkpoint
```python
from huggingface_hub import snapshot_download
save_dir = "models/TwiFF-7B"
repo_id = "Liu-Junhua/TwiFF-7B"
cache_dir = save_dir + "/cache"
snapshot_download(cache_dir=cache_dir,
local_dir=save_dir,
repo_id=repo_id,
local_dir_use_symlinks=False,
resume_download=True,
allow_patterns=["*.json", "*.safetensors", "*.bin", "*.py", "*.md", "*.txt"],
)
```
### 3. Start Inference
Store your test cases in `output/demo.jsonl` (see the GitHub README for the specific JSON format) and run:
```bash
python \
scripts/inference.py \
--max_round 8 \
--model_dir models/TwiFF-7B \
--checkpoint_file model.safetensors \
--checkpoint_dir models/TwiFF-7B \
--QA_file output/demo.jsonl \
--seed 42
```
## ✍️ Citation
```bibtex
@article{liu2026twiff,
title={TwiFF (Think With Future Frames): A Large-Scale Dataset for Dynamic Visual Reasoning},
author={Liu, Junhua and Wang, Zhangcheng and Han, Zhike and Wang, Ningli and Liang, Guotao and Kuang, Kun},
journal={arXiv preprint arXiv:2602.10675},
year={2026},
}
```
## 📜 License
TwiFF is licensed under the Apache 2.0.