--- base_model: - ByteDance-Seed/BAGEL-7B-MoT datasets: - Liu-Junhua/TwiFF-2.7M - Liu-Junhua/TwiFF-Bench language: - en license: apache-2.0 pipeline_tag: any-to-any tags: - visual-chain-of-thought - VCoT - dynamic-visual-reasoning --- # TwiFF (Think With Future Frames): A Large-Scale Dataset for Dynamic Visual Reasoning

TwiFF is a unified model fine-tuned on a high-quality dynamic visual Chain-of-Thought (VCoT) dataset comprising 2.7 million samples. In dynamic multimodal question-answering tasks involving instructional, predictive, and camera motion, TwiFF iteratively generates future event frames alongside textual reasoning, thereby producing temporally coherent visual reasoning trajectories. ## 🧠 Method

Experimental results demonstrate that, on dynamic scenario reasoning benchmarks, our dynamic VCoT approach outperforms both static VCoT methods based on tool-calling paradigms and purely textual chain-of-thought baselines. ## 🚀 Quick Start To use TwiFF, follow the instructions below derived from the [official repository](https://github.com/LiuJunhua02/TwiFF). ### 1. Set up environment ```bash git clone https://github.com/LiuJunhua02/TwiFF.git cd TwiFF conda create -n TwiFF python=3.10 -y conda activate TwiFF pip install -r requirements.txt pip install flash_attn==2.5.8 --no-build-isolation ``` ### 2. Download checkpoint ```python from huggingface_hub import snapshot_download save_dir = "models/TwiFF-7B" repo_id = "Liu-Junhua/TwiFF-7B" cache_dir = save_dir + "/cache" snapshot_download(cache_dir=cache_dir, local_dir=save_dir, repo_id=repo_id, local_dir_use_symlinks=False, resume_download=True, allow_patterns=["*.json", "*.safetensors", "*.bin", "*.py", "*.md", "*.txt"], ) ``` ### 3. Start Inference Store your test cases in `output/demo.jsonl` (see the GitHub README for the specific JSON format) and run: ```bash python \ scripts/inference.py \ --max_round 8 \ --model_dir models/TwiFF-7B \ --checkpoint_file model.safetensors \ --checkpoint_dir models/TwiFF-7B \ --QA_file output/demo.jsonl \ --seed 42 ``` ## ✍️ Citation ```bibtex @article{liu2026twiff, title={TwiFF (Think With Future Frames): A Large-Scale Dataset for Dynamic Visual Reasoning}, author={Liu, Junhua and Wang, Zhangcheng and Han, Zhike and Wang, Ningli and Liang, Guotao and Kuang, Kun}, journal={arXiv preprint arXiv:2602.10675}, year={2026}, } ``` ## 📜 License TwiFF is licensed under the Apache 2.0.