OmniCap-IF: Benchmarking and Improving Instruction Following Abilities for Omni-Video Captioning Paper • 2606.08572 • Published 4 days ago • 12
CoVEBench: Can Video Editing Models Handle Complex Instructions? Paper • 2606.08415 • Published 4 days ago • 46
Socratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent Skills Paper • 2606.07412 • Published 6 days ago • 12
MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills? Paper • 2606.01993 • Published 9 days ago • 14
Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories Paper • 2606.02060 • Published 10 days ago • 53
TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation Paper • 2606.02320 • Published 10 days ago • 14
Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding Paper • 2605.29707 • Published 14 days ago • 143
VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion Paper • 2605.30351 • Published 14 days ago • 26
LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV Paper • 2605.26244 • Published 17 days ago • 38
HumanNet: Scaling Human-centric Video Learning to One Million Hours Paper • 2605.06747 • Published May 7 • 52
WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models Paper • 2604.18224 • Published Apr 20 • 22
DR^{3}-Eval: Towards Realistic and Reproducible Deep Research Evaluation Paper • 2604.14683 • Published Apr 16 • 36
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published Apr 14 • 110