Where Did It Go Wrong? Process-Level Evaluation of Web Agents with Semantic State Tracking Paper • 2606.15673 • Published Apr 8 • 10
Where Did It Go Wrong? Process-Level Evaluation of Web Agents with Semantic State Tracking Paper • 2606.15673 • Published Apr 8 • 10
SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning Paper • 2606.13673 • Published 5 days ago • 95
Rethinking State Tracking in Recurrent Models Through Error Control Dynamics Paper • 2605.07755 • Published May 8 • 24
Rethinking State Tracking in Recurrent Models Through Error Control Dynamics Paper • 2605.07755 • Published May 8 • 24
Speaking Beyond Language: A Large-Scale Multimodal Dataset for Learning Nonverbal Cues from Video-Grounded Dialogues Paper • 2506.00958 • Published Jun 1, 2025 • 20
Multimodal Knowledge Alignment with Reinforcement Learning Paper • 2205.12630 • Published May 25, 2022
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models Paper • 2404.02575 • Published Apr 3, 2024 • 50
Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding Paper • 2406.18925 • Published Jun 27, 2024 • 1
Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you! Paper • 2410.01023 • Published Oct 1, 2024 • 2
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics Paper • 2406.14703 • Published Jun 20, 2024 • 2
Teaching Metric Distance to Autoregressive Multimodal Foundational Models Paper • 2503.02379 • Published Mar 4, 2025 • 4
VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms Paper • 2503.14427 • Published Mar 18, 2025 • 19
Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation Paper • 2504.03197 • Published Apr 4, 2025 • 1
Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation Paper • 2505.18842 • Published May 24, 2025 • 36
Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation Paper • 2505.18842 • Published May 24, 2025 • 36