Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
- Website
- Community
- Solutions
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2511.05491

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24, 2024 • 17
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 55
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 91
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27, 2024 • 35

computer vision

Visual Spatial Tuning

Paper • 2511.05491 • Published Nov 7, 2025 • 53

V-Thinker: Interactive Thinking with Images

Paper • 2511.04460 • Published Nov 6, 2025 • 98
Visual Spatial Tuning

Paper • 2511.05491 • Published Nov 7, 2025 • 53
BabyVision: Visual Reasoning Beyond Language

Paper • 2601.06521 • Published Jan 10 • 201
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text

Paper • 2601.22975 • Published Jan 30 • 113

DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning

Paper • 2510.15110 • Published Oct 16, 2025 • 18
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

Paper • 2510.14528 • Published Oct 16, 2025 • 129
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs

Paper • 2510.13795 • Published Oct 15, 2025 • 60
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning

Paper • 2510.13515 • Published Oct 15, 2025 • 12

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published Aug 13, 2025 • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14, 2025 • 20
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6, 2025 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published Aug 19, 2025 • 48

Visual Spatial Tuning

Paper • 2511.05491 • Published Nov 7, 2025 • 53
Adam's Law: Textual Frequency Law on Large Language Models

Paper • 2604.02176 • Published Apr 2 • 506
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation

Paper • 2604.10098 • Published Apr 11 • 82
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Paper • 2604.13016 • Published Apr 14 • 110

A comprehensive framework designed to cultivate VLMs with human-like visuospatial abilities.

rayruiyang/VST-3B-RL

Image-Text-to-Text • 4B • Updated Nov 11, 2025 • 198 • 3
rayruiyang/VST-3B-SFT

Image-Text-to-Text • 4B • Updated Nov 11, 2025 • 86
rayruiyang/VST-7B-SFT

Image-Text-to-Text • 8B • Updated Nov 11, 2025 • 71
rayruiyang/VST-7B-RL

Image-Text-to-Text • 8B • Updated Nov 11, 2025 • 10

IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction

Paper • 2510.22706 • Published Oct 26, 2025 • 42
Visual Spatial Tuning

Paper • 2511.05491 • Published Nov 7, 2025 • 53

A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10, 2025 • 193
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference

Paper • 2508.02193 • Published Aug 4, 2025 • 139
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations

Paper • 2510.23607 • Published Oct 27, 2025 • 181
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation

Paper • 2510.08673 • Published Oct 9, 2025 • 128

ByteDance Papers

ByteDance papers collection

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation

Paper • 2105.09501 • Published May 20, 2021 • 1
Cross-modal Contrastive Learning for Speech Translation

Paper • 2205.02444 • Published May 5, 2022
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

Paper • 2210.03052 • Published Oct 6, 2022
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning

Paper • 2212.10240 • Published Dec 20, 2022 • 1

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Paper • 2405.15223 • Published May 24, 2024 • 17
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24, 2024 • 55
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 91
Matryoshka Multimodal Models

Paper • 2405.17430 • Published May 27, 2024 • 35

Visual Spatial Tuning

Paper • 2511.05491 • Published Nov 7, 2025 • 53
Adam's Law: Textual Frequency Law on Large Language Models

Paper • 2604.02176 • Published Apr 2 • 506
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation

Paper • 2604.10098 • Published Apr 11 • 82
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Paper • 2604.13016 • Published Apr 14 • 110

computer vision

Visual Spatial Tuning

Paper • 2511.05491 • Published Nov 7, 2025 • 53

A comprehensive framework designed to cultivate VLMs with human-like visuospatial abilities.

rayruiyang/VST-3B-RL

Image-Text-to-Text • 4B • Updated Nov 11, 2025 • 198 • 3
rayruiyang/VST-3B-SFT

Image-Text-to-Text • 4B • Updated Nov 11, 2025 • 86
rayruiyang/VST-7B-SFT

Image-Text-to-Text • 8B • Updated Nov 11, 2025 • 71
rayruiyang/VST-7B-RL

Image-Text-to-Text • 8B • Updated Nov 11, 2025 • 10

V-Thinker: Interactive Thinking with Images

Paper • 2511.04460 • Published Nov 6, 2025 • 98
Visual Spatial Tuning

Paper • 2511.05491 • Published Nov 7, 2025 • 53
BabyVision: Visual Reasoning Beyond Language

Paper • 2601.06521 • Published Jan 10 • 201
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text

Paper • 2601.22975 • Published Jan 30 • 113

IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction

Paper • 2510.22706 • Published Oct 26, 2025 • 42
Visual Spatial Tuning

Paper • 2511.05491 • Published Nov 7, 2025 • 53

DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning

Paper • 2510.15110 • Published Oct 16, 2025 • 18
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

Paper • 2510.14528 • Published Oct 16, 2025 • 129
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs

Paper • 2510.13795 • Published Oct 15, 2025 • 60
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning

Paper • 2510.13515 • Published Oct 15, 2025 • 12

A Survey of Reinforcement Learning for Large Reasoning Models

Paper • 2509.08827 • Published Sep 10, 2025 • 193
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference

Paper • 2508.02193 • Published Aug 4, 2025 • 139
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations

Paper • 2510.23607 • Published Oct 27, 2025 • 181
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation

Paper • 2510.08673 • Published Oct 9, 2025 • 128

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published Aug 13, 2025 • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14, 2025 • 20
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6, 2025 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published Aug 19, 2025 • 48

ByteDance Papers

ByteDance papers collection

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation

Paper • 2105.09501 • Published May 20, 2021 • 1
Cross-modal Contrastive Learning for Speech Translation

Paper • 2205.02444 • Published May 5, 2022
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

Paper • 2210.03052 • Published Oct 6, 2022
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning

Paper • 2212.10240 • Published Dec 20, 2022 • 1

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs