Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
- Website
- Community
- Solutions
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2403.18361

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 30
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 15
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27, 2024 • 55
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Paper • 2401.09417 • Published Jan 17, 2024 • 62
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Paper • 2103.14030 • Published Mar 25, 2021 • 5

transformer-techniques

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 108
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs

Paper • 2403.20041 • Published Mar 29, 2024 • 34
ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27, 2024 • 55

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27, 2024 • 55

Papers - Encoders - Synthetic Noise

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27, 2024 • 55

Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis

Paper • 2401.09048 • Published Jan 17, 2024 • 10
Improving fine-grained understanding in image-text pre-training

Paper • 2401.09865 • Published Jan 18, 2024 • 18
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Paper • 2401.10891 • Published Jan 19, 2024 • 63
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

Paper • 2401.13627 • Published Jan 24, 2024 • 78

vision foundation modesl

vision foundation models

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27, 2024 • 55
BRAVE: Broadening the visual encoding of vision-language models

Paper • 2404.07204 • Published Apr 10, 2024 • 20
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24, 2024 • 29
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 134

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27, 2024 • 55
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published Jun 10, 2024 • 71

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27, 2024 • 55

Papers - Image - Generator - Large Resolution

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27, 2024 • 55

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 30
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 15
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis

Paper • 2401.09048 • Published Jan 17, 2024 • 10
Improving fine-grained understanding in image-text pre-training

Paper • 2401.09865 • Published Jan 18, 2024 • 18
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Paper • 2401.10891 • Published Jan 19, 2024 • 63
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

Paper • 2401.13627 • Published Jan 24, 2024 • 78

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27, 2024 • 55
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Paper • 2401.09417 • Published Jan 17, 2024 • 62
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Paper • 2103.14030 • Published Mar 25, 2021 • 5

vision foundation modesl

vision foundation models

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27, 2024 • 55
BRAVE: Broadening the visual encoding of vision-language models

Paper • 2404.07204 • Published Apr 10, 2024 • 20
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24, 2024 • 29
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 134

transformer-techniques

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Paper • 2404.02258 • Published Apr 2, 2024 • 108
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs

Paper • 2403.20041 • Published Mar 29, 2024 • 34
ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27, 2024 • 55

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27, 2024 • 55
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published Jun 10, 2024 • 71

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27, 2024 • 55

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27, 2024 • 55

Papers - Encoders - Synthetic Noise

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27, 2024 • 55

Papers - Image - Generator - Large Resolution

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27, 2024 • 55

Previous
1
2
3
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs