multimodal
updated
Image-Text-to-Text
• 2B • Updated • 1.45M
• 1.42k
Qwen/Qwen2.5-VL-7B-Instruct
Image-Text-to-Text
• 8B • Updated • 4.93M
• • 1.58k
google/gemma-3-27b-it-qat-q4_0-gguf
Image-Text-to-Text
• 27B • Updated • 247
• 401
google/paligemma2-3b-mix-224
Image-Text-to-Text
• 3B • Updated • 14.1k
• 54
HuggingFaceTB/SmolVLM2-256M-Video-Instruct
Image-Text-to-Text
• 0.3B • Updated • 90.6k
• 104
unsloth/Qwen2.5-VL-3B-Instruct-GGUF
Image-Text-to-Text
• 3B • Updated • 9.83k
• 24
Image-Text-to-Text
• 0.9B • Updated • 128k
• 85
14B • Updated • 328
• 103
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper
• 2412.13303
• Published • 77
Feature Extraction
• 0.9B • Updated • 39.1k
• 333
Qwen/Qwen3-VL-4B-Instruct
Image-Text-to-Text
• 4B • Updated • 2.68M
• 395
PaddlePaddle/PaddleOCR-VL
Image-Text-to-Text
• 1.0B • Updated • 4.83k
• 1.62k
Image-Text-to-Text
• 3B • Updated • 867
• 116
Viewer
• Updated • 1.19k • 345
• 49
nvidia/NVIDIA-Nemotron-Parse-v1.2
Image-Text-to-Text
• 0.9B • Updated • 165k
• 44