multimodal - a fnauman Collection

fnauman 's Collections

multimodal

updated Mar 10

vikhyatk/moondream2

Image-Text-to-Text • 2B • Updated Sep 23, 2025 • 1.45M • 1.42k
Qwen/Qwen2.5-VL-7B-Instruct

Image-Text-to-Text • 8B • Updated Apr 6, 2025 • 4.93M • • 1.58k
google/gemma-3-27b-it-qat-q4_0-gguf

Image-Text-to-Text • 27B • Updated Apr 11, 2025 • 247 • 401
google/paligemma2-3b-mix-224

Image-Text-to-Text • 3B • Updated Feb 7, 2025 • 14.1k • 54
HuggingFaceTB/SmolVLM2-256M-Video-Instruct

Image-Text-to-Text • 0.3B • Updated Apr 8, 2025 • 90.6k • 104
unsloth/Qwen2.5-VL-3B-Instruct-GGUF

Image-Text-to-Text • 3B • Updated May 12, 2025 • 9.83k • 24
OpenGVLab/InternVL3-1B

Image-Text-to-Text • 0.9B • Updated Sep 11, 2025 • 128k • 85
BLIP3o/BLIP3o-Model-8B

14B • Updated Jun 4, 2025 • 328 • 103
FastVLM: Efficient Vision Encoding for Vision Language Models

Paper • 2412.13303 • Published Dec 17, 2024 • 77
jinaai/jina-clip-v2

Feature Extraction • 0.9B • Updated Apr 8 • 39.1k • 333
Qwen/Qwen3-VL-4B-Instruct

Image-Text-to-Text • 4B • Updated Oct 15, 2025 • 2.68M • 395
PaddlePaddle/PaddleOCR-VL

Image-Text-to-Text • 1.0B • Updated 17 days ago • 4.83k • 1.62k
PerceptronAI/Isaac-0.1

Image-Text-to-Text • 3B • Updated Mar 20 • 867 • 116
moondream/refcoco-m

Viewer • Updated Nov 17, 2025 • 1.19k • 345 • 49
nvidia/NVIDIA-Nemotron-Parse-v1.2

Image-Text-to-Text • 0.9B • Updated May 5 • 165k • 44