Feature Extraction
Transformers
Safetensors
English
Korean
multilingual
qwen3_vl
vision-language
embedding
multimodal-embedding
mmeb
digital-forensics
custom_code
Instructions to use Urock-AI/Eddy-vl_embedding_1.9B_v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Urock-AI/Eddy-vl_embedding_1.9B_v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="Urock-AI/Eddy-vl_embedding_1.9B_v1", trust_remote_code=True)# Load model directly from transformers import AutoProcessor, AutoModel processor = AutoProcessor.from_pretrained("Urock-AI/Eddy-vl_embedding_1.9B_v1", trust_remote_code=True) model = AutoModel.from_pretrained("Urock-AI/Eddy-vl_embedding_1.9B_v1", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
Clarify local load steps and MMEB overall scores
Browse files
README.md
CHANGED
|
@@ -16,10 +16,6 @@ base_model:
|
|
| 16 |
- Qwen/Qwen3-VL-Embedding-2B
|
| 17 |
---
|
| 18 |
|
| 19 |
-
<p align="center">
|
| 20 |
-
<img src="logo.png" alt="Eddy — Urock-AI" width="100%" />
|
| 21 |
-
</p>
|
| 22 |
-
|
| 23 |
# Eddy-VL Embedding 1.9B
|
| 24 |
|
| 25 |
[Urock-AI](https://huggingface.co/Urock-AI) · [urock.kr](https://urock.kr/) · License: Apache 2.0
|
|
@@ -67,11 +63,14 @@ Rather than train a new model from scratch, we started from one of the best open
|
|
| 67 |
## Installation
|
| 68 |
|
| 69 |
```bash
|
|
|
|
|
|
|
|
|
|
| 70 |
pip install "transformers>=5.0" safetensors torch pillow torchvision
|
| 71 |
pip install decord # video input (or `av`)
|
| 72 |
```
|
| 73 |
|
| 74 |
-
|
| 75 |
|
| 76 |
| File | Role |
|
| 77 |
|------|------|
|
|
@@ -81,15 +80,17 @@ Clone this repo (or add it to `PYTHONPATH`) — inference code ships with the mo
|
|
| 81 |
|
| 82 |
## How to use it
|
| 83 |
|
|
|
|
|
|
|
| 84 |
```python
|
| 85 |
import torch
|
| 86 |
from PIL import Image
|
| 87 |
from vl_embedding_v1 import VLEmbedder
|
| 88 |
|
| 89 |
-
model_id = "Urock-AI/Eddy-vl_embedding_1.9B_v1"
|
| 90 |
instruction = "Represent this input for retrieval."
|
| 91 |
|
| 92 |
-
|
|
|
|
| 93 |
|
| 94 |
# text / image / video → 2048-d vectors (L2-normalized)
|
| 95 |
text_vec = embedder.process([{"text": "a photo of a cat"}])[0]
|
|
@@ -100,7 +101,9 @@ video_vec = embedder.process([{"video": "clip.mp4"}])[0]
|
|
| 100 |
score = (text_vec @ image_vec.T).item()
|
| 101 |
```
|
| 102 |
|
| 103 |
-
|
|
|
|
|
|
|
| 104 |
|
| 105 |
---
|
| 106 |
|
|
@@ -150,7 +153,15 @@ Eddy-VL is validated on **[MMEB-V2](https://huggingface.co/datasets/TIGER-Lab/MM
|
|
| 150 |
| ViDoRe · InfoVQA | 82.4% |
|
| 151 |
| VisRAG · ChartQA | 81.0% |
|
| 152 |
|
| 153 |
-
**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 154 |
|
| 155 |
---
|
| 156 |
|
|
|
|
| 16 |
- Qwen/Qwen3-VL-Embedding-2B
|
| 17 |
---
|
| 18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
# Eddy-VL Embedding 1.9B
|
| 20 |
|
| 21 |
[Urock-AI](https://huggingface.co/Urock-AI) · [urock.kr](https://urock.kr/) · License: Apache 2.0
|
|
|
|
| 63 |
## Installation
|
| 64 |
|
| 65 |
```bash
|
| 66 |
+
git clone https://huggingface.co/Urock-AI/Eddy-vl_embedding_1.9B_v1
|
| 67 |
+
cd Eddy-vl_embedding_1.9B_v1
|
| 68 |
+
|
| 69 |
pip install "transformers>=5.0" safetensors torch pillow torchvision
|
| 70 |
pip install decord # video input (or `av`)
|
| 71 |
```
|
| 72 |
|
| 73 |
+
Inference code ships with the repo (not a pip package):
|
| 74 |
|
| 75 |
| File | Role |
|
| 76 |
|------|------|
|
|
|
|
| 80 |
|
| 81 |
## How to use it
|
| 82 |
|
| 83 |
+
Clone the repo first, then run from inside the repo folder:
|
| 84 |
+
|
| 85 |
```python
|
| 86 |
import torch
|
| 87 |
from PIL import Image
|
| 88 |
from vl_embedding_v1 import VLEmbedder
|
| 89 |
|
|
|
|
| 90 |
instruction = "Represent this input for retrieval."
|
| 91 |
|
| 92 |
+
# load from local repo checkout (weights in ./model.safetensors)
|
| 93 |
+
embedder = VLEmbedder(".", torch_dtype=torch.bfloat16, default_instruction=instruction)
|
| 94 |
|
| 95 |
# text / image / video → 2048-d vectors (L2-normalized)
|
| 96 |
text_vec = embedder.process([{"text": "a photo of a cat"}])[0]
|
|
|
|
| 101 |
score = (text_vec @ image_vec.T).item()
|
| 102 |
```
|
| 103 |
|
| 104 |
+
You can also pass the Hub repo id (`"Urock-AI/Eddy-vl_embedding_1.9B_v1"`) to download weights automatically, but you still need the cloned Python files on `PYTHONPATH`.
|
| 105 |
+
|
| 106 |
+
> `trust_remote_code=True` is used internally for `processing_vl.py` (`VLProcessor`).
|
| 107 |
|
| 108 |
---
|
| 109 |
|
|
|
|
| 153 |
| ViDoRe · InfoVQA | 82.4% |
|
| 154 |
| VisRAG · ChartQA | 81.0% |
|
| 155 |
|
| 156 |
+
**MMEB-V2 overall (mean across all datasets, in-house eval):**
|
| 157 |
+
|
| 158 |
+
| Model | Overall |
|
| 159 |
+
|:------|--------:|
|
| 160 |
+
| Qwen3-VL-Embedding-2B (public leaderboard) | 73.0 |
|
| 161 |
+
| Qwen3-VL-Embedding-2B (our environment) | 68.9 |
|
| 162 |
+
| **Eddy-VL 1.9B** | **63.2** |
|
| 163 |
+
|
| 164 |
+
The public leaderboard and our in-house pipeline differ in setup, so we compare Eddy against the teacher re-run in the same environment (68.9 → 63.2). That is roughly a **8% relative gap** while being smaller and ~1.1× faster.
|
| 165 |
|
| 166 |
---
|
| 167 |
|