Feature Extraction
Transformers
Safetensors
English
Korean
multilingual
qwen3_vl
vision-language
embedding
multimodal-embedding
mmeb
digital-forensics
custom_code
Instructions to use Urock-AI/Eddy-vl_embedding_1.9B_v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Urock-AI/Eddy-vl_embedding_1.9B_v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="Urock-AI/Eddy-vl_embedding_1.9B_v1", trust_remote_code=True)# Load model directly from transformers import AutoProcessor, AutoModel processor = AutoProcessor.from_pretrained("Urock-AI/Eddy-vl_embedding_1.9B_v1", trust_remote_code=True) model = AutoModel.from_pretrained("Urock-AI/Eddy-vl_embedding_1.9B_v1", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
Reorder README: logo on top, MMEB before usage
Browse files
README.md
CHANGED
|
@@ -16,6 +16,10 @@ base_model:
|
|
| 16 |
- Qwen/Qwen3-VL-Embedding-2B
|
| 17 |
---
|
| 18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
# Eddy-VL Embedding 1.9B
|
| 20 |
|
| 21 |
[Urock-AI](https://huggingface.co/Urock-AI) · [urock.kr](https://urock.kr/) · License: Apache 2.0
|
|
@@ -60,53 +64,6 @@ Rather than train a new model from scratch, we started from one of the best open
|
|
| 60 |
|
| 61 |
---
|
| 62 |
|
| 63 |
-
## Installation
|
| 64 |
-
|
| 65 |
-
```bash
|
| 66 |
-
git clone https://huggingface.co/Urock-AI/Eddy-vl_embedding_1.9B_v1
|
| 67 |
-
cd Eddy-vl_embedding_1.9B_v1
|
| 68 |
-
|
| 69 |
-
pip install "transformers>=5.0" safetensors torch pillow torchvision
|
| 70 |
-
pip install decord # video input (or `av`)
|
| 71 |
-
```
|
| 72 |
-
|
| 73 |
-
Inference code ships with the repo (not a pip package):
|
| 74 |
-
|
| 75 |
-
| File | Role |
|
| 76 |
-
|------|------|
|
| 77 |
-
| `vl_embedding_v1.py` | `VLEmbedder` — load weights, encode text / image / video |
|
| 78 |
-
| `processing_vl.py` | `VLProcessor` — multimodal tokenization & preprocessing |
|
| 79 |
-
| `vl_utils/` | Image & video loading / resizing (bundled) |
|
| 80 |
-
|
| 81 |
-
## How to use it
|
| 82 |
-
|
| 83 |
-
Clone the repo first, then run from inside the repo folder:
|
| 84 |
-
|
| 85 |
-
```python
|
| 86 |
-
import torch
|
| 87 |
-
from PIL import Image
|
| 88 |
-
from vl_embedding_v1 import VLEmbedder
|
| 89 |
-
|
| 90 |
-
instruction = "Represent this input for retrieval."
|
| 91 |
-
|
| 92 |
-
# load from local repo checkout (weights in ./model.safetensors)
|
| 93 |
-
embedder = VLEmbedder(".", torch_dtype=torch.bfloat16, default_instruction=instruction)
|
| 94 |
-
|
| 95 |
-
# text / image / video → 2048-d vectors (L2-normalized)
|
| 96 |
-
text_vec = embedder.process([{"text": "a photo of a cat"}])[0]
|
| 97 |
-
image_vec = embedder.process([{"image": Image.open("photo.jpg")}])[0]
|
| 98 |
-
video_vec = embedder.process([{"video": "clip.mp4"}])[0]
|
| 99 |
-
|
| 100 |
-
# cosine similarity = dot product
|
| 101 |
-
score = (text_vec @ image_vec.T).item()
|
| 102 |
-
```
|
| 103 |
-
|
| 104 |
-
You can also pass the Hub repo id (`"Urock-AI/Eddy-vl_embedding_1.9B_v1"`) to download weights automatically, but you still need the cloned Python files on `PYTHONPATH`.
|
| 105 |
-
|
| 106 |
-
> `trust_remote_code=True` is used internally for `processing_vl.py` (`VLProcessor`).
|
| 107 |
-
|
| 108 |
-
---
|
| 109 |
-
|
| 110 |
## How well it does
|
| 111 |
|
| 112 |
Eddy-VL is validated on **[MMEB-V2](https://huggingface.co/datasets/TIGER-Lab/MMEB-V2)** across image, video, and document retrieval. Selected per-task results:
|
|
@@ -165,6 +122,53 @@ The public leaderboard and our in-house pipeline differ in setup, so we compare
|
|
| 165 |
|
| 166 |
---
|
| 167 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 168 |
## Good to know before you rely on it
|
| 169 |
|
| 170 |
- **It finds, it doesn't decide.** Eddy-VL surfaces candidates for a human to review; it shouldn't be the sole basis for any high-stakes decision.
|
|
|
|
| 16 |
- Qwen/Qwen3-VL-Embedding-2B
|
| 17 |
---
|
| 18 |
|
| 19 |
+
<p align="center">
|
| 20 |
+
<img src="logo.png" alt="Eddy-VL" width="480">
|
| 21 |
+
</p>
|
| 22 |
+
|
| 23 |
# Eddy-VL Embedding 1.9B
|
| 24 |
|
| 25 |
[Urock-AI](https://huggingface.co/Urock-AI) · [urock.kr](https://urock.kr/) · License: Apache 2.0
|
|
|
|
| 64 |
|
| 65 |
---
|
| 66 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
## How well it does
|
| 68 |
|
| 69 |
Eddy-VL is validated on **[MMEB-V2](https://huggingface.co/datasets/TIGER-Lab/MMEB-V2)** across image, video, and document retrieval. Selected per-task results:
|
|
|
|
| 122 |
|
| 123 |
---
|
| 124 |
|
| 125 |
+
## Installation
|
| 126 |
+
|
| 127 |
+
```bash
|
| 128 |
+
git clone https://huggingface.co/Urock-AI/Eddy-vl_embedding_1.9B_v1
|
| 129 |
+
cd Eddy-vl_embedding_1.9B_v1
|
| 130 |
+
|
| 131 |
+
pip install "transformers>=5.0" safetensors torch pillow torchvision
|
| 132 |
+
pip install decord # video input (or `av`)
|
| 133 |
+
```
|
| 134 |
+
|
| 135 |
+
Inference code ships with the repo (not a pip package):
|
| 136 |
+
|
| 137 |
+
| File | Role |
|
| 138 |
+
|------|------|
|
| 139 |
+
| `vl_embedding_v1.py` | `VLEmbedder` — load weights, encode text / image / video |
|
| 140 |
+
| `processing_vl.py` | `VLProcessor` — multimodal tokenization & preprocessing |
|
| 141 |
+
| `vl_utils/` | Image & video loading / resizing (bundled) |
|
| 142 |
+
|
| 143 |
+
## How to use it
|
| 144 |
+
|
| 145 |
+
Clone the repo first, then run from inside the repo folder:
|
| 146 |
+
|
| 147 |
+
```python
|
| 148 |
+
import torch
|
| 149 |
+
from PIL import Image
|
| 150 |
+
from vl_embedding_v1 import VLEmbedder
|
| 151 |
+
|
| 152 |
+
instruction = "Represent this input for retrieval."
|
| 153 |
+
|
| 154 |
+
# load from local repo checkout (weights in ./model.safetensors)
|
| 155 |
+
embedder = VLEmbedder(".", torch_dtype=torch.bfloat16, default_instruction=instruction)
|
| 156 |
+
|
| 157 |
+
# text / image / video → 2048-d vectors (L2-normalized)
|
| 158 |
+
text_vec = embedder.process([{"text": "a photo of a cat"}])[0]
|
| 159 |
+
image_vec = embedder.process([{"image": Image.open("photo.jpg")}])[0]
|
| 160 |
+
video_vec = embedder.process([{"video": "clip.mp4"}])[0]
|
| 161 |
+
|
| 162 |
+
# cosine similarity = dot product
|
| 163 |
+
score = (text_vec @ image_vec.T).item()
|
| 164 |
+
```
|
| 165 |
+
|
| 166 |
+
You can also pass the Hub repo id (`"Urock-AI/Eddy-vl_embedding_1.9B_v1"`) to download weights automatically, but you still need the cloned Python files on `PYTHONPATH`.
|
| 167 |
+
|
| 168 |
+
> `trust_remote_code=True` is used internally for `processing_vl.py` (`VLProcessor`).
|
| 169 |
+
|
| 170 |
+
---
|
| 171 |
+
|
| 172 |
## Good to know before you rely on it
|
| 173 |
|
| 174 |
- **It finds, it doesn't decide.** Eddy-VL surfaces candidates for a human to review; it shouldn't be the sole basis for any high-stakes decision.
|