Urock-AI
/

Eddy-vl_embedding_1.9B_v1

@@ -16,10 +16,6 @@ base_model:
 - Qwen/Qwen3-VL-Embedding-2B
 ---
-<p align="center">
-  <img src="logo.png" alt="Eddy — Urock-AI" width="100%" />
-</p>
 # Eddy-VL Embedding 1.9B
 [Urock-AI](https://huggingface.co/Urock-AI) · [urock.kr](https://urock.kr/) · License: Apache 2.0
@@ -67,11 +63,14 @@ Rather than train a new model from scratch, we started from one of the best open
 ## Installation
 ```bash
 pip install "transformers>=5.0" safetensors torch pillow torchvision
 pip install decord   # video input (or `av`)
 ```
-Clone this repo (or add it to `PYTHONPATH`) — inference code ships with the model:
 | File | Role |
 |------|------|
@@ -81,15 +80,17 @@ Clone this repo (or add it to `PYTHONPATH`) — inference code ships with the mo
 ## How to use it
 ```python
 import torch
 from PIL import Image
 from vl_embedding_v1 import VLEmbedder
-model_id = "Urock-AI/Eddy-vl_embedding_1.9B_v1"
 instruction = "Represent this input for retrieval."
-embedder = VLEmbedder(model_id, torch_dtype=torch.bfloat16, default_instruction=instruction)
 # text / image / video → 2048-d vectors (L2-normalized)
 text_vec = embedder.process([{"text": "a photo of a cat"}])[0]
@@ -100,7 +101,9 @@ video_vec = embedder.process([{"video": "clip.mp4"}])[0]
 score = (text_vec @ image_vec.T).item()
 ```
-> `trust_remote_code=True` is required for `processing_vl.py` (`VLProcessor`) in this repo.
 ---
@@ -150,7 +153,15 @@ Eddy-VL is validated on **[MMEB-V2](https://huggingface.co/datasets/TIGER-Lab/MM
 | ViDoRe · InfoVQA | 82.4% |
 | VisRAG · ChartQA | 81.0% |
-**On reading the numbers:** the public MMEB leaderboard lists the base model around 73.0, but re-running it ourselves in the same environment gives 68.9. Measured that way — apples to apples — Eddy-VL lands within about 10% of the base model overall, while being smaller and faster. We report the in-house baseline (68.9) so the comparison is fair rather than flattering.
 ---

 - Qwen/Qwen3-VL-Embedding-2B
 ---
 # Eddy-VL Embedding 1.9B
 [Urock-AI](https://huggingface.co/Urock-AI) · [urock.kr](https://urock.kr/) · License: Apache 2.0
 ## Installation
 ```bash
+git clone https://huggingface.co/Urock-AI/Eddy-vl_embedding_1.9B_v1
+cd Eddy-vl_embedding_1.9B_v1
 pip install "transformers>=5.0" safetensors torch pillow torchvision
 pip install decord   # video input (or `av`)
 ```
+Inference code ships with the repo (not a pip package):
 | File | Role |
 |------|------|
 ## How to use it
+Clone the repo first, then run from inside the repo folder:
 ```python
 import torch
 from PIL import Image
 from vl_embedding_v1 import VLEmbedder
 instruction = "Represent this input for retrieval."
+# load from local repo checkout (weights in ./model.safetensors)
+embedder = VLEmbedder(".", torch_dtype=torch.bfloat16, default_instruction=instruction)
 # text / image / video → 2048-d vectors (L2-normalized)
 text_vec = embedder.process([{"text": "a photo of a cat"}])[0]
 score = (text_vec @ image_vec.T).item()
 ```
+You can also pass the Hub repo id (`"Urock-AI/Eddy-vl_embedding_1.9B_v1"`) to download weights automatically, but you still need the cloned Python files on `PYTHONPATH`.
+> `trust_remote_code=True` is used internally for `processing_vl.py` (`VLProcessor`).
 ---
 | ViDoRe · InfoVQA | 82.4% |
 | VisRAG · ChartQA | 81.0% |
+**MMEB-V2 overall (mean across all datasets, in-house eval):**
+| Model | Overall |
+|:------|--------:|
+| Qwen3-VL-Embedding-2B (public leaderboard) | 73.0 |
+| Qwen3-VL-Embedding-2B (our environment) | 68.9 |
+| **Eddy-VL 1.9B** | **63.2** |
+The public leaderboard and our in-house pipeline differ in setup, so we compare Eddy against the teacher re-run in the same environment (68.9 → 63.2). That is roughly a **8% relative gap** while being smaller and ~1.1× faster.
 ---