Urock-chy commited on
Commit
5b3a3b7
·
verified ·
1 Parent(s): 667de08

Clarify local load steps and MMEB overall scores

Browse files
Files changed (1) hide show
  1. README.md +20 -9
README.md CHANGED
@@ -16,10 +16,6 @@ base_model:
16
  - Qwen/Qwen3-VL-Embedding-2B
17
  ---
18
 
19
- <p align="center">
20
- <img src="logo.png" alt="Eddy — Urock-AI" width="100%" />
21
- </p>
22
-
23
  # Eddy-VL Embedding 1.9B
24
 
25
  [Urock-AI](https://huggingface.co/Urock-AI) · [urock.kr](https://urock.kr/) · License: Apache 2.0
@@ -67,11 +63,14 @@ Rather than train a new model from scratch, we started from one of the best open
67
  ## Installation
68
 
69
  ```bash
 
 
 
70
  pip install "transformers>=5.0" safetensors torch pillow torchvision
71
  pip install decord # video input (or `av`)
72
  ```
73
 
74
- Clone this repo (or add it to `PYTHONPATH`) — inference code ships with the model:
75
 
76
  | File | Role |
77
  |------|------|
@@ -81,15 +80,17 @@ Clone this repo (or add it to `PYTHONPATH`) — inference code ships with the mo
81
 
82
  ## How to use it
83
 
 
 
84
  ```python
85
  import torch
86
  from PIL import Image
87
  from vl_embedding_v1 import VLEmbedder
88
 
89
- model_id = "Urock-AI/Eddy-vl_embedding_1.9B_v1"
90
  instruction = "Represent this input for retrieval."
91
 
92
- embedder = VLEmbedder(model_id, torch_dtype=torch.bfloat16, default_instruction=instruction)
 
93
 
94
  # text / image / video → 2048-d vectors (L2-normalized)
95
  text_vec = embedder.process([{"text": "a photo of a cat"}])[0]
@@ -100,7 +101,9 @@ video_vec = embedder.process([{"video": "clip.mp4"}])[0]
100
  score = (text_vec @ image_vec.T).item()
101
  ```
102
 
103
- > `trust_remote_code=True` is required for `processing_vl.py` (`VLProcessor`) in this repo.
 
 
104
 
105
  ---
106
 
@@ -150,7 +153,15 @@ Eddy-VL is validated on **[MMEB-V2](https://huggingface.co/datasets/TIGER-Lab/MM
150
  | ViDoRe · InfoVQA | 82.4% |
151
  | VisRAG · ChartQA | 81.0% |
152
 
153
- **On reading the numbers:** the public MMEB leaderboard lists the base model around 73.0, but re-running it ourselves in the same environment gives 68.9. Measured that way — apples to apples — Eddy-VL lands within about 10% of the base model overall, while being smaller and faster. We report the in-house baseline (68.9) so the comparison is fair rather than flattering.
 
 
 
 
 
 
 
 
154
 
155
  ---
156
 
 
16
  - Qwen/Qwen3-VL-Embedding-2B
17
  ---
18
 
 
 
 
 
19
  # Eddy-VL Embedding 1.9B
20
 
21
  [Urock-AI](https://huggingface.co/Urock-AI) · [urock.kr](https://urock.kr/) · License: Apache 2.0
 
63
  ## Installation
64
 
65
  ```bash
66
+ git clone https://huggingface.co/Urock-AI/Eddy-vl_embedding_1.9B_v1
67
+ cd Eddy-vl_embedding_1.9B_v1
68
+
69
  pip install "transformers>=5.0" safetensors torch pillow torchvision
70
  pip install decord # video input (or `av`)
71
  ```
72
 
73
+ Inference code ships with the repo (not a pip package):
74
 
75
  | File | Role |
76
  |------|------|
 
80
 
81
  ## How to use it
82
 
83
+ Clone the repo first, then run from inside the repo folder:
84
+
85
  ```python
86
  import torch
87
  from PIL import Image
88
  from vl_embedding_v1 import VLEmbedder
89
 
 
90
  instruction = "Represent this input for retrieval."
91
 
92
+ # load from local repo checkout (weights in ./model.safetensors)
93
+ embedder = VLEmbedder(".", torch_dtype=torch.bfloat16, default_instruction=instruction)
94
 
95
  # text / image / video → 2048-d vectors (L2-normalized)
96
  text_vec = embedder.process([{"text": "a photo of a cat"}])[0]
 
101
  score = (text_vec @ image_vec.T).item()
102
  ```
103
 
104
+ You can also pass the Hub repo id (`"Urock-AI/Eddy-vl_embedding_1.9B_v1"`) to download weights automatically, but you still need the cloned Python files on `PYTHONPATH`.
105
+
106
+ > `trust_remote_code=True` is used internally for `processing_vl.py` (`VLProcessor`).
107
 
108
  ---
109
 
 
153
  | ViDoRe · InfoVQA | 82.4% |
154
  | VisRAG · ChartQA | 81.0% |
155
 
156
+ **MMEB-V2 overall (mean across all datasets, in-house eval):**
157
+
158
+ | Model | Overall |
159
+ |:------|--------:|
160
+ | Qwen3-VL-Embedding-2B (public leaderboard) | 73.0 |
161
+ | Qwen3-VL-Embedding-2B (our environment) | 68.9 |
162
+ | **Eddy-VL 1.9B** | **63.2** |
163
+
164
+ The public leaderboard and our in-house pipeline differ in setup, so we compare Eddy against the teacher re-run in the same environment (68.9 → 63.2). That is roughly a **8% relative gap** while being smaller and ~1.1× faster.
165
 
166
  ---
167