Urock-chy commited on
Commit
a7b3927
·
verified ·
1 Parent(s): 5b3a3b7

Reorder README: logo on top, MMEB before usage

Browse files
Files changed (1) hide show
  1. README.md +51 -47
README.md CHANGED
@@ -16,6 +16,10 @@ base_model:
16
  - Qwen/Qwen3-VL-Embedding-2B
17
  ---
18
 
 
 
 
 
19
  # Eddy-VL Embedding 1.9B
20
 
21
  [Urock-AI](https://huggingface.co/Urock-AI) · [urock.kr](https://urock.kr/) · License: Apache 2.0
@@ -60,53 +64,6 @@ Rather than train a new model from scratch, we started from one of the best open
60
 
61
  ---
62
 
63
- ## Installation
64
-
65
- ```bash
66
- git clone https://huggingface.co/Urock-AI/Eddy-vl_embedding_1.9B_v1
67
- cd Eddy-vl_embedding_1.9B_v1
68
-
69
- pip install "transformers>=5.0" safetensors torch pillow torchvision
70
- pip install decord # video input (or `av`)
71
- ```
72
-
73
- Inference code ships with the repo (not a pip package):
74
-
75
- | File | Role |
76
- |------|------|
77
- | `vl_embedding_v1.py` | `VLEmbedder` — load weights, encode text / image / video |
78
- | `processing_vl.py` | `VLProcessor` — multimodal tokenization & preprocessing |
79
- | `vl_utils/` | Image & video loading / resizing (bundled) |
80
-
81
- ## How to use it
82
-
83
- Clone the repo first, then run from inside the repo folder:
84
-
85
- ```python
86
- import torch
87
- from PIL import Image
88
- from vl_embedding_v1 import VLEmbedder
89
-
90
- instruction = "Represent this input for retrieval."
91
-
92
- # load from local repo checkout (weights in ./model.safetensors)
93
- embedder = VLEmbedder(".", torch_dtype=torch.bfloat16, default_instruction=instruction)
94
-
95
- # text / image / video → 2048-d vectors (L2-normalized)
96
- text_vec = embedder.process([{"text": "a photo of a cat"}])[0]
97
- image_vec = embedder.process([{"image": Image.open("photo.jpg")}])[0]
98
- video_vec = embedder.process([{"video": "clip.mp4"}])[0]
99
-
100
- # cosine similarity = dot product
101
- score = (text_vec @ image_vec.T).item()
102
- ```
103
-
104
- You can also pass the Hub repo id (`"Urock-AI/Eddy-vl_embedding_1.9B_v1"`) to download weights automatically, but you still need the cloned Python files on `PYTHONPATH`.
105
-
106
- > `trust_remote_code=True` is used internally for `processing_vl.py` (`VLProcessor`).
107
-
108
- ---
109
-
110
  ## How well it does
111
 
112
  Eddy-VL is validated on **[MMEB-V2](https://huggingface.co/datasets/TIGER-Lab/MMEB-V2)** across image, video, and document retrieval. Selected per-task results:
@@ -165,6 +122,53 @@ The public leaderboard and our in-house pipeline differ in setup, so we compare
165
 
166
  ---
167
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
168
  ## Good to know before you rely on it
169
 
170
  - **It finds, it doesn't decide.** Eddy-VL surfaces candidates for a human to review; it shouldn't be the sole basis for any high-stakes decision.
 
16
  - Qwen/Qwen3-VL-Embedding-2B
17
  ---
18
 
19
+ <p align="center">
20
+ <img src="logo.png" alt="Eddy-VL" width="480">
21
+ </p>
22
+
23
  # Eddy-VL Embedding 1.9B
24
 
25
  [Urock-AI](https://huggingface.co/Urock-AI) · [urock.kr](https://urock.kr/) · License: Apache 2.0
 
64
 
65
  ---
66
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
  ## How well it does
68
 
69
  Eddy-VL is validated on **[MMEB-V2](https://huggingface.co/datasets/TIGER-Lab/MMEB-V2)** across image, video, and document retrieval. Selected per-task results:
 
122
 
123
  ---
124
 
125
+ ## Installation
126
+
127
+ ```bash
128
+ git clone https://huggingface.co/Urock-AI/Eddy-vl_embedding_1.9B_v1
129
+ cd Eddy-vl_embedding_1.9B_v1
130
+
131
+ pip install "transformers>=5.0" safetensors torch pillow torchvision
132
+ pip install decord # video input (or `av`)
133
+ ```
134
+
135
+ Inference code ships with the repo (not a pip package):
136
+
137
+ | File | Role |
138
+ |------|------|
139
+ | `vl_embedding_v1.py` | `VLEmbedder` — load weights, encode text / image / video |
140
+ | `processing_vl.py` | `VLProcessor` — multimodal tokenization & preprocessing |
141
+ | `vl_utils/` | Image & video loading / resizing (bundled) |
142
+
143
+ ## How to use it
144
+
145
+ Clone the repo first, then run from inside the repo folder:
146
+
147
+ ```python
148
+ import torch
149
+ from PIL import Image
150
+ from vl_embedding_v1 import VLEmbedder
151
+
152
+ instruction = "Represent this input for retrieval."
153
+
154
+ # load from local repo checkout (weights in ./model.safetensors)
155
+ embedder = VLEmbedder(".", torch_dtype=torch.bfloat16, default_instruction=instruction)
156
+
157
+ # text / image / video → 2048-d vectors (L2-normalized)
158
+ text_vec = embedder.process([{"text": "a photo of a cat"}])[0]
159
+ image_vec = embedder.process([{"image": Image.open("photo.jpg")}])[0]
160
+ video_vec = embedder.process([{"video": "clip.mp4"}])[0]
161
+
162
+ # cosine similarity = dot product
163
+ score = (text_vec @ image_vec.T).item()
164
+ ```
165
+
166
+ You can also pass the Hub repo id (`"Urock-AI/Eddy-vl_embedding_1.9B_v1"`) to download weights automatically, but you still need the cloned Python files on `PYTHONPATH`.
167
+
168
+ > `trust_remote_code=True` is used internally for `processing_vl.py` (`VLProcessor`).
169
+
170
+ ---
171
+
172
  ## Good to know before you rely on it
173
 
174
  - **It finds, it doesn't decide.** Eddy-VL surfaces candidates for a human to review; it shouldn't be the sole basis for any high-stakes decision.