Urock-AI
/

Eddy-vl_embedding_1.9B_v1

Feature Extraction

vision-language

multimodal-embedding

digital-forensics

Model card Files Files and versions

Urock-chy commited on 1 day ago

Commit

627ae98

·

verified ·

1 Parent(s): d5bb69f

Simplify README usage section

Files changed (1) hide show

README.md +9 -27

README.md CHANGED Viewed

@@ -83,38 +83,20 @@ from PIL import Image
 from vl_embedding_v1 import VLEmbedder
 model_id = "Urock-AI/Eddy-vl_embedding_1.9B_v1"
-embedder = VLEmbedder(
-    model_id,
-    torch_dtype=torch.bfloat16,
-    default_instruction="Represent this input for retrieval.",
-)
-# weights default to ``{model_id}/model.safetensors``
-# embedder = VLEmbedder(model_id, weights_path="/path/to/model.safetensors", ...)
-```
-> `trust_remote_code=True` is required for `processing_vl.py` (`VLProcessor`) in this repo. Model weights load from `model.safetensors`.
-Embeddings are L2-normalized and compared by cosine similarity (dot product). A simple "find the closest image to this text" looks like:
-```python
-INSTRUCTION = "Represent this input for retrieval."
-query = embedder.process(
-    [{"text": "a tan toilet and sink in a small room", "instruction": INSTRUCTION}],
-)[0]
-candidates = [
-    embedder.process([{"image": Image.open(p), "instruction": INSTRUCTION}])[0]
-    for p in ["a.jpg", "b.jpg", "c.jpg"]
-]
-scores = [(query @ c.T).item() for c in candidates]
-ranking = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)
-print(ranking, scores)
 ```
-Need cheaper storage or faster search? Truncate the 2048-d vector to a shorter prefix before normalizing — you trade a little accuracy for a lot of speed.
 ---

 from vl_embedding_v1 import VLEmbedder
 model_id = "Urock-AI/Eddy-vl_embedding_1.9B_v1"
+instruction = "Represent this input for retrieval."
+embedder = VLEmbedder(model_id, torch_dtype=torch.bfloat16, default_instruction=instruction)
+# text / image / video → 2048-d vectors (L2-normalized)
+text_vec = embedder.process([{"text": "a photo of a cat"}])[0]
+image_vec = embedder.process([{"image": Image.open("photo.jpg")}])[0]
+video_vec = embedder.process([{"video": "clip.mp4"}])[0]
+# cosine similarity = dot product
+score = (text_vec @ image_vec.T).item()
 ```
+> `trust_remote_code=True` is required for `processing_vl.py` (`VLProcessor`) in this repo.
 ---