Feature Extraction
sentence-transformers
Safetensors
Transformers
qwen3
text-generation
sentence-similarity
text-embeddings-inference
Instructions to use Qwen/Qwen3-Embedding-0.6B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Qwen/Qwen3-Embedding-0.6B with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Transformers
How to use Qwen/Qwen3-Embedding-0.6B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="Qwen/Qwen3-Embedding-0.6B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-Embedding-0.6B") model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-Embedding-0.6B") - Inference
- Notebooks
- Google Colab
- Kaggle
update README
#4
by zyznull - opened
README.md
CHANGED
|
@@ -7,7 +7,6 @@ tags:
|
|
| 7 |
- sentence-transformers
|
| 8 |
- sentence-similarity
|
| 9 |
- feature-extraction
|
| 10 |
-
- text-embeddings-inference
|
| 11 |
---
|
| 12 |
# Qwen3-Embedding-0.6B
|
| 13 |
|
|
@@ -24,14 +23,13 @@ The Qwen3 Embedding model series is the latest proprietary model of the Qwen fam
|
|
| 24 |
**Comprehensive Flexibility**: The Qwen3 Embedding series offers a full spectrum of sizes (from 0.6B to 8B) for both embedding and reranking models, catering to diverse use cases that prioritize efficiency and effectiveness. Developers can seamlessly combine these two modules. Additionally, the embedding model allows for flexible vector definitions across all dimensions, and both embedding and reranking models support user-defined instructions to enhance performance for specific tasks, languages, or scenarios.
|
| 25 |
|
| 26 |
**Multilingual Capability**: The Qwen3 Embedding series offer support for over 100 languages, thanks to the multilingual capabilites of Qwen3 models. This includes various programming languages, and provides robust multilingual, cross-lingual, and code retrieval capabilities.
|
| 27 |
-
|
| 28 |
## Model Overview
|
| 29 |
|
| 30 |
**Qwen3-Embedding-0.6B** has the following features:
|
| 31 |
|
| 32 |
- Model Type: Text Embedding
|
| 33 |
- Supported Languages: 100+ Languages
|
| 34 |
-
- Number of
|
| 35 |
- Context Length: 32k
|
| 36 |
- Embedding Dimension: Up to 1024, supports user-defined output dimensions ranging from 32 to 1024
|
| 37 |
|
|
@@ -64,7 +62,6 @@ KeyError: 'qwen3'
|
|
| 64 |
|
| 65 |
```python
|
| 66 |
# Requires transformers>=4.51.0
|
| 67 |
-
# Requires sentence-transformers>=2.7.0
|
| 68 |
|
| 69 |
from sentence_transformers import SentenceTransformer
|
| 70 |
|
|
@@ -168,66 +165,8 @@ scores = (embeddings[:2] @ embeddings[2:].T)
|
|
| 168 |
print(scores.tolist())
|
| 169 |
# [[0.7645568251609802, 0.14142508804798126], [0.13549736142158508, 0.5999549627304077]]
|
| 170 |
```
|
| 171 |
-
|
| 172 |
-
### vLLM Usage
|
| 173 |
-
|
| 174 |
-
```python
|
| 175 |
-
# Requires vllm>=0.8.5
|
| 176 |
-
import torch
|
| 177 |
-
import vllm
|
| 178 |
-
from vllm import LLM
|
| 179 |
-
|
| 180 |
-
def get_detailed_instruct(task_description: str, query: str) -> str:
|
| 181 |
-
return f'Instruct: {task_description}\nQuery:{query}'
|
| 182 |
-
|
| 183 |
-
# Each query must come with a one-sentence instruction that describes the task
|
| 184 |
-
task = 'Given a web search query, retrieve relevant passages that answer the query'
|
| 185 |
-
|
| 186 |
-
queries = [
|
| 187 |
-
get_detailed_instruct(task, 'What is the capital of China?'),
|
| 188 |
-
get_detailed_instruct(task, 'Explain gravity')
|
| 189 |
-
]
|
| 190 |
-
# No need to add instruction for retrieval documents
|
| 191 |
-
documents = [
|
| 192 |
-
"The capital of China is Beijing.",
|
| 193 |
-
"Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
|
| 194 |
-
]
|
| 195 |
-
input_texts = queries + documents
|
| 196 |
-
|
| 197 |
-
model = LLM(model="Qwen/Qwen3-Embedding-0.6B", task="embed")
|
| 198 |
-
|
| 199 |
-
outputs = model.embed(input_texts)
|
| 200 |
-
embeddings = torch.tensor([o.outputs.embedding for o in outputs])
|
| 201 |
-
scores = (embeddings[:2] @ embeddings[2:].T)
|
| 202 |
-
print(scores.tolist())
|
| 203 |
-
# [[0.7620252966880798, 0.14078938961029053], [0.1358368694782257, 0.6013815999031067]]
|
| 204 |
-
```
|
| 205 |
-
|
| 206 |
📌 **Tip**: We recommend that developers customize the `instruct` according to their specific scenarios, tasks, and languages. Our tests have shown that in most retrieval scenarios, not using an `instruct` on the query side can lead to a drop in retrieval performance by approximately 1% to 5%.
|
| 207 |
|
| 208 |
-
### Text Embeddings Inference (TEI) Usage
|
| 209 |
-
|
| 210 |
-
You can either run / deploy TEI on NVIDIA GPUs as:
|
| 211 |
-
|
| 212 |
-
```bash
|
| 213 |
-
docker run --gpus all -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.7.2 --model-id Qwen/Qwen3-Embedding-0.6B --dtype float16
|
| 214 |
-
```
|
| 215 |
-
|
| 216 |
-
Or on CPU devices as:
|
| 217 |
-
|
| 218 |
-
```bash
|
| 219 |
-
docker run -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7.2 --model-id Qwen/Qwen3-Embedding-0.6B
|
| 220 |
-
```
|
| 221 |
-
|
| 222 |
-
And then, generate the embeddings sending a HTTP POST request as:
|
| 223 |
-
|
| 224 |
-
```bash
|
| 225 |
-
curl http://localhost:8080/embed \
|
| 226 |
-
-X POST \
|
| 227 |
-
-d '{"inputs": ["Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: What is the capital of China?", "Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: Explain gravity"]}' \
|
| 228 |
-
-H "Content-Type: application/json"
|
| 229 |
-
```
|
| 230 |
-
|
| 231 |
## Evaluation
|
| 232 |
|
| 233 |
### MTEB (Multilingual)
|
|
@@ -283,10 +222,11 @@ curl http://localhost:8080/embed \
|
|
| 283 |
If you find our work helpful, feel free to give us a cite.
|
| 284 |
|
| 285 |
```
|
| 286 |
-
@
|
| 287 |
-
|
| 288 |
-
|
| 289 |
-
|
| 290 |
-
|
|
|
|
| 291 |
}
|
| 292 |
```
|
|
|
|
| 7 |
- sentence-transformers
|
| 8 |
- sentence-similarity
|
| 9 |
- feature-extraction
|
|
|
|
| 10 |
---
|
| 11 |
# Qwen3-Embedding-0.6B
|
| 12 |
|
|
|
|
| 23 |
**Comprehensive Flexibility**: The Qwen3 Embedding series offers a full spectrum of sizes (from 0.6B to 8B) for both embedding and reranking models, catering to diverse use cases that prioritize efficiency and effectiveness. Developers can seamlessly combine these two modules. Additionally, the embedding model allows for flexible vector definitions across all dimensions, and both embedding and reranking models support user-defined instructions to enhance performance for specific tasks, languages, or scenarios.
|
| 24 |
|
| 25 |
**Multilingual Capability**: The Qwen3 Embedding series offer support for over 100 languages, thanks to the multilingual capabilites of Qwen3 models. This includes various programming languages, and provides robust multilingual, cross-lingual, and code retrieval capabilities.
|
|
|
|
| 26 |
## Model Overview
|
| 27 |
|
| 28 |
**Qwen3-Embedding-0.6B** has the following features:
|
| 29 |
|
| 30 |
- Model Type: Text Embedding
|
| 31 |
- Supported Languages: 100+ Languages
|
| 32 |
+
- Number of Paramaters: 0.6B
|
| 33 |
- Context Length: 32k
|
| 34 |
- Embedding Dimension: Up to 1024, supports user-defined output dimensions ranging from 32 to 1024
|
| 35 |
|
|
|
|
| 62 |
|
| 63 |
```python
|
| 64 |
# Requires transformers>=4.51.0
|
|
|
|
| 65 |
|
| 66 |
from sentence_transformers import SentenceTransformer
|
| 67 |
|
|
|
|
| 165 |
print(scores.tolist())
|
| 166 |
# [[0.7645568251609802, 0.14142508804798126], [0.13549736142158508, 0.5999549627304077]]
|
| 167 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 168 |
📌 **Tip**: We recommend that developers customize the `instruct` according to their specific scenarios, tasks, and languages. Our tests have shown that in most retrieval scenarios, not using an `instruct` on the query side can lead to a drop in retrieval performance by approximately 1% to 5%.
|
| 169 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 170 |
## Evaluation
|
| 171 |
|
| 172 |
### MTEB (Multilingual)
|
|
|
|
| 222 |
If you find our work helpful, feel free to give us a cite.
|
| 223 |
|
| 224 |
```
|
| 225 |
+
@misc{qwen3-embedding,
|
| 226 |
+
title = {Qwen3-Embedding},
|
| 227 |
+
url = {https://qwenlm.github.io/blog/qwen3/},
|
| 228 |
+
author = {Qwen Team},
|
| 229 |
+
month = {May},
|
| 230 |
+
year = {2025}
|
| 231 |
}
|
| 232 |
```
|