majentik's picture
feat: publish Qwen3-Embedding-4B MLX 4-bit
15c6a2c verified
---
library_name: mlx-embeddings
tags:
- mlx
- mlx-embeddings
- embeddings
- sentence-similarity
- feature-extraction
- quantized
- 4bit
- qwen
- qwen3
- qwen3-embedding
base_model: Qwen/Qwen3-Embedding-4B
license: apache-2.0
pipeline_tag: feature-extraction
language:
- en
- zh
- multilingual
---
# Qwen3-Embedding-4B MLX 4-bit
MLX 4-bit quantization of [Qwen/Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B), produced with [mlx-embeddings](https://github.com/Blaizzy/mlx-embeddings) on Apple Silicon.
## What is this?
Qwen3-Embedding is a decoder-only LLM-style text embedding model from the Qwen3 family, using last-token pooling to produce dense vector representations. It scores near the top of MMTEB multilingual benchmarks while retaining Apache-2.0 licensing.
## Quantization
- Method: MLX affine quantization (`mlx_embeddings.convert`), group_size=64
- Bits per weight: 4
- Output size: **2.1 GB** (vs ~7.5 GB for bf16 source)
## Quickstart
```python
from mlx_embeddings import load
model, tokenizer = load("majentik/Qwen3-Embedding-4B-MLX-4bit")
inputs = tokenizer(
["What is the capital of France?", "Paris is the capital of France."],
padding=True, truncation=True, return_tensors="mlx"
)
outputs = model(inputs["input_ids"], attention_mask=inputs["attention_mask"])
embeddings = outputs.text_embeds # already L2-normalised, shape [batch, dim]
```
For sentence similarity:
```python
import mlx.core as mx
e = embeddings
scores = (e[0] @ e[1:].T).tolist()
print(scores)
```
## Model Specifications
| Property | Value |
|---|---|
| Base Model | [Qwen/Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B) |
| Architecture | Decoder-only (Qwen3ForCausalLM) with last-token pooling |
| Parameters | 4B (4.0B) (pre-quantization) |
| Context Length | 32K |
| Embedding Dim | 2560 |
| BF16 Size | ~7.5 GB |
| License | apache-2.0 |
| Languages | 100+ (multilingual) |
## License
Apache 2.0 — inherited from the upstream Qwen3-Embedding model. Free for research and commercial use.
## See also
- Base: [Qwen/Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B)
- Official GGUF: [Qwen/Qwen3-Embedding-4B-GGUF](https://huggingface.co/Qwen/Qwen3-Embedding-4B-GGUF) (if published by Qwen)
- mlx-embeddings package: https://github.com/Blaizzy/mlx-embeddings
- Garden hub: [majentik/garden](https://huggingface.co/majentik/garden)
- MTEB leaderboard: https://huggingface.co/spaces/mteb/leaderboard