Feature Extraction
Safetensors
MLX
English
Chinese
multilingual
mlx-embeddings
qwen3
embeddings
sentence-similarity
quantized
8bit
qwen
qwen3-embedding
Instructions to use majentik/Qwen3-Embedding-4B-MLX-8bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use majentik/Qwen3-Embedding-4B-MLX-8bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Qwen3-Embedding-4B-MLX-8bit majentik/Qwen3-Embedding-4B-MLX-8bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
Qwen3-Embedding-4B MLX 8-bit
MLX 8-bit quantization of Qwen/Qwen3-Embedding-4B, produced with mlx-embeddings on Apple Silicon.
What is this?
Qwen3-Embedding is a decoder-only LLM-style text embedding model from the Qwen3 family, using last-token pooling to produce dense vector representations. It scores near the top of MMTEB multilingual benchmarks while retaining Apache-2.0 licensing.
Quantization
- Method: MLX affine quantization (
mlx_embeddings.convert), group_size=64 - Bits per weight: 8
- Output size: 4.0 GB (vs ~7.5 GB for bf16 source)
Quickstart
from mlx_embeddings import load
model, tokenizer = load("majentik/Qwen3-Embedding-4B-MLX-8bit")
inputs = tokenizer(
["What is the capital of France?", "Paris is the capital of France."],
padding=True, truncation=True, return_tensors="mlx"
)
outputs = model(inputs["input_ids"], attention_mask=inputs["attention_mask"])
embeddings = outputs.text_embeds # already L2-normalised, shape [batch, dim]
For sentence similarity:
import mlx.core as mx
e = embeddings
scores = (e[0] @ e[1:].T).tolist()
print(scores)
Model Specifications
| Property | Value |
|---|---|
| Base Model | Qwen/Qwen3-Embedding-4B |
| Architecture | Decoder-only (Qwen3ForCausalLM) with last-token pooling |
| Parameters | 4B (4.0B) (pre-quantization) |
| Context Length | 32K |
| Embedding Dim | 2560 |
| BF16 Size | ~7.5 GB |
| License | apache-2.0 |
| Languages | 100+ (multilingual) |
License
Apache 2.0 — inherited from the upstream Qwen3-Embedding model. Free for research and commercial use.
See also
- Base: Qwen/Qwen3-Embedding-4B
- Official GGUF: Qwen/Qwen3-Embedding-4B-GGUF (if published by Qwen)
- mlx-embeddings package: https://github.com/Blaizzy/mlx-embeddings
- Garden hub: majentik/garden
- MTEB leaderboard: https://huggingface.co/spaces/mteb/leaderboard
- Downloads last month
- 152
Model size
1B params
Tensor type
F16
·
U32 ·
Hardware compatibility
Log In to add your hardware
Quantized