Feature Extraction
Safetensors
MLX
English
Chinese
multilingual
mlx-embeddings
qwen3
embeddings
sentence-similarity
quantized
4bit
qwen
qwen3-embedding
Instructions to use majentik/Qwen3-Embedding-4B-MLX-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use majentik/Qwen3-Embedding-4B-MLX-4bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Qwen3-Embedding-4B-MLX-4bit majentik/Qwen3-Embedding-4B-MLX-4bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
| library_name: mlx-embeddings | |
| tags: | |
| - mlx | |
| - mlx-embeddings | |
| - embeddings | |
| - sentence-similarity | |
| - feature-extraction | |
| - quantized | |
| - 4bit | |
| - qwen | |
| - qwen3 | |
| - qwen3-embedding | |
| base_model: Qwen/Qwen3-Embedding-4B | |
| license: apache-2.0 | |
| pipeline_tag: feature-extraction | |
| language: | |
| - en | |
| - zh | |
| - multilingual | |
| # Qwen3-Embedding-4B MLX 4-bit | |
| MLX 4-bit quantization of [Qwen/Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B), produced with [mlx-embeddings](https://github.com/Blaizzy/mlx-embeddings) on Apple Silicon. | |
| ## What is this? | |
| Qwen3-Embedding is a decoder-only LLM-style text embedding model from the Qwen3 family, using last-token pooling to produce dense vector representations. It scores near the top of MMTEB multilingual benchmarks while retaining Apache-2.0 licensing. | |
| ## Quantization | |
| - Method: MLX affine quantization (`mlx_embeddings.convert`), group_size=64 | |
| - Bits per weight: 4 | |
| - Output size: **2.1 GB** (vs ~7.5 GB for bf16 source) | |
| ## Quickstart | |
| ```python | |
| from mlx_embeddings import load | |
| model, tokenizer = load("majentik/Qwen3-Embedding-4B-MLX-4bit") | |
| inputs = tokenizer( | |
| ["What is the capital of France?", "Paris is the capital of France."], | |
| padding=True, truncation=True, return_tensors="mlx" | |
| ) | |
| outputs = model(inputs["input_ids"], attention_mask=inputs["attention_mask"]) | |
| embeddings = outputs.text_embeds # already L2-normalised, shape [batch, dim] | |
| ``` | |
| For sentence similarity: | |
| ```python | |
| import mlx.core as mx | |
| e = embeddings | |
| scores = (e[0] @ e[1:].T).tolist() | |
| print(scores) | |
| ``` | |
| ## Model Specifications | |
| | Property | Value | | |
| |---|---| | |
| | Base Model | [Qwen/Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B) | | |
| | Architecture | Decoder-only (Qwen3ForCausalLM) with last-token pooling | | |
| | Parameters | 4B (4.0B) (pre-quantization) | | |
| | Context Length | 32K | | |
| | Embedding Dim | 2560 | | |
| | BF16 Size | ~7.5 GB | | |
| | License | apache-2.0 | | |
| | Languages | 100+ (multilingual) | | |
| ## License | |
| Apache 2.0 — inherited from the upstream Qwen3-Embedding model. Free for research and commercial use. | |
| ## See also | |
| - Base: [Qwen/Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B) | |
| - Official GGUF: [Qwen/Qwen3-Embedding-4B-GGUF](https://huggingface.co/Qwen/Qwen3-Embedding-4B-GGUF) (if published by Qwen) | |
| - mlx-embeddings package: https://github.com/Blaizzy/mlx-embeddings | |
| - Garden hub: [majentik/garden](https://huggingface.co/majentik/garden) | |
| - MTEB leaderboard: https://huggingface.co/spaces/mteb/leaderboard | |