--- library_name: mlx-embeddings tags: - mlx - mlx-embeddings - embeddings - sentence-similarity - feature-extraction - quantized - 4bit - qwen - qwen3 - qwen3-embedding base_model: Qwen/Qwen3-Embedding-4B license: apache-2.0 pipeline_tag: feature-extraction language: - en - zh - multilingual --- # Qwen3-Embedding-4B MLX 4-bit MLX 4-bit quantization of [Qwen/Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B), produced with [mlx-embeddings](https://github.com/Blaizzy/mlx-embeddings) on Apple Silicon. ## What is this? Qwen3-Embedding is a decoder-only LLM-style text embedding model from the Qwen3 family, using last-token pooling to produce dense vector representations. It scores near the top of MMTEB multilingual benchmarks while retaining Apache-2.0 licensing. ## Quantization - Method: MLX affine quantization (`mlx_embeddings.convert`), group_size=64 - Bits per weight: 4 - Output size: **2.1 GB** (vs ~7.5 GB for bf16 source) ## Quickstart ```python from mlx_embeddings import load model, tokenizer = load("majentik/Qwen3-Embedding-4B-MLX-4bit") inputs = tokenizer( ["What is the capital of France?", "Paris is the capital of France."], padding=True, truncation=True, return_tensors="mlx" ) outputs = model(inputs["input_ids"], attention_mask=inputs["attention_mask"]) embeddings = outputs.text_embeds # already L2-normalised, shape [batch, dim] ``` For sentence similarity: ```python import mlx.core as mx e = embeddings scores = (e[0] @ e[1:].T).tolist() print(scores) ``` ## Model Specifications | Property | Value | |---|---| | Base Model | [Qwen/Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B) | | Architecture | Decoder-only (Qwen3ForCausalLM) with last-token pooling | | Parameters | 4B (4.0B) (pre-quantization) | | Context Length | 32K | | Embedding Dim | 2560 | | BF16 Size | ~7.5 GB | | License | apache-2.0 | | Languages | 100+ (multilingual) | ## License Apache 2.0 — inherited from the upstream Qwen3-Embedding model. Free for research and commercial use. ## See also - Base: [Qwen/Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B) - Official GGUF: [Qwen/Qwen3-Embedding-4B-GGUF](https://huggingface.co/Qwen/Qwen3-Embedding-4B-GGUF) (if published by Qwen) - mlx-embeddings package: https://github.com/Blaizzy/mlx-embeddings - Garden hub: [majentik/garden](https://huggingface.co/majentik/garden) - MTEB leaderboard: https://huggingface.co/spaces/mteb/leaderboard