Feature Extraction
MLX
Safetensors
multilingual
qwen3_vl
mlx-vlm
mlx-embeddings
embedding
multimodal
vision-language
apple-silicon
8-bit precision
Instructions to use nkamiy/Qwen3-VL-Embedding-8B-8bit-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use nkamiy/Qwen3-VL-Embedding-8B-8bit-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Qwen3-VL-Embedding-8B-8bit-mlx nkamiy/Qwen3-VL-Embedding-8B-8bit-mlx
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Qwen3-VL-Embedding-8B — 8-bit MLX
Qwen3-VL-Embedding-8B converted to 8-bit quantized MLX format for Apple Silicon. Original model: Qwen/Qwen3-VL-Embedding-8B
| Property | Value |
|---|---|
| Base model | Qwen/Qwen3-VL-Embedding-8B |
| Quantization | 8-bit affine, group_size=64 |
| Model size | ~9.2 GB |
| Embedding dim | 4096 |
| License | Apache 2.0 |
Usage
Install dependencies:
pip install mlx-embeddings mlx-vlm torch torchvision
Note:
torchandtorchvisionare required only for image preprocessing (viatransformers.AutoImageProcessor). The model itself runs entirely on MLX.
Known issues with mlx-embeddings / mlx-vlm
Two patches are currently required:
model.load_weights()raisesMissing 1 parameters: language_model.lm_head.weightbecause embedding models do not have anlm_head. Fix: usestrict=False.transformers >= 4.50requiresprocessor.image_ids/audio_ids/video_idsattributes that mlx-embeddings does not set. Fix: set them manually after loading.
Full working example
import mlx.core as mx
import mlx.nn as nn
import numpy as np
from mlx_embeddings import load
# Patch 1: allow missing lm_head weight
_orig_lw = nn.Module.load_weights
def _lw(self, w, strict=False):
return _orig_lw(self, w, strict=strict)
nn.Module.load_weights = _lw
model, processor = load("nkamiy/Qwen3-VL-Embedding-8B-8bit-mlx")
# Patch 2: missing processor attributes (transformers >= 4.50)
inner = getattr(processor, "processor", processor)
if not hasattr(inner, "image_ids"):
inner.image_ids = [getattr(inner, "image_token_id", None)]
if not hasattr(inner, "video_ids"):
inner.video_ids = [getattr(inner, "video_token_id", None)]
if not hasattr(inner, "audio_ids"):
inner.audio_ids = [None]
# Embed text, image, or both
inputs = [
{"text": "a man arguing with a plant",
"instruction": "Retrieve images or text relevant to the user's query."},
{"text": "a comedic scene in a flower shop"},
{"image": "/path/to/thumbnail.png", "text": "dialogue here"},
]
embeddings = model.process(inputs, processor=processor)
mx.eval(embeddings)
arr = np.array(embeddings.astype(mx.float32))
# arr.shape == (3, 4096), L2-normalized
similarity = arr @ arr.T
print(similarity)
Conversion
Converted from the original Hugging Face weights using mlx_vlm.convert with a strict=False patch:
python -m mlx_vlm convert \
--hf-path Qwen/Qwen3-VL-Embedding-8B \
--mlx-path ./Qwen3-VL-Embedding-8B-8bit-mlx \
--quantize --q-bits 8 --q-group-size 64 --q-mode affine
License
Apache 2.0 — same as the original Qwen/Qwen3-VL-Embedding-8B.
- Downloads last month
- 138
Model size
3B params
Tensor type
BF16
·
U32 ·
Hardware compatibility
Log In to add your hardware
8-bit