---
library_name: gguf
tags:
- gguf
- llama-cpp
- embeddings
- sentence-similarity
- feature-extraction
- quantized
- IQ4_XS
base_model: Qwen/Qwen3-Embedding-4B
license: apache-2.0
pipeline_tag: feature-extraction
---

# Qwen3-Embedding-4B GGUF IQ4_XS

llama.cpp GGUF IQ4_XS quantization of [Qwen/Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B).

- Produced with: `llama-quantize` (upstream llama.cpp, April 2026 build)
- BF16 source converted via `convert_hf_to_gguf.py` from the fresh llama.cpp tree
- Quant type: **IQ4_XS**
- File size: **2.1 GB**

## Quickstart

```bash
llama-embedding -m qwen3-emb-4b-IQ4_XS.gguf \
  -p "What is the capital of France?"
```

Or via llama-cpp-python:

```python
from llama_cpp import Llama
llm = Llama(model_path="qwen3-emb-4b-IQ4_XS.gguf", embedding=True)
vec = llm.embed("What is the capital of France?")
```

## License

Apache 2.0 — inherited from the upstream base model.

## See also

- Base: [Qwen/Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B)
- Garden hub: [majentik/garden](https://huggingface.co/majentik/garden)
- llama.cpp: https://github.com/ggml-org/llama.cpp