---
library_name: gguf
tags:
- gguf
- llama-cpp
- embeddings
- sentence-similarity
- feature-extraction
- quantized
- Q8_0
base_model: Qwen/Qwen3-Embedding-4B
license: apache-2.0
pipeline_tag: feature-extraction
---

# Qwen3-Embedding-4B GGUF Q8_0

llama.cpp GGUF Q8_0 quantization of [Qwen/Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B).

- Produced with: `llama-quantize` (upstream llama.cpp, April 2026 build)
- BF16 source converted via `convert_hf_to_gguf.py` from the fresh llama.cpp tree
- Quant type: **Q8_0**
- File size: **4.0 GB**

## Quickstart

```bash
llama-embedding -m qwen3-emb-4b-Q8_0.gguf \
  -p "What is the capital of France?"
```

Or via llama-cpp-python:

```python
from llama_cpp import Llama
llm = Llama(model_path="qwen3-emb-4b-Q8_0.gguf", embedding=True)
vec = llm.embed("What is the capital of France?")
```

## License

Apache 2.0 — inherited from the upstream base model.

## See also

- Base: [Qwen/Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B)
- Garden hub: [majentik/garden](https://huggingface.co/majentik/garden)
- llama.cpp: https://github.com/ggml-org/llama.cpp