--- library_name: gguf tags: - gguf - llama-cpp - embeddings - sentence-similarity - feature-extraction - quantized - Q8_0 base_model: Qwen/Qwen3-Embedding-4B license: apache-2.0 pipeline_tag: feature-extraction --- # Qwen3-Embedding-4B GGUF Q8_0 llama.cpp GGUF Q8_0 quantization of [Qwen/Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B). - Produced with: `llama-quantize` (upstream llama.cpp, April 2026 build) - BF16 source converted via `convert_hf_to_gguf.py` from the fresh llama.cpp tree - Quant type: **Q8_0** - File size: **4.0 GB** ## Quickstart ```bash llama-embedding -m qwen3-emb-4b-Q8_0.gguf \ -p "What is the capital of France?" ``` Or via llama-cpp-python: ```python from llama_cpp import Llama llm = Llama(model_path="qwen3-emb-4b-Q8_0.gguf", embedding=True) vec = llm.embed("What is the capital of France?") ``` ## License Apache 2.0 — inherited from the upstream base model. ## See also - Base: [Qwen/Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B) - Garden hub: [majentik/garden](https://huggingface.co/majentik/garden) - llama.cpp: https://github.com/ggml-org/llama.cpp