--- library_name: gguf tags: - gguf - llama-cpp - embeddings - sentence-similarity - feature-extraction - quantized - IQ4_XS base_model: Qwen/Qwen3-Embedding-4B license: apache-2.0 pipeline_tag: feature-extraction --- # Qwen3-Embedding-4B GGUF IQ4_XS llama.cpp GGUF IQ4_XS quantization of [Qwen/Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B). - Produced with: `llama-quantize` (upstream llama.cpp, April 2026 build) - BF16 source converted via `convert_hf_to_gguf.py` from the fresh llama.cpp tree - Quant type: **IQ4_XS** - File size: **2.1 GB** ## Quickstart ```bash llama-embedding -m qwen3-emb-4b-IQ4_XS.gguf \ -p "What is the capital of France?" ``` Or via llama-cpp-python: ```python from llama_cpp import Llama llm = Llama(model_path="qwen3-emb-4b-IQ4_XS.gguf", embedding=True) vec = llm.embed("What is the capital of France?") ``` ## License Apache 2.0 — inherited from the upstream base model. ## See also - Base: [Qwen/Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B) - Garden hub: [majentik/garden](https://huggingface.co/majentik/garden) - llama.cpp: https://github.com/ggml-org/llama.cpp