majentik commited on
Commit
f35cd2c
·
verified ·
1 Parent(s): 38296a6

docs: initial model card

Browse files
Files changed (1) hide show
  1. README.md +48 -0
README.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: gguf
3
+ tags:
4
+ - gguf
5
+ - llama-cpp
6
+ - embeddings
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - quantized
10
+ - IQ4_XS
11
+ base_model: Qwen/Qwen3-Embedding-4B
12
+ license: apache-2.0
13
+ pipeline_tag: feature-extraction
14
+ ---
15
+
16
+ # Qwen3-Embedding-4B GGUF IQ4_XS
17
+
18
+ llama.cpp GGUF IQ4_XS quantization of [Qwen/Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B).
19
+
20
+ - Produced with: `llama-quantize` (upstream llama.cpp, April 2026 build)
21
+ - BF16 source converted via `convert_hf_to_gguf.py` from the fresh llama.cpp tree
22
+ - Quant type: **IQ4_XS**
23
+ - File size: **2.1 GB**
24
+
25
+ ## Quickstart
26
+
27
+ ```bash
28
+ llama-embedding -m qwen3-emb-4b-IQ4_XS.gguf \
29
+ -p "What is the capital of France?"
30
+ ```
31
+
32
+ Or via llama-cpp-python:
33
+
34
+ ```python
35
+ from llama_cpp import Llama
36
+ llm = Llama(model_path="qwen3-emb-4b-IQ4_XS.gguf", embedding=True)
37
+ vec = llm.embed("What is the capital of France?")
38
+ ```
39
+
40
+ ## License
41
+
42
+ Apache 2.0 — inherited from the upstream base model.
43
+
44
+ ## See also
45
+
46
+ - Base: [Qwen/Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B)
47
+ - Garden hub: [majentik/garden](https://huggingface.co/majentik/garden)
48
+ - llama.cpp: https://github.com/ggml-org/llama.cpp