majentik commited on
Commit
63475df
·
verified ·
1 Parent(s): a864c6b

docs: initial model card

Browse files
Files changed (1) hide show
  1. README.md +48 -0
README.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: gguf
3
+ tags:
4
+ - gguf
5
+ - llama-cpp
6
+ - embeddings
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - quantized
10
+ - Q8_0
11
+ base_model: Qwen/Qwen3-Embedding-4B
12
+ license: apache-2.0
13
+ pipeline_tag: feature-extraction
14
+ ---
15
+
16
+ # Qwen3-Embedding-4B GGUF Q8_0
17
+
18
+ llama.cpp GGUF Q8_0 quantization of [Qwen/Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B).
19
+
20
+ - Produced with: `llama-quantize` (upstream llama.cpp, April 2026 build)
21
+ - BF16 source converted via `convert_hf_to_gguf.py` from the fresh llama.cpp tree
22
+ - Quant type: **Q8_0**
23
+ - File size: **4.0 GB**
24
+
25
+ ## Quickstart
26
+
27
+ ```bash
28
+ llama-embedding -m qwen3-emb-4b-Q8_0.gguf \
29
+ -p "What is the capital of France?"
30
+ ```
31
+
32
+ Or via llama-cpp-python:
33
+
34
+ ```python
35
+ from llama_cpp import Llama
36
+ llm = Llama(model_path="qwen3-emb-4b-Q8_0.gguf", embedding=True)
37
+ vec = llm.embed("What is the capital of France?")
38
+ ```
39
+
40
+ ## License
41
+
42
+ Apache 2.0 — inherited from the upstream base model.
43
+
44
+ ## See also
45
+
46
+ - Base: [Qwen/Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B)
47
+ - Garden hub: [majentik/garden](https://huggingface.co/majentik/garden)
48
+ - llama.cpp: https://github.com/ggml-org/llama.cpp