GGUF
conversational
Thireus commited on
Commit
0d05512
·
1 Parent(s): 19e3e1e

Update README

Browse files
Files changed (1) hide show
  1. README.md +9 -8
README.md CHANGED
@@ -5,12 +5,13 @@ license: mit
5
 
6
  ## 🤔 What is this [HuggingFace repository](https://huggingface.co/Thireus/Qwen3-235B-A22B-Instruct-2507-THIREUS-BF16-SPECIAL_SPLIT/) about?
7
 
8
- This repository provides **GGUF-quantized tensors** for the Qwen3-235B-A22B-Instruct-2507 model (official repo: https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507). These GGUF shards are designed to be used with **Thireus’ GGUF Tool Suite** (https://gguf.thireus.com), a collection of tools that automatically finds the perplexity-optimal mix of quantizations for any given VRAM and RAM target. With the Tool Suite, you can generate and download custom quantization recipes effortlessly.
9
 
10
  - 📖 Read more: https://github.com/Thireus/GGUF-Tool-Suite
11
- - 🔍 Example quant mixes: https://github.com/Thireus/GGUF-Tool-Suite/tree/main/recipe_examples
12
- - 🛠Create your own recipe: https://colab.research.google.com/github/Thireus/GGUF-Tool-Suite/blob/main/quant_recipe_pipeline.ipynb
13
- - 📂 Browse available quant shards: https://huggingface.co/Thireus/collections
 
14
 
15
  *tl;dr: Expand the details section below*
16
  <details>
@@ -33,7 +34,7 @@ cd ..
33
  # Obtain Thireus' GGUF-Tool-Suite
34
  GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/Thireus/GGUF-Tool-Suite
35
 
36
- # Download model quant mix from recipe file:
37
  cd GGUF-Tool-Suite
38
  rm -f download.conf # Make sure to copy the relevant download.conf for the model before running quant_assign.py
39
  cp -f models/DeepSeek-R1-0528/download.conf . # Use the download.conf of the chosen model
@@ -46,7 +47,7 @@ mkdir -p kitchen && cd kitchen
46
  ulimit -n 9999 # Lifts "too many open files" limitation on Linux
47
  ~/ik_llama.cpp/build/bin/llama-cli \
48
  -m DeepSeek-R1-0528-THIREUS-BF16-SPECIAL_TENSOR-00001-of-01148.gguf \
49
- -mla 3 -fa -amb 512 -fmoe -ctk f16 -c 4096 -ngl 99 \
50
  -ot "blk\.(3|4|5|6)\.ffn_.*=CUDA0" \
51
  -ot "blk\.(7|8|9|10)\.ffn_.*=CUDA1" \
52
  -ot exps=CPU -b 2048 -ub 1024 --warmup-batch --no-mmap --threads 36 \
@@ -86,7 +87,7 @@ Check out the [GGUF Tool Suite README](https://github.com/Thireus/GGUF-Tool-Suit
86
 
87
  1. ⚠️ **Requirements** – Which `ik_llama.cpp` (or `llama.cpp`) version to use and how to compile.
88
  - Windows binaries (no patching needed) at: https://github.com/Thireus/ik_llama.cpp/releases
89
- 2. 📥 **Download Model Shards** – Use `quant_downloader.sh` to fetch GGUF shards from any recipe.
90
  - Recipe examples: https://github.com/Thireus/GGUF-Tool-Suite/tree/main/recipe_examples
91
  3. 🧠 **Run a Downloaded Model** – Sample usage with `llama-cli`.
92
  4. 🛠️ **Generate a Custom Recipe** – Produce recipes tailored to your VRAM/RAM target usage for optimum perplexity.
@@ -103,7 +104,7 @@ Supported models are listed under `models/` in the [Tool Suite Github repo](http
103
 
104
  No, because I believe in **tailored quantization** for each user’s hardware. If you prefer ready-made shards, you are welcome to merge them via `llama-gguf-split --merge`, or request someone to publish them, or rely on generic GGUF dynamic quants such as [unsloth](https://huggingface.co/unsloth)'s.
105
 
106
- Instead, I prefer to share examples of recipes so users can see exactly how they were produced (command included inside these recipe files) and tweak them for their own rigs. The `quant_downloader.sh` script handles automatic fetching and verification of each shard. Note that recipes provided by [Ubergarm](https://huggingface.co/ubergarm) on his model cards are also compatible with `quant_downloader.sh`.
107
 
108
  Users who don’t trust the GGUF shards on HuggingFace can also quantize their own by passing recipe lines to `llama-quantize --custom-q` ([see example](https://github.com/Thireus/GGUF-Tool-Suite/blob/main/models/DeepSeek-R1-0528/DeepSeek-R1-0528-THIREUS-ANY-SPECIAL.sh#L482-L486)). Run `llama-quantize --help` to list compatible quants for `quant_assign.py`. This approach is especially useful if you prefer `llama.cpp` over `ik_llama.cpp`.
109
 
 
5
 
6
  ## 🤔 What is this [HuggingFace repository](https://huggingface.co/Thireus/Qwen3-235B-A22B-Instruct-2507-THIREUS-BF16-SPECIAL_SPLIT/) about?
7
 
8
+ This repository provides **GGUF-quantized tensors** for the Qwen3-235B-A22B-Instruct-2507 model (official repo: https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507). These GGUF shards are designed to be used with **Thireus’ GGUF Tool Suite** (https://gguf.thireus.com), a collection of tools that automatically finds the perplexity-optimal mix of quantizations for any given VRAM and RAM target. With this GGUF Tool Suite, you can produce your own Dynamic 3.0 Quants recipes and achieve optimum accuracy & SOTA quantization performance.
9
 
10
  - 📖 Read more: https://github.com/Thireus/GGUF-Tool-Suite
11
+ - 🔍 Example of GGUF recipes: https://github.com/Thireus/GGUF-Tool-Suite/tree/main/recipe_examples
12
+ - Download GGUF models from recipe files: https://gguf.thireus.com/quant_downloader.html
13
+ - 🛠️ Create your own recipes: https://colab.research.google.com/github/Thireus/GGUF-Tool-Suite/blob/main/quant_recipe_pipeline.ipynb
14
+ - 📂 Browse available models: https://gguf.thireus.com
15
 
16
  *tl;dr: Expand the details section below*
17
  <details>
 
34
  # Obtain Thireus' GGUF-Tool-Suite
35
  GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/Thireus/GGUF-Tool-Suite
36
 
37
+ # Download model quant mix from recipe file - you can also try the web version: https://gguf.thireus.com/quant_downloader.html
38
  cd GGUF-Tool-Suite
39
  rm -f download.conf # Make sure to copy the relevant download.conf for the model before running quant_assign.py
40
  cp -f models/DeepSeek-R1-0528/download.conf . # Use the download.conf of the chosen model
 
47
  ulimit -n 9999 # Lifts "too many open files" limitation on Linux
48
  ~/ik_llama.cpp/build/bin/llama-cli \
49
  -m DeepSeek-R1-0528-THIREUS-BF16-SPECIAL_TENSOR-00001-of-01148.gguf \
50
+ -mla 3 -fa auto -amb 512 -ctk f16 -c 4096 -ngl 99 \
51
  -ot "blk\.(3|4|5|6)\.ffn_.*=CUDA0" \
52
  -ot "blk\.(7|8|9|10)\.ffn_.*=CUDA1" \
53
  -ot exps=CPU -b 2048 -ub 1024 --warmup-batch --no-mmap --threads 36 \
 
87
 
88
  1. ⚠️ **Requirements** – Which `ik_llama.cpp` (or `llama.cpp`) version to use and how to compile.
89
  - Windows binaries (no patching needed) at: https://github.com/Thireus/ik_llama.cpp/releases
90
+ 2. 📥 **Download Model Shards** – Use `quant_downloader.sh` or [quant_downloader.html](https://gguf.thireus.com/quant_downloader.html) to fetch GGUF shards from any recipe.
91
  - Recipe examples: https://github.com/Thireus/GGUF-Tool-Suite/tree/main/recipe_examples
92
  3. 🧠 **Run a Downloaded Model** – Sample usage with `llama-cli`.
93
  4. 🛠️ **Generate a Custom Recipe** – Produce recipes tailored to your VRAM/RAM target usage for optimum perplexity.
 
104
 
105
  No, because I believe in **tailored quantization** for each user’s hardware. If you prefer ready-made shards, you are welcome to merge them via `llama-gguf-split --merge`, or request someone to publish them, or rely on generic GGUF dynamic quants such as [unsloth](https://huggingface.co/unsloth)'s.
106
 
107
+ Instead, I prefer to share examples of recipes so users can see exactly how they were produced (command included inside these recipe files) and tweak them for their own rigs. The `quant_downloader.sh` script or [quant_downloader.html](https://gguf.thireus.com/quant_downloader.html) (web port of this script) handles automatic fetching and verification of each shard. Note that recipes provided by [Ubergarm](https://huggingface.co/ubergarm) on his model cards are also compatible with `quant_downloader.sh` and [quant_downloader.html](https://gguf.thireus.com/quant_downloader.html), providing a "SPECIAL_SPLIT" version of these models exists (see https://gguf.thireus.com/).
108
 
109
  Users who don’t trust the GGUF shards on HuggingFace can also quantize their own by passing recipe lines to `llama-quantize --custom-q` ([see example](https://github.com/Thireus/GGUF-Tool-Suite/blob/main/models/DeepSeek-R1-0528/DeepSeek-R1-0528-THIREUS-ANY-SPECIAL.sh#L482-L486)). Run `llama-quantize --help` to list compatible quants for `quant_assign.py`. This approach is especially useful if you prefer `llama.cpp` over `ik_llama.cpp`.
110