DBMe/gemma-4-31B-it-heretic-exl3

EXL3 (ExLlamaV3) quantizations of coder3101/gemma-4-31B-it-heretic. All credit for the original model goes to the original authors.

📊 Available Quantizations & VRAM

The model weights are stored in separate branches. Please switch to a branch to download. Note: VRAM estimates include PyTorch context overhead (~0.8GB) and assume an unquantized FP16 KV cache.

Target BPW	Head BPW	Branch (Download Link)	WikiText-2 PPL (512 ctx)¹	2K ctx	4K ctx	8K ctx	16K ctx	32K ctx
3.5	h6	3.5bpw_h6	8842.6808	~19.15 GB	~20.87 GB	~24.3 GB	~31.18 GB	~44.93 GB
4.0	h6	4.0bpw_h6	6856.5833	~20.86 GB	~22.57 GB	~26.01 GB	~32.89 GB	~46.64 GB
5.0	h6	5.0bpw_h6	6504.4025	~24.27 GB	~25.98 GB	~29.42 GB	~36.3 GB	~50.05 GB
6.0	h6	6.0bpw_h6	5900.2612	~27.67 GB	~29.39 GB	~32.83 GB	~39.71 GB	~53.46 GB
8.0	h8	8.0bpw_h8	6355.6026	~34.82 GB	~36.54 GB	~39.98 GB	~46.85 GB	~60.6 GB

¹ Evaluated against WikiText-2 with ExLlamaV3 using a strided 512-token context window (-c 512) in llama.cpp parity mode (-g). Lower is better. (Higher BPW = higher quality, lower BPW = fits in less VRAM).

📥 How to Download

It's recommended to use the huggingface-cli to download specific branches. (Do not use git clone as it will download all branches!)

Ensure you have the CLI installed:

pip install -U "huggingface_hub[cli]"

Download a specific branch (e.g., 3.5bpw_h6):

# Example: Downloading the 3.5bpw_h6 branch
huggingface-cli download DBMe/gemma-4-31B-it-heretic-exl3 --revision 3.5bpw_h6 --local-dir gemma-4-31B-it-heretic-exl3-3.5bpw_h6

💻 Supported Engines

These models are highly optimized for modern GPUs and can be run using:

TabbyAPI: A fast, OpenAI-compatible API server. (Set model_name: "gemma-4-31B-it-heretic-exl3-<BranchName>" in your config)
Text-Generation-WebUI: A local web interface. (Select the exllamav3 loader)
ExLlamaV3 (Native): Python library for custom integration.

📈 Perplexity Degradation Curve

(Lower is better)

⚙️ Advanced: Quantization Environment & Settings

🔬 Quantization Settings

Codebook: mcg
Output Scales: always
Calibration Rows: 250
Calibration Cols: 2048
Calibration Dataset: ExLlamaV3 Default (Wiki/C4/Code)
High Quality (HQ) Mode: False
ExLlamaV3: 0.0.29 (Commit: cb1a436)
Hardware: NVIDIA RTX PRO 6000 Blackwell Server Edition

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for DBMe/gemma-4-31B-it-heretic-exl3

Base model

google/gemma-4-31B

Finetuned

google/gemma-4-31B-it

Finetuned

coder3101/gemma-4-31B-it-heretic

Quantized

(9)

this model