| Name | Size | Uploaded | Xet hash |
|---|---|---|---|
| .gitattributes | 2.17 kB xet | c5270af4 | |
| README.md | 5.78 kB xet | 85e616af | |
| gemma-4-12b-it-uncensored-Q2_K.gguf | 4.83 GB xet | 68aca0e1 | |
| gemma-4-12b-it-uncensored-Q3_K_M.gguf | 6.09 GB xet | 4344dc05 | |
| gemma-4-12b-it-uncensored-Q4_K_M.gguf | 7.38 GB xet | 9307e3f0 | |
| gemma-4-12b-it-uncensored-Q4_K_S.gguf | 7.02 GB xet | be963fdf | |
| gemma-4-12b-it-uncensored-Q5_K_M.gguf | 8.55 GB xet | 247f3ea6 | |
| gemma-4-12b-it-uncensored-Q6_K.gguf | 9.79 GB xet | 4a1903af | |
| gemma-4-12b-it-uncensored-Q8_0.gguf | 12.7 GB xet | a548b909 | |
| gemma-4-12b-it-uncensored-f16.gguf | 23.8 GB xet | 8e2885c0 | |
| mmproj-gemma-4-12B-it-bf16.gguf | 175 MB xet | 6c836c83 |
gemma-4-12b-it-uncensored - GGUF
GGUF quantizations of zaakirio/gemma-4-12b-it-uncensored, a decensored (Heretic-abliterated) version of google/gemma-4-12B-it.
These files run with llama.cpp.
⚠️ Requires a current llama.cpp build. Gemma 4 (
gemma4_unified) is a brand-new architecture; only recent llama.cpp builds can load these files. Older builds may fail with anunknown architectureerror - build from source (or use a current release) if you hit that. Always pass--jinjaso the chat template is applied.
Files
Filenames follow gemma-4-12b-it-uncensored-<QUANT>.gguf.
| Quant | Size | Notes |
|---|---|---|
| Q2_K | 4.50 GB | Smallest; lowest quality. Very tight memory only. |
| Q3_K_M | 5.67 GB | Small; usable on low RAM. |
| Q4_K_S | 6.54 GB | Compact 4-bit. |
| Q4_K_M | 6.87 GB | Recommended - best size/quality balance. |
| Q5_K_M | 7.96 GB | Higher quality, slightly larger. |
| Q6_K | 9.11 GB | Near-lossless. |
| Q8_0 | 11.80 GB | Effectively lossless vs the BF16 source. |
| f16 | 22.20 GB | Full precision; reference / re-quantizing. |
Not sure which to pick? Start with Q4_K_M. Go up to Q5/Q6/Q8 if you have the memory and want maximum fidelity; drop to Q3/Q2 only if you're memory-constrained.
Multimodal projector (for image input - see Multimodal):
| File | Size | Notes |
|---|---|---|
mmproj-gemma-4-12B-it-bf16.gguf |
0.16 GB | Vision encoder - pair with any quant above. |
Usage
llama.cpp (auto-downloads the chosen quant from this repo):
# Interactive chat
llama-cli -hf zaakirio/gemma-4-12b-it-uncensored-GGUF:Q4_K_M --jinja
# OpenAI-compatible server with web UI
llama-server -hf zaakirio/gemma-4-12b-it-uncensored-GGUF:Q4_K_M --jinja -c 4096
Or with a file you've already downloaded:
llama-cli -m gemma-4-12b-it-uncensored-Q4_K_M.gguf --jinja -p "Hello, who are you?"
Download a single file:
pip install -U "huggingface_hub[cli]"
hf download zaakirio/gemma-4-12b-it-uncensored-GGUF \
--include "gemma-4-12b-it-uncensored-Q4_K_M.gguf" --local-dir ./
Prompt format & settings
The chat template is embedded in the GGUF, chat-aware tools apply it automatically (always pass --jinja with llama.cpp). For reference, Gemma 4's format is:
<|turn>user
{prompt}<turn|>
<|turn>model
Recommended sampling (Google defaults): --temp 1.0 --top-p 0.95 --top-k 64.
Thinking mode: Gemma 4 has a reasoning channel. To disable it, pass --chat-template-kwargs '{"enable_thinking":false}' to llama-server.
Multimodal (image input)
Gemma 4 is multimodal, but in llama.cpp the vision tower ships as a separate projector file. The language .gguf alone is text-only and will reject images. This repo includes mmproj-gemma-4-12B-it-bf16.gguf for that purpose.
When you load via -hf, llama.cpp auto-downloads the projector from this repo - images just work:
llama-server -hf zaakirio/gemma-4-12b-it-uncensored-GGUF:Q4_K_M --jinja
With local files, pass it explicitly with --mmproj:
llama-server -m gemma-4-12b-it-uncensored-Q4_K_M.gguf \
--mmproj mmproj-gemma-4-12B-it-bf16.gguf --jinja
# download both files
hf download zaakirio/gemma-4-12b-it-uncensored-GGUF \
--include "gemma-4-12b-it-uncensored-Q4_K_M.gguf" "mmproj-gemma-4-12B-it-bf16.gguf" --local-dir ./
The projector pairs with any quant in the table above. It's the unmodified Gemma 4 vision encoder. Abliteration only touches the language weights, so the vision tower is unchanged from the base model. Prefer bf16 here: the encoder is small, so there's no benefit to quantizing it.
About the base model
A decensored derivative produced with Heretic (automatic directional ablation). Compared with the original:
| Metric | Decensored | Original |
|---|---|---|
| Refusals (/100 harmful prompts) | 23 | 99 |
| KL divergence (harmless prompts) | 0.043 | 0 (by definition) |
The refusal count is Heretic's keyword heuristic, which is known to over-count (it flags disclaimer-wrapped compliance as a refusal; ~11% precision per arXiv:2512.13655). We report only the measured marker figure and did not run a classifier-based eval on this model, so real compliance is likely higher. See the source model card for parameters and details.
Intended use & disclaimer
This model has had its refusal behaviour substantially removed and will comply with requests the original would have declined. Provided for research and unrestricted local use. You are responsible for how you use it and for complying with applicable law and the base model's Gemma license, which carries over to this derivative. Not for all audiences.
Provenance
- Quantized from
zaakirio/gemma-4-12b-it-uncensored(BF16) using llama.cppconvert_hf_to_gguf.py+llama-quantize. - Base model:
google/gemma-4-12B-it - Decensoring tool: Heretic by p-e-w · technique: Arditi et al. (2024)
- Total size
- 80.3 GB
- Files
- 11
- Last updated
- Jun 11
- Pre-warmed CDN
- US EU US EU