Buckets:

Pradip24042020
/

gemma-4-12b-it-uncensored-GGUF-bucket

80.3 GB

11 files

Updated 15 days ago

Ctrl+K

Name	Size	Uploaded	Xet hash
.gitattributes	2.17 kB xet	15 days ago	c5270af4
README.md	5.78 kB xet	15 days ago	85e616af
gemma-4-12b-it-uncensored-Q2_K.gguf	4.83 GB xet	15 days ago	68aca0e1
gemma-4-12b-it-uncensored-Q3_K_M.gguf	6.09 GB xet	15 days ago	4344dc05
gemma-4-12b-it-uncensored-Q4_K_M.gguf	7.38 GB xet	15 days ago	9307e3f0
gemma-4-12b-it-uncensored-Q4_K_S.gguf	7.02 GB xet	15 days ago	be963fdf
gemma-4-12b-it-uncensored-Q5_K_M.gguf	8.55 GB xet	15 days ago	247f3ea6
gemma-4-12b-it-uncensored-Q6_K.gguf	9.79 GB xet	15 days ago	4a1903af
gemma-4-12b-it-uncensored-Q8_0.gguf	12.7 GB xet	15 days ago	a548b909
gemma-4-12b-it-uncensored-f16.gguf	23.8 GB xet	15 days ago	8e2885c0
mmproj-gemma-4-12B-it-bf16.gguf	175 MB xet	15 days ago	6c836c83

README.md

gemma-4-12b-it-uncensored - GGUF

GGUF quantizations of zaakirio/gemma-4-12b-it-uncensored, a decensored (Heretic-abliterated) version of google/gemma-4-12B-it.

These files run with llama.cpp.

⚠️ Requires a current llama.cpp build. Gemma 4 (gemma4_unified) is a brand-new architecture; only recent llama.cpp builds can load these files. Older builds may fail with an unknown architecture error - build from source (or use a current release) if you hit that. Always pass --jinja so the chat template is applied.

Files

Filenames follow gemma-4-12b-it-uncensored-<QUANT>.gguf.

Quant	Size	Notes
Q2_K	4.50 GB	Smallest; lowest quality. Very tight memory only.
Q3_K_M	5.67 GB	Small; usable on low RAM.
Q4_K_S	6.54 GB	Compact 4-bit.
Q4_K_M	6.87 GB	Recommended - best size/quality balance.
Q5_K_M	7.96 GB	Higher quality, slightly larger.
Q6_K	9.11 GB	Near-lossless.
Q8_0	11.80 GB	Effectively lossless vs the BF16 source.
f16	22.20 GB	Full precision; reference / re-quantizing.

Not sure which to pick? Start with Q4_K_M. Go up to Q5/Q6/Q8 if you have the memory and want maximum fidelity; drop to Q3/Q2 only if you're memory-constrained.

Multimodal projector (for image input - see Multimodal):

File	Size	Notes
`mmproj-gemma-4-12B-it-bf16.gguf`	0.16 GB	Vision encoder - pair with any quant above.

Usage

llama.cpp (auto-downloads the chosen quant from this repo):

# Interactive chat
llama-cli -hf zaakirio/gemma-4-12b-it-uncensored-GGUF:Q4_K_M --jinja

# OpenAI-compatible server with web UI
llama-server -hf zaakirio/gemma-4-12b-it-uncensored-GGUF:Q4_K_M --jinja -c 4096

Or with a file you've already downloaded:

llama-cli -m gemma-4-12b-it-uncensored-Q4_K_M.gguf --jinja -p "Hello, who are you?"

Download a single file:

pip install -U "huggingface_hub[cli]"
hf download zaakirio/gemma-4-12b-it-uncensored-GGUF \
  --include "gemma-4-12b-it-uncensored-Q4_K_M.gguf" --local-dir ./

Prompt format & settings

The chat template is embedded in the GGUF, chat-aware tools apply it automatically (always pass --jinja with llama.cpp). For reference, Gemma 4's format is:

<|turn>user
{prompt}<turn|>
<|turn>model

Recommended sampling (Google defaults): --temp 1.0 --top-p 0.95 --top-k 64.

Thinking mode: Gemma 4 has a reasoning channel. To disable it, pass --chat-template-kwargs '{"enable_thinking":false}' to llama-server.

Multimodal (image input)

Gemma 4 is multimodal, but in llama.cpp the vision tower ships as a separate projector file. The language .gguf alone is text-only and will reject images. This repo includes mmproj-gemma-4-12B-it-bf16.gguf for that purpose.

When you load via -hf, llama.cpp auto-downloads the projector from this repo - images just work:

llama-server -hf zaakirio/gemma-4-12b-it-uncensored-GGUF:Q4_K_M --jinja

With local files, pass it explicitly with --mmproj:

llama-server -m gemma-4-12b-it-uncensored-Q4_K_M.gguf \
  --mmproj mmproj-gemma-4-12B-it-bf16.gguf --jinja

# download both files
hf download zaakirio/gemma-4-12b-it-uncensored-GGUF \
  --include "gemma-4-12b-it-uncensored-Q4_K_M.gguf" "mmproj-gemma-4-12B-it-bf16.gguf" --local-dir ./

The projector pairs with any quant in the table above. It's the unmodified Gemma 4 vision encoder. Abliteration only touches the language weights, so the vision tower is unchanged from the base model. Prefer bf16 here: the encoder is small, so there's no benefit to quantizing it.

About the base model

A decensored derivative produced with Heretic (automatic directional ablation). Compared with the original:

Metric	Decensored	Original
Refusals (/100 harmful prompts)	23	99
KL divergence (harmless prompts)	0.043	0 (by definition)

The refusal count is Heretic's keyword heuristic, which is known to over-count (it flags disclaimer-wrapped compliance as a refusal; ~11% precision per arXiv:2512.13655). We report only the measured marker figure and did not run a classifier-based eval on this model, so real compliance is likely higher. See the source model card for parameters and details.

Intended use & disclaimer

This model has had its refusal behaviour substantially removed and will comply with requests the original would have declined. Provided for research and unrestricted local use. You are responsible for how you use it and for complying with applicable law and the base model's Gemma license, which carries over to this derivative. Not for all audiences.

Provenance

Quantized from zaakirio/gemma-4-12b-it-uncensored (BF16) using llama.cpp convert_hf_to_gguf.py + llama-quantize.
Base model: google/gemma-4-12B-it
Decensoring tool: Heretic by p-e-w · technique: Arditi et al. (2024)

Total size: 80.3 GB

Files: 11

Last updated: Jun 11

Pre-warmed CDN: US EU US EU