Buckets:

Elmermoreno
/

bucket

0 Bytes

5 files

Updated 12 days ago

Ctrl+K

Name	Size	Uploaded	Xet hash
Gemma		12 days ago	31 items
.gitattributes	1.67 kB xet	12 days ago	3ac8b8a8
README.md	1.48 kB xet	12 days ago	af2f5186
gemma-4-12B-it-MTP-ik_llama-Q4_K_M.gguf	327 MB xet	12 days ago	7338b5ef
gemma-4-12B-it-MTP-ik_llama-Q8_0.gguf	465 MB xet	12 days ago	0e8bfe93

README.md

Gemma 4 12B IT MTP Assistants for ik_llama

These are converted GGUF assistant/draft models for using Gemma 4 12B IT with ik_llama MTP speculative decoding.

They are not standalone chat models. Use one of these files as --model-draft next to the matching Gemma 4 12B IT target GGUF.

Files

gemma-4-12B-it-MTP-ik_llama-Q8_0.gguf
gemma-4-12B-it-MTP-ik_llama-Q4_K_M.gguf

Conversion Notes

Source assistant GGUF:

unsloth/gemma-4-12b-it-GGUF, MTP/gemma-4-12B-it-MTP-F16.gguf

The public assistant architecture string and tensor names were converted to ik_llama's gemma4_mtp schema. The unused public-assistant rope_freqs.weight tensor was omitted because ik_llama's Gemma 4 MTP assistant loader expects 48 tensors for this assistant.

Example

llama-server \
  -m /path/to/gemma-4-12b-it-IQ4_XS.gguf \
  --model-draft /path/to/gemma-4-12B-it-MTP-ik_llama-Q4_K_M.gguf \
  --spec-type mtp:n_max=4,p_min=0.0

Older ik_llama builds may use legacy speculative flags. Use a build that includes Gemma 4 12B MTP/CUDA support.

Smoke Test

Local smoke on an RTX 4070 with ik_llama build 4561 (6b9de3dba):

Target: Gemma 4 12B IT IQ4_XS
Draft: Q4_K_M
Raw completion TG: about 129 tok/s with MTP vs about 60 tok/s plain
Draft acceptance in the small smoke: about 0.55

Total size: 0 Bytes

Files: 5

Last updated: Jun 14

Pre-warmed CDN: US EU US EU

Gemma 4 12B IT MTP Assistants for ik_llama

Files

Conversion Notes

Example

Smoke Test

Contributors