Buckets:
| Name | Size | Uploaded | Xet hash |
|---|---|---|---|
| Gemma | 31 items | ||
| .gitattributes | 1.67 kB xet | 3ac8b8a8 | |
| README.md | 1.48 kB xet | af2f5186 | |
| gemma-4-12B-it-MTP-ik_llama-Q4_K_M.gguf | 327 MB xet | 7338b5ef | |
| gemma-4-12B-it-MTP-ik_llama-Q8_0.gguf | 465 MB xet | 0e8bfe93 |
Gemma 4 12B IT MTP Assistants for ik_llama
These are converted GGUF assistant/draft models for using Gemma 4 12B IT with
ik_llama MTP speculative decoding.
They are not standalone chat models. Use one of these files as --model-draft
next to the matching Gemma 4 12B IT target GGUF.
Files
gemma-4-12B-it-MTP-ik_llama-Q8_0.ggufgemma-4-12B-it-MTP-ik_llama-Q4_K_M.gguf
Conversion Notes
Source assistant GGUF:
unsloth/gemma-4-12b-it-GGUF, MTP/gemma-4-12B-it-MTP-F16.gguf
The public assistant architecture string and tensor names were converted to
ik_llama's gemma4_mtp schema. The unused public-assistant
rope_freqs.weight tensor was omitted because ik_llama's Gemma 4 MTP
assistant loader expects 48 tensors for this assistant.
Example
llama-server \
-m /path/to/gemma-4-12b-it-IQ4_XS.gguf \
--model-draft /path/to/gemma-4-12B-it-MTP-ik_llama-Q4_K_M.gguf \
--spec-type mtp:n_max=4,p_min=0.0
Older ik_llama builds may use legacy speculative flags. Use a build that
includes Gemma 4 12B MTP/CUDA support.
Smoke Test
Local smoke on an RTX 4070 with ik_llama build 4561 (6b9de3dba):
- Target: Gemma 4 12B IT
IQ4_XS - Draft:
Q4_K_M - Raw completion TG: about
129 tok/swith MTP vs about60 tok/splain - Draft acceptance in the small smoke: about
0.55
- Total size
- 0 Bytes
- Files
- 5
- Last updated
- Jun 14
- Pre-warmed CDN
- US EU US EU