Gemma-4-12B-Gemini-3.5-flash-Reasoning-Distill-GGUF

GGUF quantized versions of Ayodele01/Gemma-4-12B-Gemini-3.5-flash-Reasoning-Distill.

Model Description

This is Google's Gemma-4 12B instruction-tuned model, fine-tuned on the full 25,000 synthetic reasoning examples dataset WithinUsAI/gemini_3.5_flash_distilled_25k using QLoRA via Unsloth.

This GGUF model contains quantized versions of the merged model weights.

Available Files and Quantizations

Filename Quant Type Size Description
Gemma-4-12B-Gemini-3.5-flash-Reasoning-Distill-bf16.gguf BF16 ~24.4 GB Full precision, best quality
Gemma-4-12B-Gemini-3.5-flash-Reasoning-Distill-Q8_0.gguf Q8_0 ~12.2 GB High quality, minimal degradation
Gemma-4-12B-Gemini-3.5-flash-Reasoning-Distill-Q5_K_M.gguf Q5_K_M ~8.3 GB Balanced (recommended)
Gemma-4-12B-Gemini-3.5-flash-Reasoning-Distill-Q4_K_M.gguf Q4_K_M ~7.2 GB Good quality, smaller size

Usage with llama.cpp

You can run these files using llama.cpp.

# Run with llama-cli
./llama-cli -m Gemma-4-12B-Gemini-3.5-flash-Reasoning-Distill-Q5_K_M.gguf \
  -p "<|turn>user\nWhat is the sum of all prime numbers between 1 and 50?<|turn>model\n" \
  -n 512

Prompt Template

Gemma-4 chat template format:

<|turn>user
{ prompt }<|turn>model

Training and Distillation Context

For details on evaluations, training hyperparameters, and qualitative findings, please refer to the main repository model card: Ayodele01/Gemma-4-12B-Gemini-3.5-flash-Reasoning-Distill.

Downloads last month
3,628
GGUF
Model size
12B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ayodele01/Gemma-4-12B-Gemini-3.5-flash-Reasoning-Distill-GGUF