Spaces:
Running
Running
File size: 1,898 Bytes
fea3064 6e23cd8 fea3064 6e23cd8 fea3064 6e23cd8 fea3064 6e23cd8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | ---
title: Gemma4 Coder GGUF Chat
emoji: "💬"
colorFrom: blue
colorTo: green
sdk: docker
app_file: app.py
app_port: 7860
pinned: false
models:
- yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF
tags:
- llama.cpp
- gguf
- gemma4
- coding
- cpu
---
# Gemma4 12B Coder GGUF Chat
Hugging Face Spaces Docker chatbot for:
- Model: `yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF`
- Default quant: `gemma4-coding-Q4_K_M.gguf`
- Backend: prebuilt `llama.cpp` `llama-server`
- UI: native `llama.cpp` web UI
- Target: testing Gemma4 Coder on HF Spaces CPU
## Why Q4 by default?
`gemma4-coding-Q2_K.gguf` is smaller and faster, but it can produce broken fake-language responses on CPU. This Space uses `gemma4-coding-Q4_K_M.gguf` by default for better coherence. It is slower than Q2, but it is the safer option if the goal is a usable chatbot.
## Default settings
```text
MODEL_REPO=yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF
MODEL_FILE=gemma4-coding-Q4_K_M.gguf
LLAMA_VERSION=b9592
THREADS=4
CTX_SIZE=2048
BATCH_SIZE=default
UBATCH_SIZE=default
FLASH_ATTN=default
CACHE_TYPE_K=default
CACHE_TYPE_V=default
TEMPERATURE=0.2
TOP_P=0.95
TOP_K=64
REPEAT_PENALTY=1.08
```
The launcher downloads the GGUF into `/data`, fetches the model chat template from Hugging Face metadata, then hands the process over to `llama-server` on port `7860`.
`default` means the launcher does not pass that flag, so native `llama.cpp` picks its own optimized default. This is closer to the fast reference Space and avoids CPU overhead from experimental KV-cache quantization or tiny batch settings.
## If you want to compare Q2
Change this environment variable back:
```text
MODEL_FILE=gemma4-coding-Q2_K.gguf
```
Q2 starts and responds faster, but the output may be incoherent.
## Upload
Upload these files to the root of a Docker Space:
- `Dockerfile`
- `app.py`
- `README.md`
|