| --- |
| title: Gemma4 Coder GGUF Chat |
| emoji: "💬" |
| colorFrom: blue |
| colorTo: green |
| sdk: docker |
| app_file: app.py |
| app_port: 7860 |
| pinned: false |
| models: |
| - yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF |
| tags: |
| - llama.cpp |
| - gguf |
| - gemma4 |
| - coding |
| - cpu |
| --- |
| |
| # Gemma4 12B Coder GGUF Chat |
|
|
| Hugging Face Spaces Docker chatbot for: |
|
|
| - Model: `yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF` |
| - Default quant: `gemma4-coding-Q4_K_M.gguf` |
| - Backend: prebuilt `llama.cpp` `llama-server` |
| - UI: native `llama.cpp` web UI |
| - Target: testing Gemma4 Coder on HF Spaces CPU |
|
|
| ## Why Q4 by default? |
|
|
| `gemma4-coding-Q2_K.gguf` is smaller and faster, but it can produce broken fake-language responses on CPU. This Space uses `gemma4-coding-Q4_K_M.gguf` by default for better coherence. It is slower than Q2, but it is the safer option if the goal is a usable chatbot. |
|
|
| ## Default settings |
|
|
| ```text |
| MODEL_REPO=yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF |
| MODEL_FILE=gemma4-coding-Q4_K_M.gguf |
| LLAMA_VERSION=b9592 |
| THREADS=4 |
| CTX_SIZE=2048 |
| BATCH_SIZE=default |
| UBATCH_SIZE=default |
| FLASH_ATTN=default |
| CACHE_TYPE_K=default |
| CACHE_TYPE_V=default |
| TEMPERATURE=0.2 |
| TOP_P=0.95 |
| TOP_K=64 |
| REPEAT_PENALTY=1.08 |
| ``` |
|
|
| The launcher downloads the GGUF into `/data`, fetches the model chat template from Hugging Face metadata, then hands the process over to `llama-server` on port `7860`. |
|
|
| `default` means the launcher does not pass that flag, so native `llama.cpp` picks its own optimized default. This is closer to the fast reference Space and avoids CPU overhead from experimental KV-cache quantization or tiny batch settings. |
|
|
| ## If you want to compare Q2 |
|
|
| Change this environment variable back: |
|
|
| ```text |
| MODEL_FILE=gemma4-coding-Q2_K.gguf |
| ``` |
|
|
| Q2 starts and responds faster, but the output may be incoherent. |
|
|
| ## Upload |
|
|
| Upload these files to the root of a Docker Space: |
|
|
| - `Dockerfile` |
| - `app.py` |
| - `README.md` |
|
|