Spaces:
Running
Running
| title: Gemma4 Coder GGUF Chat | |
| emoji: "💬" | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: docker | |
| app_file: app.py | |
| app_port: 7860 | |
| pinned: false | |
| models: | |
| - yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF | |
| tags: | |
| - llama.cpp | |
| - gguf | |
| - gemma4 | |
| - coding | |
| - cpu | |
| # Gemma4 12B Coder GGUF Chat | |
| Hugging Face Spaces Docker chatbot for: | |
| - Model: `yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF` | |
| - Default quant: `gemma4-coding-Q4_K_M.gguf` | |
| - Backend: prebuilt `llama.cpp` `llama-server` | |
| - UI: native `llama.cpp` web UI | |
| - Target: testing Gemma4 Coder on HF Spaces CPU | |
| ## Why Q4 by default? | |
| `gemma4-coding-Q2_K.gguf` is smaller and faster, but it can produce broken fake-language responses on CPU. This Space uses `gemma4-coding-Q4_K_M.gguf` by default for better coherence. It is slower than Q2, but it is the safer option if the goal is a usable chatbot. | |
| ## Default settings | |
| ```text | |
| MODEL_REPO=yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF | |
| MODEL_FILE=gemma4-coding-Q4_K_M.gguf | |
| LLAMA_VERSION=b9592 | |
| THREADS=4 | |
| CTX_SIZE=2048 | |
| BATCH_SIZE=default | |
| UBATCH_SIZE=default | |
| FLASH_ATTN=default | |
| CACHE_TYPE_K=default | |
| CACHE_TYPE_V=default | |
| TEMPERATURE=0.2 | |
| TOP_P=0.95 | |
| TOP_K=64 | |
| REPEAT_PENALTY=1.08 | |
| ``` | |
| The launcher downloads the GGUF into `/data`, fetches the model chat template from Hugging Face metadata, then hands the process over to `llama-server` on port `7860`. | |
| `default` means the launcher does not pass that flag, so native `llama.cpp` picks its own optimized default. This is closer to the fast reference Space and avoids CPU overhead from experimental KV-cache quantization or tiny batch settings. | |
| ## If you want to compare Q2 | |
| Change this environment variable back: | |
| ```text | |
| MODEL_FILE=gemma4-coding-Q2_K.gguf | |
| ``` | |
| Q2 starts and responds faster, but the output may be incoherent. | |
| ## Upload | |
| Upload these files to the root of a Docker Space: | |
| - `Dockerfile` | |
| - `app.py` | |
| - `README.md` | |