--- title: Gemma4 Coder GGUF Chat emoji: "💬" colorFrom: blue colorTo: green sdk: docker app_file: app.py app_port: 7860 pinned: false models: - yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF tags: - llama.cpp - gguf - gemma4 - coding - cpu --- # Gemma4 12B Coder GGUF Chat Hugging Face Spaces Docker chatbot for: - Model: `yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF` - Default quant: `gemma4-coding-Q4_K_M.gguf` - Backend: prebuilt `llama.cpp` `llama-server` - UI: native `llama.cpp` web UI - Target: testing Gemma4 Coder on HF Spaces CPU ## Why Q4 by default? `gemma4-coding-Q2_K.gguf` is smaller and faster, but it can produce broken fake-language responses on CPU. This Space uses `gemma4-coding-Q4_K_M.gguf` by default for better coherence. It is slower than Q2, but it is the safer option if the goal is a usable chatbot. ## Default settings ```text MODEL_REPO=yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF MODEL_FILE=gemma4-coding-Q4_K_M.gguf LLAMA_VERSION=b9592 THREADS=4 CTX_SIZE=2048 BATCH_SIZE=default UBATCH_SIZE=default FLASH_ATTN=default CACHE_TYPE_K=default CACHE_TYPE_V=default TEMPERATURE=0.2 TOP_P=0.95 TOP_K=64 REPEAT_PENALTY=1.08 ``` The launcher downloads the GGUF into `/data`, fetches the model chat template from Hugging Face metadata, then hands the process over to `llama-server` on port `7860`. `default` means the launcher does not pass that flag, so native `llama.cpp` picks its own optimized default. This is closer to the fast reference Space and avoids CPU overhead from experimental KV-cache quantization or tiny batch settings. ## If you want to compare Q2 Change this environment variable back: ```text MODEL_FILE=gemma4-coding-Q2_K.gguf ``` Q2 starts and responds faster, but the output may be incoherent. ## Upload Upload these files to the root of a Docker Space: - `Dockerfile` - `app.py` - `README.md`