--- license: apache-2.0 library_name: gguf pipeline_tag: text-generation base_model: - DJLougen/Qwable-5-27B-Coder base_model_relation: quantized language: - en tags: - gguf - llama.cpp - qwen - qwen3_6 - qwen3_5 - coder - coding-agent - agentic-coding - tool-use - repository-work - terminal-workflows - long-context - imatrix - qlora ---

Qwable GGUF quant runway

# Qwable-5-27B-Coder-GGUF **Qwable-5-27B-Coder-GGUF** packages the Qwable coder-agent tune for llama.cpp, Ollama, and local workstation inference. It comes from a Qwen3.6-based model trained first on **Claude Fable 5 traces**, then continued on **Kimi 2.7 Coder traces**. Use this repo when you want Qwable's coding-agent behavior in GGUF form: repository inspection, patch planning, terminal feedback, verifier recovery, and long-context coding prompts.

## Quant menu | File | Quant | Approx. size | Best for | | --- | ---: | ---: | --- | | `Qwable-5-27B-Coder-Q8_0.gguf` | Q8_0 | 28.6 GB | quality checks, quant comparisons, high-memory local serving | | `Qwable-5-27B-Coder-Q4_K_M.gguf` | Q4_K_M | 16.5 GB | default local starting point | | `Qwable-5-27B-Coder-IQ1_S.gguf` | IQ1_S | 7.1 GB | tight memory budgets; expect quality tradeoffs | `IQ1_S` uses an importance matrix computed on the training traces. ## Model facts | Attribute | Details | | --- | --- | | GGUF repo | `DJLougen/Qwable-5-27B-Coder-GGUF` | | Source checkpoint | [`DJLougen/Qwable-5-27B-Coder`](https://huggingface.co/DJLougen/Qwable-5-27B-Coder) | | Upstream base | [`unsloth/Qwen3.6-27B`](https://huggingface.co/unsloth/Qwen3.6-27B) | | Runtime target | llama.cpp-compatible local inference | | Architecture tag | `qwen3_5` | | Scope | Text tower only; no vision sidecar in this repo | | Training signal | Claude Fable 5 traces, then Kimi 2.7 Coder traces | | License | Apache-2.0 | ```text BF16 source checkpoint -> GGUF conversion -> Q8_0: quality reference -> Q4_K_M: normal local use -> IQ1_S: smallest imatrix build ``` Early maintainer runs show the source checkpoint outperforming the base model on a private coder benchmark. Public benchmark details are not posted yet, so treat that as early maintainer signal rather than a reproducible leaderboard claim. ## Quickstart Requires a llama.cpp build with `qwen3_5` support. Run a local OpenAI-compatible server: ```bash llama-server -hf DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M \ --jinja -ngl 99 -fa -c 32768 \ --temp 1.0 --top-p 0.95 --top-k 20 ``` Run with Ollama: ```bash ollama run hf.co/DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M ``` Download one file: ```bash hf download DJLougen/Qwable-5-27B-Coder-GGUF \ Qwable-5-27B-Coder-Q4_K_M.gguf \ --local-dir . ``` ## Choosing a file - Start with `Q4_K_M` unless you are explicitly testing quality ceilings or memory floors. - Use `Q8_0` for comparisons against the source checkpoint or high-memory local serving. - Use `IQ1_S` only when the model otherwise will not fit; verify quality on your own tasks. - Keep prompts concrete: include repository context, exact errors, constraints, and verifier commands. ## Related releases - Source BF16 Transformers checkpoint: [`DJLougen/Qwable-5-27B-Coder`](https://huggingface.co/DJLougen/Qwable-5-27B-Coder) - NVFP4 ModelOpt checkpoint: [`DJLougen/Qwable-5-27B-Coder-NVFP4`](https://huggingface.co/DJLougen/Qwable-5-27B-Coder-NVFP4) ## Limitations - Public benchmark tables are pending. - Low-bit GGUF quantization can reduce instruction following, code precision, and tool-call reliability. - This repo contains text GGUF files only; it is not the full multimodal Transformers checkpoint. - Long-context behavior depends on llama.cpp build, hardware, KV cache settings, and prompt layout. - Safety behavior is inherited from the base model and fine-tuning data; no separate safety alignment claim is made here. ## License Released under Apache-2.0, following the upstream base model license metadata.