Text Generation
GGUF
English
code
agentic
distillation
demonstration
quantized
conversational
imatrix
Instructions to use DJLougen/Qwable-5-27B-Coder-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use DJLougen/Qwable-5-27B-Coder-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="DJLougen/Qwable-5-27B-Coder-GGUF", filename="Qwable-5-27B-Coder-IQ1_S.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use DJLougen/Qwable-5-27B-Coder-GGUF with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M
Use Docker
docker model run hf.co/DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use DJLougen/Qwable-5-27B-Coder-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "DJLougen/Qwable-5-27B-Coder-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DJLougen/Qwable-5-27B-Coder-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M
- Ollama
How to use DJLougen/Qwable-5-27B-Coder-GGUF with Ollama:
ollama run hf.co/DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M
- Unsloth Studio
How to use DJLougen/Qwable-5-27B-Coder-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for DJLougen/Qwable-5-27B-Coder-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for DJLougen/Qwable-5-27B-Coder-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for DJLougen/Qwable-5-27B-Coder-GGUF to start chatting
- Pi
How to use DJLougen/Qwable-5-27B-Coder-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use DJLougen/Qwable-5-27B-Coder-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use DJLougen/Qwable-5-27B-Coder-GGUF with Docker Model Runner:
docker model run hf.co/DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M
- Lemonade
How to use DJLougen/Qwable-5-27B-Coder-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Qwable-5-27B-Coder-GGUF-Q4_K_M
List all available models
lemonade list
Rewrite GGUF card: point to base debrief, honest quant notes
Browse filesReplace hype card; reference base model card for recipe and rationale. Update (2026-06-22).
README.md
CHANGED
|
@@ -1,176 +1,60 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
-
|
| 4 |
-
pipeline_tag: image-text-to-text
|
| 5 |
-
base_model:
|
| 6 |
-
- DJLougen/Qwable-5-27B-Coder
|
| 7 |
-
base_model_relation: quantized
|
| 8 |
-
language:
|
| 9 |
-
- en
|
| 10 |
tags:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
- gguf
|
| 12 |
-
- llama.cpp
|
| 13 |
- quantized
|
| 14 |
-
|
| 15 |
-
-
|
| 16 |
-
|
| 17 |
-
- qwen
|
| 18 |
-
- qwen3_6
|
| 19 |
-
- qwen3_5
|
| 20 |
-
- coder
|
| 21 |
-
- coding-agent
|
| 22 |
-
- agentic-coding
|
| 23 |
-
- tool-use
|
| 24 |
-
- function-calling
|
| 25 |
-
- repository-work
|
| 26 |
-
- terminal-workflows
|
| 27 |
-
- long-context
|
| 28 |
-
- image-text-to-text
|
| 29 |
-
- unsloth
|
| 30 |
-
- trl
|
| 31 |
---
|
| 32 |
|
| 33 |
-
|
| 34 |
-
<img src="./assets/banner.jpeg" alt="Qwable-5-27B-Coder banner" style="width:100%;max-width:1024px;border-radius:18px;" />
|
| 35 |
-
</p>
|
| 36 |
-
|
| 37 |
-
<p align="center">
|
| 38 |
-
<img src="./assets/trace-board.svg" alt="Qwable trace board" style="width:100%;max-width:1100px;border-radius:22px;" />
|
| 39 |
-
</p>
|
| 40 |
-
|
| 41 |
-
# Qwable-5-27B-Coder (GGUF)
|
| 42 |
-
|
| 43 |
-
**Qwable-5-27B-Coder** is a Qwen3.6-based coder-agent tune trained first on **Claude Fable 5 traces**, then continued on **Kimi 2.7 Coder traces**. It is built for the messy part of coding work: reading a repo, planning a patch, using terminal feedback, fixing the miss, and carrying constraints through long turns.
|
| 44 |
-
|
| 45 |
-
> This repository hosts the **GGUF** builds for llama.cpp / Ollama / local workstation inference. The **MTP head is embedded** (`blk.64.nextn.*`, `nextn_predict_layers=1`) for speculative decoding, and a vision **mmproj** is included for multimodal use.
|
| 46 |
-
|
| 47 |
-
> Early maintainer runs show Qwable outperforming the base model on a private coder benchmark. Public scores, harness settings, and task definitions will be added when the evaluation packet is ready.
|
| 48 |
-
|
| 49 |
-
<a href="https://ko-fi.com/djlougen"><img alt="Support on Ko-fi" src="https://img.shields.io/badge/Support%20the%20compute-Ko--fi-ff5f5f?style=for-the-badge&logo=kofi&logoColor=white"></a>
|
| 50 |
-
|
| 51 |
-
Training, quantization, and coder-agent evaluation are expensive. If Qwable helps your work, support continued releases at **[ko-fi.com/djlougen](https://ko-fi.com/djlougen)**.
|
| 52 |
-
|
| 53 |
-
## Release channels
|
| 54 |
-
|
| 55 |
-
| Repo | Format | Use it when |
|
| 56 |
-
| --- | --- | --- |
|
| 57 |
-
| [`DJLougen/Qwable-5-27B-Coder`](https://huggingface.co/DJLougen/Qwable-5-27B-Coder) | BF16 Transformers safetensors | You want the source checkpoint, further training, conversion, or quality-ceiling evaluation. |
|
| 58 |
-
| [`DJLougen/Qwable-5-27B-Coder-GGUF`](https://huggingface.co/DJLougen/Qwable-5-27B-Coder-GGUF) | GGUF | You want llama.cpp, Ollama, or local workstation inference. |
|
| 59 |
-
| [`DJLougen/Qwable-5-27B-Coder-NVFP4`](https://huggingface.co/DJLougen/Qwable-5-27B-Coder-NVFP4) | ModelOpt NVFP4 safetensors | You want a compact NVIDIA-serving checkpoint for supported vLLM / TensorRT-LLM stacks. |
|
| 60 |
-
|
| 61 |
-
## GGUF files
|
| 62 |
-
|
| 63 |
-
All quants embed the MTP head and were converted from the BF16 checkpoint. `IQ1_S` is built with an importance matrix (the MTP block is kept at Q6_K so it stays usable).
|
| 64 |
-
|
| 65 |
-
| File | Bits | Size | Notes |
|
| 66 |
-
| --- | --- | --- | --- |
|
| 67 |
-
| `Qwable-5-27B-Coder-Q8_0.gguf` | 8.50 bpw | ~29 GB | Highest fidelity GGUF. |
|
| 68 |
-
| `Qwable-5-27B-Coder-Q6_K.gguf` | 6.56 bpw | ~22 GB | Near-lossless, smaller. |
|
| 69 |
-
| `Qwable-5-27B-Coder-Q4_K_M.gguf` | 4.92 bpw | ~17 GB | Balanced default for most GPUs. |
|
| 70 |
-
| `Qwable-5-27B-Coder-IQ1_S.gguf` | ~2.1 bpw | ~7 GB | Smallest; imatrix-quantized, lowest fidelity. |
|
| 71 |
-
| `mmproj-Qwable-5-27B-Coder-f16.gguf` | f16 | ~0.9 GB | Vision projector (pair with any text quant). |
|
| 72 |
-
| `chat_template.jinja` | - | - | Chat template. |
|
| 73 |
-
|
| 74 |
-
Requires a llama.cpp build with **`qwen3_5`** architecture support (MTP/`nextn` aware).
|
| 75 |
-
|
| 76 |
-
## Quickstart (llama.cpp)
|
| 77 |
-
|
| 78 |
-
Download a single quant:
|
| 79 |
-
|
| 80 |
-
```bash
|
| 81 |
-
hf download DJLougen/Qwable-5-27B-Coder-GGUF \
|
| 82 |
-
Qwable-5-27B-Coder-Q4_K_M.gguf --local-dir .
|
| 83 |
-
```
|
| 84 |
-
|
| 85 |
-
Plain text inference:
|
| 86 |
-
|
| 87 |
-
```bash
|
| 88 |
-
llama-server -m Qwable-5-27B-Coder-Q4_K_M.gguf -ngl 99 -c 8192 --jinja
|
| 89 |
-
```
|
| 90 |
-
|
| 91 |
-
**MTP speculative decoding** (uses the embedded MTP head as the draft; shared weights):
|
| 92 |
-
|
| 93 |
-
```bash
|
| 94 |
-
llama-server \
|
| 95 |
-
-m Qwable-5-27B-Coder-Q8_0.gguf \
|
| 96 |
-
-md Qwable-5-27B-Coder-Q8_0.gguf \
|
| 97 |
-
--spec-type draft-mtp \
|
| 98 |
-
-ngl 99 -ngld 99 -c 8192 --jinja
|
| 99 |
-
```
|
| 100 |
-
|
| 101 |
-
**Multimodal** (add the vision projector):
|
| 102 |
-
|
| 103 |
-
```bash
|
| 104 |
-
llama-mtmd-cli \
|
| 105 |
-
-m Qwable-5-27B-Coder-Q4_K_M.gguf \
|
| 106 |
-
--mmproj mmproj-Qwable-5-27B-Coder-f16.gguf \
|
| 107 |
-
-ngl 99
|
| 108 |
-
```
|
| 109 |
-
|
| 110 |
-
## Trace stack
|
| 111 |
-
|
| 112 |
-
```text
|
| 113 |
-
unsloth/Qwen3.6-27B
|
| 114 |
-
-> Claude Fable 5 coder-agent traces
|
| 115 |
-
-> Kimi 2.7 Coder traces
|
| 116 |
-
-> Qwable-5-27B-Coder
|
| 117 |
-
-> GGUF (this repo)
|
| 118 |
-
```
|
| 119 |
-
|
| 120 |
-
The release is aimed at agentic coding behavior, not benchmark-demo prose. The training signal is trace-shaped: inspect, decide, edit, verify, recover.
|
| 121 |
|
| 122 |
-
|
| 123 |
-
| --- | --- |
|
| 124 |
-
| Base | [`unsloth/Qwen3.6-27B`](https://huggingface.co/unsloth/Qwen3.6-27B) |
|
| 125 |
-
| Source checkpoint | [`DJLougen/Qwable-5-27B-Coder`](https://huggingface.co/DJLougen/Qwable-5-27B-Coder) |
|
| 126 |
-
| Architecture tag | `qwen3_5` |
|
| 127 |
-
| Release format | GGUF (llama.cpp) |
|
| 128 |
-
| MTP | Embedded (`nextn_predict_layers=1`), usable via `--spec-type draft-mtp` |
|
| 129 |
-
| Vision | `mmproj-*-f16.gguf` included |
|
| 130 |
-
| Pipeline | `image-text-to-text` |
|
| 131 |
-
| Primary use | coding agents, repository work, terminal workflows, tool-use-style chat |
|
| 132 |
-
| License | Apache-2.0 |
|
| 133 |
|
| 134 |
-
|
| 135 |
|
| 136 |
-
|
| 137 |
-
- Translate failing command output into the next useful patch.
|
| 138 |
-
- Keep constraints alive across multi-step coding tasks.
|
| 139 |
-
- Produce tool-friendly, implementation-oriented answers.
|
| 140 |
-
- Handle long engineering prompts with logs, diffs, stack traces, and partial failures.
|
| 141 |
-
- Bias toward concrete edits, commands, and verification over generic advice.
|
| 142 |
|
| 143 |
-
|
| 144 |
|
| 145 |
-
|
| 146 |
|
| 147 |
-
|
| 148 |
|
| 149 |
-
|
| 150 |
|
| 151 |
-
|
| 152 |
-
You are Qwable, a precise coding agent. Inspect before editing. Prefer minimal, correct patches. Preserve existing conventions. Verify behavior with the narrowest meaningful test before finalizing.
|
| 153 |
-
```
|
| 154 |
|
| 155 |
-
|
|
|
|
|
|
|
|
|
|
| 156 |
|
| 157 |
-
##
|
| 158 |
|
| 159 |
-
|
| 160 |
|
| 161 |
-
##
|
| 162 |
|
| 163 |
-
|
| 164 |
|
| 165 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 166 |
|
| 167 |
-
-
|
| 168 |
-
- The model may inherit failure modes from the base model and from the trace sources.
|
| 169 |
-
- Long-context behavior depends on runtime implementation, hardware, KV cache settings, and prompt structure.
|
| 170 |
-
- Tool-use quality depends on prompt format and schema consistency.
|
| 171 |
-
- Low-bit quants (especially `IQ1_S`) trade quality for size; prefer `Q4_K_M` or higher for real work.
|
| 172 |
-
- The card does not claim safety alignment beyond the base model and fine-tuning data.
|
| 173 |
|
| 174 |
-
##
|
| 175 |
|
| 176 |
-
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
base_model: DJLougen/Qwable-5-27B-Coder
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
tags:
|
| 5 |
+
- code
|
| 6 |
+
- agentic
|
| 7 |
+
- distillation
|
| 8 |
+
- demonstration
|
| 9 |
- gguf
|
|
|
|
| 10 |
- quantized
|
| 11 |
+
language:
|
| 12 |
+
- en
|
| 13 |
+
pipeline_tag: text-generation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
---
|
| 15 |
|
| 16 |
+
# Qwable-5-27B-Coder-GGUF
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
+
GGUF quantizations of [DJLougen/Qwable-5-27B-Coder](https://huggingface.co/DJLougen/Qwable-5-27B-Coder).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
+
> **Update (2026-06-22):** Read the base model card before using these. The original release was deliberately under-documented as part of a point about hype versus evidence in local AI. The full recipe and rationale are now on the base card.
|
| 21 |
|
| 22 |
+
## What this actually is
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
+
GGUF builds of a Qwen3.6-27B base that was post-trained on **10 traces total** (5 from a Fable 5 dataset, 5 generated by Kimi 2.7 Coder) in roughly **3 minutes** on a single DGX Spark. That is the entire recipe.
|
| 25 |
|
| 26 |
+
It was released to demonstrate how little work it takes to make a model look credible through framing alone, and these quants exist so the demonstration reaches the people who run local in `llama.cpp` / Ollama / LM Studio.
|
| 27 |
|
| 28 |
+
## Why this exists
|
| 29 |
|
| 30 |
+
See the [base model card](https://huggingface.co/DJLougen/Qwable-5-27B-Coder). Short version: as local AI grows, the community has to reward measured evidence over hype, buzzword names, and impressive teacher names. This release is a worked example of the failure mode.
|
| 31 |
|
| 32 |
+
## What you should actually do
|
|
|
|
|
|
|
| 33 |
|
| 34 |
+
- Test it yourself rather than trusting the card or the teacher names.
|
| 35 |
+
- Demand real evals: data volume and methodology, not just "distilled from {impressive model}."
|
| 36 |
+
- Be skeptical of version-numbered names and benchmark-maxxing.
|
| 37 |
+
- Prefer reproducible, hardware-specific open evals.
|
| 38 |
|
| 39 |
+
## Intended use
|
| 40 |
|
| 41 |
+
Educational and illustrative. Not recommended for production coding. No methodology-backed benchmark numbers are provided, by design.
|
| 42 |
|
| 43 |
+
## Quantization notes
|
| 44 |
|
| 45 |
+
> Fill in the exact quant types you shipped.
|
| 46 |
|
| 47 |
+
| Quant | Approx size | Notes |
|
| 48 |
+
|---|---|---|
|
| 49 |
+
| Q4_K_M | TBD | |
|
| 50 |
+
| Q5_K_M | TBD | |
|
| 51 |
+
| Q6_K | TBD | |
|
| 52 |
+
| Q8_0 | TBD | |
|
| 53 |
|
| 54 |
+
Quantization further compounds the caveat on the base card: at n=10 the behavioral delta over base is already narrow and underdetermined, and low-bit quants will shift it further. Do not generalize any apparent strength.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
+
## Attribution
|
| 57 |
|
| 58 |
+
- Base model: Qwen3.6-27B (see its card for license and terms)
|
| 59 |
+
- Fine-tune: DJLougen/Qwable-5-27B-Coder
|
| 60 |
+
- Seed data: Fable 5 dataset, Kimi 2.7 Coder generations
|