DJLougen commited on
Commit
0d8c134
·
verified ·
1 Parent(s): a46ba63

Rewrite GGUF card: point to base debrief, honest quant notes

Browse files

Replace hype card; reference base model card for recipe and rationale. Update (2026-06-22).

Files changed (1) hide show
  1. README.md +36 -152
README.md CHANGED
@@ -1,176 +1,60 @@
1
  ---
2
  license: apache-2.0
3
- library_name: gguf
4
- pipeline_tag: image-text-to-text
5
- base_model:
6
- - DJLougen/Qwable-5-27B-Coder
7
- base_model_relation: quantized
8
- language:
9
- - en
10
  tags:
 
 
 
 
11
  - gguf
12
- - llama.cpp
13
  - quantized
14
- - mtp
15
- - speculative-decoding
16
- - vision
17
- - qwen
18
- - qwen3_6
19
- - qwen3_5
20
- - coder
21
- - coding-agent
22
- - agentic-coding
23
- - tool-use
24
- - function-calling
25
- - repository-work
26
- - terminal-workflows
27
- - long-context
28
- - image-text-to-text
29
- - unsloth
30
- - trl
31
  ---
32
 
33
- <p align="center">
34
- <img src="./assets/banner.jpeg" alt="Qwable-5-27B-Coder banner" style="width:100%;max-width:1024px;border-radius:18px;" />
35
- </p>
36
-
37
- <p align="center">
38
- <img src="./assets/trace-board.svg" alt="Qwable trace board" style="width:100%;max-width:1100px;border-radius:22px;" />
39
- </p>
40
-
41
- # Qwable-5-27B-Coder (GGUF)
42
-
43
- **Qwable-5-27B-Coder** is a Qwen3.6-based coder-agent tune trained first on **Claude Fable 5 traces**, then continued on **Kimi 2.7 Coder traces**. It is built for the messy part of coding work: reading a repo, planning a patch, using terminal feedback, fixing the miss, and carrying constraints through long turns.
44
-
45
- > This repository hosts the **GGUF** builds for llama.cpp / Ollama / local workstation inference. The **MTP head is embedded** (`blk.64.nextn.*`, `nextn_predict_layers=1`) for speculative decoding, and a vision **mmproj** is included for multimodal use.
46
-
47
- > Early maintainer runs show Qwable outperforming the base model on a private coder benchmark. Public scores, harness settings, and task definitions will be added when the evaluation packet is ready.
48
-
49
- <a href="https://ko-fi.com/djlougen"><img alt="Support on Ko-fi" src="https://img.shields.io/badge/Support%20the%20compute-Ko--fi-ff5f5f?style=for-the-badge&logo=kofi&logoColor=white"></a>
50
-
51
- Training, quantization, and coder-agent evaluation are expensive. If Qwable helps your work, support continued releases at **[ko-fi.com/djlougen](https://ko-fi.com/djlougen)**.
52
-
53
- ## Release channels
54
-
55
- | Repo | Format | Use it when |
56
- | --- | --- | --- |
57
- | [`DJLougen/Qwable-5-27B-Coder`](https://huggingface.co/DJLougen/Qwable-5-27B-Coder) | BF16 Transformers safetensors | You want the source checkpoint, further training, conversion, or quality-ceiling evaluation. |
58
- | [`DJLougen/Qwable-5-27B-Coder-GGUF`](https://huggingface.co/DJLougen/Qwable-5-27B-Coder-GGUF) | GGUF | You want llama.cpp, Ollama, or local workstation inference. |
59
- | [`DJLougen/Qwable-5-27B-Coder-NVFP4`](https://huggingface.co/DJLougen/Qwable-5-27B-Coder-NVFP4) | ModelOpt NVFP4 safetensors | You want a compact NVIDIA-serving checkpoint for supported vLLM / TensorRT-LLM stacks. |
60
-
61
- ## GGUF files
62
-
63
- All quants embed the MTP head and were converted from the BF16 checkpoint. `IQ1_S` is built with an importance matrix (the MTP block is kept at Q6_K so it stays usable).
64
-
65
- | File | Bits | Size | Notes |
66
- | --- | --- | --- | --- |
67
- | `Qwable-5-27B-Coder-Q8_0.gguf` | 8.50 bpw | ~29 GB | Highest fidelity GGUF. |
68
- | `Qwable-5-27B-Coder-Q6_K.gguf` | 6.56 bpw | ~22 GB | Near-lossless, smaller. |
69
- | `Qwable-5-27B-Coder-Q4_K_M.gguf` | 4.92 bpw | ~17 GB | Balanced default for most GPUs. |
70
- | `Qwable-5-27B-Coder-IQ1_S.gguf` | ~2.1 bpw | ~7 GB | Smallest; imatrix-quantized, lowest fidelity. |
71
- | `mmproj-Qwable-5-27B-Coder-f16.gguf` | f16 | ~0.9 GB | Vision projector (pair with any text quant). |
72
- | `chat_template.jinja` | - | - | Chat template. |
73
-
74
- Requires a llama.cpp build with **`qwen3_5`** architecture support (MTP/`nextn` aware).
75
-
76
- ## Quickstart (llama.cpp)
77
-
78
- Download a single quant:
79
-
80
- ```bash
81
- hf download DJLougen/Qwable-5-27B-Coder-GGUF \
82
- Qwable-5-27B-Coder-Q4_K_M.gguf --local-dir .
83
- ```
84
-
85
- Plain text inference:
86
-
87
- ```bash
88
- llama-server -m Qwable-5-27B-Coder-Q4_K_M.gguf -ngl 99 -c 8192 --jinja
89
- ```
90
-
91
- **MTP speculative decoding** (uses the embedded MTP head as the draft; shared weights):
92
-
93
- ```bash
94
- llama-server \
95
- -m Qwable-5-27B-Coder-Q8_0.gguf \
96
- -md Qwable-5-27B-Coder-Q8_0.gguf \
97
- --spec-type draft-mtp \
98
- -ngl 99 -ngld 99 -c 8192 --jinja
99
- ```
100
-
101
- **Multimodal** (add the vision projector):
102
-
103
- ```bash
104
- llama-mtmd-cli \
105
- -m Qwable-5-27B-Coder-Q4_K_M.gguf \
106
- --mmproj mmproj-Qwable-5-27B-Coder-f16.gguf \
107
- -ngl 99
108
- ```
109
-
110
- ## Trace stack
111
-
112
- ```text
113
- unsloth/Qwen3.6-27B
114
- -> Claude Fable 5 coder-agent traces
115
- -> Kimi 2.7 Coder traces
116
- -> Qwable-5-27B-Coder
117
- -> GGUF (this repo)
118
- ```
119
-
120
- The release is aimed at agentic coding behavior, not benchmark-demo prose. The training signal is trace-shaped: inspect, decide, edit, verify, recover.
121
 
122
- | Attribute | Details |
123
- | --- | --- |
124
- | Base | [`unsloth/Qwen3.6-27B`](https://huggingface.co/unsloth/Qwen3.6-27B) |
125
- | Source checkpoint | [`DJLougen/Qwable-5-27B-Coder`](https://huggingface.co/DJLougen/Qwable-5-27B-Coder) |
126
- | Architecture tag | `qwen3_5` |
127
- | Release format | GGUF (llama.cpp) |
128
- | MTP | Embedded (`nextn_predict_layers=1`), usable via `--spec-type draft-mtp` |
129
- | Vision | `mmproj-*-f16.gguf` included |
130
- | Pipeline | `image-text-to-text` |
131
- | Primary use | coding agents, repository work, terminal workflows, tool-use-style chat |
132
- | License | Apache-2.0 |
133
 
134
- ## What Qwable is tuned to do
135
 
136
- - Navigate real repositories instead of isolated snippets.
137
- - Translate failing command output into the next useful patch.
138
- - Keep constraints alive across multi-step coding tasks.
139
- - Produce tool-friendly, implementation-oriented answers.
140
- - Handle long engineering prompts with logs, diffs, stack traces, and partial failures.
141
- - Bias toward concrete edits, commands, and verification over generic advice.
142
 
143
- ## Prompting profile
144
 
145
- Qwable works best when the prompt looks like an actual coding task, not a riddle.
146
 
147
- Good inputs include the relevant files, exact failing command output, hard constraints, expected output format, tool boundaries, and a verifier command or acceptance test when available.
148
 
149
- Suggested system prompt:
150
 
151
- ```text
152
- You are Qwable, a precise coding agent. Inspect before editing. Prefer minimal, correct patches. Preserve existing conventions. Verify behavior with the narrowest meaningful test before finalizing.
153
- ```
154
 
155
- The current `generation_config.json` uses `temperature=1.0`, `top_p=0.95`, and `top_k=20`.
 
 
 
156
 
157
- ## Evaluation status
158
 
159
- Current public status: early maintainer testing only. The maintainer has observed wins over the base model on a private coder benchmark, but reproducible claims require the full packet: benchmark name, split, prompt format, tool schema, harness commit, sampling settings, pass/fail rules, and raw results.
160
 
161
- ## Vision and multimodal note
162
 
163
- The repository is configured as `image-text-to-text`, and the base model family supports image/video tokens through the Qwen vision stack via the included `mmproj`. This fine-tune is marketed for coding behavior. Do not assume it improves vision understanding unless you evaluate that separately.
164
 
165
- ## Limitations
 
 
 
 
 
166
 
167
- - Public benchmark scores are not published yet.
168
- - The model may inherit failure modes from the base model and from the trace sources.
169
- - Long-context behavior depends on runtime implementation, hardware, KV cache settings, and prompt structure.
170
- - Tool-use quality depends on prompt format and schema consistency.
171
- - Low-bit quants (especially `IQ1_S`) trade quality for size; prefer `Q4_K_M` or higher for real work.
172
- - The card does not claim safety alignment beyond the base model and fine-tuning data.
173
 
174
- ## License
175
 
176
- Released under Apache-2.0, following the upstream base model license metadata.
 
 
 
1
  ---
2
  license: apache-2.0
3
+ base_model: DJLougen/Qwable-5-27B-Coder
 
 
 
 
 
 
4
  tags:
5
+ - code
6
+ - agentic
7
+ - distillation
8
+ - demonstration
9
  - gguf
 
10
  - quantized
11
+ language:
12
+ - en
13
+ pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ---
15
 
16
+ # Qwable-5-27B-Coder-GGUF
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
+ GGUF quantizations of [DJLougen/Qwable-5-27B-Coder](https://huggingface.co/DJLougen/Qwable-5-27B-Coder).
 
 
 
 
 
 
 
 
 
 
19
 
20
+ > **Update (2026-06-22):** Read the base model card before using these. The original release was deliberately under-documented as part of a point about hype versus evidence in local AI. The full recipe and rationale are now on the base card.
21
 
22
+ ## What this actually is
 
 
 
 
 
23
 
24
+ GGUF builds of a Qwen3.6-27B base that was post-trained on **10 traces total** (5 from a Fable 5 dataset, 5 generated by Kimi 2.7 Coder) in roughly **3 minutes** on a single DGX Spark. That is the entire recipe.
25
 
26
+ It was released to demonstrate how little work it takes to make a model look credible through framing alone, and these quants exist so the demonstration reaches the people who run local in `llama.cpp` / Ollama / LM Studio.
27
 
28
+ ## Why this exists
29
 
30
+ See the [base model card](https://huggingface.co/DJLougen/Qwable-5-27B-Coder). Short version: as local AI grows, the community has to reward measured evidence over hype, buzzword names, and impressive teacher names. This release is a worked example of the failure mode.
31
 
32
+ ## What you should actually do
 
 
33
 
34
+ - Test it yourself rather than trusting the card or the teacher names.
35
+ - Demand real evals: data volume and methodology, not just "distilled from {impressive model}."
36
+ - Be skeptical of version-numbered names and benchmark-maxxing.
37
+ - Prefer reproducible, hardware-specific open evals.
38
 
39
+ ## Intended use
40
 
41
+ Educational and illustrative. Not recommended for production coding. No methodology-backed benchmark numbers are provided, by design.
42
 
43
+ ## Quantization notes
44
 
45
+ > Fill in the exact quant types you shipped.
46
 
47
+ | Quant | Approx size | Notes |
48
+ |---|---|---|
49
+ | Q4_K_M | TBD | |
50
+ | Q5_K_M | TBD | |
51
+ | Q6_K | TBD | |
52
+ | Q8_0 | TBD | |
53
 
54
+ Quantization further compounds the caveat on the base card: at n=10 the behavioral delta over base is already narrow and underdetermined, and low-bit quants will shift it further. Do not generalize any apparent strength.
 
 
 
 
 
55
 
56
+ ## Attribution
57
 
58
+ - Base model: Qwen3.6-27B (see its card for license and terms)
59
+ - Fine-tune: DJLougen/Qwable-5-27B-Coder
60
+ - Seed data: Fable 5 dataset, Kimi 2.7 Coder generations