Instructions to use DJLougen/Qwable-5-27B-Coder-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use DJLougen/Qwable-5-27B-Coder-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="DJLougen/Qwable-5-27B-Coder-GGUF",
	filename="Qwable-5-27B-Coder-IQ1_S.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use DJLougen/Qwable-5-27B-Coder-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M

Use Docker

docker model run hf.co/DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use DJLougen/Qwable-5-27B-Coder-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "DJLougen/Qwable-5-27B-Coder-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DJLougen/Qwable-5-27B-Coder-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M

Ollama
How to use DJLougen/Qwable-5-27B-Coder-GGUF with Ollama:
```
ollama run hf.co/DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M
```

Unsloth Studio

How to use DJLougen/Qwable-5-27B-Coder-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for DJLougen/Qwable-5-27B-Coder-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for DJLougen/Qwable-5-27B-Coder-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for DJLougen/Qwable-5-27B-Coder-GGUF to start chatting

How to use DJLougen/Qwable-5-27B-Coder-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use DJLougen/Qwable-5-27B-Coder-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use DJLougen/Qwable-5-27B-Coder-GGUF with Docker Model Runner:
```
docker model run hf.co/DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M
```

Lemonade

How to use DJLougen/Qwable-5-27B-Coder-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull DJLougen/Qwable-5-27B-Coder-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Qwable-5-27B-Coder-GGUF-Q4_K_M

List all available models

lemonade list

DJLougen commited on 5 days ago

Commit

0d8c134

verified ·

1 Parent(s): a46ba63

Rewrite GGUF card: point to base debrief, honest quant notes

Browse files

Replace hype card; reference base model card for recipe and rationale. Update (2026-06-22).

Files changed (1) hide show

README.md +36 -152

README.md CHANGED Viewed

@@ -1,176 +1,60 @@
 ---
 license: apache-2.0
-library_name: gguf
-pipeline_tag: image-text-to-text
-base_model:
-  - DJLougen/Qwable-5-27B-Coder
-base_model_relation: quantized
-language:
-  - en
 tags:
   - gguf
-  - llama.cpp
   - quantized
-  - mtp
-  - speculative-decoding
-  - vision
-  - qwen
-  - qwen3_6
-  - qwen3_5
-  - coder
-  - coding-agent
-  - agentic-coding
-  - tool-use
-  - function-calling
-  - repository-work
-  - terminal-workflows
-  - long-context
-  - image-text-to-text
-  - unsloth
-  - trl
 ---
-<p align="center">
-  <img src="./assets/banner.jpeg" alt="Qwable-5-27B-Coder banner" style="width:100%;max-width:1024px;border-radius:18px;" />
-</p>
-<p align="center">
-  <img src="./assets/trace-board.svg" alt="Qwable trace board" style="width:100%;max-width:1100px;border-radius:22px;" />
-</p>
-# Qwable-5-27B-Coder (GGUF)
-**Qwable-5-27B-Coder** is a Qwen3.6-based coder-agent tune trained first on **Claude Fable 5 traces**, then continued on **Kimi 2.7 Coder traces**. It is built for the messy part of coding work: reading a repo, planning a patch, using terminal feedback, fixing the miss, and carrying constraints through long turns.
-> This repository hosts the **GGUF** builds for llama.cpp / Ollama / local workstation inference. The **MTP head is embedded** (`blk.64.nextn.*`, `nextn_predict_layers=1`) for speculative decoding, and a vision **mmproj** is included for multimodal use.
-> Early maintainer runs show Qwable outperforming the base model on a private coder benchmark. Public scores, harness settings, and task definitions will be added when the evaluation packet is ready.
-<a href="https://ko-fi.com/djlougen"><img alt="Support on Ko-fi" src="https://img.shields.io/badge/Support%20the%20compute-Ko--fi-ff5f5f?style=for-the-badge&logo=kofi&logoColor=white"></a>
-Training, quantization, and coder-agent evaluation are expensive. If Qwable helps your work, support continued releases at **[ko-fi.com/djlougen](https://ko-fi.com/djlougen)**.
-## Release channels
-| Repo | Format | Use it when |
-| --- | --- | --- |
-| [`DJLougen/Qwable-5-27B-Coder`](https://huggingface.co/DJLougen/Qwable-5-27B-Coder) | BF16 Transformers safetensors | You want the source checkpoint, further training, conversion, or quality-ceiling evaluation. |
-| [`DJLougen/Qwable-5-27B-Coder-GGUF`](https://huggingface.co/DJLougen/Qwable-5-27B-Coder-GGUF) | GGUF | You want llama.cpp, Ollama, or local workstation inference. |
-| [`DJLougen/Qwable-5-27B-Coder-NVFP4`](https://huggingface.co/DJLougen/Qwable-5-27B-Coder-NVFP4) | ModelOpt NVFP4 safetensors | You want a compact NVIDIA-serving checkpoint for supported vLLM / TensorRT-LLM stacks. |
-## GGUF files
-All quants embed the MTP head and were converted from the BF16 checkpoint. `IQ1_S` is built with an importance matrix (the MTP block is kept at Q6_K so it stays usable).
-| File | Bits | Size | Notes |
-| --- | --- | --- | --- |
-| `Qwable-5-27B-Coder-Q8_0.gguf` | 8.50 bpw | ~29 GB | Highest fidelity GGUF. |
-| `Qwable-5-27B-Coder-Q6_K.gguf` | 6.56 bpw | ~22 GB | Near-lossless, smaller. |
-| `Qwable-5-27B-Coder-Q4_K_M.gguf` | 4.92 bpw | ~17 GB | Balanced default for most GPUs. |
-| `Qwable-5-27B-Coder-IQ1_S.gguf` | ~2.1 bpw | ~7 GB | Smallest; imatrix-quantized, lowest fidelity. |
-| `mmproj-Qwable-5-27B-Coder-f16.gguf` | f16 | ~0.9 GB | Vision projector (pair with any text quant). |
-| `chat_template.jinja` | - | - | Chat template. |
-Requires a llama.cpp build with **`qwen3_5`** architecture support (MTP/`nextn` aware).
-## Quickstart (llama.cpp)
-Download a single quant:
-```bash
-hf download DJLougen/Qwable-5-27B-Coder-GGUF \
-  Qwable-5-27B-Coder-Q4_K_M.gguf --local-dir .
-```
-Plain text inference:
-```bash
-llama-server -m Qwable-5-27B-Coder-Q4_K_M.gguf -ngl 99 -c 8192 --jinja
-```
-**MTP speculative decoding** (uses the embedded MTP head as the draft; shared weights):
-```bash
-llama-server \
-  -m  Qwable-5-27B-Coder-Q8_0.gguf \
-  -md Qwable-5-27B-Coder-Q8_0.gguf \
-  --spec-type draft-mtp \
-  -ngl 99 -ngld 99 -c 8192 --jinja
-```
-**Multimodal** (add the vision projector):
-```bash
-llama-mtmd-cli \
-  -m Qwable-5-27B-Coder-Q4_K_M.gguf \
-  --mmproj mmproj-Qwable-5-27B-Coder-f16.gguf \
-  -ngl 99
-```
-## Trace stack
-```text
-unsloth/Qwen3.6-27B
-  -> Claude Fable 5 coder-agent traces
-  -> Kimi 2.7 Coder traces
-  -> Qwable-5-27B-Coder
-  -> GGUF (this repo)
-```
-The release is aimed at agentic coding behavior, not benchmark-demo prose. The training signal is trace-shaped: inspect, decide, edit, verify, recover.
-| Attribute | Details |
-| --- | --- |
-| Base | [`unsloth/Qwen3.6-27B`](https://huggingface.co/unsloth/Qwen3.6-27B) |
-| Source checkpoint | [`DJLougen/Qwable-5-27B-Coder`](https://huggingface.co/DJLougen/Qwable-5-27B-Coder) |
-| Architecture tag | `qwen3_5` |
-| Release format | GGUF (llama.cpp) |
-| MTP | Embedded (`nextn_predict_layers=1`), usable via `--spec-type draft-mtp` |
-| Vision | `mmproj-*-f16.gguf` included |
-| Pipeline | `image-text-to-text` |
-| Primary use | coding agents, repository work, terminal workflows, tool-use-style chat |
-| License | Apache-2.0 |
-## What Qwable is tuned to do
-- Navigate real repositories instead of isolated snippets.
-- Translate failing command output into the next useful patch.
-- Keep constraints alive across multi-step coding tasks.
-- Produce tool-friendly, implementation-oriented answers.
-- Handle long engineering prompts with logs, diffs, stack traces, and partial failures.
-- Bias toward concrete edits, commands, and verification over generic advice.
-## Prompting profile
-Qwable works best when the prompt looks like an actual coding task, not a riddle.
-Good inputs include the relevant files, exact failing command output, hard constraints, expected output format, tool boundaries, and a verifier command or acceptance test when available.
-Suggested system prompt:
-```text
-You are Qwable, a precise coding agent. Inspect before editing. Prefer minimal, correct patches. Preserve existing conventions. Verify behavior with the narrowest meaningful test before finalizing.
-```
-The current `generation_config.json` uses `temperature=1.0`, `top_p=0.95`, and `top_k=20`.
-## Evaluation status
-Current public status: early maintainer testing only. The maintainer has observed wins over the base model on a private coder benchmark, but reproducible claims require the full packet: benchmark name, split, prompt format, tool schema, harness commit, sampling settings, pass/fail rules, and raw results.
-## Vision and multimodal note
-The repository is configured as `image-text-to-text`, and the base model family supports image/video tokens through the Qwen vision stack via the included `mmproj`. This fine-tune is marketed for coding behavior. Do not assume it improves vision understanding unless you evaluate that separately.
-## Limitations
-- Public benchmark scores are not published yet.
-- The model may inherit failure modes from the base model and from the trace sources.
-- Long-context behavior depends on runtime implementation, hardware, KV cache settings, and prompt structure.
-- Tool-use quality depends on prompt format and schema consistency.
-- Low-bit quants (especially `IQ1_S`) trade quality for size; prefer `Q4_K_M` or higher for real work.
-- The card does not claim safety alignment beyond the base model and fine-tuning data.
-## License
-Released under Apache-2.0, following the upstream base model license metadata.

 ---
 license: apache-2.0
+base_model: DJLougen/Qwable-5-27B-Coder
 tags:
+  - code
+  - agentic
+  - distillation
+  - demonstration
   - gguf
   - quantized
+language:
+  - en
+pipeline_tag: text-generation
 ---
+# Qwable-5-27B-Coder-GGUF
+GGUF quantizations of [DJLougen/Qwable-5-27B-Coder](https://huggingface.co/DJLougen/Qwable-5-27B-Coder).
+> **Update (2026-06-22):** Read the base model card before using these. The original release was deliberately under-documented as part of a point about hype versus evidence in local AI. The full recipe and rationale are now on the base card.
+## What this actually is
+GGUF builds of a Qwen3.6-27B base that was post-trained on **10 traces total** (5 from a Fable 5 dataset, 5 generated by Kimi 2.7 Coder) in roughly **3 minutes** on a single DGX Spark. That is the entire recipe.
+It was released to demonstrate how little work it takes to make a model look credible through framing alone, and these quants exist so the demonstration reaches the people who run local in `llama.cpp` / Ollama / LM Studio.
+## Why this exists
+See the [base model card](https://huggingface.co/DJLougen/Qwable-5-27B-Coder). Short version: as local AI grows, the community has to reward measured evidence over hype, buzzword names, and impressive teacher names. This release is a worked example of the failure mode.
+## What you should actually do
+- Test it yourself rather than trusting the card or the teacher names.
+- Demand real evals: data volume and methodology, not just "distilled from {impressive model}."
+- Be skeptical of version-numbered names and benchmark-maxxing.
+- Prefer reproducible, hardware-specific open evals.
+## Intended use
+Educational and illustrative. Not recommended for production coding. No methodology-backed benchmark numbers are provided, by design.
+## Quantization notes
+> Fill in the exact quant types you shipped.
+| Quant | Approx size | Notes |
+|---|---|---|
+| Q4_K_M | TBD | |
+| Q5_K_M | TBD | |
+| Q6_K | TBD | |
+| Q8_0 | TBD | |
+Quantization further compounds the caveat on the base card: at n=10 the behavioral delta over base is already narrow and underdetermined, and low-bit quants will shift it further. Do not generalize any apparent strength.
+## Attribution
+- Base model: Qwen3.6-27B (see its card for license and terms)
+- Fine-tune: DJLougen/Qwable-5-27B-Coder
+- Seed data: Fable 5 dataset, Kimi 2.7 Coder generations