How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY",
	filename="Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY.gguf",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Qwable 5 27B Chadrock v2 ROCmFP6 Quality

Qwable 5 27B Chadrock v2 ROCmFP6 Quality

Qwable 5 27B Chadrock v2 ROCmFP6 Quality is an AMD-tuned GGUF release of DJLougen/Qwable-5-27B-Coder, quantized with the new Q6_0_ROCMFPX_STRIX_QUALITY recipe for Ryzen AI Max+ 395 / Strix Halo systems.

Qwable is an agentic coding tune of Qwen3.6 27B built for repository work, terminal feedback, tool-use style prompts, and long coding turns. This build keeps that Qwable behavior but moves from the smaller ROCmFP4 lane into the new quality-focused ROCmFP6 recipe.

This GGUF does not run correctly with stock upstream llama.cpp. It requires the Chadrock ROCmFPX runner with ROCmFP6 Strix Quality support:

https://github.com/ciru-ai/ROCmFPX/tree/rocmfp6-strix-quality

The public FP6 quality write-up is here:

https://llm.ciru.ai/reports/rocmfp6-quality-research-report-20260624/

Why This Build Exists

The first Strix FP6 speed recipe was too small for the quality target. It routed most tensors through FP4-fast storage and landed around the old speed lane size. The quality report showed that this was not close enough to a real Q6 baseline for agentic behavior.

The new Q6_0_ROCMFPX_STRIX_QUALITY recipe changes the balance:

  • default bulk tensor storage moves back to Q6_0_ROCMFPX
  • attention, output, and selected high-sensitivity FFN down/gate tensors are protected with Q8_0_ROCMFPX
  • the resulting artifact lands at 7.37 BPW, much closer to a true Q6-class build than the older FP6 speed lane
  • the recipe is still AMD runtime-focused, with ROCm served decode improvements versus the downloaded Unsloth Q6 baseline in the report

This Qwable release uses that same new FP6 quality recipe, but applied to the Qwable 5 27B Coder source checkpoint.

FP6 Quality Report Findings

The linked report measured the new recipe on the Unsloth Qwen3.6 27B baseline path against the exact downloaded Unsloth Q6 comparison model. Those recipe-level findings are the reason this Qwable build uses the quality recipe instead of the older speed recipe.

Check New ROCmFP6 Strix Quality Baseline / prior row
HermesAgent-20 overall 0.78 Unsloth Q6 0.76, old FP6 speed 0.60
HumanEval+ 155/164 = 94.51% Unsloth Q6 153/164 = 93.29%
Served decode range 15.72-30.73 tok/s ROCm rows from 512 to 64k prompt tokens
Quantized size 24018.32 MiB 7.37 BPW

Key interpretation from the report: the new recipe recovered Q6-class HermesAgent behavior and improved served MTP decode speed against Q6, while the short-prompt prefill win was not universal. The useful speed claim is decode and end-to-end served behavior across the controlled rows, not a blanket prefill claim.

Lineage

unsloth/Qwen3.6-27B
  -> DJLougen/Qwable-5-27B-Coder
       training focus:
         - Claude Fable 5 coder-agent traces
         - Kimi 2.7 Coder traces
         - repository work, terminal workflows, tool-use style coding
  -> Qwable 5 27B Chadrock v2 ROCmFP6 Quality

The source Qwable model card describes the model as a Qwen3.6-based coder-agent tune for real coding loops: inspect, decide, edit, verify, and recover.

File

File Size SHA256
Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY.gguf 25,196,024,896 bytes e7263757665db2c31bdf2ede1b9d7b6e9575e946f46252296d2205b9b20b0430

Quantization provenance:

source: Qwable-5-27B-Coder-BF16.gguf
recipe: Q6_0_ROCMFPX_STRIX_QUALITY
format: rocmfp6
profile: strix-quality
source size: 52115.19 MiB, 16.00 BPW
quant size: 24018.32 MiB, 7.37 BPW
quant time: 165.8 s

Intended Use

  • Local coding-agent and repository-work prompts on AMD Strix Halo.
  • Users who want a higher-quality Qwable ROCmFPX lane than the smaller ROCmFP4 build.
  • One-slot served MTP experimentation with the Chadrock ROCmFPX runner.
  • ROCm-focused local inference where quality is more important than minimum file size.

This is the quality build. If your priority is smallest file size or maximum compactness, use the ROCmFP4 build instead.

Best Known Settings

Use the ROCmFP6 quality branch and one-slot MTP serving. The launch shape below follows the controlled FP6 quality report settings and the existing Qwable FP6 ROCm profile style.

backend: ROCm0 target + ROCm0 draft
context: 32768 for fast serving, 262144 model context metadata
batch / ubatch: 2048 / 512
target KV: q8_0 / q8_0
draft KV: f16 / f16
MTP: draft-mtp
startup draft cap: n_max=6, n_min=0, p_min=0.0, p_split=0.20
serving: one slot, metrics on, text-only for speed runs
sampler: temperature=0, top_p=0.95, top_k=20

The FP6 quality report found ROCm to be the correct backend for this quality recipe. Vulkan could match some short-prompt prefill behavior, but ROCm won decode and end-to-end time in the controlled backend comparison.

Run With Chadrock ROCmFPX

Build the runner from the quality branch:

git clone https://github.com/ciru-ai/ROCmFPX.git
cd ROCmFPX
git checkout rocmfp6-strix-quality
cmake -S . -B build-strix-rocmfp6-quality -DGGML_HIP=ON -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build-strix-rocmfp6-quality --target llama-server -j16

Launch a ROCm text-serving profile:

./build-strix-rocmfp6-quality/bin/llama-server \
  -m /path/to/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY.gguf \
  --alias qwable-5-27b-chadrock-v2-rocmfp6-quality \
  --host 127.0.0.1 \
  --port 18180 \
  --jinja \
  -c 32768 \
  --reasoning off \
  --reasoning-format none \
  --reasoning-budget -1 \
  --no-context-shift \
  -dev ROCm0 \
  -ngl 999 \
  -fa on \
  -b 2048 \
  -ub 512 \
  -t 16 \
  -tb 32 \
  -ctk q8_0 \
  -ctv q8_0 \
  --temp 0 \
  --top-p 0.95 \
  --top-k 20 \
  --seed 123 \
  --parallel 1 \
  --metrics \
  --ctx-checkpoints 0 \
  --checkpoint-every-n-tokens -1 \
  --spec-type draft-mtp \
  --spec-draft-device ROCm0 \
  --spec-draft-ngl all \
  --spec-draft-threads 16 \
  --spec-draft-threads-batch 32 \
  --spec-draft-type-k f16 \
  --spec-draft-type-v f16 \
  --spec-draft-n-max 6 \
  --spec-draft-n-min 0 \
  --spec-draft-p-min 0.0 \
  --spec-draft-p-split 0.20 \
  --no-spec-draft-backend-sampling \
  --spec-draft-poll 1 \
  --spec-draft-poll-batch 1

Example /completion request:

curl -sS http://127.0.0.1:18180/completion \
  -H 'Content-Type: application/json' \
  -d '{
    "prompt": "Write a concise technical note about ROCmFPX MTP serving.",
    "n_predict": 512,
    "temperature": 0,
    "ignore_eos": true,
    "speculative.n_max": 6,
    "speculative.n_min": 0,
    "speculative.p_min": 0.0
  }'

Use --parallel 1 for MTP speed testing. Multi-slot serving changes draft-MTP behavior and is not the intended profile for these settings.

Benchmark Status

This card intentionally separates recipe evidence from model-specific evaluation:

  • The FP6 quality recipe has been validated in the public report against the downloaded Unsloth Q6 baseline.
  • This Qwable artifact has been quantized successfully with the same recipe and checksumed locally.
  • Fresh Qwable-specific HermesAgent, HumanEval, PPL, and served-speed rows should be run before making model-specific quality claims beyond the recipe-level findings above.

Credits

  • DJLougen: Qwable 5 27B Coder source model and coder-agent training.
  • Unsloth and Qwen: Qwen3.6 27B base model path used by the source checkpoint.
  • Ciru / Chadrock ROCmFPX: ROCmFP6 Strix Quality recipe, AMD Strix Halo quantization, local benchmark report, and pinned runner setup.

Notes

This is an experimental AMD ROCmFP6/MTP release for local evaluation and runtime experimentation. Speeds are hardware-sensitive and depend on driver version, clocks, prompt shape, KV cache settings, draft-token acceptance, and runtime branch.

Downloads last month
198
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY

Base model

Qwen/Qwen3.6-27B
Quantized
(6)
this model