Instructions to use jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY",
	filename="Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY
# Run inference directly in the terminal:
llama cli -hf jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY
# Run inference directly in the terminal:
llama cli -hf jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY
# Run inference directly in the terminal:
./llama-cli -hf jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY
# Run inference directly in the terminal:
./build/bin/llama-cli -hf jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY

Use Docker

docker model run hf.co/jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY

LM Studio
Jan

vLLM

How to use jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY

Ollama
How to use jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY with Ollama:
```
ollama run hf.co/jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY
```

Unsloth Studio

How to use jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY to start chatting

How to use jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY with Docker Model Runner:
```
docker model run hf.co/jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY
```

Lemonade

How to use jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY

Run and chat with the model

lemonade run user.Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY-{{QUANT_TAG}}

List all available models

lemonade list

Qwable 5 27B Chadrock v2 ROCmFP6 Quality

Qwable 5 27B Chadrock v2 ROCmFP6 Quality is an AMD-tuned GGUF release of DJLougen/Qwable-5-27B-Coder, quantized with the new Q6_0_ROCMFPX_STRIX_QUALITY recipe for Ryzen AI Max+ 395 / Strix Halo systems.

Qwable is an agentic coding tune of Qwen3.6 27B built for repository work, terminal feedback, tool-use style prompts, and long coding turns. This build keeps that Qwable behavior but moves from the smaller ROCmFP4 lane into the new quality-focused ROCmFP6 recipe.

This GGUF does not run correctly with stock upstream llama.cpp. It requires the Chadrock ROCmFPX runner with ROCmFP6 Strix Quality support:

https://github.com/ciru-ai/ROCmFPX/tree/rocmfp6-strix-quality

The public FP6 quality write-up is here:

https://llm.ciru.ai/reports/rocmfp6-quality-research-report-20260624/

Why This Build Exists

The first Strix FP6 speed recipe was too small for the quality target. It routed most tensors through FP4-fast storage and landed around the old speed lane size. The quality report showed that this was not close enough to a real Q6 baseline for agentic behavior.

The new Q6_0_ROCMFPX_STRIX_QUALITY recipe changes the balance:

default bulk tensor storage moves back to Q6_0_ROCMFPX
attention, output, and selected high-sensitivity FFN down/gate tensors are protected with Q8_0_ROCMFPX
the resulting artifact lands at 7.37 BPW, much closer to a true Q6-class build than the older FP6 speed lane
the recipe is still AMD runtime-focused, with ROCm served decode improvements versus the downloaded Unsloth Q6 baseline in the report

This Qwable release uses that same new FP6 quality recipe, but applied to the Qwable 5 27B Coder source checkpoint.

FP6 Quality Report Findings

The linked report measured the new recipe on the Unsloth Qwen3.6 27B baseline path against the exact downloaded Unsloth Q6 comparison model. Those recipe-level findings are the reason this Qwable build uses the quality recipe instead of the older speed recipe.

Check	New ROCmFP6 Strix Quality	Baseline / prior row
HermesAgent-20 overall	`0.78`	Unsloth Q6 `0.76`, old FP6 speed `0.60`
HumanEval+	`155/164 = 94.51%`	Unsloth Q6 `153/164 = 93.29%`
Served decode range	`15.72-30.73 tok/s`	ROCm rows from 512 to 64k prompt tokens
Quantized size	`24018.32 MiB`	`7.37 BPW`

Key interpretation from the report: the new recipe recovered Q6-class HermesAgent behavior and improved served MTP decode speed against Q6, while the short-prompt prefill win was not universal. The useful speed claim is decode and end-to-end served behavior across the controlled rows, not a blanket prefill claim.

Lineage

unsloth/Qwen3.6-27B
  -> DJLougen/Qwable-5-27B-Coder
       training focus:
         - Claude Fable 5 coder-agent traces
         - Kimi 2.7 Coder traces
         - repository work, terminal workflows, tool-use style coding
  -> Qwable 5 27B Chadrock v2 ROCmFP6 Quality

The source Qwable model card describes the model as a Qwen3.6-based coder-agent tune for real coding loops: inspect, decide, edit, verify, and recover.

File

File	Size	SHA256
`Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY.gguf`	`25,196,024,896 bytes`	`e7263757665db2c31bdf2ede1b9d7b6e9575e946f46252296d2205b9b20b0430`

Quantization provenance:

source: Qwable-5-27B-Coder-BF16.gguf
recipe: Q6_0_ROCMFPX_STRIX_QUALITY
format: rocmfp6
profile: strix-quality
source size: 52115.19 MiB, 16.00 BPW
quant size: 24018.32 MiB, 7.37 BPW
quant time: 165.8 s

Intended Use

Local coding-agent and repository-work prompts on AMD Strix Halo.
Users who want a higher-quality Qwable ROCmFPX lane than the smaller ROCmFP4 build.
One-slot served MTP experimentation with the Chadrock ROCmFPX runner.
ROCm-focused local inference where quality is more important than minimum file size.

This is the quality build. If your priority is smallest file size or maximum compactness, use the ROCmFP4 build instead.

Best Known Settings

Use the ROCmFP6 quality branch and one-slot MTP serving. The launch shape below follows the controlled FP6 quality report settings and the existing Qwable FP6 ROCm profile style.

backend: ROCm0 target + ROCm0 draft
context: 32768 for fast serving, 262144 model context metadata
batch / ubatch: 2048 / 512
target KV: q8_0 / q8_0
draft KV: f16 / f16
MTP: draft-mtp
startup draft cap: n_max=6, n_min=0, p_min=0.0, p_split=0.20
serving: one slot, metrics on, text-only for speed runs
sampler: temperature=0, top_p=0.95, top_k=20

The FP6 quality report found ROCm to be the correct backend for this quality recipe. Vulkan could match some short-prompt prefill behavior, but ROCm won decode and end-to-end time in the controlled backend comparison.

Run With Chadrock ROCmFPX

Build the runner from the quality branch:

git clone https://github.com/ciru-ai/ROCmFPX.git
cd ROCmFPX
git checkout rocmfp6-strix-quality
cmake -S . -B build-strix-rocmfp6-quality -DGGML_HIP=ON -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build-strix-rocmfp6-quality --target llama-server -j16

Launch a ROCm text-serving profile:

./build-strix-rocmfp6-quality/bin/llama-server \
  -m /path/to/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY.gguf \
  --alias qwable-5-27b-chadrock-v2-rocmfp6-quality \
  --host 127.0.0.1 \
  --port 18180 \
  --jinja \
  -c 32768 \
  --reasoning off \
  --reasoning-format none \
  --reasoning-budget -1 \
  --no-context-shift \
  -dev ROCm0 \
  -ngl 999 \
  -fa on \
  -b 2048 \
  -ub 512 \
  -t 16 \
  -tb 32 \
  -ctk q8_0 \
  -ctv q8_0 \
  --temp 0 \
  --top-p 0.95 \
  --top-k 20 \
  --seed 123 \
  --parallel 1 \
  --metrics \
  --ctx-checkpoints 0 \
  --checkpoint-every-n-tokens -1 \
  --spec-type draft-mtp \
  --spec-draft-device ROCm0 \
  --spec-draft-ngl all \
  --spec-draft-threads 16 \
  --spec-draft-threads-batch 32 \
  --spec-draft-type-k f16 \
  --spec-draft-type-v f16 \
  --spec-draft-n-max 6 \
  --spec-draft-n-min 0 \
  --spec-draft-p-min 0.0 \
  --spec-draft-p-split 0.20 \
  --no-spec-draft-backend-sampling \
  --spec-draft-poll 1 \
  --spec-draft-poll-batch 1

Example /completion request:

curl -sS http://127.0.0.1:18180/completion \
  -H 'Content-Type: application/json' \
  -d '{
    "prompt": "Write a concise technical note about ROCmFPX MTP serving.",
    "n_predict": 512,
    "temperature": 0,
    "ignore_eos": true,
    "speculative.n_max": 6,
    "speculative.n_min": 0,
    "speculative.p_min": 0.0
  }'

Use --parallel 1 for MTP speed testing. Multi-slot serving changes draft-MTP behavior and is not the intended profile for these settings.

Benchmark Status

This card intentionally separates recipe evidence from model-specific evaluation:

The FP6 quality recipe has been validated in the public report against the downloaded Unsloth Q6 baseline.
This Qwable artifact has been quantized successfully with the same recipe and checksumed locally.
Fresh Qwable-specific HermesAgent, HumanEval, PPL, and served-speed rows should be run before making model-specific quality claims beyond the recipe-level findings above.

Credits

DJLougen: Qwable 5 27B Coder source model and coder-agent training.
Unsloth and Qwen: Qwen3.6 27B base model path used by the source checkpoint.
Ciru / Chadrock ROCmFPX: ROCmFP6 Strix Quality recipe, AMD Strix Halo quantization, local benchmark report, and pinned runner setup.

Notes

This is an experimental AMD ROCmFP6/MTP release for local evaluation and runtime experimentation. Speeds are hardware-sensitive and depend on driver version, clocks, prompt shape, KV cache settings, draft-token acceptance, and runtime branch.

Downloads last month: 198

GGUF

Model size

27B params

Architecture

qwen35

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Model tree for jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY

Base model

Qwen/Qwen3.6-27B

Finetuned

DJLougen/Qwable-5-27B-Coder

Quantized

(6)

this model