Instructions to use jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY", filename="Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY # Run inference directly in the terminal: llama cli -hf jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY # Run inference directly in the terminal: llama cli -hf jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY # Run inference directly in the terminal: ./llama-cli -hf jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY # Run inference directly in the terminal: ./build/bin/llama-cli -hf jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY
Use Docker
docker model run hf.co/jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY
- LM Studio
- Jan
- vLLM
How to use jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY
- Ollama
How to use jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY with Ollama:
ollama run hf.co/jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY
- Unsloth Studio
How to use jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY to start chatting
- Pi
How to use jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY with Docker Model Runner:
docker model run hf.co/jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY
- Lemonade
How to use jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY
Run and chat with the model
lemonade run user.Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY-{{QUANT_TAG}}List all available models
lemonade list
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
)Qwable 5 27B Chadrock v2 ROCmFP6 Quality
Qwable 5 27B Chadrock v2 ROCmFP6 Quality is an AMD-tuned GGUF release of DJLougen/Qwable-5-27B-Coder, quantized with the new Q6_0_ROCMFPX_STRIX_QUALITY recipe for Ryzen AI Max+ 395 / Strix Halo systems.
Qwable is an agentic coding tune of Qwen3.6 27B built for repository work, terminal feedback, tool-use style prompts, and long coding turns. This build keeps that Qwable behavior but moves from the smaller ROCmFP4 lane into the new quality-focused ROCmFP6 recipe.
This GGUF does not run correctly with stock upstream llama.cpp. It requires the Chadrock ROCmFPX runner with ROCmFP6 Strix Quality support:
https://github.com/ciru-ai/ROCmFPX/tree/rocmfp6-strix-quality
The public FP6 quality write-up is here:
https://llm.ciru.ai/reports/rocmfp6-quality-research-report-20260624/
Why This Build Exists
The first Strix FP6 speed recipe was too small for the quality target. It routed most tensors through FP4-fast storage and landed around the old speed lane size. The quality report showed that this was not close enough to a real Q6 baseline for agentic behavior.
The new Q6_0_ROCMFPX_STRIX_QUALITY recipe changes the balance:
- default bulk tensor storage moves back to
Q6_0_ROCMFPX - attention, output, and selected high-sensitivity FFN down/gate tensors are protected with
Q8_0_ROCMFPX - the resulting artifact lands at
7.37 BPW, much closer to a true Q6-class build than the older FP6 speed lane - the recipe is still AMD runtime-focused, with ROCm served decode improvements versus the downloaded Unsloth Q6 baseline in the report
This Qwable release uses that same new FP6 quality recipe, but applied to the Qwable 5 27B Coder source checkpoint.
FP6 Quality Report Findings
The linked report measured the new recipe on the Unsloth Qwen3.6 27B baseline path against the exact downloaded Unsloth Q6 comparison model. Those recipe-level findings are the reason this Qwable build uses the quality recipe instead of the older speed recipe.
| Check | New ROCmFP6 Strix Quality | Baseline / prior row |
|---|---|---|
| HermesAgent-20 overall | 0.78 |
Unsloth Q6 0.76, old FP6 speed 0.60 |
| HumanEval+ | 155/164 = 94.51% |
Unsloth Q6 153/164 = 93.29% |
| Served decode range | 15.72-30.73 tok/s |
ROCm rows from 512 to 64k prompt tokens |
| Quantized size | 24018.32 MiB |
7.37 BPW |
Key interpretation from the report: the new recipe recovered Q6-class HermesAgent behavior and improved served MTP decode speed against Q6, while the short-prompt prefill win was not universal. The useful speed claim is decode and end-to-end served behavior across the controlled rows, not a blanket prefill claim.
Lineage
unsloth/Qwen3.6-27B
-> DJLougen/Qwable-5-27B-Coder
training focus:
- Claude Fable 5 coder-agent traces
- Kimi 2.7 Coder traces
- repository work, terminal workflows, tool-use style coding
-> Qwable 5 27B Chadrock v2 ROCmFP6 Quality
The source Qwable model card describes the model as a Qwen3.6-based coder-agent tune for real coding loops: inspect, decide, edit, verify, and recover.
File
| File | Size | SHA256 |
|---|---|---|
Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY.gguf |
25,196,024,896 bytes |
e7263757665db2c31bdf2ede1b9d7b6e9575e946f46252296d2205b9b20b0430 |
Quantization provenance:
source: Qwable-5-27B-Coder-BF16.gguf
recipe: Q6_0_ROCMFPX_STRIX_QUALITY
format: rocmfp6
profile: strix-quality
source size: 52115.19 MiB, 16.00 BPW
quant size: 24018.32 MiB, 7.37 BPW
quant time: 165.8 s
Intended Use
- Local coding-agent and repository-work prompts on AMD Strix Halo.
- Users who want a higher-quality Qwable ROCmFPX lane than the smaller ROCmFP4 build.
- One-slot served MTP experimentation with the Chadrock ROCmFPX runner.
- ROCm-focused local inference where quality is more important than minimum file size.
This is the quality build. If your priority is smallest file size or maximum compactness, use the ROCmFP4 build instead.
Best Known Settings
Use the ROCmFP6 quality branch and one-slot MTP serving. The launch shape below follows the controlled FP6 quality report settings and the existing Qwable FP6 ROCm profile style.
backend: ROCm0 target + ROCm0 draft
context: 32768 for fast serving, 262144 model context metadata
batch / ubatch: 2048 / 512
target KV: q8_0 / q8_0
draft KV: f16 / f16
MTP: draft-mtp
startup draft cap: n_max=6, n_min=0, p_min=0.0, p_split=0.20
serving: one slot, metrics on, text-only for speed runs
sampler: temperature=0, top_p=0.95, top_k=20
The FP6 quality report found ROCm to be the correct backend for this quality recipe. Vulkan could match some short-prompt prefill behavior, but ROCm won decode and end-to-end time in the controlled backend comparison.
Run With Chadrock ROCmFPX
Build the runner from the quality branch:
git clone https://github.com/ciru-ai/ROCmFPX.git
cd ROCmFPX
git checkout rocmfp6-strix-quality
cmake -S . -B build-strix-rocmfp6-quality -DGGML_HIP=ON -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build-strix-rocmfp6-quality --target llama-server -j16
Launch a ROCm text-serving profile:
./build-strix-rocmfp6-quality/bin/llama-server \
-m /path/to/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY.gguf \
--alias qwable-5-27b-chadrock-v2-rocmfp6-quality \
--host 127.0.0.1 \
--port 18180 \
--jinja \
-c 32768 \
--reasoning off \
--reasoning-format none \
--reasoning-budget -1 \
--no-context-shift \
-dev ROCm0 \
-ngl 999 \
-fa on \
-b 2048 \
-ub 512 \
-t 16 \
-tb 32 \
-ctk q8_0 \
-ctv q8_0 \
--temp 0 \
--top-p 0.95 \
--top-k 20 \
--seed 123 \
--parallel 1 \
--metrics \
--ctx-checkpoints 0 \
--checkpoint-every-n-tokens -1 \
--spec-type draft-mtp \
--spec-draft-device ROCm0 \
--spec-draft-ngl all \
--spec-draft-threads 16 \
--spec-draft-threads-batch 32 \
--spec-draft-type-k f16 \
--spec-draft-type-v f16 \
--spec-draft-n-max 6 \
--spec-draft-n-min 0 \
--spec-draft-p-min 0.0 \
--spec-draft-p-split 0.20 \
--no-spec-draft-backend-sampling \
--spec-draft-poll 1 \
--spec-draft-poll-batch 1
Example /completion request:
curl -sS http://127.0.0.1:18180/completion \
-H 'Content-Type: application/json' \
-d '{
"prompt": "Write a concise technical note about ROCmFPX MTP serving.",
"n_predict": 512,
"temperature": 0,
"ignore_eos": true,
"speculative.n_max": 6,
"speculative.n_min": 0,
"speculative.p_min": 0.0
}'
Use --parallel 1 for MTP speed testing. Multi-slot serving changes draft-MTP behavior and is not the intended profile for these settings.
Benchmark Status
This card intentionally separates recipe evidence from model-specific evaluation:
- The FP6 quality recipe has been validated in the public report against the downloaded Unsloth Q6 baseline.
- This Qwable artifact has been quantized successfully with the same recipe and checksumed locally.
- Fresh Qwable-specific HermesAgent, HumanEval, PPL, and served-speed rows should be run before making model-specific quality claims beyond the recipe-level findings above.
Credits
- DJLougen: Qwable 5 27B Coder source model and coder-agent training.
- Unsloth and Qwen: Qwen3.6 27B base model path used by the source checkpoint.
- Ciru / Chadrock ROCmFPX: ROCmFP6 Strix Quality recipe, AMD Strix Halo quantization, local benchmark report, and pinned runner setup.
Notes
This is an experimental AMD ROCmFP6/MTP release for local evaluation and runtime experimentation. Speeds are hardware-sensitive and depend on driver version, clocks, prompt shape, KV cache settings, draft-token acceptance, and runtime branch.
- Downloads last month
- 198
We're not able to determine the quantization variants.

# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="jcbtc/Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY", filename="Qwable-5-27B-Chadrock-v2-ROCmFP6-QUALITY.gguf", )