CHADROCK3.6 27B Pi Agent ROCmFP4 MTP

CHADROCK3.6 27B Pi Agent ROCmFP4 MTP

CHADROCK3.6 27B Pi Agent is a Chadrock ROCmFP4/MTP GGUF release of bytkim/Qwen3.6-27B-MTP-pi-tune-GGUF, tuned for no-thinking local coding-agent loops and served here in Charlie's AMD-focused ROCmFP4 Strix Lean runtime format.

This release is meant for Pi-style terminal agents, repository edits, shell-verifier loops, and direct no-thinking coding workflows on AMD Ryzen AI Max+ 395 / Strix Halo systems. It keeps the upstream Pi tune's non-thinking sampling profile and MTP speculative decoding posture, then converts the model into a compact 14 GB Chadrock ROCmFP4 GGUF for the local Strix runtime.

This GGUF will not run correctly with stock llama.cpp. It needs the pinned ciru-ai/ROCmFPX runner because the file uses ROCmFP4 tensor types that upstream llama.cpp does not currently understand.

Why This Build Exists

The upstream Pi tune is built around the fast no-thinking path: act directly, emit tool-shaped work, and avoid spending the agent loop's wall time on hidden scratchpad tokens. Chadrock adds the AMD runtime piece:

  • ROCmFP4 Strix Lean tensor recipe
  • native draft-MTP serving
  • AMD ROCm/HIP and Vulkan-oriented local runtime path
  • q8 target and draft KV profile
  • one-slot agent serving
  • 128K context profile for local coding sessions

Treat this as a model/runtime pairing for Strix Halo rather than a generic GGUF quant.

Model Lineage

Qwen/Qwen3.6-27B
  -> bytkim/Qwen3.6-27B-MTP-pi-tune-GGUF
       adds:
         - 4-bit QLoRA SFT Pi-style agent trajectories
         - no-thinking coding-agent behavior
         - MTP speculative decoding heads
  -> jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp
       adds:
         - Chadrock ROCmFP4 Strix Lean conversion
         - local AMD Strix profile and run settings

The upstream card describes the source as a 4-bit QLoRA SFT Multi-Token Prediction tune of Qwen3.6-27B for no-thinking agentic coding through a Pi-style harness. This release keeps that behavioral target and changes the local runtime format.

Technical Metadata

Field Value
model size 27B dense
architecture qwen35
local runtime format ROCmFP4 Chadrock GGUF
direct upstream/source GGUF bytkim/Qwen3.6-27B-MTP-pi-tune-GGUF
upstream revision used locally 22943ba3319569b743d4cb3832e3aa336785d8b8
local profile chadrock3.6-27b-pi-agent-rocmfp4
context target 131072 tokens
max generation profile 65536 tokens
draft mode draft-mtp, n_max=3, p_split=0.10
device profile ROCm0, split mode none
target KV cache q8_0
draft KV cache q8_0
batch / ubatch 2048 / 512
intended hardware AMD Ryzen AI Max+ 395 / Strix Halo

Local Coding Benchmarks

All numbers below were measured locally on AMD Ryzen AI Max+ 395 / Strix Halo. The Pi Agent rows use the local profile chadrock3.6-27b-pi-agent-rocmfp4.

Official Scored Rows

Benchmark Run Result
EvalPlus HumanEval base 20260617T164929Z-quick-coding-test-small 157/164 = 95.73%
EvalPlus HumanEval+ 20260617T164929Z-quick-coding-test-small 151/164 = 92.07%
BigCodeBench Hard Instruct 20260618T031413Z-full-coding-benchmark-large 40/148 = 27.03% pass@1

For BigCodeBench, the official model score is the original 148-task row. A later 146-task adjusted subset exists in the local lab for debugging ground-truth issues, but it is not the score used here.

The stored Chadrock 27B Coder quality rows are from a different agent/model lineage, so they are not used as a quality baseline for this Pi Agent release. Add a quality before/after table only when there is an official stored EvalPlus/BigCodeBench run for the matching upstream/source Pi Agent under the same benchmark protocol.

Speed Before / After And Current Profile

The speed comparison below is against the direct upstream/source Pi Agent GGUF, bytkim/Qwen3.6-27B-MTP-pi-tune-GGUF, not a different Chadrock Coder model. These are llama-server API rows from June 17, 2026 on the same Strix Halo host. The current best Chadrock serving profile was selected in a follow-up ROCm tuning pass on June 18, 2026.

The current measured local winner for this Pi Agent release is:

ROCm0, split none, q8/q8 target and draft KV, batch 2048, ubatch 512,
draft-mtp n-max 3, p-min 0, p-split 0.10, one slot, reasoning off

Against that upstream/source Pi Agent, the Chadrock ROCmFP4 build improved the published long-prompt row by 1.30x decode speed while also lowering TTFP:

Workload Upstream Pi Agent Q4_K_M Chadrock Pi Agent ROCmFP4 Chadrock delta
short prompt, card/default KV, 512 generated tokens 28.20 tok/s, 1111 ms TTFP 27.03 tok/s, 874 ms TTFP 0.96x decode, 21% lower TTFP
short prompt, f16 KV, 512 generated tokens 25.40 tok/s, 922 ms TTFP 26.69 tok/s, 854 ms TTFP 1.05x decode, 7% lower TTFP
short prompt, q8 KV + MTP n=4, 512 generated tokens 24.78 tok/s, 945 ms TTFP 26.11 tok/s, 851 ms TTFP 1.05x decode, 10% lower TTFP
long prompt, card/default KV, 128 generated tokens 18.87 tok/s, 53.72 s TTFP 24.52 tok/s, 46.11 s TTFP 1.30x decode, 14% lower TTFP

After that upstream/source comparison, the current ROCm tuning pass selected the best Chadrock serving profile. On the same 512-token smoke prompt, moving the Chadrock profile from the older Vulkan device path to ROCm improved generation speed and latency:

Device path Decode TTFP Prompt throughput Draft accepted
Vulkan0 26.94 tok/s 900 ms 162.62 tok/s 332/536
ROCm0 27.70 tok/s 672 ms 218.11 tok/s 339/514

That ROCm tuning step is a 1.03x decode-speed gain over the Chadrock Vulkan launch path, about 25% lower TTFP, and about 34% higher prompt throughput on the smoke run. It is a local serving-setting improvement on top of the Chadrock-vs-upstream gains above.

For a stricter apples-to-apples high-context check, the same promoted ROCm profile was rerun against the upstream/source Pi Agent on the same harder ~39K-token prompt with two uncached 512-token generations, same ROCm0 backend, same q8/q8 target and draft KV, same b2048/u512, and same MTP n=3 settings:

Model and setting Mean decode Mean TTFP Mean prompt throughput Draft accepted
upstream/source Pi Agent Q4_K_M, ROCm0 q8/q8 b2048 n3 16.32 tok/s 179.04 s 218.05 tok/s 711/923
Chadrock Pi Agent ROCmFP4, promoted ROCm profile 20.60 tok/s 164.13 s 237.87 tok/s 688/1000

On that matched high-context run, Chadrock is 1.26x faster on decode, has about 8% lower TTFP, and has about 9% higher prompt throughput than the upstream/source Pi Agent.

The same high-context gate also tested harder ROCm variants against the promoted Chadrock profile:

Setting Mean decode Mean TTFP Mean prompt throughput Draft accepted
promoted ROCm profile 20.60 tok/s 164.13 s 237.87 tok/s 688/1000
HSA_ENABLE_SDMA=0 20.54 tok/s 166.14 s 234.99 tok/s 688/1000
FATTN_V_NTHREADS=4 build 20.58 tok/s 165.84 s 235.40 tok/s 688/1000
FATTN_KQ_NTHREADS=2 build 20.48 tok/s 164.93 s 236.72 tok/s 688/1000

Those harder ROCm variants did not beat the promoted profile, so they are not used in the recommended command below.

During the official BigCodeBench quality run, the Chadrock server metrics recorded:

Metric Value
peak prompt throughput while active 254.069 tok/s
peak decode throughput while active 34.8247 tok/s
last active prompt throughput 220.612 tok/s
last active decode throughput 30.3895 tok/s

These are local benchmark-server measurements, not universal llama.cpp claims. Throughput depends on driver version, clocks, prompt shape, KV cache settings, and MTP acceptance.

Best Settings / Advanced Setup

For the pinned runner build, copy-paste build commands, request-level speculative controls, and the 35B/27B reproduction notes, use the advanced Ciru setup page:

https://llm.ciru.ai/chadrock-rocmfpx/

The current pinned runner build is:

ciru-ai/ROCmFPX commit: 7aa484a2f0a504dc612a3d74a068024f3e6d6353
historical score tag: chadrock-rocmfp4-mtp-scores-20260621

For this Pi Agent release, keep the current local winner:

backend: ROCm0 target + ROCm0 draft
context: 131072
batch / ubatch: 2048 / 512
target KV: q8_0 / q8_0
draft KV: q8_0 / q8_0
MTP: draft-mtp, n_max=3, n_min=0, p_min=0.0, p_split=0.10
serving: one slot, metrics on, no-thinking mode
sampler: temperature=0.7, top_p=0.8, top_k=20, presence_penalty=1.5
reasoning: off, reasoning_format=none, reasoning_budget=0

This is the profile selected by the June 18 ROCm tuning pass. It is the best fit for Pi-style repository agents because it keeps TTFP low, uses q8 KV for the target and draft paths, and preserves the no-thinking behavior target.

Run With llama-server

Build Charlie's custom llama.cpp once, download the GGUF, and run:

HSA_OVERRIDE_GFX_VERSION=11.5.1 \
GGML_HIP_ENABLE_UNIFIED_MEMORY=1 \
/path/to/rocmfp4-llama/build-strix-rocmfp4/bin/llama-server \
  -m CHADROCK3.6-27B-Pi-Agent-MTP-ROCmFP4-STRIX_LEAN.gguf \
  --alias chadrock3.6-27b-pi-agent-rocmfp4 \
  --host 127.0.0.1 \
  --port 8080 \
  --jinja \
  -c 131072 \
  -ngl 999 \
  -fa on \
  -dev ROCm0 \
  -sm none \
  -b 2048 \
  -ub 512 \
  -t 16 \
  -tb 32 \
  --cache-type-k q8_0 \
  --cache-type-v q8_0 \
  --ctx-checkpoints 0 \

  --checkpoint-every-n-tokens -1 \

  --spec-type draft-mtp \
  --spec-draft-device ROCm0 \
  --spec-draft-ngl all \
  --spec-draft-type-k q8_0 \
  --spec-draft-type-v q8_0 \
  --spec-draft-n-max 3 \
  --spec-draft-n-min 0 \
  --spec-draft-p-min 0.0 \
  --spec-draft-p-split 0.10 \
  --parallel 1 \
  --temp 0.7 \
  --top-p 0.8 \
  --top-k 20 \
  --min-p 0 \
  --presence-penalty 1.5 \
  --seed 123 \
  --metrics

Use --parallel 1 for this MTP profile. Multi-slot serving changes draft-MTP behavior and is not the intended configuration.

Pi Agent Profile

The local Pi-facing model ID is:

chadrock3.6-27b-pi-agent-rocmfp4

The profile is configured as a no-thinking agentic coding model:

REASONING_MODE=off
REASONING_FORMAT=none
REASONING_BUDGET=0
TEMPERATURE=0.7
TOP_P=0.8
TOP_K=20
PRESENCE_PENALTY=1.5

Point a Pi provider at the OpenAI-compatible llama-server endpoint and use the alias above as the model ID.

Build The Required llama.cpp

git clone https://github.com/ciru-ai/ROCmFPX.git
cd ROCmFPX
git checkout 7aa484a2f0a504dc612a3d74a068024f3e6d6353
env JOBS=16 scripts/build-strix-rocmfp4-mtp.sh llama-server llama-bench

The server binary will be here:

build-strix-rocmfp4/bin/llama-server

Files

File Size SHA256
CHADROCK3.6-27B-Pi-Agent-MTP-ROCmFP4-STRIX_LEAN.gguf 14 GB 15f98039eb7da09f743ae6b2f1545ef7059a89af84813d9d510eb8d10e699f68

Credits

  • Qwen: Qwen/Qwen3.6-27B base model family.
  • bytkim: Qwen3.6-27B-MTP-pi-tune-GGUF, the direct upstream Pi-tuned GGUF source.
  • charlie12345 / @Italianclownz: ROCmFP4 llama.cpp fork, Strix Halo build path, and AMD-focused MTP runtime work.

Notes

  • This is a text-generation release. The direct upstream Pi tune can be paired with compatible Qwen3.6 vision sidecars, but this Chadrock profile is configured text-only.
  • The benchmark table intentionally uses the official 148-task BigCodeBench row and does not promote the adjusted local subset.
  • The recommended ROCm settings above are the current local winner; harder ROCm compile/runtime variants were tested and rejected because they did not beat the promoted high-context profile.
  • The Pi Agent tuning goal is agent-loop behavior in no-thinking mode, not maximum offline benchmark score.
Downloads last month
677
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp

Base model

Qwen/Qwen3.6-27B
Quantized
(1)
this model