Instructions to use jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp",
	filename="CHADROCK3.6-27B-Pi-Agent-MTP-ROCmFP4-STRIX_LEAN.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp
# Run inference directly in the terminal:
llama cli -hf jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp
# Run inference directly in the terminal:
llama cli -hf jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp
# Run inference directly in the terminal:
./llama-cli -hf jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp
# Run inference directly in the terminal:
./build/bin/llama-cli -hf jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp

Use Docker

docker model run hf.co/jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp

LM Studio
Jan

vLLM

How to use jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp

Ollama
How to use jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp with Ollama:
```
ollama run hf.co/jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp
```

Unsloth Studio

How to use jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp to start chatting

How to use jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp with Docker Model Runner:
```
docker model run hf.co/jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp
```

Lemonade

How to use jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp

Run and chat with the model

lemonade run user.chadrock3.6-27b-pi-agent-rocmfp4-mtp-{{QUANT_TAG}}

List all available models

lemonade list

CHADROCK3.6 27B Pi Agent ROCmFP4 MTP

CHADROCK3.6 27B Pi Agent is a Chadrock ROCmFP4/MTP GGUF release of bytkim/Qwen3.6-27B-MTP-pi-tune-GGUF, tuned for no-thinking local coding-agent loops and served here in Charlie's AMD-focused ROCmFP4 Strix Lean runtime format.

This release is meant for Pi-style terminal agents, repository edits, shell-verifier loops, and direct no-thinking coding workflows on AMD Ryzen AI Max+ 395 / Strix Halo systems. It keeps the upstream Pi tune's non-thinking sampling profile and MTP speculative decoding posture, then converts the model into a compact 14 GB Chadrock ROCmFP4 GGUF for the local Strix runtime.

This GGUF will not run correctly with stock llama.cpp. It needs the pinned ciru-ai/ROCmFPX runner because the file uses ROCmFP4 tensor types that upstream llama.cpp does not currently understand.

Why This Build Exists

The upstream Pi tune is built around the fast no-thinking path: act directly, emit tool-shaped work, and avoid spending the agent loop's wall time on hidden scratchpad tokens. Chadrock adds the AMD runtime piece:

ROCmFP4 Strix Lean tensor recipe
native draft-MTP serving
AMD ROCm/HIP and Vulkan-oriented local runtime path
q8 target and draft KV profile
one-slot agent serving
128K context profile for local coding sessions

Treat this as a model/runtime pairing for Strix Halo rather than a generic GGUF quant.

Model Lineage

Qwen/Qwen3.6-27B
  -> bytkim/Qwen3.6-27B-MTP-pi-tune-GGUF
       adds:
         - 4-bit QLoRA SFT Pi-style agent trajectories
         - no-thinking coding-agent behavior
         - MTP speculative decoding heads
  -> jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp
       adds:
         - Chadrock ROCmFP4 Strix Lean conversion
         - local AMD Strix profile and run settings

The upstream card describes the source as a 4-bit QLoRA SFT Multi-Token Prediction tune of Qwen3.6-27B for no-thinking agentic coding through a Pi-style harness. This release keeps that behavioral target and changes the local runtime format.

Technical Metadata

Field	Value
model size	`27B` dense
architecture	`qwen35`
local runtime format	ROCmFP4 Chadrock GGUF
direct upstream/source GGUF	`bytkim/Qwen3.6-27B-MTP-pi-tune-GGUF`
upstream revision used locally	`22943ba3319569b743d4cb3832e3aa336785d8b8`
local profile	`chadrock3.6-27b-pi-agent-rocmfp4`
context target	`131072` tokens
max generation profile	`65536` tokens
draft mode	`draft-mtp`, `n_max=3`, `p_split=0.10`
device profile	`ROCm0`, split mode `none`
target KV cache	`q8_0`
draft KV cache	`q8_0`
batch / ubatch	`2048 / 512`
intended hardware	AMD Ryzen AI Max+ 395 / Strix Halo

Local Coding Benchmarks

All numbers below were measured locally on AMD Ryzen AI Max+ 395 / Strix Halo. The Pi Agent rows use the local profile chadrock3.6-27b-pi-agent-rocmfp4.

Official Scored Rows

Benchmark	Run	Result
EvalPlus HumanEval base	`20260617T164929Z-quick-coding-test-small`	`157/164 = 95.73%`
EvalPlus HumanEval+	`20260617T164929Z-quick-coding-test-small`	`151/164 = 92.07%`
BigCodeBench Hard Instruct	`20260618T031413Z-full-coding-benchmark-large`	`40/148 = 27.03% pass@1`

For BigCodeBench, the official model score is the original 148-task row. A later 146-task adjusted subset exists in the local lab for debugging ground-truth issues, but it is not the score used here.

The stored Chadrock 27B Coder quality rows are from a different agent/model lineage, so they are not used as a quality baseline for this Pi Agent release. Add a quality before/after table only when there is an official stored EvalPlus/BigCodeBench run for the matching upstream/source Pi Agent under the same benchmark protocol.

Speed Before / After And Current Profile

The speed comparison below is against the direct upstream/source Pi Agent GGUF, bytkim/Qwen3.6-27B-MTP-pi-tune-GGUF, not a different Chadrock Coder model. These are llama-server API rows from June 17, 2026 on the same Strix Halo host. The current best Chadrock serving profile was selected in a follow-up ROCm tuning pass on June 18, 2026.

The current measured local winner for this Pi Agent release is:

ROCm0, split none, q8/q8 target and draft KV, batch 2048, ubatch 512,
draft-mtp n-max 3, p-min 0, p-split 0.10, one slot, reasoning off

Against that upstream/source Pi Agent, the Chadrock ROCmFP4 build improved the published long-prompt row by 1.30x decode speed while also lowering TTFP:

Workload	Upstream Pi Agent `Q4_K_M`	Chadrock Pi Agent ROCmFP4	Chadrock delta
short prompt, card/default KV, 512 generated tokens	`28.20 tok/s`, `1111 ms` TTFP	`27.03 tok/s`, `874 ms` TTFP	`0.96x` decode, `21%` lower TTFP
short prompt, `f16` KV, 512 generated tokens	`25.40 tok/s`, `922 ms` TTFP	`26.69 tok/s`, `854 ms` TTFP	`1.05x` decode, `7%` lower TTFP
short prompt, `q8` KV + MTP `n=4`, 512 generated tokens	`24.78 tok/s`, `945 ms` TTFP	`26.11 tok/s`, `851 ms` TTFP	`1.05x` decode, `10%` lower TTFP
long prompt, card/default KV, 128 generated tokens	`18.87 tok/s`, `53.72 s` TTFP	`24.52 tok/s`, `46.11 s` TTFP	`1.30x` decode, `14%` lower TTFP

After that upstream/source comparison, the current ROCm tuning pass selected the best Chadrock serving profile. On the same 512-token smoke prompt, moving the Chadrock profile from the older Vulkan device path to ROCm improved generation speed and latency:

Device path	Decode	TTFP	Prompt throughput	Draft accepted
`Vulkan0`	`26.94 tok/s`	`900 ms`	`162.62 tok/s`	`332/536`
`ROCm0`	`27.70 tok/s`	`672 ms`	`218.11 tok/s`	`339/514`

That ROCm tuning step is a 1.03x decode-speed gain over the Chadrock Vulkan launch path, about 25% lower TTFP, and about 34% higher prompt throughput on the smoke run. It is a local serving-setting improvement on top of the Chadrock-vs-upstream gains above.

For a stricter apples-to-apples high-context check, the same promoted ROCm profile was rerun against the upstream/source Pi Agent on the same harder ~39K-token prompt with two uncached 512-token generations, same ROCm0 backend, same q8/q8 target and draft KV, same b2048/u512, and same MTP n=3 settings:

Model and setting	Mean decode	Mean TTFP	Mean prompt throughput	Draft accepted
upstream/source Pi Agent `Q4_K_M`, ROCm0 q8/q8 b2048 n3	`16.32 tok/s`	`179.04 s`	`218.05 tok/s`	`711/923`
Chadrock Pi Agent ROCmFP4, promoted ROCm profile	`20.60 tok/s`	`164.13 s`	`237.87 tok/s`	`688/1000`

On that matched high-context run, Chadrock is 1.26x faster on decode, has about 8% lower TTFP, and has about 9% higher prompt throughput than the upstream/source Pi Agent.

The same high-context gate also tested harder ROCm variants against the promoted Chadrock profile:

Setting	Mean decode	Mean TTFP	Mean prompt throughput	Draft accepted
promoted ROCm profile	`20.60 tok/s`	`164.13 s`	`237.87 tok/s`	`688/1000`
`HSA_ENABLE_SDMA=0`	`20.54 tok/s`	`166.14 s`	`234.99 tok/s`	`688/1000`
`FATTN_V_NTHREADS=4` build	`20.58 tok/s`	`165.84 s`	`235.40 tok/s`	`688/1000`
`FATTN_KQ_NTHREADS=2` build	`20.48 tok/s`	`164.93 s`	`236.72 tok/s`	`688/1000`

Those harder ROCm variants did not beat the promoted profile, so they are not used in the recommended command below.

During the official BigCodeBench quality run, the Chadrock server metrics recorded:

Metric	Value
peak prompt throughput while active	`254.069 tok/s`
peak decode throughput while active	`34.8247 tok/s`
last active prompt throughput	`220.612 tok/s`
last active decode throughput	`30.3895 tok/s`

These are local benchmark-server measurements, not universal llama.cpp claims. Throughput depends on driver version, clocks, prompt shape, KV cache settings, and MTP acceptance.

Best Settings / Advanced Setup

For the pinned runner build, copy-paste build commands, request-level speculative controls, and the 35B/27B reproduction notes, use the advanced Ciru setup page:

https://llm.ciru.ai/chadrock-rocmfpx/

The current pinned runner build is:

ciru-ai/ROCmFPX commit: 7aa484a2f0a504dc612a3d74a068024f3e6d6353
historical score tag: chadrock-rocmfp4-mtp-scores-20260621

For this Pi Agent release, keep the current local winner:

backend: ROCm0 target + ROCm0 draft
context: 131072
batch / ubatch: 2048 / 512
target KV: q8_0 / q8_0
draft KV: q8_0 / q8_0
MTP: draft-mtp, n_max=3, n_min=0, p_min=0.0, p_split=0.10
serving: one slot, metrics on, no-thinking mode
sampler: temperature=0.7, top_p=0.8, top_k=20, presence_penalty=1.5
reasoning: off, reasoning_format=none, reasoning_budget=0

This is the profile selected by the June 18 ROCm tuning pass. It is the best fit for Pi-style repository agents because it keeps TTFP low, uses q8 KV for the target and draft paths, and preserves the no-thinking behavior target.

Run With llama-server

Build Charlie's custom llama.cpp once, download the GGUF, and run:

HSA_OVERRIDE_GFX_VERSION=11.5.1 \
GGML_HIP_ENABLE_UNIFIED_MEMORY=1 \
/path/to/rocmfp4-llama/build-strix-rocmfp4/bin/llama-server \
  -m CHADROCK3.6-27B-Pi-Agent-MTP-ROCmFP4-STRIX_LEAN.gguf \
  --alias chadrock3.6-27b-pi-agent-rocmfp4 \
  --host 127.0.0.1 \
  --port 8080 \
  --jinja \
  -c 131072 \
  -ngl 999 \
  -fa on \
  -dev ROCm0 \
  -sm none \
  -b 2048 \
  -ub 512 \
  -t 16 \
  -tb 32 \
  --cache-type-k q8_0 \
  --cache-type-v q8_0 \
  --ctx-checkpoints 0 \

  --checkpoint-every-n-tokens -1 \

  --spec-type draft-mtp \
  --spec-draft-device ROCm0 \
  --spec-draft-ngl all \
  --spec-draft-type-k q8_0 \
  --spec-draft-type-v q8_0 \
  --spec-draft-n-max 3 \
  --spec-draft-n-min 0 \
  --spec-draft-p-min 0.0 \
  --spec-draft-p-split 0.10 \
  --parallel 1 \
  --temp 0.7 \
  --top-p 0.8 \
  --top-k 20 \
  --min-p 0 \
  --presence-penalty 1.5 \
  --seed 123 \
  --metrics

Use --parallel 1 for this MTP profile. Multi-slot serving changes draft-MTP behavior and is not the intended configuration.

Pi Agent Profile

The local Pi-facing model ID is:

chadrock3.6-27b-pi-agent-rocmfp4

The profile is configured as a no-thinking agentic coding model:

REASONING_MODE=off
REASONING_FORMAT=none
REASONING_BUDGET=0
TEMPERATURE=0.7
TOP_P=0.8
TOP_K=20
PRESENCE_PENALTY=1.5

Point a Pi provider at the OpenAI-compatible llama-server endpoint and use the alias above as the model ID.

Build The Required llama.cpp

git clone https://github.com/ciru-ai/ROCmFPX.git
cd ROCmFPX
git checkout 7aa484a2f0a504dc612a3d74a068024f3e6d6353
env JOBS=16 scripts/build-strix-rocmfp4-mtp.sh llama-server llama-bench

The server binary will be here:

build-strix-rocmfp4/bin/llama-server

Files

File	Size	SHA256
`CHADROCK3.6-27B-Pi-Agent-MTP-ROCmFP4-STRIX_LEAN.gguf`	`14 GB`	`15f98039eb7da09f743ae6b2f1545ef7059a89af84813d9d510eb8d10e699f68`

Credits

Qwen: Qwen/Qwen3.6-27B base model family.
bytkim: Qwen3.6-27B-MTP-pi-tune-GGUF, the direct upstream Pi-tuned GGUF source.
charlie12345 / @Italianclownz: ROCmFP4 llama.cpp fork, Strix Halo build path, and AMD-focused MTP runtime work.

Notes

This is a text-generation release. The direct upstream Pi tune can be paired with compatible Qwen3.6 vision sidecars, but this Chadrock profile is configured text-only.
The benchmark table intentionally uses the official 148-task BigCodeBench row and does not promote the adjusted local subset.
The recommended ROCm settings above are the current local winner; harder ROCm compile/runtime variants were tested and rejected because they did not beat the promoted high-context profile.
The Pi Agent tuning goal is agent-loop behavior in no-thinking mode, not maximum offline benchmark score.

Downloads last month: 677

GGUF

Model size

27B params

Architecture

qwen35

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Model tree for jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp

Base model

Qwen/Qwen3.6-27B

Finetuned

bytkim/Qwen3.6-27B-MTP-pi-tune-GGUF

Quantized

(1)

this model