Instructions to use jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp", filename="CHADROCK3.6-27B-Pi-Agent-MTP-ROCmFP4-STRIX_LEAN.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp # Run inference directly in the terminal: llama cli -hf jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp # Run inference directly in the terminal: llama cli -hf jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp # Run inference directly in the terminal: ./llama-cli -hf jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp # Run inference directly in the terminal: ./build/bin/llama-cli -hf jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp
Use Docker
docker model run hf.co/jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp
- LM Studio
- Jan
- vLLM
How to use jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp
- Ollama
How to use jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp with Ollama:
ollama run hf.co/jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp
- Unsloth Studio
How to use jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp to start chatting
- Pi
How to use jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp with Docker Model Runner:
docker model run hf.co/jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp
- Lemonade
How to use jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp
Run and chat with the model
lemonade run user.chadrock3.6-27b-pi-agent-rocmfp4-mtp-{{QUANT_TAG}}List all available models
lemonade list
CHADROCK3.6 27B Pi Agent ROCmFP4 MTP
CHADROCK3.6 27B Pi Agent is a Chadrock ROCmFP4/MTP GGUF release of bytkim/Qwen3.6-27B-MTP-pi-tune-GGUF, tuned for no-thinking local coding-agent loops and served here in Charlie's AMD-focused ROCmFP4 Strix Lean runtime format.
This release is meant for Pi-style terminal agents, repository edits, shell-verifier loops, and direct no-thinking coding workflows on AMD Ryzen AI Max+ 395 / Strix Halo systems. It keeps the upstream Pi tune's non-thinking sampling profile and MTP speculative decoding posture, then converts the model into a compact 14 GB Chadrock ROCmFP4 GGUF for the local Strix runtime.
This GGUF will not run correctly with stock llama.cpp. It needs the pinned ciru-ai/ROCmFPX runner because the file uses ROCmFP4 tensor types that upstream llama.cpp does not currently understand.
Why This Build Exists
The upstream Pi tune is built around the fast no-thinking path: act directly, emit tool-shaped work, and avoid spending the agent loop's wall time on hidden scratchpad tokens. Chadrock adds the AMD runtime piece:
- ROCmFP4 Strix Lean tensor recipe
- native draft-MTP serving
- AMD ROCm/HIP and Vulkan-oriented local runtime path
- q8 target and draft KV profile
- one-slot agent serving
- 128K context profile for local coding sessions
Treat this as a model/runtime pairing for Strix Halo rather than a generic GGUF quant.
Model Lineage
Qwen/Qwen3.6-27B
-> bytkim/Qwen3.6-27B-MTP-pi-tune-GGUF
adds:
- 4-bit QLoRA SFT Pi-style agent trajectories
- no-thinking coding-agent behavior
- MTP speculative decoding heads
-> jcbtc/chadrock3.6-27b-pi-agent-rocmfp4-mtp
adds:
- Chadrock ROCmFP4 Strix Lean conversion
- local AMD Strix profile and run settings
The upstream card describes the source as a 4-bit QLoRA SFT Multi-Token Prediction tune of Qwen3.6-27B for no-thinking agentic coding through a Pi-style harness. This release keeps that behavioral target and changes the local runtime format.
Technical Metadata
| Field | Value |
|---|---|
| model size | 27B dense |
| architecture | qwen35 |
| local runtime format | ROCmFP4 Chadrock GGUF |
| direct upstream/source GGUF | bytkim/Qwen3.6-27B-MTP-pi-tune-GGUF |
| upstream revision used locally | 22943ba3319569b743d4cb3832e3aa336785d8b8 |
| local profile | chadrock3.6-27b-pi-agent-rocmfp4 |
| context target | 131072 tokens |
| max generation profile | 65536 tokens |
| draft mode | draft-mtp, n_max=3, p_split=0.10 |
| device profile | ROCm0, split mode none |
| target KV cache | q8_0 |
| draft KV cache | q8_0 |
| batch / ubatch | 2048 / 512 |
| intended hardware | AMD Ryzen AI Max+ 395 / Strix Halo |
Local Coding Benchmarks
All numbers below were measured locally on AMD Ryzen AI Max+ 395 / Strix Halo. The Pi Agent rows use the local profile chadrock3.6-27b-pi-agent-rocmfp4.
Official Scored Rows
| Benchmark | Run | Result |
|---|---|---|
| EvalPlus HumanEval base | 20260617T164929Z-quick-coding-test-small |
157/164 = 95.73% |
| EvalPlus HumanEval+ | 20260617T164929Z-quick-coding-test-small |
151/164 = 92.07% |
| BigCodeBench Hard Instruct | 20260618T031413Z-full-coding-benchmark-large |
40/148 = 27.03% pass@1 |
For BigCodeBench, the official model score is the original 148-task row. A later 146-task adjusted subset exists in the local lab for debugging ground-truth issues, but it is not the score used here.
The stored Chadrock 27B Coder quality rows are from a different agent/model lineage, so they are not used as a quality baseline for this Pi Agent release. Add a quality before/after table only when there is an official stored EvalPlus/BigCodeBench run for the matching upstream/source Pi Agent under the same benchmark protocol.
Speed Before / After And Current Profile
The speed comparison below is against the direct upstream/source Pi Agent GGUF,
bytkim/Qwen3.6-27B-MTP-pi-tune-GGUF, not a different Chadrock Coder model.
These are llama-server API rows from June 17, 2026 on the same Strix Halo
host. The current best Chadrock serving profile was selected in a follow-up
ROCm tuning pass on June 18, 2026.
The current measured local winner for this Pi Agent release is:
ROCm0, split none, q8/q8 target and draft KV, batch 2048, ubatch 512,
draft-mtp n-max 3, p-min 0, p-split 0.10, one slot, reasoning off
Against that upstream/source Pi Agent, the Chadrock ROCmFP4 build improved the
published long-prompt row by 1.30x decode speed while also lowering TTFP:
| Workload | Upstream Pi Agent Q4_K_M |
Chadrock Pi Agent ROCmFP4 | Chadrock delta |
|---|---|---|---|
| short prompt, card/default KV, 512 generated tokens | 28.20 tok/s, 1111 ms TTFP |
27.03 tok/s, 874 ms TTFP |
0.96x decode, 21% lower TTFP |
short prompt, f16 KV, 512 generated tokens |
25.40 tok/s, 922 ms TTFP |
26.69 tok/s, 854 ms TTFP |
1.05x decode, 7% lower TTFP |
short prompt, q8 KV + MTP n=4, 512 generated tokens |
24.78 tok/s, 945 ms TTFP |
26.11 tok/s, 851 ms TTFP |
1.05x decode, 10% lower TTFP |
| long prompt, card/default KV, 128 generated tokens | 18.87 tok/s, 53.72 s TTFP |
24.52 tok/s, 46.11 s TTFP |
1.30x decode, 14% lower TTFP |
After that upstream/source comparison, the current ROCm tuning pass selected the best Chadrock serving profile. On the same 512-token smoke prompt, moving the Chadrock profile from the older Vulkan device path to ROCm improved generation speed and latency:
| Device path | Decode | TTFP | Prompt throughput | Draft accepted |
|---|---|---|---|---|
Vulkan0 |
26.94 tok/s |
900 ms |
162.62 tok/s |
332/536 |
ROCm0 |
27.70 tok/s |
672 ms |
218.11 tok/s |
339/514 |
That ROCm tuning step is a 1.03x decode-speed gain over the Chadrock Vulkan
launch path, about 25% lower TTFP, and about 34% higher prompt throughput
on the smoke run. It is a local serving-setting improvement on top of the
Chadrock-vs-upstream gains above.
For a stricter apples-to-apples high-context check, the same promoted ROCm
profile was rerun against the upstream/source Pi Agent on the same harder
~39K-token prompt with two uncached 512-token generations, same ROCm0 backend,
same q8/q8 target and draft KV, same b2048/u512, and same MTP n=3
settings:
| Model and setting | Mean decode | Mean TTFP | Mean prompt throughput | Draft accepted |
|---|---|---|---|---|
upstream/source Pi Agent Q4_K_M, ROCm0 q8/q8 b2048 n3 |
16.32 tok/s |
179.04 s |
218.05 tok/s |
711/923 |
| Chadrock Pi Agent ROCmFP4, promoted ROCm profile | 20.60 tok/s |
164.13 s |
237.87 tok/s |
688/1000 |
On that matched high-context run, Chadrock is 1.26x faster on decode, has
about 8% lower TTFP, and has about 9% higher prompt throughput than the
upstream/source Pi Agent.
The same high-context gate also tested harder ROCm variants against the promoted Chadrock profile:
| Setting | Mean decode | Mean TTFP | Mean prompt throughput | Draft accepted |
|---|---|---|---|---|
| promoted ROCm profile | 20.60 tok/s |
164.13 s |
237.87 tok/s |
688/1000 |
HSA_ENABLE_SDMA=0 |
20.54 tok/s |
166.14 s |
234.99 tok/s |
688/1000 |
FATTN_V_NTHREADS=4 build |
20.58 tok/s |
165.84 s |
235.40 tok/s |
688/1000 |
FATTN_KQ_NTHREADS=2 build |
20.48 tok/s |
164.93 s |
236.72 tok/s |
688/1000 |
Those harder ROCm variants did not beat the promoted profile, so they are not used in the recommended command below.
During the official BigCodeBench quality run, the Chadrock server metrics recorded:
| Metric | Value |
|---|---|
| peak prompt throughput while active | 254.069 tok/s |
| peak decode throughput while active | 34.8247 tok/s |
| last active prompt throughput | 220.612 tok/s |
| last active decode throughput | 30.3895 tok/s |
These are local benchmark-server measurements, not universal llama.cpp claims. Throughput depends on driver version, clocks, prompt shape, KV cache settings, and MTP acceptance.
Best Settings / Advanced Setup
For the pinned runner build, copy-paste build commands, request-level speculative controls, and the 35B/27B reproduction notes, use the advanced Ciru setup page:
https://llm.ciru.ai/chadrock-rocmfpx/
The current pinned runner build is:
ciru-ai/ROCmFPX commit: 7aa484a2f0a504dc612a3d74a068024f3e6d6353
historical score tag: chadrock-rocmfp4-mtp-scores-20260621
For this Pi Agent release, keep the current local winner:
backend: ROCm0 target + ROCm0 draft
context: 131072
batch / ubatch: 2048 / 512
target KV: q8_0 / q8_0
draft KV: q8_0 / q8_0
MTP: draft-mtp, n_max=3, n_min=0, p_min=0.0, p_split=0.10
serving: one slot, metrics on, no-thinking mode
sampler: temperature=0.7, top_p=0.8, top_k=20, presence_penalty=1.5
reasoning: off, reasoning_format=none, reasoning_budget=0
This is the profile selected by the June 18 ROCm tuning pass. It is the best fit for Pi-style repository agents because it keeps TTFP low, uses q8 KV for the target and draft paths, and preserves the no-thinking behavior target.
Run With llama-server
Build Charlie's custom llama.cpp once, download the GGUF, and run:
HSA_OVERRIDE_GFX_VERSION=11.5.1 \
GGML_HIP_ENABLE_UNIFIED_MEMORY=1 \
/path/to/rocmfp4-llama/build-strix-rocmfp4/bin/llama-server \
-m CHADROCK3.6-27B-Pi-Agent-MTP-ROCmFP4-STRIX_LEAN.gguf \
--alias chadrock3.6-27b-pi-agent-rocmfp4 \
--host 127.0.0.1 \
--port 8080 \
--jinja \
-c 131072 \
-ngl 999 \
-fa on \
-dev ROCm0 \
-sm none \
-b 2048 \
-ub 512 \
-t 16 \
-tb 32 \
--cache-type-k q8_0 \
--cache-type-v q8_0 \
--ctx-checkpoints 0 \
--checkpoint-every-n-tokens -1 \
--spec-type draft-mtp \
--spec-draft-device ROCm0 \
--spec-draft-ngl all \
--spec-draft-type-k q8_0 \
--spec-draft-type-v q8_0 \
--spec-draft-n-max 3 \
--spec-draft-n-min 0 \
--spec-draft-p-min 0.0 \
--spec-draft-p-split 0.10 \
--parallel 1 \
--temp 0.7 \
--top-p 0.8 \
--top-k 20 \
--min-p 0 \
--presence-penalty 1.5 \
--seed 123 \
--metrics
Use --parallel 1 for this MTP profile. Multi-slot serving changes draft-MTP behavior and is not the intended configuration.
Pi Agent Profile
The local Pi-facing model ID is:
chadrock3.6-27b-pi-agent-rocmfp4
The profile is configured as a no-thinking agentic coding model:
REASONING_MODE=off
REASONING_FORMAT=none
REASONING_BUDGET=0
TEMPERATURE=0.7
TOP_P=0.8
TOP_K=20
PRESENCE_PENALTY=1.5
Point a Pi provider at the OpenAI-compatible llama-server endpoint and use the alias above as the model ID.
Build The Required llama.cpp
git clone https://github.com/ciru-ai/ROCmFPX.git
cd ROCmFPX
git checkout 7aa484a2f0a504dc612a3d74a068024f3e6d6353
env JOBS=16 scripts/build-strix-rocmfp4-mtp.sh llama-server llama-bench
The server binary will be here:
build-strix-rocmfp4/bin/llama-server
Files
| File | Size | SHA256 |
|---|---|---|
CHADROCK3.6-27B-Pi-Agent-MTP-ROCmFP4-STRIX_LEAN.gguf |
14 GB |
15f98039eb7da09f743ae6b2f1545ef7059a89af84813d9d510eb8d10e699f68 |
Credits
- Qwen:
Qwen/Qwen3.6-27Bbase model family. - bytkim:
Qwen3.6-27B-MTP-pi-tune-GGUF, the direct upstream Pi-tuned GGUF source. - charlie12345 / @Italianclownz: ROCmFP4 llama.cpp fork, Strix Halo build path, and AMD-focused MTP runtime work.
Notes
- This is a text-generation release. The direct upstream Pi tune can be paired with compatible Qwen3.6 vision sidecars, but this Chadrock profile is configured text-only.
- The benchmark table intentionally uses the official 148-task BigCodeBench row and does not promote the adjusted local subset.
- The recommended ROCm settings above are the current local winner; harder ROCm compile/runtime variants were tested and rejected because they did not beat the promoted high-context profile.
- The Pi Agent tuning goal is agent-loop behavior in no-thinking mode, not maximum offline benchmark score.
- Downloads last month
- 677
We're not able to determine the quantization variants.
