Instructions to use jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp", filename="Qwen3.6-35B-A3B-NSC-ACE-SABER-MTP-F16-to-ROCmFP4-STRIX_LEAN.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp:F16 # Run inference directly in the terminal: llama cli -hf jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp:F16 # Run inference directly in the terminal: llama cli -hf jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp:F16 # Run inference directly in the terminal: ./llama-cli -hf jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp:F16
Use Docker
docker model run hf.co/jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp:F16
- LM Studio
- Jan
- vLLM
How to use jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp:F16
- Ollama
How to use jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp with Ollama:
ollama run hf.co/jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp:F16
- Unsloth Studio
How to use jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp to start chatting
- Pi
How to use jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp:F16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp:F16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp:F16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp:F16
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp with Docker Model Runner:
docker model run hf.co/jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp:F16
- Lemonade
How to use jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp:F16
Run and chat with the model
lemonade run user.chadrock-35b-ace-saber-rocmfp4-mtp-F16
List all available models
lemonade list
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
)New Chadrock v2 llama.cpp and config!
Chadrock-35B Ace Saber ROCmFP4 MTP
Chadrock-35B Ace Saber is a ROCmFP4/MTP GGUF for AMD Ryzen AI Max+ 395 / Strix Halo systems. This release also includes a tested Qwen3.6 vision projector, so the same full Chadrock language GGUF can run image-text prompts when launched with --mmproj.
The model behavior comes from the Ace Saber build by @DJLougen. The current speed numbers use the pinned Chadrock v2 ROCmFPX llama.cpp build from ciru-ai/ROCmFPX, with the request-level MTP controls described below.
This GGUF will not run correctly with stock llama.cpp. Use the pinned Chadrock v2 ROCmFPX runner because this file uses ROCmFP4 tensor types and MTP serving controls that upstream llama.cpp does not currently understand.
The model file is already provided here. You do not need to rebuild or quantize the model.
Why This Mix
Ace Saber gives the model its coding, agentic, and tool-use behavior. Chadrock/ROCmFP4 gives it the speed profile needed to feel good locally on AMD unified-memory hardware.
The goal is not just another Qwen3.6 quant. The goal is:
- Ace Saber behavior from @DJLougen
- Qwen3.6 35B-A3B MoE efficiency
- MTP speculative decoding
- ROCmFP4 tensor-aware quantization
- high-throughput local serving on Ryzen AI Max+ 395 / Strix Halo
Technical Metadata
Hugging Face may round the parsed GGUF tensor count to 36B in its automatic badge. This release is the Qwen3.6 35B-A3B MoE family: about 35B-class total parameters with roughly 3B active parameters per token.
| Field | Value |
|---|---|
| model size | 35B-A3B MoE |
| total parameters | 35B class |
| active parameters | ~3B class |
| architecture | qwen35moe |
| direct upstream GGUF | GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP |
| base family | Qwen/Qwen3.6-35B-A3B |
| local runtime format | ROCmFP4 Chadrock GGUF plus separate GGUF-format vision projector |
Vision Support
Vision is provided by mmproj-CHADROCK-35B-Ace-Saber-F32.mmproj, a GGUF-format Qwen3VL projector converted from the restored Qwen3.6 visual tower sidecar in the upstream Ace Saber release.
This does not replace the language model and does not disable MTP. The validated command used the full Chadrock ROCmFP4 GGUF with native --spec-type draft-mtp enabled and added the projector with --mmproj.
Local validation used two generated images whose answers were not present in the prompt:
| Image gate | Expected | Result |
|---|---|---|
gate_a.png |
CIRU-742, red square, blue circle |
passed |
gate_b.png |
HALO-319, orange triangle, purple star |
passed |
The same gate fails without a projector, so this is a real image-read check rather than a metadata-only claim.
Chadrock v2 Speed
Best current text-only profile on AMD Ryzen AI Max+ 395 / Strix Halo: pinned Chadrock v2 ROCmFPX llama.cpp, Vulkan0 target plus Vulkan0 draft, f16/f16 target and draft KV, one slot, prompt cache disabled, no multimodal projector, deterministic decoding, and request policy speculative.n_max=4, speculative.n_min=0, speculative.p_min=0.25.
| Measurement | Prompt tokens | Generated tokens | Decode tok/s | Prefill tok/s | Total time | Draft accepted |
|---|---|---|---|---|---|---|
| Chadrock v2 best, gen512 | 3,946 |
512 |
143.08 |
1072.34 |
7.26 s |
408 / 408 |
| Chadrock v2 repeat, gen2048 | 3,946 |
2048 |
141.77 |
1064.16 |
18.16 s |
1637 / 1637 |
| Same-run no-draft control, gen512 | 3,946 |
512 |
72.57 |
1064.49 |
10.77 s |
0 / 0 |
| Same-run no-draft control, gen2048 | 3,946 |
2048 |
72.04 |
1067.18 |
32.13 s |
0 / 0 |
Against the same-run no-draft control, the new Chadrock v2 MTP config is 1.97x faster in decode at both gen512 and gen2048: 143.08 vs 72.57 tok/s, and 141.77 vs 72.04 tok/s.
Compared with the older Chadrock v1 card-speed profile (~101.31 tok/s aggregate HumanEval eval speed), the new best served text row is about 1.41x faster (+41.2%). Compared with the older uncached Chadrock TG64 served row (78.28 tok/s server eval), the new gen2048 row is about 1.81x faster (+81.1%). Those older rows are not identical prompts, so treat them as release-to-release runner evidence rather than a strict apples-to-apples benchmark pair.
The older 2026-06-07 HumanEval rerun is still useful as a quality guard: it generated 48,824 completion tokens across 164 tasks at ~101.31 tok/s aggregate llama-server eval speed and produced the 155/164 base pass@1 and 148/164 HumanEval+ pass@1 result below.
HumanEval
This model also posts an exceptional HumanEval result for a local GGUF run:
| Model / row | HumanEval base pass@1 | HumanEval+ pass@1 |
|---|---|---|
| Chadrock-35B Ace Saber ROCmFP4, 32k Vulkan d2 rerun | 155/164 = 94.51% |
148/164 = 90.24% |
| earlier Chadrock-35B Ace Saber ROCmFP4 run | 157/164 = 95.73% |
149/164 = 90.85% |
| recorded stock Qwen3.6-27B UD-Q8_K_XL | 154/164 = 93.90% |
149/164 = 90.85% |
The fresh 32k rerun still beats the recorded stock 27B row on base HumanEval, while the older row remains one task higher on HumanEval+.
BigCodeBench-Hard
The same tuned Chadrock Vulkan d2 family was also run on BigCodeBench-Hard-Instruct:
| Benchmark | Result |
|---|---|
| BigCodeBench-Hard-Instruct pass@1 | 47/148 = 31.76% |
| generation wall time | 799 s |
| aggregate prompt speed | ~624.06 tok/s |
| aggregate generation speed | ~100.12 tok/s |
This is a harder instruction-coding benchmark than HumanEval and is included as a sanity check that the speed-tuned runtime still produces usable code under a broader task mix.
Best Settings / Advanced Setup
For the pinned runner build, copy-paste build commands, request-level speculative controls, and the 35B/27B reproduction notes, use the advanced Ciru setup page:
https://llm.ciru.ai/chadrock-rocmfpx/
The current pinned runner build is:
ciru-ai/ROCmFPX commit: 7aa484a2f0a504dc612a3d74a068024f3e6d6353
historical score tag: chadrock-rocmfp4-mtp-scores-20260621
For the fastest measured text-only ACE/SABER path on Strix Halo, use:
backend: Vulkan0 target + Vulkan0 draft
context: 32768
batch / ubatch: 2048 / 512
target KV: f16 / f16
draft KV: f16 / f16
MTP: draft-mtp, n_max=4, n_min=0, p_min=0.25, p_split=0.10
serving: one slot, prompt cache disabled for benchmarks, --no-mmproj for text speed
sampler: temperature=0, top_p=0.95, top_k=20
That is the profile that produced 143.08 tok/s at gen512 and repeated at
141.77 tok/s at gen2048 on the 3946-token text prompt. For image-text use,
remove --no-mmproj and add --mmproj mmproj-CHADROCK-35B-Ace-Saber-F32.mmproj;
the headline speed row above is text-only.
Run With llama-server
Build the pinned Chadrock v2 ROCmFPX llama.cpp once, download this GGUF, then run this text-speed profile from the ROCmFPX checkout:
./build-strix-rocmfp4/bin/llama-server \
-m /path/to/Qwen3.6-35B-A3B-NSC-ACE-SABER-MTP-F16-to-ROCmFP4-STRIX_LEAN.gguf \
--alias chadrock-35b-ace-saber-rocmfp4-cap4 \
--host 127.0.0.1 \
--port 18180 \
--jinja \
-c 32768 \
--reasoning off \
--reasoning-format none \
--reasoning-budget -1 \
--no-context-shift \
-dev Vulkan0 \
-ngl 999 \
-fa on \
-b 2048 \
-ub 512 \
-t 16 \
-tb 32 \
-ctk f16 \
-ctv f16 \
--temp 0 \
--top-p 0.95 \
--top-k 20 \
--seed 123 \
--parallel 1 \
--no-mmproj \
--metrics \
--no-webui \
--cache-ram 8192 \
--ctx-checkpoints 0 \
--checkpoint-every-n-tokens -1 \
--spec-type draft-mtp \
--spec-draft-device Vulkan0 \
--spec-draft-ngl all \
--spec-draft-threads 16 \
--spec-draft-threads-batch 32 \
--spec-draft-type-k f16 \
--spec-draft-type-v f16 \
--spec-draft-n-max 4 \
--spec-draft-n-min 0 \
--spec-draft-p-min 0.25 \
--spec-draft-p-split 0.10 \
--no-spec-draft-backend-sampling \
--spec-draft-poll 1 \
--spec-draft-poll-batch 1
Use --parallel 1 for MTP. Multi-slot serving changes the MTP behavior and is not the intended profile.
The benchmark table above used -c 32768 and --no-mmproj for the fastest text row. For image-text use, remove --no-mmproj and add the projector:
--mmproj /path/to/mmproj-CHADROCK-35B-Ace-Saber-F32.mmproj
For general local coding and agent work, you can raise context after validating memory headroom on your machine. The advanced page below keeps the copy-paste build and launch blocks current:
https://llm.ciru.ai/chadrock-rocmfpx/
About Ace Saber
The source checkpoint is GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER, based on Qwen/Qwen3.6-35B-A3B. The direct GGUF-MTP source is GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP.
Training order:
Qwen3.6-35B-A3B -> NSC-ACE -> SABER
NSC-ACE uses multiple steered rollouts from the same model and rewards convergence across latent behavior modes, especially for tool-call structure, reasoning wrappers, self-consistency, and avoiding repeated loops.
SABER is the final calibration pass. The source model card reports 98.33% HarmBench-300 compliance and final KLD 0.025383937664711.
About Chadrock / ROCmFP4
Chadrock v2 uses the pinned Ciru ROCmFPX llama.cpp branch carrying the ROCmFP4 tensor/runtime work and request-level MTP serving controls.
ROCmFP4 is not stock Q4, MXFP4, or NVFP4. It uses custom Codebook10 4-bit weights, finite unsigned E4M3 scale semantics, tensor-aware presets, ROCm/HIP kernels, Vulkan shader support, and MTP regression guards.
Why it matters: Ryzen AI Max+ 395 / Strix Halo has a large unified-memory pool, but decode speed still depends heavily on bandwidth, tensor layout, and draft-token acceptance. ROCmFP4 is designed to make this class of AMD machine fast enough for serious local long-context use.
Build The Required llama.cpp
The GGUF is already provided. You only need to build the custom llama.cpp server once:
git clone https://github.com/ciru-ai/ROCmFPX.git
cd ROCmFPX
git checkout 7aa484a2f0a504dc612a3d74a068024f3e6d6353
env JOBS=16 scripts/build-strix-rocmfp4-mtp.sh llama-server llama-bench
The server binary will be here:
build-strix-rocmfp4/bin/llama-server
Files
| File | SHA256 |
|---|---|
Qwen3.6-35B-A3B-NSC-ACE-SABER-MTP-F16-to-ROCmFP4-STRIX_LEAN.gguf |
6a635d1d8ac4af8f2c4ca6ff528bc6bad9b3a6d45e8630ef6e5728f04898eeed |
mmproj-CHADROCK-35B-Ace-Saber-F32.mmproj |
1365c90e3f35cad9c33e09e67ff377af083631c19718ec4c22d251a54c24c6a7 |
Credits
- @DJLougen: Ace Saber / NSC-ACE SABER model build.
- Ciru /
ciru-ai/ROCmFPX: pinned Chadrock v2 ROCmFPX llama.cpp runner, build path, request-level MTP controls, and public reproduction page. - charlie12345 / @Italianclownz: original ROCmFP4 direction and AMD-focused runtime work that informed the Chadrock path.
- Qwen: base
Qwen3.6-35B-A3Bmodel.
Notes
This is an experimental AMD ROCmFP4/MTP build. Performance depends on driver version, clocks, prompt shape, MTP acceptance, and serving flags. The numbers above are local reproducible measurements, not universal llama.cpp claims.
- Downloads last month
- 1,657
16-bit

# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="jcbtc/chadrock-35b-ace-saber-rocmfp4-mtp", filename="Qwen3.6-35B-A3B-NSC-ACE-SABER-MTP-F16-to-ROCmFP4-STRIX_LEAN.gguf", )