Instructions to use maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF", filename="Ornith-1.0-9b-ROCmFPX-STRIX_LEAN.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF # Run inference directly in the terminal: llama cli -hf maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF # Run inference directly in the terminal: llama cli -hf maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF # Run inference directly in the terminal: ./llama-cli -hf maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF # Run inference directly in the terminal: ./build/bin/llama-cli -hf maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF
Use Docker
docker model run hf.co/maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF
- LM Studio
- Jan
- Ollama
How to use maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF with Ollama:
ollama run hf.co/maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF
- Unsloth Studio
How to use maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF to start chatting
- Pi
How to use maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF
Run Hermes
hermes
- Atomic Chat new
- OpenClaw new
How to use maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF with OpenClaw:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF
Configure OpenClaw
# Install OpenClaw: npm install -g openclaw@latest # Register the local server and set it as the default model: openclaw onboard --non-interactive --mode local \ --auth-choice custom-api-key \ --custom-base-url http://127.0.0.1:8080/v1 \ --custom-model-id "maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF" \ --custom-provider-id llama-cpp \ --custom-compatibility openai \ --custom-text-input \ --accept-risk \ --skip-health
Run OpenClaw
openclaw agent --local --agent main --message "Hello from Hugging Face"
- Docker Model Runner
How to use maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF with Docker Model Runner:
docker model run hf.co/maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF
- Lemonade
How to use maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF
Run and chat with the model
lemonade run user.Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF-{{QUANT_TAG}}List all available models
lemonade list
Configure the model in Pi
# Install Pi:
npm install -g @mariozechner/pi-coding-agent# Add to ~/.pi/agent/models.json:
{
"providers": {
"llama-cpp": {
"baseUrl": "http://localhost:8080/v1",
"api": "openai-completions",
"apiKey": "none",
"models": [
{
"id": "maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF"
}
]
}
}
}Run Pi
# Start Pi in your project directory:
piOrnith-1.0-9B ROCmFPX STRIX_LEAN — GGUF
ROCmFPX Q4_0_ROCMFP4_STRIX_LEAN quant of deepreinforce-ai/Ornith-1.0-9B (qwen35 hybrid SSM+attention, 8.95 B params, 262 144 native ctx).
Built with charlie12345/ROCmFPX on a Radeon RX 9060 XT 16 GB (gfx1200), ROCm 7.2.3, NixOS 25.11. Quantized 2026-06-27 with build commit 11d76c2.
| File | Size | Quant | BPW |
|---|---|---|---|
Ornith-1.0-9b-ROCmFPX-STRIX_LEAN.gguf |
4.72 GB | Q4_0_ROCMFP4_STRIX_LEAN (4-bit ROCmFP4 + Strix K/V + Q5_K embed) |
4.42 |
This is not a stock llama.cpp quant; you need a ROCmFPX build of llama-server / llama-cli / llama-quantize to load it. The ROCmFP4 weight format is unknown to stock llama.cpp and will fail with unknown quantization.
Scope of these benchmarks — read this first
These numbers are a light baseline, not a thorough ROCmFPX evaluation. The mesh's bench framework is built for production agent workload regression-detection on the local stack, not for the kind of multi-axis sweep that upstream quant maintainers typically publish. Specifically:
- Harness scope is bounded. The numbers below come from
llama-benchctx sweeps + the mesh'smesh_eval(6 tests, 4 deterministic + throughput) +hermes_loop_eval(5 agent scenarios). That's a regression suite, not a quality benchmark — it answers "does this quant still serve the mesh's agent stack correctly," not "is this the best possible 4-bit ROCmFP4 quant of this model." - Sample sizes are small. Throughput numbers are 3 reps on a single GPU; correctness is 5 prompts × 16 tokens; agent loop is 5 scenarios with one-shot generation. None of these are powered for statistical significance on a per-token level.
- No perplexity / wikitext / MMLU / GSM8K. The mesh's stack isn't a quality benchmark — those are upstream ROCmFPX's territory. If you need a quality signal, charlie's own validation ladder or an
lm-eval-harnessrun is the right tool. - Single GPU class. All measurements are on a 16 GB RDNA4 (RX 9060 XT, gfx1200). No Strix unified-memory, no CDNA, no multi-GPU, no Vulkan, no CUDA. Cross-hardware generalization is not implied.
- No human eval. Quality is "byte-identical on factual / short deterministic outputs, divergent on high-entropy creative generation" — which is expected for any 4-bit quant, not a quality verdict on this one specifically.
What this IS good for: a quick signal that the quant (a) loads, (b) runs at sane throughput, (c) doesn't break the mesh's agent tool-calling, (d) scales predictably with context. What this is NOT good for: claiming "this is the best quant of this model," reproducing academic benchmark results, or substituting for upstream's validation work.
If you want the rigorous version, charlie's own ROCmFPX brief + the model's stock GGUF variants (e.g. bartowski/deepreinforce-ai_Ornith-1.0-9B-GGUF) are the place to look.
What we measured
Hardware: Node B, AMD Ryzen 9 5900XT 16-core, Radeon RX 9060 XT 16 GB (gfx1200), ROCm 7.2.3, NixOS 25.11
Software: charlie12345/ROCmFPX main @ 11d76c2
Source GGUF: ornith-1.0-9b-bf16.gguf (BF16, 17.9 GB) from deepreinforce-ai/Ornith-1.0-9B
Same-class baseline: stock llama.cpp b9608 Q4_K_M quantized from the same BF16 source
Throughput vs stock Q4_K_M
llama-bench 3 reps, q8_0 KV, fa=on, ngpu-layers=99. Raw JSON: BENCH-strix-lean-ctx-sweep.json and BENCH-stock-Q4_K_M-ctx-sweep.json.
| Ctx | STRIX_LEAN pp (t/s) | Q4_K_M pp (t/s) | Δ pp | STRIX_LEAN tg (t/s) | Q4_K_M tg (t/s) | Δ tg |
|---|---|---|---|---|---|---|
| 4 K | 1903 | 1607 | +18 % | 48.0 | 46.2 | +4 % |
| 8 K | 1756 | 1513 | +16 % | 48.0 | 46.2 | +4 % |
| 16 K | 1531 | 1341 | +14 % | 48.0 | 46.2 | +4 % |
| 32 K | 1215 | 1093 | +11 % | 48.1 | 46.2 | +4 % |
| 64 K | 862 | 798 | +8 % | 48.1 | 46.2 | +4 % |
Findings (small-sample, 3 reps — see the scope caveat above):
- STRIX_LEAN beats stock Q4_K_M at every ctx tested on prompt processing.
- Decode throughput is ~48 t/s across 4 K → 64 K ctx — high-context scaling claim is consistent with the flat-line observation, but 3 reps is too small to claim a tight bound.
- Prompt-processing edge narrows as ctx grows (KV cache dominates at 64 K).
- The 4 % decode delta is within the noise of 3-rep llama-bench on a 16 GB card; the more interesting signal is the +18 % pp at 4 K and the 10 % smaller file.
KV cache type sweep (added v0.5.134, head_dim=128)
131 K ctx, fa=on, kv-unified, -np 1:
| KV type | VRAM | gen t/s |
|---|---|---|
| q8_0 (baseline) | 8.7 GB | 46.3 |
| turbo4 (winner) | 7.6 GB | 46.6 |
| turbo3 | 7.4 GB | 44.3 |
| q4_0_rocmfp4 | 7.7 GB | 42.6 |
| q4_0_rocmfp4_fast | 7.6 GB | 41.9 |
turbo4 is the production default for any head_dim=128 model in the ROCmFPX build. -1.1 GB VRAM, same speed. The turbo3/4 KV types are TheTom's turboquant, absorbed into ROCmFPX main via PlunderStruck commits d859c9e + d0141e8.
Correctness — STRIX_LEAN vs stock Q4_K_M (5 prompts × 16 tokens, top-10 logprobs, sequential A/B)
| Prompt | Argmax match | Mean KL | Text |
|---|---|---|---|
| Capital of France | 16 / 16 (100 %) | 0.35 | identical |
| Fibonacci code | 5 / 13 (38 %) | 11.6 | divergent |
| Story opener | 1 / 16 (6 %) | 16.4 | divergent |
| 15 × 37 math | 16 / 16 (100 %) | 0.11 | identical |
| SQL injection | 14 / 16 (87 %) | 0.21 | near-identical |
| TOTAL | 67.5 % | 5.51 weighted | — |
Byte-identical on factual / short deterministic outputs (KL < 0.4, argmax 87-100 %), high divergence on open-ended creative generation (KL 11-16, argmax 6-38 %). Divergence correlates with prompt entropy — where multiple tokens are near-equal, greedy argmax flips and KL amplifies. This is expected for any 4-bit quant.
Agent / loop validation (raw JSONs included)
mesh_eval.py 4 deterministic tests + throughput (raw-mesh-eval-ornith-strix-lean.json):
| Test | Result |
|---|---|
gibberish (no degenerate repetition) |
OK |
thinking_leak (no <think> leakage) |
CLEAN |
tool_calling (single tool call, valid args) |
PASS — get_weather(location=Tokyo) |
coding (merge_sorted_lists, runs + passes test) |
PASS |
uncensored (no refusal on security-tools question) |
PASS |
throughput (3×256-token gen, gen t/s mean) |
47.1 t/s (±0.1) |
overall_status |
PASS, 4/4 |
hermes_loop_eval.py 5 scenarios (raw-hermes-loop-ornith-strix-lean.json):
| Scenario | Result |
|---|---|
single (one tool call) |
PASS — final answer correct |
chained (calc → use result) |
PASS — 15 × 37 = 555 |
multi_step (compare 2 cities) |
PASS — table + conclusion |
search (web search + extract) |
PASS — Eiffel Tower height |
error_recovery (file not found) |
PARTIAL — model says the file doesn't exist (factually correct) but the test's strict final_answer_correct: false flagged it |
overall_status |
PARTIAL, 4/5 |
The 4/5 loop is the error_recovery scenario's strict-match failure, not a quant defect. The model behaved correctly.
Quick start
# Build llama.cpp with ROCmFPX (the ROCmFPX-fork supports Q4_0_ROCMFP4_STRIX_LEAN weight type)
git clone https://github.com/charlie12345/ROCmFPX
cd ROCmFPX
cmake -S . -B build -DGGML_HIP=ON -DGGML_VULKAN=OFF -DGGML_CUDA=OFF \
-DCMAKE_HIP_ARCHITECTURES=gfx1200 ...
cmake --build build --target llama-server llama-cli llama-quantize
# Serve (131 072 ctx, turbo4 KV for head_dim=128, fa=on)
./build/bin/llama-server \
-m Ornith-1.0-9b-ROCmFPX-STRIX_LEAN.gguf \
-np 1 -c 131072 \
-ctk turbo4 -ctv turbo4 \
-kvo -cram 32768 -fa on
Reproduce the quant
# Source (we used the BF16 gguf; any BF16/F16 gguf of the same parent works)
SRC=/mnt/e/llms-models-data/ornith/ornith-1.0-9b-bf16.gguf
# ROCmFPX llama-quantize (preset is built in; see `llama-quantize --help`)
~/ROCmFPX/build-rdna4/bin/llama-quantize \
$SRC \
Ornith-1.0-9b-ROCmFPX-STRIX_LEAN.gguf \
Q4_0_ROCMFP4_STRIX_LEAN
Quantize time: ~6 min cold, <2 min warm-cache. CPU-only, no GPU required.
Files in this repo
| File | What it is |
|---|---|
Ornith-1.0-9b-ROCmFPX-STRIX_LEAN.gguf |
The quant. Load only with a ROCmFPX llama-server. |
README.md |
This file |
raw-mesh-eval-ornith-strix-lean.json |
mesh_eval.py output (2026-06-27 19:51 UTC) |
raw-hermes-loop-ornith-strix-lean.json |
hermes_loop_eval.py output (2026-06-27 19:52 UTC) |
BENCH-strix-lean-ctx-sweep.json |
llama-bench ctx sweep (3 reps, 4 K → 64 K) |
BENCH-stock-Q4_K_M-ctx-sweep.json |
Same sweep on the stock baseline |
BENCH-kv-type-sweep.txt |
KV cache type comparison (q8_0, turbo3, turbo4, q4_0_rocmfp4, q4_0_rocmfp4_fast) |
quant-command.sh |
The exact llama-quantize invocation used |
What's NOT in this repo (caveats)
- Stock llama.cpp will not load this file. The ROCmFP4 weight format is unique to charlie12345/ROCmFPX. Use that fork's
llama-server/llama-cli/llama-quantize. - No CUDA / non-AMD GPU bench. All measurements are RDNA4 (gfx1200). Vulkan path on RDNA4 has a known upstream regression (charlie12345/rocmfp4-llama issue #6) — we did not test it.
- system_fingerprint will be
b1-11d76c2when served by the ROCmFPX build (verified on prior bench runs in this corpus). If you see a different fingerprint, the wrong binary loaded the file. - No multi-GPU / tensor-parallel bench. 9 B params at 4.7 GB fits comfortably on a single 16 GB card; no need to split.
- No MTP / speculative-decode bench on this file. Ornith 1.0 9B does not ship with MTP draft heads.
- No vision/multimodal test. Ornith 1.0 9B is text-only; the
mesh_evalvision test was skipped (HTTP 500 = expected for this model class).
Provenance
- Source model:
deepreinforce-ai/Ornith-1.0-9B— qwen35 hybrid SSM+attention, 8.95 B params, native ctx 262 144 - Source model license: MIT (
https://huggingface.co/deepreinforce-ai/Ornith-1.0-9B/blob/main/LICENSE) - Quantizer: charlie12345/ROCmFPX
main@11d76c2(2026-06-27) - Quantizer license: MIT
- Build hardware: Node B, AMD Ryzen 9 5900XT 16-core, Radeon RX 9060 XT 16 GB (gfx1200), ROCm 7.2.3, NixOS 25.11
- Build tooling: NixOS 25.11, ROCm store paths dynamic-discovered. See the
meshinarepo'sreferences/nixos-rocm-external-build-recipe.mdfor the build env setup. - Bench harnesses:
scripts/mesh-bench/mesh_eval.py+scripts/mesh-bench/hermes_loop_eval.pyfrom the meshina repo - Original bench report:
raw/benchmarks/2026-06-27-rocmfpx-validation/briefs/2026-06-27-ornith-rocmfpx-validation.mdin the meshina repo
License
- The Ornith 1.0 9B parent model is MIT (per its HF model card).
- The
charlie12345/ROCmFPXquantizer is MIT. - The GGUF in this repo is a derivative of the MIT-licensed parent, produced with the MIT-licensed quantizer. The MIT license is preserved.
- Downloads last month
- 124
We're not able to determine the quantization variants.
Model tree for maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF
Base model
deepreinforce-ai/Ornith-1.0-9B
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp# Start a local OpenAI-compatible server: llama serve -hf maczzzzzz/Ornith-1.0-9b-ROCmFPX-STRIX_LEAN-GGUF