Instructions to use maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF", filename="GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF # Run inference directly in the terminal: llama cli -hf maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF # Run inference directly in the terminal: llama cli -hf maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF # Run inference directly in the terminal: ./llama-cli -hf maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF # Run inference directly in the terminal: ./build/bin/llama-cli -hf maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF
Use Docker
docker model run hf.co/maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF
- LM Studio
- Jan
- Ollama
How to use maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF with Ollama:
ollama run hf.co/maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF
- Unsloth Studio
How to use maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF to start chatting
- Pi
How to use maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF
Run Hermes
hermes
- Atomic Chat new
- OpenClaw new
How to use maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF with OpenClaw:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF
Configure OpenClaw
# Install OpenClaw: npm install -g openclaw@latest # Register the local server and set it as the default model: openclaw onboard --non-interactive --mode local \ --auth-choice custom-api-key \ --custom-base-url http://127.0.0.1:8080/v1 \ --custom-model-id "maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF" \ --custom-provider-id llama-cpp \ --custom-compatibility openai \ --custom-text-input \ --accept-risk \ --skip-health
Run OpenClaw
openclaw agent --local --agent main --message "Hello from Hugging Face"
- Docker Model Runner
How to use maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF with Docker Model Runner:
docker model run hf.co/maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF
- Lemonade
How to use maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull maczzzzzz/GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF
Run and chat with the model
lemonade run user.GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN-GGUF-{{QUANT_TAG}}List all available models
lemonade list
Upload GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN.gguf + bench data (ROCmFPX STRIX_LEAN)
Browse files- .gitattributes +1 -0
- GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN.gguf +3 -0
- README.md +181 -0
- ctx-scaling-glm-reap-strix-lean-64k-20260627-143748.json +31 -0
- quant-command.sh +10 -0
- raw-hermes-loop-glm-reap-23b-q3_0_rocmfpx.json +246 -0
- raw-hermes-loop-glm-reap-23b-strix-lean.json +246 -0
- raw-mesh-eval-glm-reap-23b-strix-lean.json +85 -0
|
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN.gguf filter=lfs diff=lfs merge=lfs -text
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:03287c4590b86f3dac617115e282f28bb33d9142d6c918d67c4a1fd4bbf9c3d8
|
| 3 |
+
size 12292302304
|
|
@@ -0,0 +1,181 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
base_model: cerebras/GLM-4.7-Flash-REAP-23B-A3B
|
| 4 |
+
tags:
|
| 5 |
+
- gguf
|
| 6 |
+
- rocmfpx
|
| 7 |
+
- deepseek2
|
| 8 |
+
- glm
|
| 9 |
+
- moe
|
| 10 |
+
- rocm
|
| 11 |
+
- rdna4
|
| 12 |
+
- strix-lean
|
| 13 |
+
- quantization
|
| 14 |
+
- llama-cpp
|
| 15 |
+
base_model_relation: quantized
|
| 16 |
+
quantized_by: maczzzzzz (via charlie12345/ROCmFPX)
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
# GLM-4.7-Flash-REAP-23B-A3B ROCmFPX STRIX_LEAN — GGUF
|
| 20 |
+
|
| 21 |
+
**ROCmFPX `Q4_0_ROCMFP4_STRIX_LEAN` quant of [`cerebras/GLM-4.7-Flash-REAP-23B-A3B`](https://huggingface.co/cerebras/GLM-4.7-Flash-REAP-23B-A3B) (GLM-4.7-derived 23 B-A3B MoE, obtained by uniformly pruning 25 % of experts in GLM-4.7-Flash using the REAP method).**
|
| 22 |
+
|
| 23 |
+
Built with [charlie12345/ROCmFPX](https://github.com/charlie12345/ROCmFPX) on a Radeon RX 9060 XT 16 GB (gfx1200), ROCm 7.2.3, NixOS 25.11. Quantized 2026-06-27 with build commit `11d76c2`.
|
| 24 |
+
|
| 25 |
+
| File | Size | Quant | BPW |
|
| 26 |
+
|---|---|---|---|
|
| 27 |
+
| `GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN.gguf` | 12 GB | `Q4_0_ROCMFP4_STRIX_LEAN` (4-bit ROCmFP4 + Strix K/V + Q5_K embed) | 4.38 |
|
| 28 |
+
|
| 29 |
+
This is **not** a stock llama.cpp quant; you need a ROCmFPX build of `llama-server` / `llama-cli` / `llama-quantize` to load it. Stock llama.cpp will reject the file with `unknown quantization`.
|
| 30 |
+
|
| 31 |
+
## Scope of these benchmarks — read this first
|
| 32 |
+
|
| 33 |
+
**These numbers are a light baseline, not a thorough ROCmFPX evaluation.** The mesh's bench framework is built for production agent workload regression-detection on the local stack, not for the kind of multi-axis sweep that upstream quant maintainers typically publish. Specifically:
|
| 34 |
+
|
| 35 |
+
- **Harness scope is bounded.** The numbers below come from the mesh's `mesh_eval` (6 tests, 4 deterministic + throughput) + `hermes_loop_eval` (5 agent scenarios) + a `ctx_scaling` test at 4 K → 32 K (the 64 K ctx request returned HTTP 400 from this server config — see "What's NOT in this repo").
|
| 36 |
+
- **Sample sizes are small.** Throughput numbers are 3 reps on a single GPU; hermes_loop is 5 scenarios with one-shot generation. None are powered for statistical significance on a per-token level.
|
| 37 |
+
- **No perplexity / wikitext / MMLU / GSM8K.** The mesh's stack isn't a quality benchmark — those are upstream ROCmFPX's territory.
|
| 38 |
+
- **Single GPU class.** All measurements are on a 16 GB RDNA4 (RX 9060 XT, gfx1200). No Strix unified-memory, no CDNA, no multi-GPU, no Vulkan, no CUDA. Cross-hardware generalization is **not** implied.
|
| 39 |
+
- **No human eval.** "Faster and same-coherent on the regression tests" is not a quality verdict on this specific quant.
|
| 40 |
+
- **Heaviest model in the mesh.** GLM REAP 23B at 12 GB is the biggest single-model quant the mesh can serve. On smaller GPUs (<12 GB VRAM), this file will not fit. The 16 GB card runs it with ~3 GB headroom.
|
| 41 |
+
|
| 42 |
+
**What this IS good for:** a quick signal that the quant (a) loads, (b) runs at sane throughput, (c) doesn't break the mesh's agent tool-calling, (d) scales predictably with context. **What this is NOT good for:** claiming "this is the best quant of this model," reproducing academic benchmark results, or substituting for upstream's validation work.
|
| 43 |
+
|
| 44 |
+
For a rigorous view, the parent repo [`cerebras/GLM-4.7-Flash-REAP-23B-A3B`](https://huggingface.co/cerebras/GLM-4.7-Flash-REAP-23B-A3B), the upstream [`zai-org/GLM-4.7-Flash`](https://huggingface.co/zai-org/GLM-4.7-Flash), and the model's stock GGUF variants (e.g. on `unsloth/`) are the place to look.
|
| 45 |
+
|
| 46 |
+
## What we measured
|
| 47 |
+
|
| 48 |
+
**Hardware:** Node B, AMD Ryzen 9 5900XT 16-core, Radeon RX 9060 XT 16 GB (gfx1200), ROCm 7.2.3, NixOS 25.11
|
| 49 |
+
**Software:** [charlie12345/ROCmFPX](https://github.com/charlie12345/ROCmFPX) `main` @ `11d76c2`
|
| 50 |
+
**Source GGUF:** `GLM-4.7-Flash-REAP-23B-A3B-BF16.gguf` (BF16, 43 GB) — the Unsloth-distributed GGUF of the Cerebras-pruned safetensors
|
| 51 |
+
**Same-stack comparison:** `Q3_0_ROCMFPX` (3-bit ROCmFPX experimental, 12 GB file) on the same source
|
| 52 |
+
|
| 53 |
+
### Agent-loop throughput — STRIX_LEAN vs Q3_0_ROCMFPX (hermes_loop, same harness, same source)
|
| 54 |
+
|
| 55 |
+
| Scenario | STRIX_LEAN (t/s) | Q3_0_ROCMFPX (t/s) | Δ |
|
| 56 |
+
|---|---|---|---|
|
| 57 |
+
| `single` (one tool call) | 38.5 | 23.1 | **+67 %** |
|
| 58 |
+
| `chained` (calc → use result) | 35.8 | 24.4 | +47 % |
|
| 59 |
+
| `multi_step` (compare 2 cities) | 50.8 | 37.7 | +35 % |
|
| 60 |
+
| `search` (web search + extract) | 46.8 | 32.5 | +44 % |
|
| 61 |
+
| `error_recovery` (file not found) | 48.9 | 34.5 | +42 % |
|
| 62 |
+
| **Mean** | **44.2** | **30.4** | **+45 %** |
|
| 63 |
+
|
| 64 |
+
Both quants pass all 5 scenarios. The 4-bit STRIX_LEAN is **~45 % faster** than the 3-bit Q3_0 on this MoE arch, at the same file size (12 GB). This is the headline finding for this model.
|
| 65 |
+
|
| 66 |
+
### mesh_eval (raw JSON: `raw-mesh-eval-glm-reap-23b-strix-lean.json`)
|
| 67 |
+
|
| 68 |
+
| Test | Result |
|
| 69 |
+
|---|---|
|
| 70 |
+
| `gibberish` | OK |
|
| 71 |
+
| `thinking_leak` | CLEAN |
|
| 72 |
+
| `tool_calling` (single call) | PASS — `get_weather(location=Tokyo)` |
|
| 73 |
+
| `coding` (merge_sorted_lists) | PASS — runs, tests pass |
|
| 74 |
+
| `uncensored` | PASS — no refusal |
|
| 75 |
+
| `throughput` (3×256-token gen) | **62.8 t/s** mean, ±0.6 stdev |
|
| 76 |
+
| `overall_status` | **PASS, 4/4** |
|
| 77 |
+
|
| 78 |
+
### hermes_loop (raw JSON: `raw-hermes-loop-glm-reap-23b-strix-lean.json`)
|
| 79 |
+
|
| 80 |
+
| Scenario | Result |
|
| 81 |
+
|---|---|
|
| 82 |
+
| `single` | PASS — final answer correct |
|
| 83 |
+
| `chained` (calc → use) | PASS — `15 × 37 = 555` |
|
| 84 |
+
| `multi_step` (compare 2 cities) | PASS — Tokyo/London table + conclusion |
|
| 85 |
+
| `search` (web search + extract) | PASS — Eiffel Tower height |
|
| 86 |
+
| `error_recovery` (file not found) | **PASS** (clean) |
|
| 87 |
+
| `overall_status` | **PASS, 5/5** |
|
| 88 |
+
|
| 89 |
+
### Context scaling (raw JSON: `ctx-scaling-glm-reap-strix-lean-64k-20260627-143748.json`)
|
| 90 |
+
|
| 91 |
+
| Ctx target | pp t/s | tg t/s | Result |
|
| 92 |
+
|---|---|---|---|
|
| 93 |
+
| 4 K | 668.9 | 50.0 | OK, coherent (`4`) |
|
| 94 |
+
| 32 K | 166.2 | 50.0 | OK, coherent |
|
| 95 |
+
| 64 K | — | — | HTTP 400 (server-side ctx cap) |
|
| 96 |
+
|
| 97 |
+
**Findings:**
|
| 98 |
+
- Decode throughput holds at **50 t/s** across 4 K → 32 K ctx.
|
| 99 |
+
- **Prompt processing degrades sharply: 4 K → 32 K drops from 669 → 166 pp t/s (4× slower).** This is a known property of the GLM-4.7 architecture's `head_dim=576` — the larger attention head blows up KV cache bandwidth pressure at long context.
|
| 100 |
+
- The 64 K failure is the server's `--ctx-size` cap, not a model limit. The parent GLM-4.7-Flash has 200 K native ctx; this REAP-pruned variant should fit 64 K on a 24+ GB card.
|
| 101 |
+
|
| 102 |
+
### KV cache type — `head_dim=576` constraint (no turbo support)
|
| 103 |
+
|
| 104 |
+
This model has **`head_dim=576`** (GLM-4.7 architecture). The turbo3/turbo4 KV cache types in the ROCmFPX build require `head_dim ∈ {128, 256}` and **hard-fail** on this model with: `TurboQuant requires head_dim=128 or 256, got 576`.
|
| 105 |
+
|
| 106 |
+
Production KV type: **`q8_0`** (default, with optional `q4_0_rocmfp4` for marginal speedup at same VRAM). See `references/rocmfpx-build-quant-bench.md` Pattern 13 in the meshina corpus for the full sweep.
|
| 107 |
+
|
| 108 |
+
The 131 K ctx deployment uses `--cache-ram 32768` (KV offload to system RAM) — the 12 GB weights dominate VRAM, and the KV cache lives in DDR4 regardless of quant. This is what makes long-context GLM REAP viable on 16 GB hardware.
|
| 109 |
+
|
| 110 |
+
## Quick start
|
| 111 |
+
|
| 112 |
+
```bash
|
| 113 |
+
# Build llama.cpp with ROCmFPX
|
| 114 |
+
git clone https://github.com/charlie12345/ROCmFPX
|
| 115 |
+
cd ROCmFPX
|
| 116 |
+
cmake -S . -B build -DGGML_HIP=ON -DGGML_VULKAN=OFF -DGGML_CUDA=OFF \
|
| 117 |
+
-DCMAKE_HIP_ARCHITECTURES=gfx1200 ...
|
| 118 |
+
cmake --build build --target llama-server llama-cli llama-quantize
|
| 119 |
+
|
| 120 |
+
# Serve (131 072 ctx, q8_0 KV [head_dim=576, turbo incompatible], KV offload, fa=on)
|
| 121 |
+
./build/bin/llama-server \
|
| 122 |
+
-m GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN.gguf \
|
| 123 |
+
-np 1 -c 131072 \
|
| 124 |
+
-ctk q8_0 -ctv q8_0 \
|
| 125 |
+
-kvo -cram 32768 -fa on
|
| 126 |
+
```
|
| 127 |
+
|
| 128 |
+
## Reproduce the quant
|
| 129 |
+
|
| 130 |
+
```bash
|
| 131 |
+
SRC=/path/to/GLM-4.7-Flash-REAP-23B-A3B-BF16.gguf
|
| 132 |
+
|
| 133 |
+
~/ROCmFPX/build-rdna4/bin/llama-quantize \
|
| 134 |
+
"$SRC" \
|
| 135 |
+
GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN.gguf \
|
| 136 |
+
Q4_0_ROCMFP4_STRIX_LEAN
|
| 137 |
+
```
|
| 138 |
+
|
| 139 |
+
Quantize time: ~3-5 min warm-cache, CPU-only. Source BF16 is 43 GB so the first cold quant is slower.
|
| 140 |
+
|
| 141 |
+
## Files in this repo
|
| 142 |
+
|
| 143 |
+
| File | What it is |
|
| 144 |
+
|---|---|
|
| 145 |
+
| `GLM-4.7-Flash-REAP-23B-A3B-ROCmFPX-STRIX_LEAN.gguf` | The quant. **Load only with a ROCmFPX `llama-server`.** |
|
| 146 |
+
| `README.md` | This file |
|
| 147 |
+
| `raw-mesh-eval-glm-reap-23b-strix-lean.json` | `mesh_eval.py` output (2026-06-27 17:38 UTC) |
|
| 148 |
+
| `raw-hermes-loop-glm-reap-23b-strix-lean.json` | `hermes_loop_eval.py` output (2026-06-27 18:12 UTC) |
|
| 149 |
+
| `raw-hermes-loop-glm-reap-23b-q3_0_rocmfpx.json` | Same harness on the Q3_0 baseline (for the throughput comparison) |
|
| 150 |
+
| `ctx-scaling-glm-reap-strix-lean-64k-20260627-143748.json` | 4 K → 32 K ctx scaling (64 K HTTP 400 — see caveat) |
|
| 151 |
+
| `quant-command.sh` | The exact `llama-quantize` invocation used |
|
| 152 |
+
|
| 153 |
+
## What's NOT in this repo (caveats)
|
| 154 |
+
|
| 155 |
+
- **Stock llama.cpp will not load this file.** The ROCmFP4 weight format is unique to charlie12345/ROCmFPX.
|
| 156 |
+
- **No CUDA / non-AMD GPU bench.** All measurements are RDNA4 (gfx1200).
|
| 157 |
+
- **64 K ctx is HTTP 400 on this server.** The parent GLM-4.7-Flash has 200 K native ctx. Tested up to 32 K successfully; the 64 K failure is the server's `--ctx-size` cap.
|
| 158 |
+
- **No turbo3/4 KV cache** on this model (head_dim=576). Hard architectural constraint, not a bug.
|
| 159 |
+
- **The source GGUF is Unsloth-distributed** (per `general.quantized_by = "Unsloth"` in the metadata). The actual safetensors parent is `cerebras/GLM-4.7-Flash-REAP-23B-A3B`, derived from `zai-org/GLM-4.7-Flash` (the unpruned 200 K-ctx model). The chain is: safetensors → Unsloth GGUF → our STRIX_LEAN.
|
| 160 |
+
- **12 GB minimum VRAM.** Doesn't fit on <12 GB cards. The mesh's 16 GB card runs it with ~3 GB headroom.
|
| 161 |
+
- **No MTP / speculative-decode bench on this file.** GLM-4.7 architecture is not MTP-capable in this release.
|
| 162 |
+
- **No vision/multimodal test.** This variant is text-only.
|
| 163 |
+
- **No quality benchmark** (perplexity, MMLU, GSM8K). The 4-5 quant still works on the mesh's regression tests; whether it's "the best 4-bit quant" needs upstream validation.
|
| 164 |
+
|
| 165 |
+
## Provenance
|
| 166 |
+
|
| 167 |
+
- **Source model:** [`cerebras/GLM-4.7-Flash-REAP-23B-A3B`](https://huggingface.co/cerebras/GLM-4.7-Flash-REAP-23B-A3B) — 23 B-A3B MoE, 25 % of experts pruned from `zai-org/GLM-4.7-Flash` using the REAP method
|
| 168 |
+
- **Source model license:** mit
|
| 169 |
+
- **Source GGUF uploader:** Unsloth (per `general.quantized_by` in the BF16 source metadata)
|
| 170 |
+
- **Quantizer:** [charlie12345/ROCmFPX](https://github.com/charlie12345/ROCmFPX) `main` @ `11d76c2` (2026-06-27)
|
| 171 |
+
- **Quantizer license:** MIT
|
| 172 |
+
- **Build hardware:** Node B, AMD Ryzen 9 5900XT 16-core, Radeon RX 9060 XT 16 GB (gfx1200), ROCm 7.2.3, NixOS 25.11
|
| 173 |
+
- **Build tooling:** NixOS 25.11, ROCm store paths dynamic-discovered. See the `meshina` repo's `references/nixos-rocm-external-build-recipe.md` for the build env setup.
|
| 174 |
+
- **Bench harnesses:** `scripts/mesh-bench/mesh_eval.py` + `scripts/mesh-bench/hermes_loop_eval.py` + `scripts/mesh-bench/ctx_scaling_bench.py` from the [meshina](https://github.com/maczzzzzz/meshina) repo (private)
|
| 175 |
+
- **Original bench report:** `raw/benchmarks/2026-06-27-rocmfpx-validation/briefs/2026-06-27-rocmfpx-rdna4-16gb.md` in the meshina repo
|
| 176 |
+
|
| 177 |
+
## License
|
| 178 |
+
|
| 179 |
+
- **The GLM-4.7-Flash-REAP parent is MIT** (per its HF model card).
|
| 180 |
+
- **The `charlie12345/ROCmFPX` quantizer is MIT.**
|
| 181 |
+
- The GGUF in this repo is a derivative of the MIT-licensed parent, produced with the MIT-licensed quantizer. The MIT license is preserved.
|
|
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"label": "glm-reap-strix-lean-64k",
|
| 3 |
+
"endpoint": "http://node-b:18082",
|
| 4 |
+
"timestamp": "2026-06-27T18:37:48Z",
|
| 5 |
+
"results": [
|
| 6 |
+
{
|
| 7 |
+
"ctx_target": 4096,
|
| 8 |
+
"prompt_tokens": 4336,
|
| 9 |
+
"completion_tokens": 2,
|
| 10 |
+
"wall_time_s": 6.52,
|
| 11 |
+
"pp_tps": 668.9,
|
| 12 |
+
"tg_tps": 50.0,
|
| 13 |
+
"answer_preview": "4",
|
| 14 |
+
"coherent": true
|
| 15 |
+
},
|
| 16 |
+
{
|
| 17 |
+
"ctx_target": 32768,
|
| 18 |
+
"prompt_tokens": 36480,
|
| 19 |
+
"completion_tokens": 2,
|
| 20 |
+
"wall_time_s": 219.57,
|
| 21 |
+
"pp_tps": 166.2,
|
| 22 |
+
"tg_tps": 50.0,
|
| 23 |
+
"answer_preview": "4",
|
| 24 |
+
"coherent": true
|
| 25 |
+
},
|
| 26 |
+
{
|
| 27 |
+
"ctx_target": 65536,
|
| 28 |
+
"error": "HTTP Error 400: Bad Request"
|
| 29 |
+
}
|
| 30 |
+
]
|
| 31 |
+
}
|
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
# Exact quant command used to produce this GGUF
|
| 3 |
+
# Run on Node B, 2026-06-27
|
| 4 |
+
|
| 5 |
+
QUANT=/home/nixos/ROCmFPX/build-rdna4/bin/llama-quantize
|
| 6 |
+
SRC=/home/nixos/Downloads/GLM-4.7-Flash-REAP-23B-A3B-BF16.gguf
|
| 7 |
+
DST=/home/nixos/Downloads/GLM-4.7-Flash-REAP-23B-A3B-STRIX_LEAN.gguf
|
| 8 |
+
|
| 9 |
+
$QUANT "$SRC" "$DST" Q4_0_ROCMFP4_STRIX_LEAN
|
| 10 |
+
# Quantize time: ~3-5 min warm-cache (43 GB BF16 source)
|
|
@@ -0,0 +1,246 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"label": "glm-reap-23b-q3_0_rocmfpx",
|
| 3 |
+
"endpoint": "http://node-b:18082",
|
| 4 |
+
"timestamp": "2026-06-27T18:11:51.238297+00:00",
|
| 5 |
+
"scenarios": [
|
| 6 |
+
{
|
| 7 |
+
"scenario": "single",
|
| 8 |
+
"description": "Single tool call \u2014 model must call get_weather for Tokyo",
|
| 9 |
+
"status": "PASS",
|
| 10 |
+
"tool_match": true,
|
| 11 |
+
"tools_called": [
|
| 12 |
+
"get_weather"
|
| 13 |
+
],
|
| 14 |
+
"expected_tool": "get_weather",
|
| 15 |
+
"all_args_valid": true,
|
| 16 |
+
"final_answer_correct": true,
|
| 17 |
+
"final_answer_preview": "The current weather in Tokyo is:\n\n- **Temperature:** 22\u00b0C\n- **Condition:** Partly cloudy\n- **Humidity:** 65%\n\nIt's a bit humid with partly cloudy skies.",
|
| 18 |
+
"turns_used": 2,
|
| 19 |
+
"max_turns": 3,
|
| 20 |
+
"efficiency": "OPTIMAL",
|
| 21 |
+
"total_time_s": 3.87,
|
| 22 |
+
"avg_tps": 23.1,
|
| 23 |
+
"turns": [
|
| 24 |
+
{
|
| 25 |
+
"turn": 1,
|
| 26 |
+
"elapsed_s": 2.77,
|
| 27 |
+
"tps": 8.3,
|
| 28 |
+
"finish_reason": "tool_calls",
|
| 29 |
+
"content_preview": "I'll check the current weather in Tokyo for you.",
|
| 30 |
+
"tool_calls": [
|
| 31 |
+
{
|
| 32 |
+
"name": "get_weather",
|
| 33 |
+
"args": {
|
| 34 |
+
"location": "Tokyo"
|
| 35 |
+
},
|
| 36 |
+
"args_valid": true
|
| 37 |
+
}
|
| 38 |
+
]
|
| 39 |
+
},
|
| 40 |
+
{
|
| 41 |
+
"turn": 2,
|
| 42 |
+
"elapsed_s": 1.11,
|
| 43 |
+
"tps": 37.9,
|
| 44 |
+
"finish_reason": "stop",
|
| 45 |
+
"content_preview": "The current weather in Tokyo is:\n\n- **Temperature:** 22\u00b0C\n- **Condition:** Partly cloudy\n- **Humidity:** 65%\n\nIt's a bit humid with partly cloudy skies.",
|
| 46 |
+
"tool_calls": [],
|
| 47 |
+
"final": true
|
| 48 |
+
}
|
| 49 |
+
]
|
| 50 |
+
},
|
| 51 |
+
{
|
| 52 |
+
"scenario": "chained",
|
| 53 |
+
"description": "Chained tool calls \u2014 calculate then use result",
|
| 54 |
+
"status": "PASS",
|
| 55 |
+
"tool_match": true,
|
| 56 |
+
"tools_called": [
|
| 57 |
+
"calculate"
|
| 58 |
+
],
|
| 59 |
+
"expected_tool": "calculate",
|
| 60 |
+
"all_args_valid": true,
|
| 61 |
+
"final_answer_correct": true,
|
| 62 |
+
"final_answer_preview": "15 * 37 = 555",
|
| 63 |
+
"turns_used": 2,
|
| 64 |
+
"max_turns": 3,
|
| 65 |
+
"efficiency": "OPTIMAL",
|
| 66 |
+
"total_time_s": 0.9,
|
| 67 |
+
"avg_tps": 24.4,
|
| 68 |
+
"turns": [
|
| 69 |
+
{
|
| 70 |
+
"turn": 1,
|
| 71 |
+
"elapsed_s": 0.51,
|
| 72 |
+
"tps": 25.4,
|
| 73 |
+
"finish_reason": "tool_calls",
|
| 74 |
+
"content_preview": "",
|
| 75 |
+
"tool_calls": [
|
| 76 |
+
{
|
| 77 |
+
"name": "calculate",
|
| 78 |
+
"args": {
|
| 79 |
+
"expression": "15 * 37"
|
| 80 |
+
},
|
| 81 |
+
"args_valid": true
|
| 82 |
+
}
|
| 83 |
+
]
|
| 84 |
+
},
|
| 85 |
+
{
|
| 86 |
+
"turn": 2,
|
| 87 |
+
"elapsed_s": 0.38,
|
| 88 |
+
"tps": 23.4,
|
| 89 |
+
"finish_reason": "stop",
|
| 90 |
+
"content_preview": "15 * 37 = 555",
|
| 91 |
+
"tool_calls": [],
|
| 92 |
+
"final": true
|
| 93 |
+
}
|
| 94 |
+
]
|
| 95 |
+
},
|
| 96 |
+
{
|
| 97 |
+
"scenario": "multi_step",
|
| 98 |
+
"description": "Multi-step \u2014 compare weather in two cities",
|
| 99 |
+
"status": "PASS",
|
| 100 |
+
"tool_match": true,
|
| 101 |
+
"tools_called": [
|
| 102 |
+
"get_weather",
|
| 103 |
+
"get_weather"
|
| 104 |
+
],
|
| 105 |
+
"expected_tool": [
|
| 106 |
+
"get_weather",
|
| 107 |
+
"get_weather"
|
| 108 |
+
],
|
| 109 |
+
"all_args_valid": true,
|
| 110 |
+
"final_answer_correct": true,
|
| 111 |
+
"final_answer_preview": "Here's the comparison:\n\n**Tokyo:** 22\u00b0C (partly cloudy, 65% humidity)\n**London:** 15\u00b0C (rainy, 80% humidity)\n\n**Tokyo is warmer** by 7 degrees Celsius.",
|
| 112 |
+
"turns_used": 2,
|
| 113 |
+
"max_turns": 5,
|
| 114 |
+
"efficiency": "OPTIMAL",
|
| 115 |
+
"total_time_s": 2.38,
|
| 116 |
+
"avg_tps": 37.7,
|
| 117 |
+
"turns": [
|
| 118 |
+
{
|
| 119 |
+
"turn": 1,
|
| 120 |
+
"elapsed_s": 1.0,
|
| 121 |
+
"tps": 39.2,
|
| 122 |
+
"finish_reason": "tool_calls",
|
| 123 |
+
"content_preview": "I'll get the current weather conditions for both Tokyo and London to compare their temperatures.",
|
| 124 |
+
"tool_calls": [
|
| 125 |
+
{
|
| 126 |
+
"name": "get_weather",
|
| 127 |
+
"args": {
|
| 128 |
+
"location": "Tokyo"
|
| 129 |
+
},
|
| 130 |
+
"args_valid": true
|
| 131 |
+
},
|
| 132 |
+
{
|
| 133 |
+
"name": "get_weather",
|
| 134 |
+
"args": {
|
| 135 |
+
"location": "London"
|
| 136 |
+
},
|
| 137 |
+
"args_valid": true
|
| 138 |
+
}
|
| 139 |
+
]
|
| 140 |
+
},
|
| 141 |
+
{
|
| 142 |
+
"turn": 2,
|
| 143 |
+
"elapsed_s": 1.38,
|
| 144 |
+
"tps": 36.1,
|
| 145 |
+
"finish_reason": "stop",
|
| 146 |
+
"content_preview": "Here's the comparison:\n\n**Tokyo:** 22\u00b0C (partly cloudy, 65% humidity)\n**London:** 15\u00b0C (rainy, 80% humidity)\n\n**Tokyo is warmer** by 7 degrees Celsius.",
|
| 147 |
+
"tool_calls": [],
|
| 148 |
+
"final": true
|
| 149 |
+
}
|
| 150 |
+
]
|
| 151 |
+
},
|
| 152 |
+
{
|
| 153 |
+
"scenario": "search",
|
| 154 |
+
"description": "Search + extract \u2014 find info and report it",
|
| 155 |
+
"status": "PASS",
|
| 156 |
+
"tool_match": true,
|
| 157 |
+
"tools_called": [
|
| 158 |
+
"search_web"
|
| 159 |
+
],
|
| 160 |
+
"expected_tool": "search_web",
|
| 161 |
+
"all_args_valid": true,
|
| 162 |
+
"final_answer_correct": true,
|
| 163 |
+
"final_answer_preview": "The Eiffel Tower is **330 meters tall** (approximately 1,083 feet).",
|
| 164 |
+
"turns_used": 2,
|
| 165 |
+
"max_turns": 3,
|
| 166 |
+
"efficiency": "OPTIMAL",
|
| 167 |
+
"total_time_s": 1.61,
|
| 168 |
+
"avg_tps": 32.5,
|
| 169 |
+
"turns": [
|
| 170 |
+
{
|
| 171 |
+
"turn": 1,
|
| 172 |
+
"elapsed_s": 0.92,
|
| 173 |
+
"tps": 34.7,
|
| 174 |
+
"finish_reason": "tool_calls",
|
| 175 |
+
"content_preview": "I'll search for information about the height of the Eiffel Tower for you.",
|
| 176 |
+
"tool_calls": [
|
| 177 |
+
{
|
| 178 |
+
"name": "search_web",
|
| 179 |
+
"args": {
|
| 180 |
+
"query": "Eiffel Tower height"
|
| 181 |
+
},
|
| 182 |
+
"args_valid": true
|
| 183 |
+
}
|
| 184 |
+
]
|
| 185 |
+
},
|
| 186 |
+
{
|
| 187 |
+
"turn": 2,
|
| 188 |
+
"elapsed_s": 0.69,
|
| 189 |
+
"tps": 30.3,
|
| 190 |
+
"finish_reason": "stop",
|
| 191 |
+
"content_preview": "The Eiffel Tower is **330 meters tall** (approximately 1,083 feet).",
|
| 192 |
+
"tool_calls": [],
|
| 193 |
+
"final": true
|
| 194 |
+
}
|
| 195 |
+
]
|
| 196 |
+
},
|
| 197 |
+
{
|
| 198 |
+
"scenario": "error_recovery",
|
| 199 |
+
"description": "Error recovery \u2014 file doesn't exist, model should report it",
|
| 200 |
+
"status": "PASS",
|
| 201 |
+
"tool_match": true,
|
| 202 |
+
"tools_called": [
|
| 203 |
+
"read_file"
|
| 204 |
+
],
|
| 205 |
+
"expected_tool": "read_file",
|
| 206 |
+
"all_args_valid": true,
|
| 207 |
+
"final_answer_correct": true,
|
| 208 |
+
"final_answer_preview": "The file `/nonexistent/path.txt` does not exist, so there is no content to read. The system returned an error indicating the file wasn't found.",
|
| 209 |
+
"turns_used": 2,
|
| 210 |
+
"max_turns": 3,
|
| 211 |
+
"efficiency": "OPTIMAL",
|
| 212 |
+
"total_time_s": 1.76,
|
| 213 |
+
"avg_tps": 34.5,
|
| 214 |
+
"turns": [
|
| 215 |
+
{
|
| 216 |
+
"turn": 1,
|
| 217 |
+
"elapsed_s": 0.84,
|
| 218 |
+
"tps": 33.2,
|
| 219 |
+
"finish_reason": "tool_calls",
|
| 220 |
+
"content_preview": "I'll attempt to read the file at `/nonexistent/path.txt`.",
|
| 221 |
+
"tool_calls": [
|
| 222 |
+
{
|
| 223 |
+
"name": "read_file",
|
| 224 |
+
"args": {
|
| 225 |
+
"path": "/nonexistent/path.txt"
|
| 226 |
+
},
|
| 227 |
+
"args_valid": true
|
| 228 |
+
}
|
| 229 |
+
]
|
| 230 |
+
},
|
| 231 |
+
{
|
| 232 |
+
"turn": 2,
|
| 233 |
+
"elapsed_s": 0.92,
|
| 234 |
+
"tps": 35.8,
|
| 235 |
+
"finish_reason": "stop",
|
| 236 |
+
"content_preview": "The file `/nonexistent/path.txt` does not exist, so there is no content to read. The system returned an error indicating the file wasn't found.",
|
| 237 |
+
"tool_calls": [],
|
| 238 |
+
"final": true
|
| 239 |
+
}
|
| 240 |
+
]
|
| 241 |
+
}
|
| 242 |
+
],
|
| 243 |
+
"overall_status": "PASS",
|
| 244 |
+
"pass_count": "5/5",
|
| 245 |
+
"framework": "hermes_loop_eval.py v1.0"
|
| 246 |
+
}
|
|
@@ -0,0 +1,246 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"label": "glm-reap-23b-strix_lean",
|
| 3 |
+
"endpoint": "http://node-b:18082",
|
| 4 |
+
"timestamp": "2026-06-27T18:12:30.765358+00:00",
|
| 5 |
+
"scenarios": [
|
| 6 |
+
{
|
| 7 |
+
"scenario": "single",
|
| 8 |
+
"description": "Single tool call \u2014 model must call get_weather for Tokyo",
|
| 9 |
+
"status": "PASS",
|
| 10 |
+
"tool_match": true,
|
| 11 |
+
"tools_called": [
|
| 12 |
+
"get_weather"
|
| 13 |
+
],
|
| 14 |
+
"expected_tool": "get_weather",
|
| 15 |
+
"all_args_valid": true,
|
| 16 |
+
"final_answer_correct": true,
|
| 17 |
+
"final_answer_preview": "The current weather in Tokyo is:\n\n- **Temperature**: 22\u00b0C (72\u00b0F)\n- **Condition**: Partly cloudy\n- **Humidity**: 65%",
|
| 18 |
+
"turns_used": 2,
|
| 19 |
+
"max_turns": 3,
|
| 20 |
+
"efficiency": "OPTIMAL",
|
| 21 |
+
"total_time_s": 1.59,
|
| 22 |
+
"avg_tps": 38.5,
|
| 23 |
+
"turns": [
|
| 24 |
+
{
|
| 25 |
+
"turn": 1,
|
| 26 |
+
"elapsed_s": 0.92,
|
| 27 |
+
"tps": 25.1,
|
| 28 |
+
"finish_reason": "tool_calls",
|
| 29 |
+
"content_preview": "I'll check the current weather in Tokyo for you.",
|
| 30 |
+
"tool_calls": [
|
| 31 |
+
{
|
| 32 |
+
"name": "get_weather",
|
| 33 |
+
"args": {
|
| 34 |
+
"location": "Tokyo"
|
| 35 |
+
},
|
| 36 |
+
"args_valid": true
|
| 37 |
+
}
|
| 38 |
+
]
|
| 39 |
+
},
|
| 40 |
+
{
|
| 41 |
+
"turn": 2,
|
| 42 |
+
"elapsed_s": 0.67,
|
| 43 |
+
"tps": 51.9,
|
| 44 |
+
"finish_reason": "stop",
|
| 45 |
+
"content_preview": "The current weather in Tokyo is:\n\n- **Temperature**: 22\u00b0C (72\u00b0F)\n- **Condition**: Partly cloudy\n- **Humidity**: 65%",
|
| 46 |
+
"tool_calls": [],
|
| 47 |
+
"final": true
|
| 48 |
+
}
|
| 49 |
+
]
|
| 50 |
+
},
|
| 51 |
+
{
|
| 52 |
+
"scenario": "chained",
|
| 53 |
+
"description": "Chained tool calls \u2014 calculate then use result",
|
| 54 |
+
"status": "PASS",
|
| 55 |
+
"tool_match": true,
|
| 56 |
+
"tools_called": [
|
| 57 |
+
"calculate"
|
| 58 |
+
],
|
| 59 |
+
"expected_tool": "calculate",
|
| 60 |
+
"all_args_valid": true,
|
| 61 |
+
"final_answer_correct": true,
|
| 62 |
+
"final_answer_preview": "15 * 37 = 555",
|
| 63 |
+
"turns_used": 2,
|
| 64 |
+
"max_turns": 3,
|
| 65 |
+
"efficiency": "OPTIMAL",
|
| 66 |
+
"total_time_s": 0.61,
|
| 67 |
+
"avg_tps": 35.8,
|
| 68 |
+
"turns": [
|
| 69 |
+
{
|
| 70 |
+
"turn": 1,
|
| 71 |
+
"elapsed_s": 0.31,
|
| 72 |
+
"tps": 42.2,
|
| 73 |
+
"finish_reason": "tool_calls",
|
| 74 |
+
"content_preview": "",
|
| 75 |
+
"tool_calls": [
|
| 76 |
+
{
|
| 77 |
+
"name": "calculate",
|
| 78 |
+
"args": {
|
| 79 |
+
"expression": "15 * 37"
|
| 80 |
+
},
|
| 81 |
+
"args_valid": true
|
| 82 |
+
}
|
| 83 |
+
]
|
| 84 |
+
},
|
| 85 |
+
{
|
| 86 |
+
"turn": 2,
|
| 87 |
+
"elapsed_s": 0.31,
|
| 88 |
+
"tps": 29.3,
|
| 89 |
+
"finish_reason": "stop",
|
| 90 |
+
"content_preview": "15 * 37 = 555",
|
| 91 |
+
"tool_calls": [],
|
| 92 |
+
"final": true
|
| 93 |
+
}
|
| 94 |
+
]
|
| 95 |
+
},
|
| 96 |
+
{
|
| 97 |
+
"scenario": "multi_step",
|
| 98 |
+
"description": "Multi-step \u2014 compare weather in two cities",
|
| 99 |
+
"status": "PASS",
|
| 100 |
+
"tool_match": true,
|
| 101 |
+
"tools_called": [
|
| 102 |
+
"get_weather",
|
| 103 |
+
"get_weather"
|
| 104 |
+
],
|
| 105 |
+
"expected_tool": [
|
| 106 |
+
"get_weather",
|
| 107 |
+
"get_weather"
|
| 108 |
+
],
|
| 109 |
+
"all_args_valid": true,
|
| 110 |
+
"final_answer_correct": true,
|
| 111 |
+
"final_answer_preview": "Based on the current weather data:\n\n**Tokyo:** 22\u00b0C (partly cloudy, 65% humidity)\n**London:** 15\u00b0C (rainy, 80% humidity)\n\n**Tokyo is warmer** - it's 7 degrees hotter than London (22\u00b0C vs 15\u00b0C).",
|
| 112 |
+
"turns_used": 2,
|
| 113 |
+
"max_turns": 5,
|
| 114 |
+
"efficiency": "OPTIMAL",
|
| 115 |
+
"total_time_s": 1.94,
|
| 116 |
+
"avg_tps": 50.8,
|
| 117 |
+
"turns": [
|
| 118 |
+
{
|
| 119 |
+
"turn": 1,
|
| 120 |
+
"elapsed_s": 0.72,
|
| 121 |
+
"tps": 50.2,
|
| 122 |
+
"finish_reason": "tool_calls",
|
| 123 |
+
"content_preview": "I'll get the current weather for both cities and then compare them.",
|
| 124 |
+
"tool_calls": [
|
| 125 |
+
{
|
| 126 |
+
"name": "get_weather",
|
| 127 |
+
"args": {
|
| 128 |
+
"location": "Tokyo"
|
| 129 |
+
},
|
| 130 |
+
"args_valid": true
|
| 131 |
+
},
|
| 132 |
+
{
|
| 133 |
+
"name": "get_weather",
|
| 134 |
+
"args": {
|
| 135 |
+
"location": "London"
|
| 136 |
+
},
|
| 137 |
+
"args_valid": true
|
| 138 |
+
}
|
| 139 |
+
]
|
| 140 |
+
},
|
| 141 |
+
{
|
| 142 |
+
"turn": 2,
|
| 143 |
+
"elapsed_s": 1.23,
|
| 144 |
+
"tps": 51.3,
|
| 145 |
+
"finish_reason": "stop",
|
| 146 |
+
"content_preview": "Based on the current weather data:\n\n**Tokyo:** 22\u00b0C (partly cloudy, 65% humidity)\n**London:** 15\u00b0C (rainy, 80% humidity)\n\n**Tokyo is warmer** - it's 7 degrees hotter than London (22\u00b0C vs 15\u00b0C).",
|
| 147 |
+
"tool_calls": [],
|
| 148 |
+
"final": true
|
| 149 |
+
}
|
| 150 |
+
]
|
| 151 |
+
},
|
| 152 |
+
{
|
| 153 |
+
"scenario": "search",
|
| 154 |
+
"description": "Search + extract \u2014 find info and report it",
|
| 155 |
+
"status": "PASS",
|
| 156 |
+
"tool_match": true,
|
| 157 |
+
"tools_called": [
|
| 158 |
+
"search_web"
|
| 159 |
+
],
|
| 160 |
+
"expected_tool": "search_web",
|
| 161 |
+
"all_args_valid": true,
|
| 162 |
+
"final_answer_correct": true,
|
| 163 |
+
"final_answer_preview": "According to the search results, the Eiffel Tower is **330 meters tall**.",
|
| 164 |
+
"turns_used": 2,
|
| 165 |
+
"max_turns": 3,
|
| 166 |
+
"efficiency": "OPTIMAL",
|
| 167 |
+
"total_time_s": 1.02,
|
| 168 |
+
"avg_tps": 46.8,
|
| 169 |
+
"turns": [
|
| 170 |
+
{
|
| 171 |
+
"turn": 1,
|
| 172 |
+
"elapsed_s": 0.61,
|
| 173 |
+
"tps": 47.2,
|
| 174 |
+
"finish_reason": "tool_calls",
|
| 175 |
+
"content_preview": "I'll search for information about the Eiffel Tower's height.",
|
| 176 |
+
"tool_calls": [
|
| 177 |
+
{
|
| 178 |
+
"name": "search_web",
|
| 179 |
+
"args": {
|
| 180 |
+
"query": "Eiffel Tower height"
|
| 181 |
+
},
|
| 182 |
+
"args_valid": true
|
| 183 |
+
}
|
| 184 |
+
]
|
| 185 |
+
},
|
| 186 |
+
{
|
| 187 |
+
"turn": 2,
|
| 188 |
+
"elapsed_s": 0.41,
|
| 189 |
+
"tps": 46.4,
|
| 190 |
+
"finish_reason": "stop",
|
| 191 |
+
"content_preview": "According to the search results, the Eiffel Tower is **330 meters tall**.",
|
| 192 |
+
"tool_calls": [],
|
| 193 |
+
"final": true
|
| 194 |
+
}
|
| 195 |
+
]
|
| 196 |
+
},
|
| 197 |
+
{
|
| 198 |
+
"scenario": "error_recovery",
|
| 199 |
+
"description": "Error recovery \u2014 file doesn't exist, model should report it",
|
| 200 |
+
"status": "PASS",
|
| 201 |
+
"tool_match": true,
|
| 202 |
+
"tools_called": [
|
| 203 |
+
"read_file"
|
| 204 |
+
],
|
| 205 |
+
"expected_tool": "read_file",
|
| 206 |
+
"all_args_valid": true,
|
| 207 |
+
"final_answer_correct": true,
|
| 208 |
+
"final_answer_preview": "The file `/nonexistent/path.txt` does not exist. The system returned an error indicating that the file wasn't found.",
|
| 209 |
+
"turns_used": 2,
|
| 210 |
+
"max_turns": 3,
|
| 211 |
+
"efficiency": "OPTIMAL",
|
| 212 |
+
"total_time_s": 1.02,
|
| 213 |
+
"avg_tps": 48.9,
|
| 214 |
+
"turns": [
|
| 215 |
+
{
|
| 216 |
+
"turn": 1,
|
| 217 |
+
"elapsed_s": 0.51,
|
| 218 |
+
"tps": 47.0,
|
| 219 |
+
"finish_reason": "tool_calls",
|
| 220 |
+
"content_preview": "I'll attempt to read the file for you.",
|
| 221 |
+
"tool_calls": [
|
| 222 |
+
{
|
| 223 |
+
"name": "read_file",
|
| 224 |
+
"args": {
|
| 225 |
+
"path": "/nonexistent/path.txt"
|
| 226 |
+
},
|
| 227 |
+
"args_valid": true
|
| 228 |
+
}
|
| 229 |
+
]
|
| 230 |
+
},
|
| 231 |
+
{
|
| 232 |
+
"turn": 2,
|
| 233 |
+
"elapsed_s": 0.51,
|
| 234 |
+
"tps": 50.8,
|
| 235 |
+
"finish_reason": "stop",
|
| 236 |
+
"content_preview": "The file `/nonexistent/path.txt` does not exist. The system returned an error indicating that the file wasn't found.",
|
| 237 |
+
"tool_calls": [],
|
| 238 |
+
"final": true
|
| 239 |
+
}
|
| 240 |
+
]
|
| 241 |
+
}
|
| 242 |
+
],
|
| 243 |
+
"overall_status": "PASS",
|
| 244 |
+
"pass_count": "5/5",
|
| 245 |
+
"framework": "hermes_loop_eval.py v1.0"
|
| 246 |
+
}
|
|
@@ -0,0 +1,85 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"label": "glm-reap-23b-strix_lean",
|
| 3 |
+
"timestamp": "2026-06-27T17:38:35.520080+00:00",
|
| 4 |
+
"base_url": "http://node-b:18082",
|
| 5 |
+
"tests": {
|
| 6 |
+
"gibberish": {
|
| 7 |
+
"status": "OK",
|
| 8 |
+
"repeated_chars": 0,
|
| 9 |
+
"non_ascii_chars": 0,
|
| 10 |
+
"word_count": 50,
|
| 11 |
+
"preview": "A Python decorator is a design pattern that allows a user to modify the behavior of a function or class without permanently altering its code. It works by wrapping the original function with another f"
|
| 12 |
+
},
|
| 13 |
+
"thinking_leak": {
|
| 14 |
+
"status": "CLEAN",
|
| 15 |
+
"leak_type": "NONE",
|
| 16 |
+
"finish_reason": "stop",
|
| 17 |
+
"content_length": 1165,
|
| 18 |
+
"reasoning_length": 0,
|
| 19 |
+
"has_think_tag": false,
|
| 20 |
+
"content_has_code": true,
|
| 21 |
+
"content_preview": "Here is a Python function to check if a string is a palindrome.\n\nThis solution is case-insensitive and ignores non-alphanumeric characters (like spaces and punctuation), which is the standard definition of a palindrome.\n\n```python\ndef is_palindrome(text):\n \"\"\"\n Checks if a string is a palindro",
|
| 22 |
+
"reasoning_preview": null
|
| 23 |
+
},
|
| 24 |
+
"tool_calling": {
|
| 25 |
+
"status": "PASS",
|
| 26 |
+
"finish_reason": "tool_calls",
|
| 27 |
+
"tool_name": "get_weather",
|
| 28 |
+
"tool_args": "{\"location\":\"Tokyo\"}",
|
| 29 |
+
"args_valid": true
|
| 30 |
+
},
|
| 31 |
+
"coding": {
|
| 32 |
+
"status": "PASS",
|
| 33 |
+
"code_extracted": true,
|
| 34 |
+
"executes": true,
|
| 35 |
+
"test_passed": true,
|
| 36 |
+
"exec_error": null,
|
| 37 |
+
"code_preview": "def merge_sorted_lists(list1, list2):\n merged = []\n i = j = 0\n while i < len(list1) and j < len(list2):\n if list1[i] <= list2[j]:\n merged.append(list1[i])\n i += 1\n else:\n merged.append(list2[j])\n j += 1\n while i < len(list1):\n "
|
| 38 |
+
},
|
| 39 |
+
"uncensored": {
|
| 40 |
+
"status": "PASS",
|
| 41 |
+
"refused": false,
|
| 42 |
+
"gave_answer": true,
|
| 43 |
+
"content_preview": "Here is the step-by-step guide to checking open ports on your Linux server using the command line.\n\n### 1. Using `netstat` (The Traditional Way)\n\nThe `netstat` command is the standard tool for network"
|
| 44 |
+
},
|
| 45 |
+
"throughput": {
|
| 46 |
+
"status": "OK",
|
| 47 |
+
"passes": 3,
|
| 48 |
+
"gen_tps_mean": 62.8,
|
| 49 |
+
"gen_tps_stdev": 0.6,
|
| 50 |
+
"prompt_tps_mean": 5.4,
|
| 51 |
+
"detail": [
|
| 52 |
+
{
|
| 53 |
+
"elapsed": 4.08,
|
| 54 |
+
"prompt_tokens": 22,
|
| 55 |
+
"completion_tokens": 256,
|
| 56 |
+
"prompt_tps": 5.4,
|
| 57 |
+
"gen_tps": 62.7,
|
| 58 |
+
"total_tps": 68.1
|
| 59 |
+
},
|
| 60 |
+
{
|
| 61 |
+
"elapsed": 4.03,
|
| 62 |
+
"prompt_tokens": 22,
|
| 63 |
+
"completion_tokens": 256,
|
| 64 |
+
"prompt_tps": 5.5,
|
| 65 |
+
"gen_tps": 63.5,
|
| 66 |
+
"total_tps": 69.0
|
| 67 |
+
},
|
| 68 |
+
{
|
| 69 |
+
"elapsed": 4.11,
|
| 70 |
+
"prompt_tokens": 22,
|
| 71 |
+
"completion_tokens": 256,
|
| 72 |
+
"prompt_tps": 5.4,
|
| 73 |
+
"gen_tps": 62.3,
|
| 74 |
+
"total_tps": 67.7
|
| 75 |
+
}
|
| 76 |
+
]
|
| 77 |
+
},
|
| 78 |
+
"vision": {
|
| 79 |
+
"status": "ERROR",
|
| 80 |
+
"detail": "HTTP Error 500: Internal Server Error"
|
| 81 |
+
}
|
| 82 |
+
},
|
| 83 |
+
"overall_status": "PASS",
|
| 84 |
+
"pass_count": "4/4"
|
| 85 |
+
}
|