How to use from
Hermes Agent
Start the llama.cpp server
# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf AaryanK/GLM-4.7-Flash-GGUF:
Configure Hermes
# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default AaryanK/GLM-4.7-Flash-GGUF:
Run Hermes
hermes
Quick Links

GLM-4.7-Flash-GGUF

Description

This repository contains GGUF format model files for Zhipu AI's GLM-4.7-Flash.

GLM-4.7-Flash is a highly efficient 30B-A3B Mixture-of-Experts (MoE) model. It is designed to be the strongest model in the 30B parameter class, offering a powerful option for lightweight deployment that perfectly balances performance and efficiency.

Evaluation Results

Benchmark GLM-4.7-Flash Qwen3-30B-A3B-Thinking-2507 GPT-OSS-20B
AIME 25 91.6 85.0 91.7
GPQA 75.2 73.4 71.5
LCB v6 64.0 66.0 61.0
HLE 14.4 9.8 10.9
SWE-bench Verified 59.2 22.0 34.0
ฯ„ยฒ-Bench 79.5 49.0 47.7
BrowseComp 42.8 2.29 28.3

Files & Quantization

To see the available files, please verify the Files and versions tab.

How to Run (llama.cpp)

Recommended Parameters:

  • Temperature: 1.0 (Standard) or 0.7 (For stricter adherence)
  • Top-P: 0.95
  • Context: -c (Adjust based on available RAM).

CLI Example

./llama-cli -m GLM-4.7-Flash.Q4_K_M.gguf \
  -c 8192 \
  --temp 1.0 \
  --top-p 0.95 \
  -p "User: Write a Python script to calculate Fibonacci numbers.\nAssistant:" \
  -cnv

Server Example

./llama-server -m GLM-4.7-Flash.Q4_K_M.gguf \
  --port 8080 \
  --host 0.0.0.0 \
  -c 16384 \
  -ngl 99
Downloads last month
248
GGUF
Model size
30B params
Architecture
deepseek2
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AaryanK/GLM-4.7-Flash-GGUF

Quantized
(83)
this model