---
base_model: zai-org/GLM-4.7-Flash
base_model_relation: quantized
language:
- en
- zh
library_name: gguf
license: mit
pipeline_tag: text-generation
tags:
- text-generation-inference
- glm
- moe
- flash
- gguf
- glm4_moe_lite
---

# GLM-4.7-Flash-GGUF

<div align="center">
<img src="https://raw.githubusercontent.com/zai-org/GLM-4.5/refs/heads/main/resources/logo.svg" width="15%"/>
</div>

## Description

This repository contains **GGUF** format model files for [Zhipu AI's GLM-4.7-Flash](https://huggingface.co/zai-org/GLM-4.7-Flash).

**GLM-4.7-Flash** is a highly efficient **30B-A3B Mixture-of-Experts (MoE)** model. It is designed to be the strongest model in the 30B parameter class, offering a powerful option for lightweight deployment that perfectly balances performance and efficiency.

## Evaluation Results

| Benchmark          | GLM-4.7-Flash | Qwen3-30B-A3B-Thinking-2507 | GPT-OSS-20B |
|--------------------|---------------|-----------------------------|-------------|
| AIME 25            | 91.6          | 85.0                        | 91.7        |
| GPQA               | 75.2          | 73.4                        | 71.5        |
| LCB v6             | 64.0          | 66.0                        | 61.0        |
| HLE                | 14.4          | 9.8                         | 10.9        |
| SWE-bench Verified | 59.2          | 22.0                        | 34.0        |
| τ²-Bench           | 79.5          | 49.0                        | 47.7        |
| BrowseComp         | 42.8          | 2.29                        | 28.3        |

## Files & Quantization

To see the available files, please verify the **Files and versions** tab.

## How to Run (llama.cpp)

**Recommended Parameters:**
*   **Temperature:** `1.0` (Standard) or `0.7` (For stricter adherence)
*   **Top-P:** `0.95`
*   **Context:** `-c` (Adjust based on available RAM).

### CLI Example

```bash
./llama-cli -m GLM-4.7-Flash.Q4_K_M.gguf \
  -c 8192 \
  --temp 1.0 \
  --top-p 0.95 \
  -p "User: Write a Python script to calculate Fibonacci numbers.\nAssistant:" \
  -cnv
```

### Server Example

```bash
./llama-server -m GLM-4.7-Flash.Q4_K_M.gguf \
  --port 8080 \
  --host 0.0.0.0 \
  -c 16384 \
  -ngl 99
```