--- base_model: zai-org/GLM-4.7-Flash base_model_relation: quantized language: - en - zh library_name: gguf license: mit pipeline_tag: text-generation tags: - text-generation-inference - glm - moe - flash - gguf - glm4_moe_lite --- # GLM-4.7-Flash-GGUF
## Description This repository contains **GGUF** format model files for [Zhipu AI's GLM-4.7-Flash](https://huggingface.co/zai-org/GLM-4.7-Flash). **GLM-4.7-Flash** is a highly efficient **30B-A3B Mixture-of-Experts (MoE)** model. It is designed to be the strongest model in the 30B parameter class, offering a powerful option for lightweight deployment that perfectly balances performance and efficiency. ## Evaluation Results | Benchmark | GLM-4.7-Flash | Qwen3-30B-A3B-Thinking-2507 | GPT-OSS-20B | |--------------------|---------------|-----------------------------|-------------| | AIME 25 | 91.6 | 85.0 | 91.7 | | GPQA | 75.2 | 73.4 | 71.5 | | LCB v6 | 64.0 | 66.0 | 61.0 | | HLE | 14.4 | 9.8 | 10.9 | | SWE-bench Verified | 59.2 | 22.0 | 34.0 | | τ²-Bench | 79.5 | 49.0 | 47.7 | | BrowseComp | 42.8 | 2.29 | 28.3 | ## Files & Quantization To see the available files, please verify the **Files and versions** tab. ## How to Run (llama.cpp) **Recommended Parameters:** * **Temperature:** `1.0` (Standard) or `0.7` (For stricter adherence) * **Top-P:** `0.95` * **Context:** `-c` (Adjust based on available RAM). ### CLI Example ```bash ./llama-cli -m GLM-4.7-Flash.Q4_K_M.gguf \ -c 8192 \ --temp 1.0 \ --top-p 0.95 \ -p "User: Write a Python script to calculate Fibonacci numbers.\nAssistant:" \ -cnv ``` ### Server Example ```bash ./llama-server -m GLM-4.7-Flash.Q4_K_M.gguf \ --port 8080 \ --host 0.0.0.0 \ -c 16384 \ -ngl 99 ```