--- base_model: - Qwen/Qwen-AgentWorld-35B-A3B tags: - Qwen - agent - world - 35b - a3b - moe --- # Qwen-AgentWorld-35B-A3B GGUF GGUF quantizations of [Qwen/Qwen-AgentWorld-35B-A3B](https://huggingface.co/Qwen/Qwen-AgentWorld-35B-A3B) generated with [llama.cpp](https://github.com/ggerganov/llama.cpp). - Architecture: `qwen35moe` (35B params, 256 experts / 8 active, A3B) - Context length: 262 144 - Source: BF16 GGUF converted via `convert_hf_to_gguf.py` - Importance matrix (`coding.imatrix.gguf`) generated from a curated coding calibration text (1 840 samples), 128 chunks, ctx 4 096. ## Files | File | Size (GiB) | BPW | Use | | --- | --- | --- | --- | | `Qwen-AgentWorld-35B-A3B-BF16.gguf` | 64.61 | 16.01 | reference / highest fidelity | | `Qwen-AgentWorld-35B-A3B-Q8_0.gguf` | 34.37 | 8.52 | near-lossless, needs >24 GB VRAM with all layers offloaded | | `Qwen-AgentWorld-35B-A3B-Q6_K.gguf` | 26.56 | 6.58 | very high quality, ~20 GB VRAM | | `Qwen-AgentWorld-35B-A3B-Q4_K_M.gguf` | 19.71 | 4.88 | balanced quality / size, ~14 GB VRAM | | `Qwen-AgentWorld-35B-A3B-Q2_K.gguf` | 12.05 | 2.99 | smallest, ~9 GB VRAM, quality trade-off | | `Qwen-AgentWorld-35B-A3B-coding.imatrix.gguf` | 0.18 | — | importance matrix for finer quant recipes (Tensor-type overrides). | Plain quantization passes (no recipe overrides) only — these avoid the `std::bad_alloc` triggered by `--tensor-type-file` + imatrix on this specific MoE architecture in the current llama.cpp build (8194 / 1179bfc82). The imatrix file is still provided for users who want to mix types via `--tensor-type-file`. ## Recommended inference ``` llama.cpp/build/bin/llama-server \ -m Qwen-AgentWorld-35B-A3B-Q4_K_M.gguf \ -ngl 999 --ctx-size 8192 -b 2048 -ub 512 -np 1 \ --temp 0.6 --top-p 0.95 --top-k 20 \ -fa --host 0.0.0.0 --port 8080 ```