---
base_model:
- Qwen/Qwen-AgentWorld-35B-A3B
tags:
- Qwen
- agent
- world
- 35b
- a3b
- moe
---
# Qwen-AgentWorld-35B-A3B GGUF

GGUF quantizations of [Qwen/Qwen-AgentWorld-35B-A3B](https://huggingface.co/Qwen/Qwen-AgentWorld-35B-A3B) generated with [llama.cpp](https://github.com/ggerganov/llama.cpp).

- Architecture: `qwen35moe` (35B params, 256 experts / 8 active, A3B)
- Context length: 262 144
- Source: BF16 GGUF converted via `convert_hf_to_gguf.py`
- Importance matrix (`coding.imatrix.gguf`) generated from a curated coding calibration text (1 840 samples), 128 chunks, ctx 4 096.

## Files

| File | Size (GiB) | BPW | Use |
| --- | --- | --- | --- |
| `Qwen-AgentWorld-35B-A3B-BF16.gguf` | 64.61 | 16.01 | reference / highest fidelity |
| `Qwen-AgentWorld-35B-A3B-Q8_0.gguf` | 34.37 | 8.52 | near-lossless, needs >24 GB VRAM with all layers offloaded |
| `Qwen-AgentWorld-35B-A3B-Q6_K.gguf` | 26.56 | 6.58 | very high quality, ~20 GB VRAM |
| `Qwen-AgentWorld-35B-A3B-Q4_K_M.gguf` | 19.71 | 4.88 | balanced quality / size, ~14 GB VRAM |
| `Qwen-AgentWorld-35B-A3B-Q2_K.gguf` | 12.05 | 2.99 | smallest, ~9 GB VRAM, quality trade-off |
| `Qwen-AgentWorld-35B-A3B-coding.imatrix.gguf` | 0.18 | — | importance matrix for finer quant recipes (Tensor-type overrides). |

Plain quantization passes (no recipe overrides) only — these avoid the
`std::bad_alloc` triggered by `--tensor-type-file` + imatrix on this
specific MoE architecture in the current llama.cpp build (8194 /
1179bfc82). The imatrix file is still provided for users who want to
mix types via `--tensor-type-file`.

## Recommended inference

```
llama.cpp/build/bin/llama-server \
    -m Qwen-AgentWorld-35B-A3B-Q4_K_M.gguf \
    -ngl 999 --ctx-size 8192 -b 2048 -ub 512 -np 1 \
    --temp 0.6 --top-p 0.95 --top-k 20 \
    -fa --host 0.0.0.0 --port 8080
```