---
language:
- en
- multilingual
tags:
- qwen
- qwen3.5
- moe
- agent
- world-model
- mxfp4_moe
- gguf
- vision
- multimodal
- 35b
license: apache-2.0
base_model: Qwen/Qwen-AgentWorld-35B-A3B
---

# Qwen AgentWorld 35B-A3B — MXFP4 MoE GGUF

MXFP4 MoE quantization of [Qwen/Qwen-AgentWorld-35B-A3B](https://huggingface.co/Qwen/Qwen-AgentWorld-35B-A3B), a 35B parameter Mixture-of-Experts model with 3B active parameters, designed for agent tasks and world modeling with vision support.

## About the Model

Qwen AgentWorld is a specialized variant of the Qwen 3.5 MoE architecture optimized for:

- **Agent tasks** — tool calling, function execution, environment simulation
- **World modeling** — understanding and predicting environment states
- **Vision understanding** — multimodal image input via unified vision encoder
- **35B total parameters** with only **3B active per token** (256 experts, 8 active)
- **Efficient inference** — MoE architecture activates only a fraction of parameters

## Architecture

- **Text model**: Qwen3.5 MoE — 40 layers, 2048 hidden, 256 experts (8 active/token)
- **Vision encoder**: 27-layer SigLIP-style, 1152 hidden, patch_size 16
- **Vocabulary**: 248,320 tokens
- **Vision**: Unified architecture — vision weights embedded in main GGUF (no separate mmproj)

## Quantization

This GGUF was quantized from the BF16 safetensors using [llama.cpp](https://github.com/ggerganov/llama.cpp) (build 537). The source weights were converted to F16 GGUF, then quantized to MXFP4 MoE format.

MXFP4 MoE uses microscaling FP4 for expert weights and Q8_0 for non-expert tensors, optimized for MoE architectures.

## Files

| File | Size | Description |
|------|------|-------------|
| `qwen-agentworld-35b-a3b-mxfp4_moe.gguf` | ~18.4 GB | MXFP4 MoE quantized model (text + vision) |

**Note**: Vision weights are embedded in the main GGUF — no separate mmproj file needed.

## Usage

### llama.cpp

```bash
# Server mode with OpenAI-compatible API
llama-server \
  -m qwen-agentworld-35b-a3b-mxfp4_moe.gguf \
  -ngl 99 \
  --host 0.0.0.0 \
  --port 8080

# Direct inference
llama-cli \
  -m qwen-agentworld-35b-a3b-mxfp4_moe.gguf \
  -ngl 99 \
  -p "Analyze this image and describe what you see"
```

### LM Studio

1. Download the GGUF file from this repository
2. Load the GGUF file in LM Studio (vision is embedded, no mmproj needed)
3. Set GPU offload layers to maximum

## Hardware Requirements

- **Minimum**: 20 GB VRAM for partial offload
- **Recommended**: 24+ GB VRAM for full GPU offload
- **Disk**: ~18.4 GB

## License

Apache 2.0 — same as the base model.