---
language:
- en
- multilingual
tags:
- qwen
- qwen3.5
- moe
- agent
- world-model
- mxfp4_moe
- gguf
- vision
- multimodal
- 35b
license: apache-2.0
base_model: Qwen/Qwen-AgentWorld-35B-A3B
---

# Qwen AgentWorld 35B-A3B — MXFP4 MoE GGUF

MXFP4 MoE quantization of [Qwen/Qwen-AgentWorld-35B-A3B](https://huggingface.co/Qwen/Qwen-AgentWorld-35B-A3B), a 35B parameter Mixture-of-Experts model with 3B active parameters, designed for agent tasks and world modeling with vision support.

## About the Model

Qwen AgentWorld is a specialized variant of the Qwen 3.5 MoE architecture optimized for:

- **Agent tasks** — tool calling, function execution, environment simulation
- **World modeling** — understanding and predicting environment states
- **Vision understanding** — multimodal image input via separate mmproj vision projector
- **35B total parameters** with only **3B active per token** (256 experts, 8 active)
- **Efficient inference** — MoE architecture activates only a fraction of parameters

## Architecture

- **Text model**: Qwen3.5 MoE — 40 layers, 2048 hidden, 256 experts (8 active/token)
- **Vision encoder**: 27-layer SigLIP-style, 1152 hidden, patch_size 16 (via mmproj)
- **Vocabulary**: 248,320 tokens

## Quantization

This GGUF was quantized from the BF16 safetensors using [llama.cpp](https://github.com/ggerganov/llama.cpp) (build 537). The source weights were converted to F16 GGUF, then quantized to MXFP4 MoE format.

MXFP4 MoE uses microscaling FP4 for expert weights and Q8_0 for non-expert tensors, optimized for MoE architectures.

## Files

| File | Size | Description |
|------|------|-------------|
| qwen-agentworld-35b-a3b-mxfp4_moe.gguf | ~18.4 GB | MXFP4 MoE quantized model weights |
| mmproj-qwen-agentworld-35b-a3b-f16.gguf | ~843 MB | Vision projector (BF16) |

## Usage

### llama.cpp

`ash
llama-server \
  -m qwen-agentworld-35b-a3b-mxfp4_moe.gguf \
  --mmproj mmproj-qwen-agentworld-35b-a3b-f16.gguf \
  -ngl 99 \
  --host 0.0.0.0 \
  --port 8080
`

### LM Studio

1. Download both files from this repository
2. Load the main GGUF file in LM Studio
3. Load the mmproj file for vision support
4. Set GPU offload layers to maximum

## Hardware Requirements

- **Minimum**: 20 GB VRAM for partial offload
- **Recommended**: 24+ GB VRAM for full GPU offload
- **Disk**: ~19.2 GB

## License

Apache 2.0 — same as the base model.