--- language: - en - multilingual tags: - qwen - qwen3.5 - moe - agent - world-model - mxfp4_moe - gguf - vision - multimodal - 35b license: apache-2.0 base_model: Qwen/Qwen-AgentWorld-35B-A3B --- # Qwen AgentWorld 35B-A3B — MXFP4 MoE GGUF MXFP4 MoE quantization of [Qwen/Qwen-AgentWorld-35B-A3B](https://huggingface.co/Qwen/Qwen-AgentWorld-35B-A3B), a 35B parameter Mixture-of-Experts model with 3B active parameters, designed for agent tasks and world modeling with vision support. ## About the Model Qwen AgentWorld is a specialized variant of the Qwen 3.5 MoE architecture optimized for: - **Agent tasks** — tool calling, function execution, environment simulation - **World modeling** — understanding and predicting environment states - **Vision understanding** — multimodal image input via unified vision encoder - **35B total parameters** with only **3B active per token** (256 experts, 8 active) - **Efficient inference** — MoE architecture activates only a fraction of parameters ## Architecture - **Text model**: Qwen3.5 MoE — 40 layers, 2048 hidden, 256 experts (8 active/token) - **Vision encoder**: 27-layer SigLIP-style, 1152 hidden, patch_size 16 - **Vocabulary**: 248,320 tokens - **Vision**: Unified architecture — vision weights embedded in main GGUF (no separate mmproj) ## Quantization This GGUF was quantized from the BF16 safetensors using [llama.cpp](https://github.com/ggerganov/llama.cpp) (build 537). The source weights were converted to F16 GGUF, then quantized to MXFP4 MoE format. MXFP4 MoE uses microscaling FP4 for expert weights and Q8_0 for non-expert tensors, optimized for MoE architectures. ## Files | File | Size | Description | |------|------|-------------| | `qwen-agentworld-35b-a3b-mxfp4_moe.gguf` | ~18.4 GB | MXFP4 MoE quantized model (text + vision) | **Note**: Vision weights are embedded in the main GGUF — no separate mmproj file needed. ## Usage ### llama.cpp ```bash # Server mode with OpenAI-compatible API llama-server \ -m qwen-agentworld-35b-a3b-mxfp4_moe.gguf \ -ngl 99 \ --host 0.0.0.0 \ --port 8080 # Direct inference llama-cli \ -m qwen-agentworld-35b-a3b-mxfp4_moe.gguf \ -ngl 99 \ -p "Analyze this image and describe what you see" ``` ### LM Studio 1. Download the GGUF file from this repository 2. Load the GGUF file in LM Studio (vision is embedded, no mmproj needed) 3. Set GPU offload layers to maximum ## Hardware Requirements - **Minimum**: 20 GB VRAM for partial offload - **Recommended**: 24+ GB VRAM for full GPU offload - **Disk**: ~18.4 GB ## License Apache 2.0 — same as the base model.