FreedomAISVR's picture
Upload README.md with huggingface_hub
42c989b verified
|
Raw
History Blame
2.61 kB
metadata
language:
  - en
  - multilingual
tags:
  - qwen
  - qwen3.5
  - moe
  - agent
  - world-model
  - mxfp4_moe
  - gguf
  - vision
  - multimodal
  - 35b
license: apache-2.0
base_model: Qwen/Qwen-AgentWorld-35B-A3B

Qwen AgentWorld 35B-A3B β€” MXFP4 MoE GGUF

MXFP4 MoE quantization of Qwen/Qwen-AgentWorld-35B-A3B, a 35B parameter Mixture-of-Experts model with 3B active parameters, designed for agent tasks and world modeling with vision support.

About the Model

Qwen AgentWorld is a specialized variant of the Qwen 3.5 MoE architecture optimized for:

  • Agent tasks β€” tool calling, function execution, environment simulation
  • World modeling β€” understanding and predicting environment states
  • Vision understanding β€” multimodal image input via unified vision encoder
  • 35B total parameters with only 3B active per token (256 experts, 8 active)
  • Efficient inference β€” MoE architecture activates only a fraction of parameters

Architecture

  • Text model: Qwen3.5 MoE β€” 40 layers, 2048 hidden, 256 experts (8 active/token)
  • Vision encoder: 27-layer SigLIP-style, 1152 hidden, patch_size 16
  • Vocabulary: 248,320 tokens
  • Vision: Unified architecture β€” vision weights embedded in main GGUF (no separate mmproj)

Quantization

This GGUF was quantized from the BF16 safetensors using llama.cpp (build 537). The source weights were converted to F16 GGUF, then quantized to MXFP4 MoE format.

MXFP4 MoE uses microscaling FP4 for expert weights and Q8_0 for non-expert tensors, optimized for MoE architectures.

Files

File Size Description
qwen-agentworld-35b-a3b-mxfp4_moe.gguf ~18.4 GB MXFP4 MoE quantized model (text + vision)

Note: Vision weights are embedded in the main GGUF β€” no separate mmproj file needed.

Usage

llama.cpp

# Server mode with OpenAI-compatible API
llama-server \
  -m qwen-agentworld-35b-a3b-mxfp4_moe.gguf \
  -ngl 99 \
  --host 0.0.0.0 \
  --port 8080

# Direct inference
llama-cli \
  -m qwen-agentworld-35b-a3b-mxfp4_moe.gguf \
  -ngl 99 \
  -p "Analyze this image and describe what you see"

LM Studio

  1. Download the GGUF file from this repository
  2. Load the GGUF file in LM Studio (vision is embedded, no mmproj needed)
  3. Set GPU offload layers to maximum

Hardware Requirements

  • Minimum: 20 GB VRAM for partial offload
  • Recommended: 24+ GB VRAM for full GPU offload
  • Disk: ~18.4 GB

License

Apache 2.0 β€” same as the base model.