How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="FreedomAISVR/Qwen-AgentWorld-35B-A3B-MXFP4-MOE-GGUF",
	filename="",
)
llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Qwen AgentWorld 35B-A3B — MXFP4 MoE GGUF

MXFP4 MoE quantization of Qwen/Qwen-AgentWorld-35B-A3B, a 35B parameter Mixture-of-Experts model with 3B active parameters, designed for agent tasks and world modeling with vision support.

About the Model

Qwen AgentWorld is a specialized variant of the Qwen 3.5 MoE architecture optimized for:

  • Agent tasks — tool calling, function execution, environment simulation
  • World modeling — understanding and predicting environment states
  • Vision understanding — multimodal image input via separate mmproj vision projector
  • 35B total parameters with only 3B active per token (256 experts, 8 active)
  • Efficient inference — MoE architecture activates only a fraction of parameters

Architecture

  • Text model: Qwen3.5 MoE — 40 layers, 2048 hidden, 256 experts (8 active/token)
  • Vision encoder: 27-layer SigLIP-style, 1152 hidden, patch_size 16 (via mmproj)
  • Vocabulary: 248,320 tokens

Quantization

This GGUF was quantized from the BF16 safetensors using llama.cpp (build 537). The source weights were converted to F16 GGUF, then quantized to MXFP4 MoE format.

MXFP4 MoE uses microscaling FP4 for expert weights and Q8_0 for non-expert tensors, optimized for MoE architectures.

Files

File Size Description
qwen-agentworld-35b-a3b-mxfp4_moe.gguf ~18.4 GB MXFP4 MoE quantized model weights
mmproj-qwen-agentworld-35b-a3b-f16.gguf ~843 MB Vision projector (BF16)

Usage

llama.cpp

ash llama-server \ -m qwen-agentworld-35b-a3b-mxfp4_moe.gguf \ --mmproj mmproj-qwen-agentworld-35b-a3b-f16.gguf \ -ngl 99 \ --host 0.0.0.0 \ --port 8080

LM Studio

  1. Download both files from this repository
  2. Load the main GGUF file in LM Studio
  3. Load the mmproj file for vision support
  4. Set GPU offload layers to maximum

Hardware Requirements

  • Minimum: 20 GB VRAM for partial offload
  • Recommended: 24+ GB VRAM for full GPU offload
  • Disk: ~19.2 GB

License

Apache 2.0 — same as the base model.

Downloads last month
694
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for FreedomAISVR/Qwen-AgentWorld-35B-A3B-MXFP4-MOE-GGUF

Quantized
(31)
this model