--- title: Qwen-AgentWorld-35B-A3B emoji: 🌍 colorFrom: indigo colorTo: purple sdk: gradio sdk_version: "5.9.1" app_file: app.py python_version: "3.12" pinned: false license: apache-2.0 short_description: Free ZeroGPU demo of Qwen-AgentWorld-35B-A3B (4-bit) --- # Qwen-AgentWorld-35B-A3B — ZeroGPU Space Free GPU demo of [`Qwen/Qwen-AgentWorld-35B-A3B`](https://hf.co/Qwen/Qwen-AgentWorld-35B-A3B) running on **Hugging Face ZeroGPU**. The 35B MoE is loaded **4-bit (nf4)** so it fits in a ZeroGPU slot. ## Why this is "free" - ZeroGPU compute is free; an **HF Pro** account gets the **largest daily quota**. - No always-on server, no per-hour billing (unlike Inference Endpoints). ## Deploy 1. Create a new Space → SDK **Gradio**. 2. In **Settings → Hardware**, select **ZeroGPU** (free with Pro). 3. Push `app.py`, `requirements.txt`, and this `README.md`. Or push from the CLI (see `push_space.py` in this folder). ## Notes - `size`/`duration` are tuned in `app.py`; lower `max_new_tokens` = less quota used. - ZeroGPU's backing GPU and per-slot VRAM change over time — if 4-bit ever stops fitting, switch `MODEL_ID` to a pre-quantized mirror.