--- license: apache-2.0 language: - en - ru tags: - nvfp4 - qwen - agentworld - heretic - moe - gb10 - blackwell - modelopt pipeline_tag: text-generation base_model: Qwen/Qwen-AgentWorld-35B-A3B ---

Qwen-AgentWorld-Heretic-HCl-NVFP4

35B MoE · NVFP4 4-bit · Heretic-abliterated · DGX Spark

--- ## Model Card | | | |---|---| | **Base model** | [Qwen/Qwen-AgentWorld-35B-A3B](https://huggingface.co/Qwen/Qwen-AgentWorld-35B-A3B) | | **Architecture** | Qwen3.5 MoE — 35B total / 3B active | | **Experts** | 256 experts, top-8 routing | | **Layers** | 40 transformer (linear attention + full attention) | | **Context** | **262,144 tokens** (256K) | | **Quantization** | — 4-bit NVIDIA floating point (e2m1) | | **Compression** | **21 GB** (3.1× from 65 GB bf16) | | **Format** | safetensors, vLLM-compatible | ## Heretic Abliteration This model underwent controlled steering via [Heretic](https://github.com/p-e-w/heretic) v1.4.0 to reduce refusal behavior while minimizing KL divergence. | Parameter | Value | |---|---| | Trials | **200** (completed) | | KL divergence target | | | Good prompts | (400) | | Bad prompts | (400) | | Row normalization | FULL, LoRA rank 3 | | Refusal markers | 28 | | GPU | NVIDIA GB10, 121 GB | ## NVFP4 Quantization Custom pipeline on DGX Spark (GB10, 121 GB unified memory): | Step | Detail | |---|---| | Load | (half the memory of bf16) | | Calibration | 32 samples, | | Quantization | 0.44, | | Export | | | Remap | for vLLM | ## Usage ## Naming | Component | Meaning | |---|---| | **Qwen** | Base architecture family | | **AgentWorld** | Language world model — 7 simulation domains | | **Heretic** | Abliterated via Heretic parameter study | | **HCl** | Production mode (hydrochloric acid) | | **NVFP4** | 4-bit NVIDIA floating point quantization | | **dgx** | Quantized & deployed on DGX Spark | ---

AgentWorld + Heretic + NVFP4 — all on a single DGX Spark.

Quantized 2026-06-26 · Model on HuggingFace