---
license: apache-2.0
language:
- en
- ru
tags:
- nvfp4
- qwen
- agentworld
- heretic
- moe
- gb10
- blackwell
- modelopt
pipeline_tag: text-generation
base_model: Qwen/Qwen-AgentWorld-35B-A3B
---
Qwen-AgentWorld-Heretic-HCl-NVFP4
35B MoE · NVFP4 4-bit · Heretic-abliterated · DGX Spark
---
## Model Card
| | |
|---|---|
| **Base model** | [Qwen/Qwen-AgentWorld-35B-A3B](https://huggingface.co/Qwen/Qwen-AgentWorld-35B-A3B) |
| **Architecture** | Qwen3.5 MoE — 35B total / 3B active |
| **Experts** | 256 experts, top-8 routing |
| **Layers** | 40 transformer (linear attention + full attention) |
| **Context** | **262,144 tokens** (256K) |
| **Quantization** | — 4-bit NVIDIA floating point (e2m1) |
| **Compression** | **21 GB** (3.1× from 65 GB bf16) |
| **Format** | safetensors, vLLM-compatible |
## Heretic Abliteration
This model underwent controlled steering via [Heretic](https://github.com/p-e-w/heretic) v1.4.0 to reduce refusal behavior while minimizing KL divergence.
| Parameter | Value |
|---|---|
| Trials | **200** (completed) |
| KL divergence target | |
| Good prompts | (400) |
| Bad prompts | (400) |
| Row normalization | FULL, LoRA rank 3 |
| Refusal markers | 28 |
| GPU | NVIDIA GB10, 121 GB |
## NVFP4 Quantization
Custom pipeline on DGX Spark (GB10, 121 GB unified memory):
| Step | Detail |
|---|---|
| Load | (half the memory of bf16) |
| Calibration | 32 samples, |
| Quantization | 0.44, |
| Export | |
| Remap | for vLLM |
## Usage
## Naming
| Component | Meaning |
|---|---|
| **Qwen** | Base architecture family |
| **AgentWorld** | Language world model — 7 simulation domains |
| **Heretic** | Abliterated via Heretic parameter study |
| **HCl** | Production mode (hydrochloric acid) |
| **NVFP4** | 4-bit NVIDIA floating point quantization |
| **dgx** | Quantized & deployed on DGX Spark |
---
AgentWorld + Heretic + NVFP4 — all on a single DGX Spark.
Quantized 2026-06-26 · Model on HuggingFace