Qwen-AgentWorld-Heretic-HCl-NVFP4

35B MoE · NVFP4 4-bit · Heretic-abliterated · DGX Spark

Model Card


Base model	Qwen/Qwen-AgentWorld-35B-A3B
Architecture	Qwen3.5 MoE — 35B total / 3B active
Experts	256 experts, top-8 routing
Layers	40 transformer (linear attention + full attention)
Context	262,144 tokens (256K)
Quantization	— 4-bit NVIDIA floating point (e2m1)
Compression	21 GB (3.1× from 65 GB bf16)
Format	safetensors, vLLM-compatible

This model underwent controlled steering via Heretic v1.4.0 to reduce refusal behavior while minimizing KL divergence.

Custom pipeline on DGX Spark (GB10, 121 GB unified memory):

Component	Meaning
Qwen	Base architecture family
AgentWorld	Language world model — 7 simulation domains
Heretic	Abliterated via Heretic parameter study
HCl	Production mode (hydrochloric acid)
NVFP4	4-bit NVIDIA floating point quantization
dgx	Quantized & deployed on DGX Spark

AgentWorld + Heretic + NVFP4 — all on a single DGX Spark.

Quantized 2026-06-26 · Model on HuggingFace

Safetensors

Model size

20B params

Tensor type

F16

F8_E4M3

Base model

Finetuned

Quantized

(31)

this model