Qwen-AgentWorld-Heretic-HCl-NVFP4
35B MoE · NVFP4 4-bit · Heretic-abliterated · DGX Spark
Model Card
| Base model | Qwen/Qwen-AgentWorld-35B-A3B |
| Architecture | Qwen3.5 MoE — 35B total / 3B active |
| Experts | 256 experts, top-8 routing |
| Layers | 40 transformer (linear attention + full attention) |
| Context | 262,144 tokens (256K) |
| Quantization | — 4-bit NVIDIA floating point (e2m1) |
| Compression | 21 GB (3.1× from 65 GB bf16) |
| Format | safetensors, vLLM-compatible |
Heretic Abliteration
This model underwent controlled steering via Heretic v1.4.0 to reduce refusal behavior while minimizing KL divergence.
| Parameter | Value |
|---|---|
| Trials | 200 (completed) |
| KL divergence target | |
| Good prompts | (400) |
| Bad prompts | (400) |
| Row normalization | FULL, LoRA rank 3 |
| Refusal markers | 28 |
| GPU | NVIDIA GB10, 121 GB |
NVFP4 Quantization
Custom pipeline on DGX Spark (GB10, 121 GB unified memory):
| Step | Detail |
|---|---|
| Load | (half the memory of bf16) |
| Calibration | 32 samples, |
| Quantization | 0.44, |
| Export | |
| Remap | for vLLM |
Usage
Naming
| Component | Meaning |
|---|---|
| Qwen | Base architecture family |
| AgentWorld | Language world model — 7 simulation domains |
| Heretic | Abliterated via Heretic parameter study |
| HCl | Production mode (hydrochloric acid) |
| NVFP4 | 4-bit NVIDIA floating point quantization |
| dgx | Quantized & deployed on DGX Spark |
AgentWorld + Heretic + NVFP4 — all on a single DGX Spark.
Quantized 2026-06-26 · Model on HuggingFace
- Downloads last month
- -
Model tree for maltsevis/Qwen-AgentWorld-Heretic-HCl-NVFP4-dgx
Base model
Qwen/Qwen3.5-35B-A3B-Base Finetuned
Qwen/Qwen-AgentWorld-35B-A3B