Qwen-AgentWorld-Heretic-HCl-NVFP4

35B MoE · NVFP4 4-bit · Heretic-abliterated · DGX Spark


Model Card

Base model Qwen/Qwen-AgentWorld-35B-A3B
Architecture Qwen3.5 MoE — 35B total / 3B active
Experts 256 experts, top-8 routing
Layers 40 transformer (linear attention + full attention)
Context 262,144 tokens (256K)
Quantization — 4-bit NVIDIA floating point (e2m1)
Compression 21 GB (3.1× from 65 GB bf16)
Format safetensors, vLLM-compatible

Heretic Abliteration

This model underwent controlled steering via Heretic v1.4.0 to reduce refusal behavior while minimizing KL divergence.

Parameter Value
Trials 200 (completed)
KL divergence target
Good prompts (400)
Bad prompts (400)
Row normalization FULL, LoRA rank 3
Refusal markers 28
GPU NVIDIA GB10, 121 GB

NVFP4 Quantization

Custom pipeline on DGX Spark (GB10, 121 GB unified memory):

Step Detail
Load (half the memory of bf16)
Calibration 32 samples,
Quantization 0.44,
Export
Remap for vLLM

Usage

Naming

Component Meaning
Qwen Base architecture family
AgentWorld Language world model — 7 simulation domains
Heretic Abliterated via Heretic parameter study
HCl Production mode (hydrochloric acid)
NVFP4 4-bit NVIDIA floating point quantization
dgx Quantized & deployed on DGX Spark

AgentWorld + Heretic + NVFP4 — all on a single DGX Spark.

Quantized 2026-06-26 · Model on HuggingFace

Downloads last month
-
Safetensors
Model size
20B params
Tensor type
F16
·
F8_E4M3
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for maltsevis/Qwen-AgentWorld-Heretic-HCl-NVFP4-dgx

Quantized
(31)
this model