Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark

Recovered HuggingFace safetensors from the Q8_0 quantized GGUF published by HauhauCS.

Source

Field Value
Original GGUF Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Q8_K_P.gguf
GGUF Size 41 GB
Quantization Q8_0 (355 tensors), F32 (301 tensors), F16 (77 tensors)
Reference Model Qwen3.6-35B-A3B (official, BF16)
Architecture Qwen3_5MoeForConditionalGeneration (MoE hybrid Gated DeltaNet + Gated Attention, 256 experts with 8 active per token)

Recovery Details

Converted from GGUF to HuggingFace safetensors format using ungguf with bit-exact verification.

All 693 GGUF-derived tensors verified bit-exact against the GGUF source after applying:

  • GGML Fortran-order reversal (reverse_shape=True for all tensors)
  • Norm convention (subtract 1.0)
  • A_log convention (log(-A))
  • V-head inverse reorder (v_per_k=2: 16 K-heads / 32 V-heads)
  • Expert 3D tensor reshape and gate/up concatenation

MTP and Vision Encoder Restoration

The GGUF file does not contain Multi-Token Prediction (MTP) or vision encoder tensors — these are excluded by the llama.cpp converter that produced it. For a complete, loadable model, the following were copied verbatim from the official Qwen3.6-35B-A3B reference model:

Component Tensors Source
Vision encoder (model.visual.*) 333 Reference model (bit-exact copy)
MTP layers (mtp.*) 4 Reference model (bit-exact copy)
Additional vision/metadata tensors 15 Reference model (bit-exact copy)

All 352 copied tensors verified bit-exact against the reference.

Sanity Check

The recovered model was tested with vLLM (FP8 + TP2 on 2x GPUs):

Model Harmful Coherence Benign Coherence Harmful Refusal
Base MoE (FP8+TP2) 100% 100% 40%
Recovered MoE (FP8+TP2) 100% 100% 0%

The recovered model achieves 100% coherence on both harmful and benign prompts, matching the base model's generation quality. The abliteration is effective: 0% refusal rate (down from the base model's 40%).

Tensor Comparison vs Base Model

Compared against the official Qwen3.6-35B-A3B base to identify abliteration modifications:

Summary

Category Tensors Identical to Base Modified
GGUF-derived 693 307 386
Copied (MTP + vision) 352 352 0
Total 1045 659 386

Unchanged Tensors (identical to base)

These tensors were not modified by abliteration:

Group Count Note
layernorm 82 Input/post-attention layernorms
linear_attn.norm 30 Layer norms for linear attention
linear_attn.conv1d 30 Conv1d weights
linear_attn.dt_bias 30 Delta-time biases
linear_attn.A_log 30 A-log parameters
self_attn.q_norm / k_norm 22 QK norms for full attention
router_gate 41 Expert router gates
vision 333 Vision encoder
mtp 4 Multi-token prediction layers
final_norm 1 Final layer norm

Modified Tensors

Group Total Modified Typical % Changed Max Abs Diff
expert_gate_up 41 40 41–79% 1.8e-02
expert_down 41 40 42–85% 6.5e-02
shared_expert_gate 41 40 76–93% 2.5e-02
shared_expert_up 41 40 38–92% 2.1e-02
shared_expert_down 41 40 65–88% 2.4e-02
shared_expert_gate_scalar 41 16 89–99% 5.6e-03
linear_attn.out_proj 30 30 75–88% 6.5e-02
linear_attn.in_proj_qkv 30 26 73–76% 2.3e-03
linear_attn.in_proj_z 30 26 75–77% 2.0e-03
linear_attn.in_proj_a 30 26 76–78% 9.8e-04
linear_attn.in_proj_b 30 26 77–80% 9.8e-04
self_attn.o_proj 11 10 75–87% 3.2e-02
self_attn.q_proj 11 8 75–76% 1.6e-03
self_attn.k_proj 11 8 76–80% 1.2e-03
self_attn.v_proj 11 8 77–79% 2.0e-03
embed_tokens 1 1 74% 1.1e-03
lm_head 1 1 75% 1.1e-03

Key observations:

  • Expert and shared expert projections show the largest deviations (up to 6.5e-02 max abs diff)
  • Linear attention out_proj has the highest max abs diff (6.5e-02), consistent with the 27B model pattern
  • Router gates and normalization layers were left untouched — the abliteration targeted only projection weights
  • 40 of 41 MoE layers have modified expert tensors; the unmodified layer's experts may have been below a threshold
  • Layer 0's linear attention projections are unmodified, while layers 1+ show modifications (26/30 layers affected)

Output Format

Property Value
Format HuggingFace safetensors (17 shards)
Dtype BF16 (dequantized from Q8_0/F32/F16)
Total Size 67 GB
Tensor Count 1045
Shard Size ~4.1 GB

Usage

Load with HuggingFace transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "./Qwen3.6-35B-A3B-HauhauCS-Q8KP-recovered",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("./Qwen3.6-35B-A3B-HauhauCS-Q8KP-recovered")

For efficient inference with vLLM:

vllm serve ./Qwen3.6-35B-A3B-HauhauCS-Q8KP-recovered --quantization fp8 --tensor-parallel-size 2

See our other tensor comparisons and provenance analyses for HauhauCS models at: DreamFast HauhauCS Safetensor Benchmarks

Quality Notes

This model was recovered from a lossy Q8_0 quantization. While the conversion itself is bit-exact to the GGUF source, the original quantization introduces error on the most affected tensors compared to the original BF16 weights. The abliteration modifications (up to 0.065 max abs diff) are significantly larger than the quantization noise, confirming the abliteration signal is well-preserved.

Benchmarks

Benchmarks and tensor analysis coming soon. See our previous HauhauCS model benchmarks and evaluations at: DreamFast HauhauCS Safetensor Benchmarks

Files

Qwen3.6-35B-A3B-HauhauCS-Q8KP-recovered/
├── config.json
├── generation_config.json
├── tokenizer.json
├── tokenizer_config.json
├── preprocessor_config.json
├── video_preprocessor_config.json
├── chat_template.jinja
├── vocab.json
├── merges.txt
├── model.safetensors.index.json
├── model.safetensors-00001-of-00017.safetensors
├── ...
├── model.safetensors-00017-of-00017.safetensors
└── diff_report.json              # Full tensor-by-tensor comparison
Downloads last month
208
Safetensors
Model size
36B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark

Finetuned
(140)
this model

Collection including DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark