Instructions to use DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark")
model = AutoModelForMultimodalLM.from_pretrained("DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark

SGLang

How to use DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark with Docker Model Runner:
```
docker model run hf.co/DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark
```

Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark

Recovered HuggingFace safetensors from the Q8_0 quantized GGUF published by HauhauCS.

Source

Field	Value
Original GGUF	`Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Q8_K_P.gguf`
GGUF Size	41 GB
Quantization	Q8_0 (355 tensors), F32 (301 tensors), F16 (77 tensors)
Reference Model	`Qwen3.6-35B-A3B` (official, BF16)
Architecture	`Qwen3_5MoeForConditionalGeneration` (MoE hybrid Gated DeltaNet + Gated Attention, 256 experts with 8 active per token)

Recovery Details

Converted from GGUF to HuggingFace safetensors format using ungguf with bit-exact verification.

All 693 GGUF-derived tensors verified bit-exact against the GGUF source after applying:

GGML Fortran-order reversal (reverse_shape=True for all tensors)
Norm convention (subtract 1.0)
A_log convention (log(-A))
V-head inverse reorder (v_per_k=2: 16 K-heads / 32 V-heads)
Expert 3D tensor reshape and gate/up concatenation

MTP and Vision Encoder Restoration

The GGUF file does not contain Multi-Token Prediction (MTP) or vision encoder tensors — these are excluded by the llama.cpp converter that produced it. For a complete, loadable model, the following were copied verbatim from the official Qwen3.6-35B-A3B reference model:

Component	Tensors	Source
Vision encoder (`model.visual.*`)	333	Reference model (bit-exact copy)
MTP layers (`mtp.*`)	4	Reference model (bit-exact copy)
Additional vision/metadata tensors	15	Reference model (bit-exact copy)

All 352 copied tensors verified bit-exact against the reference.

Sanity Check

The recovered model was tested with vLLM (FP8 + TP2 on 2x GPUs):

Model	Harmful Coherence	Benign Coherence	Harmful Refusal
Base MoE (FP8+TP2)	100%	100%	40%
Recovered MoE (FP8+TP2)	100%	100%	0%

The recovered model achieves 100% coherence on both harmful and benign prompts, matching the base model's generation quality. The abliteration is effective: 0% refusal rate (down from the base model's 40%).

Tensor Comparison vs Base Model

Compared against the official Qwen3.6-35B-A3B base to identify abliteration modifications:

Summary

Category	Tensors	Identical to Base	Modified
GGUF-derived	693	307	386
Copied (MTP + vision)	352	352	0
Total	1045	659	386

Unchanged Tensors (identical to base)

These tensors were not modified by abliteration:

Group	Count	Note
`layernorm`	82	Input/post-attention layernorms
`linear_attn.norm`	30	Layer norms for linear attention
`linear_attn.conv1d`	30	Conv1d weights
`linear_attn.dt_bias`	30	Delta-time biases
`linear_attn.A_log`	30	A-log parameters
`self_attn.q_norm` / `k_norm`	22	QK norms for full attention
`router_gate`	41	Expert router gates
`vision`	333	Vision encoder
`mtp`	4	Multi-token prediction layers
`final_norm`	1	Final layer norm

Modified Tensors

Group	Total	Modified	Typical % Changed	Max Abs Diff
`expert_gate_up`	41	40	41–79%	1.8e-02
`expert_down`	41	40	42–85%	6.5e-02
`shared_expert_gate`	41	40	76–93%	2.5e-02
`shared_expert_up`	41	40	38–92%	2.1e-02
`shared_expert_down`	41	40	65–88%	2.4e-02
`shared_expert_gate_scalar`	41	16	89–99%	5.6e-03
`linear_attn.out_proj`	30	30	75–88%	6.5e-02
`linear_attn.in_proj_qkv`	30	26	73–76%	2.3e-03
`linear_attn.in_proj_z`	30	26	75–77%	2.0e-03
`linear_attn.in_proj_a`	30	26	76–78%	9.8e-04
`linear_attn.in_proj_b`	30	26	77–80%	9.8e-04
`self_attn.o_proj`	11	10	75–87%	3.2e-02
`self_attn.q_proj`	11	8	75–76%	1.6e-03
`self_attn.k_proj`	11	8	76–80%	1.2e-03
`self_attn.v_proj`	11	8	77–79%	2.0e-03
`embed_tokens`	1	1	74%	1.1e-03
`lm_head`	1	1	75%	1.1e-03

Key observations:

Expert and shared expert projections show the largest deviations (up to 6.5e-02 max abs diff)
Linear attention out_proj has the highest max abs diff (6.5e-02), consistent with the 27B model pattern
Router gates and normalization layers were left untouched — the abliteration targeted only projection weights
40 of 41 MoE layers have modified expert tensors; the unmodified layer's experts may have been below a threshold
Layer 0's linear attention projections are unmodified, while layers 1+ show modifications (26/30 layers affected)

Output Format

Property	Value
Format	HuggingFace safetensors (17 shards)
Dtype	BF16 (dequantized from Q8_0/F32/F16)
Total Size	67 GB
Tensor Count	1045
Shard Size	~4.1 GB

Usage

Load with HuggingFace transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "./Qwen3.6-35B-A3B-HauhauCS-Q8KP-recovered",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("./Qwen3.6-35B-A3B-HauhauCS-Q8KP-recovered")

For efficient inference with vLLM:

vllm serve ./Qwen3.6-35B-A3B-HauhauCS-Q8KP-recovered --quantization fp8 --tensor-parallel-size 2

See our other tensor comparisons and provenance analyses for HauhauCS models at: DreamFast HauhauCS Safetensor Benchmarks

Quality Notes

This model was recovered from a lossy Q8_0 quantization. While the conversion itself is bit-exact to the GGUF source, the original quantization introduces error on the most affected tensors compared to the original BF16 weights. The abliteration modifications (up to 0.065 max abs diff) are significantly larger than the quantization noise, confirming the abliteration signal is well-preserved.

Benchmarks

Benchmarks and tensor analysis coming soon. See our previous HauhauCS model benchmarks and evaluations at: DreamFast HauhauCS Safetensor Benchmarks

Files

Qwen3.6-35B-A3B-HauhauCS-Q8KP-recovered/
├── config.json
├── generation_config.json
├── tokenizer.json
├── tokenizer_config.json
├── preprocessor_config.json
├── video_preprocessor_config.json
├── chat_template.jinja
├── vocab.json
├── merges.txt
├── model.safetensors.index.json
├── model.safetensors-00001-of-00017.safetensors
├── ...
├── model.safetensors-00017-of-00017.safetensors
└── diff_report.json              # Full tensor-by-tensor comparison