Instructions to use DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark") model = AutoModelForMultimodalLM.from_pretrained("DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark
- SGLang
How to use DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark with Docker Model Runner:
docker model run hf.co/DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark
Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark
Recovered HuggingFace safetensors from the Q8_0 quantized GGUF published by HauhauCS.
Source
| Field | Value |
|---|---|
| Original GGUF | Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Q8_K_P.gguf |
| GGUF Size | 41 GB |
| Quantization | Q8_0 (355 tensors), F32 (301 tensors), F16 (77 tensors) |
| Reference Model | Qwen3.6-35B-A3B (official, BF16) |
| Architecture | Qwen3_5MoeForConditionalGeneration (MoE hybrid Gated DeltaNet + Gated Attention, 256 experts with 8 active per token) |
Recovery Details
Converted from GGUF to HuggingFace safetensors format using ungguf with bit-exact verification.
All 693 GGUF-derived tensors verified bit-exact against the GGUF source after applying:
- GGML Fortran-order reversal (
reverse_shape=Truefor all tensors) - Norm convention (subtract 1.0)
- A_log convention (log(-A))
- V-head inverse reorder (v_per_k=2: 16 K-heads / 32 V-heads)
- Expert 3D tensor reshape and gate/up concatenation
MTP and Vision Encoder Restoration
The GGUF file does not contain Multi-Token Prediction (MTP) or vision encoder tensors — these are excluded by the llama.cpp converter that produced it. For a complete, loadable model, the following were copied verbatim from the official Qwen3.6-35B-A3B reference model:
| Component | Tensors | Source |
|---|---|---|
Vision encoder (model.visual.*) |
333 | Reference model (bit-exact copy) |
MTP layers (mtp.*) |
4 | Reference model (bit-exact copy) |
| Additional vision/metadata tensors | 15 | Reference model (bit-exact copy) |
All 352 copied tensors verified bit-exact against the reference.
Sanity Check
The recovered model was tested with vLLM (FP8 + TP2 on 2x GPUs):
| Model | Harmful Coherence | Benign Coherence | Harmful Refusal |
|---|---|---|---|
| Base MoE (FP8+TP2) | 100% | 100% | 40% |
| Recovered MoE (FP8+TP2) | 100% | 100% | 0% |
The recovered model achieves 100% coherence on both harmful and benign prompts, matching the base model's generation quality. The abliteration is effective: 0% refusal rate (down from the base model's 40%).
Tensor Comparison vs Base Model
Compared against the official Qwen3.6-35B-A3B base to identify abliteration modifications:
Summary
| Category | Tensors | Identical to Base | Modified |
|---|---|---|---|
| GGUF-derived | 693 | 307 | 386 |
| Copied (MTP + vision) | 352 | 352 | 0 |
| Total | 1045 | 659 | 386 |
Unchanged Tensors (identical to base)
These tensors were not modified by abliteration:
| Group | Count | Note |
|---|---|---|
layernorm |
82 | Input/post-attention layernorms |
linear_attn.norm |
30 | Layer norms for linear attention |
linear_attn.conv1d |
30 | Conv1d weights |
linear_attn.dt_bias |
30 | Delta-time biases |
linear_attn.A_log |
30 | A-log parameters |
self_attn.q_norm / k_norm |
22 | QK norms for full attention |
router_gate |
41 | Expert router gates |
vision |
333 | Vision encoder |
mtp |
4 | Multi-token prediction layers |
final_norm |
1 | Final layer norm |
Modified Tensors
| Group | Total | Modified | Typical % Changed | Max Abs Diff |
|---|---|---|---|---|
expert_gate_up |
41 | 40 | 41–79% | 1.8e-02 |
expert_down |
41 | 40 | 42–85% | 6.5e-02 |
shared_expert_gate |
41 | 40 | 76–93% | 2.5e-02 |
shared_expert_up |
41 | 40 | 38–92% | 2.1e-02 |
shared_expert_down |
41 | 40 | 65–88% | 2.4e-02 |
shared_expert_gate_scalar |
41 | 16 | 89–99% | 5.6e-03 |
linear_attn.out_proj |
30 | 30 | 75–88% | 6.5e-02 |
linear_attn.in_proj_qkv |
30 | 26 | 73–76% | 2.3e-03 |
linear_attn.in_proj_z |
30 | 26 | 75–77% | 2.0e-03 |
linear_attn.in_proj_a |
30 | 26 | 76–78% | 9.8e-04 |
linear_attn.in_proj_b |
30 | 26 | 77–80% | 9.8e-04 |
self_attn.o_proj |
11 | 10 | 75–87% | 3.2e-02 |
self_attn.q_proj |
11 | 8 | 75–76% | 1.6e-03 |
self_attn.k_proj |
11 | 8 | 76–80% | 1.2e-03 |
self_attn.v_proj |
11 | 8 | 77–79% | 2.0e-03 |
embed_tokens |
1 | 1 | 74% | 1.1e-03 |
lm_head |
1 | 1 | 75% | 1.1e-03 |
Key observations:
- Expert and shared expert projections show the largest deviations (up to 6.5e-02 max abs diff)
- Linear attention out_proj has the highest max abs diff (6.5e-02), consistent with the 27B model pattern
- Router gates and normalization layers were left untouched — the abliteration targeted only projection weights
- 40 of 41 MoE layers have modified expert tensors; the unmodified layer's experts may have been below a threshold
- Layer 0's linear attention projections are unmodified, while layers 1+ show modifications (26/30 layers affected)
Output Format
| Property | Value |
|---|---|
| Format | HuggingFace safetensors (17 shards) |
| Dtype | BF16 (dequantized from Q8_0/F32/F16) |
| Total Size | 67 GB |
| Tensor Count | 1045 |
| Shard Size | ~4.1 GB |
Usage
Load with HuggingFace transformers:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"./Qwen3.6-35B-A3B-HauhauCS-Q8KP-recovered",
torch_dtype="auto",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("./Qwen3.6-35B-A3B-HauhauCS-Q8KP-recovered")
For efficient inference with vLLM:
vllm serve ./Qwen3.6-35B-A3B-HauhauCS-Q8KP-recovered --quantization fp8 --tensor-parallel-size 2
See our other tensor comparisons and provenance analyses for HauhauCS models at: DreamFast HauhauCS Safetensor Benchmarks
Quality Notes
This model was recovered from a lossy Q8_0 quantization. While the conversion itself is bit-exact to the GGUF source, the original quantization introduces error on the most affected tensors compared to the original BF16 weights. The abliteration modifications (up to 0.065 max abs diff) are significantly larger than the quantization noise, confirming the abliteration signal is well-preserved.
Benchmarks
Benchmarks and tensor analysis coming soon. See our previous HauhauCS model benchmarks and evaluations at: DreamFast HauhauCS Safetensor Benchmarks
Files
Qwen3.6-35B-A3B-HauhauCS-Q8KP-recovered/
├── config.json
├── generation_config.json
├── tokenizer.json
├── tokenizer_config.json
├── preprocessor_config.json
├── video_preprocessor_config.json
├── chat_template.jinja
├── vocab.json
├── merges.txt
├── model.safetensors.index.json
├── model.safetensors-00001-of-00017.safetensors
├── ...
├── model.safetensors-00017-of-00017.safetensors
└── diff_report.json # Full tensor-by-tensor comparison
- Downloads last month
- 208
Model tree for DreamFast/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Safetensor-Benchmark
Base model
Qwen/Qwen3.6-35B-A3B