How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="batsclamp/Huihui-Qwen3.5-35B-A3B-Claude-4.6-Opus-abliterated-FP8")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)
# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("batsclamp/Huihui-Qwen3.5-35B-A3B-Claude-4.6-Opus-abliterated-FP8")
model = AutoModelForMultimodalLM.from_pretrained("batsclamp/Huihui-Qwen3.5-35B-A3B-Claude-4.6-Opus-abliterated-FP8")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))
Quick Links

Huihui-Qwen3.5-35B-A3B-Claude-4.6-Opus-abliterated-FP8

Vision-capable FP8 quantized fast abliterated distilled Qwen3.5-35B model made for Nvidia DGX Spark (~80GB VRAM is needed for full functionality)

Model Lineage

So first it was Qwen/Qwen3.5-35B-A3B (BF16).

Performance

Conservative approach to FP8 quantization caused minimum quality loss, while still bumping the speed from 31 t/s → 51 t/s on DGX Spark. With 262k context and some space for KV cache it uses 80GB VRAM (only).

Currently that's the best, fastest and abliterated model to be used on Nvidia DGX Spark, which also preserves all visual layers untouched.

I failed to find a case where this model will refuse to answer. It is especially funny to use with pictures ;). So far the best "tooling" skills — it really likes to Google stuff first even if it knows the answer.

I plan to test the quality of the model's output later and update this page.

Quantization Details

Quantized using the FP8_DYNAMIC scheme from llmcompressor (>=0.10) with compressed-tensors serialization.

Method

FP8_DYNAMIC is a data-free quantization scheme — no calibration dataset required. Weights are statically quantized to FP8 (per-channel, symmetric), while activations are dynamically quantized to FP8 (per-token, symmetric) at inference time.

Modules Excluded from Quantization

Matching the conservative strategy from Qwen/Qwen3.5-35B-A3B-FP8:

Module Reason
lm_head Output head — precision-sensitive
embed_tokens Embedding layer
linear_attn.conv1d, linear_attn.in_proj_a/b Linear attention layers
mlp.gate, mlp.shared_expert_gate MoE router gates — routing precision matters
model.visual.* Entire visual encoder kept at BF16
mtp.* Multi-token prediction layers

Post-processing

The model was quantized via AutoModelForCausalLM (the only loader proven to work with llmcompressor for this architecture), then post-processed:

  1. Weight key renamingmodel.layers.Xmodel.language_model.layers.X to match the ConditionalGeneration format expected by vLLM
  2. Visual encoder restoration — BF16 vision encoder weights copied from the source model (since AutoModelForCausalLM strips them)
  3. Config restructuringconfig.json rebuilt from the source model's nested structure with the quantization config injected

Resources

Disclaimer

It's an abliterated model. DO NOT use it if you think that all AIs need to be politically correct and boring.

Downloads last month
86
Safetensors
Model size
35B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for batsclamp/Huihui-Qwen3.5-35B-A3B-Claude-4.6-Opus-abliterated-FP8