Qwen3-30B-A3B-abliterated-erotic-autoround-int4

4-bit AutoRound quantization of Qwen3-30B-A3B-abliterated-erotic.

Quantization Details

  • Method: AutoRound (SignRound optimization)
  • Bits: 4-bit (W4A16 symmetric)
  • Group Size: 128
  • Calibration: 512 samples from NeelNanda/pile-10k
  • Iterations: 200 (light mode)

Model Size

  • Original (FP16): ~60GB
  • Quantized (Int4): ~17GB
  • Compression: 3.5x

Usage with vLLM

from vllm import LLM, SamplingParams

llm = LLM(
    model="yourusername/Qwen3-30B-A3B-abliterated-erotic-autoround-int4",
    quantization="auto-round",
    dtype="float16",
    gpu_memory_utilization=0.9,
    max_model_len=16384,
    trust_remote_code=True,
)

sampling_params = SamplingParams(temperature=0.7, max_tokens=512)
outputs = llm.generate(["Your prompt here"], sampling_params)

Hardware Requirements

- VRAM: 18-20GB
- Inference Speed: average 231 tokens/sec on 1x RTX 3090

Base Model

Based on https://huggingface.co/Ewere/Qwen3-30B-A3B-abliterated-erotic
Downloads last month
9
Safetensors
Model size
0.6B params
Tensor type
I32
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zandzpider/Qwen3-30B-A3B-abliterated-erotic-autoround-int4

Quantized
(7)
this model