Qwen3-30B-A3B-abliterated-erotic-autoround-int4
4-bit AutoRound quantization of Qwen3-30B-A3B-abliterated-erotic.
Quantization Details
- Method: AutoRound (SignRound optimization)
- Bits: 4-bit (W4A16 symmetric)
- Group Size: 128
- Calibration: 512 samples from NeelNanda/pile-10k
- Iterations: 200 (light mode)
Model Size
- Original (FP16): ~60GB
- Quantized (Int4): ~17GB
- Compression: 3.5x
Usage with vLLM
from vllm import LLM, SamplingParams
llm = LLM(
model="yourusername/Qwen3-30B-A3B-abliterated-erotic-autoround-int4",
quantization="auto-round",
dtype="float16",
gpu_memory_utilization=0.9,
max_model_len=16384,
trust_remote_code=True,
)
sampling_params = SamplingParams(temperature=0.7, max_tokens=512)
outputs = llm.generate(["Your prompt here"], sampling_params)
Hardware Requirements
- VRAM: 18-20GB
- Inference Speed: average 231 tokens/sec on 1x RTX 3090
Base Model
Based on https://huggingface.co/Ewere/Qwen3-30B-A3B-abliterated-erotic
- Downloads last month
- 9
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for zandzpider/Qwen3-30B-A3B-abliterated-erotic-autoround-int4
Base model
Qwen/Qwen3-30B-A3B-Base Finetuned
Qwen/Qwen3-30B-A3B Finetuned
Ewere/Qwen3-30B-A3B-abliterated-erotic