Not-For-All-Audiences

Qwen3-30B-A3B-abliterated-erotic-autoround-int4

4-bit AutoRound quantization of Qwen3-30B-A3B-abliterated-erotic.

Quantization Details

Method: AutoRound (SignRound optimization)
Bits: 4-bit (W4A16 symmetric)
Group Size: 128
Calibration: 512 samples from NeelNanda/pile-10k
Iterations: 200 (light mode)

Model Size

Original (FP16): ~60GB
Quantized (Int4): ~17GB
Compression: 3.5x

Usage with vLLM

from vllm import LLM, SamplingParams

llm = LLM(
    model="yourusername/Qwen3-30B-A3B-abliterated-erotic-autoround-int4",
    quantization="auto-round",
    dtype="float16",
    gpu_memory_utilization=0.9,
    max_model_len=16384,
    trust_remote_code=True,
)

sampling_params = SamplingParams(temperature=0.7, max_tokens=512)
outputs = llm.generate(["Your prompt here"], sampling_params)

Hardware Requirements

- VRAM: 18-20GB
- Inference Speed: average 231 tokens/sec on 1x RTX 3090

Base Model

Based on https://huggingface.co/Ewere/Qwen3-30B-A3B-abliterated-erotic

Downloads last month: 9

Safetensors

Model size

0.6B params

Tensor type

I32

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zandzpider/Qwen3-30B-A3B-abliterated-erotic-autoround-int4

Base model

Qwen/Qwen3-30B-A3B-Base

Finetuned

Qwen/Qwen3-30B-A3B

Finetuned

Ewere/Qwen3-30B-A3B-abliterated-erotic

Quantized

(7)

this model