Not-For-All-Audiences

Qwen3-30B-A3B-abliterated-erotic-autoround-int4

File size: 1,213 Bytes

---
  license: other
  base_model: Ewere/Qwen3-30B-A3B-abliterated-erotic
  tags:
    - quantized
    - autoround
    - int4
    - qwen3
    - vllm
  quantization:
    method: autoround
    bits: 4
---

# Qwen3-30B-A3B-abliterated-erotic-autoround-int4

4-bit AutoRound quantization of Qwen3-30B-A3B-abliterated-erotic.

## Quantization Details

- Method: AutoRound (SignRound optimization)
- Bits: 4-bit (W4A16 symmetric)
- Group Size: 128
- Calibration: 512 samples from NeelNanda/pile-10k
- Iterations: 200 (light mode)

## Model Size

- Original (FP16): ~60GB
- Quantized (Int4): ~17GB
- Compression: 3.5x

## Usage with vLLM

```python
from vllm import LLM, SamplingParams

llm = LLM(
    model="yourusername/Qwen3-30B-A3B-abliterated-erotic-autoround-int4",
    quantization="auto-round",
    dtype="float16",
    gpu_memory_utilization=0.9,
    max_model_len=16384,
    trust_remote_code=True,
)

sampling_params = SamplingParams(temperature=0.7, max_tokens=512)
outputs = llm.generate(["Your prompt here"], sampling_params)

Hardware Requirements

- VRAM: 18-20GB
- Inference Speed: average 231 tokens/sec on 1x RTX 3090

Base Model

Based on https://huggingface.co/Ewere/Qwen3-30B-A3B-abliterated-erotic