File size: 1,213 Bytes
481e49d 82951a3 481e49d 82951a3 481e49d 5f03459 481e49d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | ---
license: other
base_model: Ewere/Qwen3-30B-A3B-abliterated-erotic
tags:
- quantized
- autoround
- int4
- qwen3
- vllm
quantization:
method: autoround
bits: 4
---
# Qwen3-30B-A3B-abliterated-erotic-autoround-int4
4-bit AutoRound quantization of Qwen3-30B-A3B-abliterated-erotic.
## Quantization Details
- Method: AutoRound (SignRound optimization)
- Bits: 4-bit (W4A16 symmetric)
- Group Size: 128
- Calibration: 512 samples from NeelNanda/pile-10k
- Iterations: 200 (light mode)
## Model Size
- Original (FP16): ~60GB
- Quantized (Int4): ~17GB
- Compression: 3.5x
## Usage with vLLM
```python
from vllm import LLM, SamplingParams
llm = LLM(
model="yourusername/Qwen3-30B-A3B-abliterated-erotic-autoround-int4",
quantization="auto-round",
dtype="float16",
gpu_memory_utilization=0.9,
max_model_len=16384,
trust_remote_code=True,
)
sampling_params = SamplingParams(temperature=0.7, max_tokens=512)
outputs = llm.generate(["Your prompt here"], sampling_params)
Hardware Requirements
- VRAM: 18-20GB
- Inference Speed: average 231 tokens/sec on 1x RTX 3090
Base Model
Based on https://huggingface.co/Ewere/Qwen3-30B-A3B-abliterated-erotic
|