File size: 1,213 Bytes
481e49d
 
 
 
 
 
 
 
 
 
 
 
 
 
82951a3
481e49d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82951a3
481e49d
 
 
 
 
 
 
 
 
 
 
 
5f03459
 
481e49d
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
  license: other
  base_model: Ewere/Qwen3-30B-A3B-abliterated-erotic
  tags:
    - quantized
    - autoround
    - int4
    - qwen3
    - vllm
  quantization:
    method: autoround
    bits: 4
---

# Qwen3-30B-A3B-abliterated-erotic-autoround-int4

4-bit AutoRound quantization of Qwen3-30B-A3B-abliterated-erotic.

## Quantization Details

- Method: AutoRound (SignRound optimization)
- Bits: 4-bit (W4A16 symmetric)
- Group Size: 128
- Calibration: 512 samples from NeelNanda/pile-10k
- Iterations: 200 (light mode)

## Model Size

- Original (FP16): ~60GB
- Quantized (Int4): ~17GB
- Compression: 3.5x

## Usage with vLLM

```python
from vllm import LLM, SamplingParams

llm = LLM(
    model="yourusername/Qwen3-30B-A3B-abliterated-erotic-autoround-int4",
    quantization="auto-round",
    dtype="float16",
    gpu_memory_utilization=0.9,
    max_model_len=16384,
    trust_remote_code=True,
)

sampling_params = SamplingParams(temperature=0.7, max_tokens=512)
outputs = llm.generate(["Your prompt here"], sampling_params)

Hardware Requirements

- VRAM: 18-20GB
- Inference Speed: average 231 tokens/sec on 1x RTX 3090

Base Model

Based on https://huggingface.co/Ewere/Qwen3-30B-A3B-abliterated-erotic