--- license: other base_model: Ewere/Qwen3-30B-A3B-abliterated-erotic tags: - quantized - autoround - int4 - qwen3 - vllm quantization: method: autoround bits: 4 --- # Qwen3-30B-A3B-abliterated-erotic-autoround-int4 4-bit AutoRound quantization of Qwen3-30B-A3B-abliterated-erotic. ## Quantization Details - Method: AutoRound (SignRound optimization) - Bits: 4-bit (W4A16 symmetric) - Group Size: 128 - Calibration: 512 samples from NeelNanda/pile-10k - Iterations: 200 (light mode) ## Model Size - Original (FP16): ~60GB - Quantized (Int4): ~17GB - Compression: 3.5x ## Usage with vLLM ```python from vllm import LLM, SamplingParams llm = LLM( model="yourusername/Qwen3-30B-A3B-abliterated-erotic-autoround-int4", quantization="auto-round", dtype="float16", gpu_memory_utilization=0.9, max_model_len=16384, trust_remote_code=True, ) sampling_params = SamplingParams(temperature=0.7, max_tokens=512) outputs = llm.generate(["Your prompt here"], sampling_params) Hardware Requirements - VRAM: 18-20GB - Inference Speed: average 231 tokens/sec on 1x RTX 3090 Base Model Based on https://huggingface.co/Ewere/Qwen3-30B-A3B-abliterated-erotic