---
language:
- en
license: other
tags:
- awq
- int4
- quantization
- agent
- tool-calling
- function-calling
- vllm
- qwen
base_model: Qwen/Qwen1.5-4B-Chat
---

# 🤖 Qwen3.5 4B AWQ (INT4) - Optimized for Tool Calling

<div align="center">
  <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/69/Qwen_logo.svg/3840px-Qwen_logo.svg.png" width="200" alt="Qwen Logo" />
</div>

<br>

**Qwen3.5-4B-awq-int4-optimized** is a highly specialized, quantized version of the Qwen3.5-4B-Chat model. It has been compressed to 4-bit precision using **AWQ (Activation-aware Weight Quantization)** to maximize inference throughput and minimize VRAM usage, making it perfect for edge deployments and high-concurrency serving.

---

## 🚀 Quickstart with vLLM

This model is fully optimized for high-performance serving with [vLLM](https://github.com/vllm-project/vllm). 

### Installation
```bash
pip install vllm
```

### Serving the Model
You can immediately deploy this model as an OpenAI-compatible API server:

```bash
python -m vllm.entrypoints.openai.api_server \
    --model Faustus-Faber/Qwen3.5-4B-awq-int4-optimized \
    --quantization awq \
    --dtype auto \
    --max-model-len 4096
```

---

## 🧠 Model Details

* **Base Model:** Qwen3.5-4B-Chat
* **Quantization Method:** AWQ (Activation-aware Weight Quantization)
* **Precision:** INT4 (Group Size: 128)
* **Primary Use Case:** Agentic workflows, tool-calling, and highly structured JSON generation.

## 🛠️ Intended Use

This model is designed to act as the reasoning engine for AI agents. The quantization profile exhibits high fidelity in generating structured JSON tool calls and following strict system prompts, even at extreme 4-bit compression.

---

## 📜 License & Citation

This model is subject to the Tongyi Qianwen LICENSE agreement.