--- language: - en license: other tags: - awq - int4 - quantization - agent - tool-calling - function-calling - vllm - qwen base_model: Qwen/Qwen1.5-4B-Chat --- # 🤖 Qwen3.5 4B AWQ (INT4) - Optimized for Tool Calling
Qwen Logo

**Qwen3.5-4B-awq-int4-optimized** is a highly specialized, quantized version of the Qwen3.5-4B-Chat model. It has been compressed to 4-bit precision using **AWQ (Activation-aware Weight Quantization)** to maximize inference throughput and minimize VRAM usage, making it perfect for edge deployments and high-concurrency serving. --- ## 🚀 Quickstart with vLLM This model is fully optimized for high-performance serving with [vLLM](https://github.com/vllm-project/vllm). ### Installation ```bash pip install vllm ``` ### Serving the Model You can immediately deploy this model as an OpenAI-compatible API server: ```bash python -m vllm.entrypoints.openai.api_server \ --model Faustus-Faber/Qwen3.5-4B-awq-int4-optimized \ --quantization awq \ --dtype auto \ --max-model-len 4096 ``` --- ## 🧠 Model Details * **Base Model:** Qwen3.5-4B-Chat * **Quantization Method:** AWQ (Activation-aware Weight Quantization) * **Precision:** INT4 (Group Size: 128) * **Primary Use Case:** Agentic workflows, tool-calling, and highly structured JSON generation. ## 🛠️ Intended Use This model is designed to act as the reasoning engine for AI agents. The quantization profile exhibits high fidelity in generating structured JSON tool calls and following strict system prompts, even at extreme 4-bit compression. --- ## 📜 License & Citation This model is subject to the Tongyi Qianwen LICENSE agreement.