--- license: apache-2.0 base_model: - Qwen/Qwen3.6-27B tags: - qwen3.6 - gguf - tq3_4s - turboquant - vision - multimodal pipeline_tag: image-text-to-text language: - en - zh - multilingual --- # Qwen3.6-27B-TQ3_4S [![Qwen Chat](https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5)](https://chat.qwen.ai) > [!Note] > This repository contains model weights and configuration files for the post-trained model in GGUF format. > > These artifacts are intended for llama.cpp-style runtimes and other GGUF-compatible inference stacks. Following the February release of the Qwen3.5 series, we're pleased to share a `TQ3_4S` GGUF release of Qwen3.6-27B. Built on the upstream Qwen3.6-27B model and converted through the `unsloth/Qwen3.6-27B-GGUF` path, this release is aimed at strong local inference efficiency while preserving the stability and real-world coding utility of the base model. ## Qwen3.6 Highlights This release delivers substantial upgrades, particularly in - **Agentic Coding:** the model now handles frontend workflows and repository-level reasoning with greater fluency and precision. - **Thinking Preservation:** Qwen introduced the option to retain reasoning context from historical messages, reducing overhead during iterative work. ![Benchmark Results](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3.6/Figures/qwen3.6_27b_score.png) For the full upstream write-up, see the Qwen blog post: [Qwen3.6-27B](https://qwen.ai/blog?id=qwen3.6-27b). ## Model Overview - Type: Causal Language Model with Vision Encoder - Training Stage: Pre-training and Post-training - Architecture: `qwen35` - Parameters: `27B` - Layers: `64` - Embedding dimension: `5120` - FFN dimension: `17408` - Hidden layout: `16 × (3 × (Gated DeltaNet -> FFN) -> 1 × (Gated Attention -> FFN))` - Gated DeltaNet heads: `48` for `V`, `16` for `QK`, head dim `128` - Gated Attention heads: `24` for `Q`, `4` for `KV`, head dim `256` - RoPE dim: `64` - Native context: `262,144` ## Benchmark Results For the full upstream benchmark tables, refer to the official Qwen model card: - [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) Selected upstream headline results for the base model: - `SWE-bench Verified`: `77.2` - `Terminal-Bench 2.0`: `59.3` - `SkillsBench Avg5`: `48.2` - `GPQA Diamond`: `87.8` - `AIME26`: `94.1` - `MMMU`: `82.9` - `AndroidWorld`: `70.3` These are upstream base-model results, not local GGUF quant results. ## TQ3_4S Release This repository packages the model as a TurboQuant `TQ3_4S` GGUF for local deployment. ## Files | File | Quant | Size | | --- | --- | ---: | | `Qwen3.6-27B-TQ3_4S.gguf` | TQ3_4S | ~13.0 GB | | `chat_template.jinja` | chat template | text | | `thumbnail.png` | model card image | png | ## Local Validation Hardware: - RTX 5060 Ti 16 GB Prompt processing: - `llama-perplexity --chunks 10 -c 2048` - `PPL = 6.2452 +/- 0.16138` - `prompt eval = 712.02 tok/s` ## Runtime Notes - Use a TurboQuant-capable llama.cpp build for best performance. - The upstream family is multimodal-capable, but the public 27B repos used here do not currently expose a separate GGUF `mmproj` artifact. - For llama.cpp chat usage, keep `--jinja` enabled so the bundled chat template is honored. - Upstream guidance recommends keeping at least `128K` context when possible for reasoning-heavy workloads. On smaller local GPUs, reduce context as needed to fit memory. - Upstream default sampling guidance differs between thinking and non-thinking mode; follow the official Qwen card if you are trying to reproduce base-model behavior. ## Example ```bash llama-cli \ -m Qwen3.6-27B-TQ3_4S.gguf \ --jinja \ -ngl 99 \ -c 4096 ``` ## Sources - Upstream base model: [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) - Upstream GGUF source used for conversion: [unsloth/Qwen3.6-27B-GGUF](https://huggingface.co/unsloth/Qwen3.6-27B-GGUF) - Upstream blog and benchmark context: [Qwen3.6-27B model card](https://huggingface.co/Qwen/Qwen3.6-27B)