--- license: apache-2.0 base_model: Qwen/Qwen3.6-35B-A3B-FP8 pipeline_tag: image-text-to-text library_name: transformers tags: - qwen - qwen-3.6 - moe - fp8 - reference-card --- # Qwen3.6-35B-A3B-FP8 ## Summary Reference wrapper around [Qwen/Qwen3.6-35B-A3B-FP8](https://huggingface.co/Qwen/Qwen3.6-35B-A3B-FP8) — the official FP8 release. **This repository carries no weights**; it exists only to anchor the FP8 variant inside the `majentik/*` family navigation. ## Why this variant Pick this for Hopper / Ada / Blackwell GPUs where FP8 is natively supported and you want the closest-to-bf16 fidelity with ~50% memory savings. For additional compression pick one of the 4-bit variants below. ## Hardware compatibility | Device | VRAM | Recommendation | | --- | --- | --- | | H100 / H200 | 80–141 GB | native | | RTX 4090 | 24 GB | does not fit full precision — use 4-bit | | RTX 5090 | 32 GB | native | ## Reproduce ```bash # No re-quantization needed — use the upstream weights directly. huggingface-cli download Qwen/Qwen3.6-35B-A3B-FP8 ``` ## Evaluation _Benchmarks pending — populated after the eval-harness workstream lands._ ## Family - **bf16** — [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) - **FP8 (this)** — [Qwen/Qwen3.6-35B-A3B-FP8](https://huggingface.co/Qwen/Qwen3.6-35B-A3B-FP8) - **RotorQuant family** — [majentik/Qwen3.6-35B-A3B-RotorQuant](https://huggingface.co/majentik/Qwen3.6-35B-A3B-RotorQuant) - **TurboQuant family** — [majentik/Qwen3.6-35B-A3B-TurboQuant](https://huggingface.co/majentik/Qwen3.6-35B-A3B-TurboQuant) ## Provenance Card-only. No weights stored. ## License Released under `apache-2.0`. Upstream license of the base model applies.