--- license: apache-2.0 language: - en library_name: gguf tags: - gguf - qwen3.6 - qwen - nvfp4 - blackwell - fp4 - mixture-of-experts - moe - multimodal - vision base_model: Qwen/Qwen3.6-35B-A3B pipeline_tag: image-text-to-text inference: false quantized_by: FreedomAISVR --- # Qwen3.6-35B-A3B-NVFP4-GGUF NVFP4 GGUF quantization of [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B). Multimodal model: vision encoder (903 MB) + text MoE LLM (18.36 GB). ## About NVFP4 NVFP4 is a Blackwell-native FP4 format (EFM4: 1 sign, 1 mantissa, 2 exponent bits). Applied uniformly to all tensors. - **Total params:** 35.95B (3B active, 256 experts, 8/token) - **Quantization:** NVFP4 (~4.55 BPW) - **File size:** 18.36 GiB (text) + 903 MB (vision) - **Vision encoder:** 27-layer ViT, hidden 1152, 3->1152x2 patch embed (temporal) - **Context:** 262,144 tokens natively NVFP4 requires a Blackwell (RTX 50-series or B-series) GPU for hardware acceleration. ## Files | Filename | Type | Size | Description | |---|---|---|---| | `qwen3.6-35b-a3b-nvfp4.gguf` | NVFP4 | 18.36 GiB | Text MoE LLM weights | | `mmproj-qwen36-35b-src-BF16.gguf` | MMProj | 903 MB | Vision encoder weights | ## Usage ### llama.cpp CLI (text only) ```bash llama-cli -hf FreedomAISVR/Qwen3.6-35B-A3B-NVFP4-GGUF -cnv -p "You are a helpful assistant" ``` ### llama-server (multimodal) ```bash llama-server -hf FreedomAISVR/Qwen3.6-35B-A3B-NVFP4-GGUF --mmproj mmproj-qwen36-35b-src-BF16.gguf --ctx-size 0 --jinja ``` ### llama-cpp-python ```python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="FreedomAISVR/Qwen3.6-35B-A3B-NVFP4-GGUF", filename="qwen3.6-35b-a3b-nvfp4.gguf", ) ``` ## Quantization Pipeline ```bash # 1. Convert HF model to intermediate GGUF python convert_hf_to_gguf.py ./models/qwen3.6-35b/ --outfile qwen3.6-35b-a3b-f16.gguf --outtype bf16 # 2. Export vision encoder python convert_hf_to_gguf.py ./models/qwen3.6-35b/ --mmproj --outtype bf16 # 3. Quantize to NVFP4 llama-quantize --allow-requantize qwen3.6-35b-a3b-f16.gguf qwen3.6-35b-a3b-nvfp4.gguf NVFP4 ``` ## Hardware | GPU | VRAM | Notes | |---|---|---| | NVIDIA RTX 5060 Ti | 16 GB | Quantization performed on this GPU | ## License Apache-2.0 (same as [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B))