---
license: apache-2.0
language:
- en
library_name: gguf
tags:
- gguf
- qwen3.6
- qwen
- nvfp4
- blackwell
- fp4
- mixture-of-experts
- moe
- multimodal
- vision
base_model: Qwen/Qwen3.6-35B-A3B
pipeline_tag: image-text-to-text
inference: false
quantized_by: FreedomAISVR
---

# Qwen3.6-35B-A3B-NVFP4-GGUF

NVFP4 GGUF quantization of [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B).

Multimodal model: vision encoder (903 MB) + text MoE LLM (18.36 GB).

## About NVFP4

NVFP4 is a Blackwell-native FP4 format (EFM4: 1 sign, 1 mantissa, 2 exponent bits). Applied uniformly to all tensors.

- **Total params:** 35.95B (3B active, 256 experts, 8/token)
- **Quantization:** NVFP4 (~4.55 BPW)
- **File size:** 18.36 GiB (text) + 903 MB (vision)
- **Vision encoder:** 27-layer ViT, hidden 1152, 3->1152x2 patch embed (temporal)
- **Context:** 262,144 tokens natively

NVFP4 requires a Blackwell (RTX 50-series or B-series) GPU for hardware acceleration.

## Files

| Filename | Type | Size | Description |
|---|---|---|---|
| `qwen3.6-35b-a3b-nvfp4.gguf` | NVFP4 | 18.36 GiB | Text MoE LLM weights |
| `mmproj-qwen36-35b-src-BF16.gguf` | MMProj | 903 MB | Vision encoder weights |

## Usage

### llama.cpp CLI (text only)

```bash
llama-cli -hf FreedomAISVR/Qwen3.6-35B-A3B-NVFP4-GGUF -cnv -p "You are a helpful assistant"
```

### llama-server (multimodal)

```bash
llama-server -hf FreedomAISVR/Qwen3.6-35B-A3B-NVFP4-GGUF --mmproj mmproj-qwen36-35b-src-BF16.gguf --ctx-size 0 --jinja
```

### llama-cpp-python

```python
from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="FreedomAISVR/Qwen3.6-35B-A3B-NVFP4-GGUF",
    filename="qwen3.6-35b-a3b-nvfp4.gguf",
)
```

## Quantization Pipeline

```bash
# 1. Convert HF model to intermediate GGUF
python convert_hf_to_gguf.py ./models/qwen3.6-35b/ --outfile qwen3.6-35b-a3b-f16.gguf --outtype bf16

# 2. Export vision encoder
python convert_hf_to_gguf.py ./models/qwen3.6-35b/ --mmproj --outtype bf16

# 3. Quantize to NVFP4
llama-quantize --allow-requantize qwen3.6-35b-a3b-f16.gguf qwen3.6-35b-a3b-nvfp4.gguf NVFP4
```

## Hardware

| GPU | VRAM | Notes |
|---|---|---|
| NVIDIA RTX 5060 Ti | 16 GB | Quantization performed on this GPU |

## License

Apache-2.0 (same as [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B))