---
base_model: nvidia/Nemotron-Cascade-2-30B-A3B
tags:
- gguf
- nemotron_h
- nemotron-cascade-2
- nvidia
- quantized
license: other
license_name: nvidia-open-model-license
license_link: https://huggingface.co/nvidia/Nemotron-Cascade-2-30B-A3B/blob/main/LICENSE
---

# Nemotron-Cascade-2-30B-A3B — Q5_1 GGUF

GGUF quantization of [nvidia/Nemotron-Cascade-2-30B-A3B](https://huggingface.co/nvidia/Nemotron-Cascade-2-30B-A3B).

- **Architecture**: Hybrid Attention + Mamba (SSM) + MoE — 30B total parameters, 3B active
- **Quantization**: Q5_1 (uniform 5-bit with delta and min per block)

## Quantization commands

```bash
# Convert HF model to GGUF (bf16)
python llama.cpp/convert_hf_to_gguf.py \
  nvidia/Nemotron-Cascade-2-30B-A3B \
  --outfile nemotron-cascade-30b-bf16.gguf \
  --outtype bf16

# Quantize to Q5_1
llama-quantize nemotron-cascade-30b-bf16.gguf \
  nemotron-cascade-30b-Q5_1.gguf Q5_1
```

## Usage

Load in [LM Studio](https://lmstudio.ai/), [llama.cpp](https://github.com/ggml-org/llama.cpp), or any GGUF-compatible runtime.