--- base_model: nvidia/Nemotron-Cascade-2-30B-A3B tags: - gguf - nemotron_h - nemotron-cascade-2 - nvidia - quantized license: other license_name: nvidia-open-model-license license_link: https://huggingface.co/nvidia/Nemotron-Cascade-2-30B-A3B/blob/main/LICENSE --- # Nemotron-Cascade-2-30B-A3B — Q5_1 GGUF GGUF quantization of [nvidia/Nemotron-Cascade-2-30B-A3B](https://huggingface.co/nvidia/Nemotron-Cascade-2-30B-A3B). - **Architecture**: Hybrid Attention + Mamba (SSM) + MoE — 30B total parameters, 3B active - **Quantization**: Q5_1 (uniform 5-bit with delta and min per block) ## Quantization commands ```bash # Convert HF model to GGUF (bf16) python llama.cpp/convert_hf_to_gguf.py \ nvidia/Nemotron-Cascade-2-30B-A3B \ --outfile nemotron-cascade-30b-bf16.gguf \ --outtype bf16 # Quantize to Q5_1 llama-quantize nemotron-cascade-30b-bf16.gguf \ nemotron-cascade-30b-Q5_1.gguf Q5_1 ``` ## Usage Load in [LM Studio](https://lmstudio.ai/), [llama.cpp](https://github.com/ggml-org/llama.cpp), or any GGUF-compatible runtime.