NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 MLX 4-bit

This repository contains an MLX-LM conversion of nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16.

Conversion Details

  • Original model: nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
  • Model family: Nemotron 3
  • Source model type: nemotron_h
  • Model size: 31,577,937,344 parameters
  • Quantization: MLX-LM affine quantization
  • Bits: 4-bit
  • Group size: 64
  • Local MLX folder size at upload time: 16.57 GiB
  • Local safetensors weight size at upload time: 16.55 GiB

Usage

mlx_lm.generate --model DreamFoundries/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16-4bit --prompt "Hello" --max-tokens 64

Benchmarks

No comparative benchmarks have been run yet. The repository does not currently provide quality, speed, memory, or benchmark comparisons against the original weights or other quantizations.

License

This is a converted/quantized derivative of the original model. Please refer to the original model repository for the upstream license and usage terms: https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16

Downloads last month
51
Safetensors
Model size
32B params
Tensor type
BF16
·
U32
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DreamFoundries/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16-4bit

Quantized
(55)
this model

Collection including DreamFoundries/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16-4bit