--- library_name: transformers base_model: nvidia/Nemotron-Labs-TwoTower-30B-A3B-Base-BF16 base_model_relation: quantized license: other license_name: nvidia-open-model-license license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-nemotron-open-model-license/ pipeline_tag: text-generation language: - en tags: - nvidia - nemotron - two-tower - nvfp4 - modelopt - atlas - text-generation --- # Nemotron TwoTower NVFP4 for Atlas This repository contains an Atlas-compatible working NVFP4 quantization of [`nvidia/Nemotron-Labs-TwoTower-30B-A3B-Base-BF16`](https://huggingface.co/nvidia/Nemotron-Labs-TwoTower-30B-A3B-Base-BF16). The checkpoint was prepared from a local ModelOpt NVFP4 export of `NemotronHTwoTowerForCausalLM` and repaired for Atlas causal inference. The repaired payload is intended for the OpenAI-compatible Atlas inference API using the context tower. ## What was repaired The original local ModelOpt NVFP4 export had defective routed expert scale tensors in the context tower, which caused incoherent output when loaded by Atlas. The context-tower routed expert matrices were re-quantized from the BF16 source weights and written back into the NVFP4 safetensors layout. Repair scope: - Tower: `context_tower` - Layers: 23 MoE layers - Experts: 128 routed experts per MoE layer - Matrices: `up_proj` and `down_proj` - Total repaired matrices: 5,888 - Total replaced tensor payloads: 23,552 The denoiser tower was not repaired in this checkpoint. Atlas causal/OpenAI-compatible inference uses the context tower. ## Atlas usage Example: ```bash ATLAS_TARGET_MODEL=nemotron-3-nano-30b-a3b \ ATLAS_TARGET_QUANT=nvfp4 \ CUDARC_CUDA_VERSION=12000 \ ./target/debug/spark serve \ --model-from-path /path/to/nemotron-twotower-nvfp4 \ --port 8891 \ --max-seq-len 4096 \ --max-num-seqs 1 \ --max-batch-size 1 \ --gpu-memory-utilization 0.70 \ --kv-cache-dtype bf16 \ --lm-head-dtype bf16 ``` Verified English completion prompts with Atlas included: - `The capital of France is` -> coherent answer mentioning Paris. - `Question: What is 2 + 2? Answer:` -> `4`. - `Write one concise sentence about the Moon:` -> coherent factual sentence. ## Notes This is a derived quantized checkpoint. Use is governed by the NVIDIA Nemotron Open Model License Agreement linked in the metadata above.