---
library_name: transformers
base_model: nvidia/Nemotron-Labs-TwoTower-30B-A3B-Base-BF16
base_model_relation: quantized
license: other
license_name: nvidia-open-model-license
license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-nemotron-open-model-license/
pipeline_tag: text-generation
language:
  - en
tags:
  - nvidia
  - nemotron
  - two-tower
  - nvfp4
  - modelopt
  - atlas
  - text-generation
---

# Nemotron TwoTower NVFP4 for Atlas

This repository contains an Atlas-compatible working NVFP4 quantization of [`nvidia/Nemotron-Labs-TwoTower-30B-A3B-Base-BF16`](https://huggingface.co/nvidia/Nemotron-Labs-TwoTower-30B-A3B-Base-BF16).

The checkpoint was prepared from a local ModelOpt NVFP4 export of `NemotronHTwoTowerForCausalLM` and repaired for Atlas causal inference. The repaired payload is intended for the OpenAI-compatible Atlas inference API using the context tower.

## What was repaired

The original local ModelOpt NVFP4 export had defective routed expert scale tensors in the context tower, which caused incoherent output when loaded by Atlas. The context-tower routed expert matrices were re-quantized from the BF16 source weights and written back into the NVFP4 safetensors layout.

Repair scope:

- Tower: `context_tower`
- Layers: 23 MoE layers
- Experts: 128 routed experts per MoE layer
- Matrices: `up_proj` and `down_proj`
- Total repaired matrices: 5,888
- Total replaced tensor payloads: 23,552

The denoiser tower was not repaired in this checkpoint. Atlas causal/OpenAI-compatible inference uses the context tower.

## Atlas usage

Example:

```bash
ATLAS_TARGET_MODEL=nemotron-3-nano-30b-a3b \
ATLAS_TARGET_QUANT=nvfp4 \
CUDARC_CUDA_VERSION=12000 \
./target/debug/spark serve \
  --model-from-path /path/to/nemotron-twotower-nvfp4 \
  --port 8891 \
  --max-seq-len 4096 \
  --max-num-seqs 1 \
  --max-batch-size 1 \
  --gpu-memory-utilization 0.70 \
  --kv-cache-dtype bf16 \
  --lm-head-dtype bf16
```

Verified English completion prompts with Atlas included:

- `The capital of France is` -> coherent answer mentioning Paris.
- `Question: What is 2 + 2? Answer:` -> `4`.
- `Write one concise sentence about the Moon:` -> coherent factual sentence.

## Notes

This is a derived quantized checkpoint. Use is governed by the NVIDIA Nemotron Open Model License Agreement linked in the metadata above.