Configuration Parsing Warning:Invalid JSON for config file config.json

Nemotron TwoTower NVFP4 for Atlas

This repository contains an Atlas-compatible working NVFP4 quantization of nvidia/Nemotron-Labs-TwoTower-30B-A3B-Base-BF16.

The checkpoint was prepared from a local ModelOpt NVFP4 export of NemotronHTwoTowerForCausalLM and repaired for Atlas causal inference. The repaired payload is intended for the OpenAI-compatible Atlas inference API using the context tower.

What was repaired

The original local ModelOpt NVFP4 export had defective routed expert scale tensors in the context tower, which caused incoherent output when loaded by Atlas. The context-tower routed expert matrices were re-quantized from the BF16 source weights and written back into the NVFP4 safetensors layout.

Repair scope:

  • Tower: context_tower
  • Layers: 23 MoE layers
  • Experts: 128 routed experts per MoE layer
  • Matrices: up_proj and down_proj
  • Total repaired matrices: 5,888
  • Total replaced tensor payloads: 23,552

The denoiser tower was not repaired in this checkpoint. Atlas causal/OpenAI-compatible inference uses the context tower.

Atlas usage

Example:

ATLAS_TARGET_MODEL=nemotron-3-nano-30b-a3b \
ATLAS_TARGET_QUANT=nvfp4 \
CUDARC_CUDA_VERSION=12000 \
./target/debug/spark serve \
  --model-from-path /path/to/nemotron-twotower-nvfp4 \
  --port 8891 \
  --max-seq-len 4096 \
  --max-num-seqs 1 \
  --max-batch-size 1 \
  --gpu-memory-utilization 0.70 \
  --kv-cache-dtype bf16 \
  --lm-head-dtype bf16

Verified English completion prompts with Atlas included:

  • The capital of France is -> coherent answer mentioning Paris.
  • Question: What is 2 + 2? Answer: -> 4.
  • Write one concise sentence about the Moon: -> coherent factual sentence.

Notes

This is a derived quantized checkpoint. Use is governed by the NVIDIA Nemotron Open Model License Agreement linked in the metadata above.

Downloads last month
-
Safetensors
Model size
34B params
Tensor type
F32
BF16
F8_E4M3
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for syscall42/nemotron-twotower-nvfp4

Quantized
(7)
this model