Text Generation
Transformers
Safetensors
PyTorch
nemotron_h
nvidia
conversational
custom_code
Eval Results

Nano Cache is not passed to attention layers

#65
by jeffwillette - opened

The KV cache is never passed to the attention layers here: https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16/blob/main/modeling_nemotron_h.py#L782

This would affect anyone who runs the model with trust_remote_code=True. The model file that is embedded into transformers appears to be correctly handled. This bug affects some other Nano variant model cards as well, but it seems to not affect other Nemotron 3 variants.

Sign up or log in to comment