Text Generation
Transformers
GGUF
PyTorch
English
nvidia
conversational

How to run

#1
by kristianpaul - opened

Due the lack of information i figured to run it like this unless there are other ways?

./llama.cpp/llama-server     -hf nvidia/NVIDIA-Nemotron-3-Nano-4B-GGUF:Q4_K_M     --ctx-size 200000     --temp 0.6 --top-p 0.95 --port 8081  --jinja --reasoning-format auto

Sign up or log in to comment