Instructions to use HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive", filename="Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-IQ2_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M # Run inference directly in the terminal: llama-cli -hf HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M # Run inference directly in the terminal: llama-cli -hf HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M
Use Docker
docker model run hf.co/HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive with Ollama:
ollama run hf.co/HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M
- Unsloth Studio
How to use HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive to start chatting
- Pi
How to use HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive with Docker Model Runner:
docker model run hf.co/HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M
- Lemonade
How to use HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M
Run and chat with the model
lemonade run user.Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-Q4_K_M
List all available models
lemonade list
Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive
Join the Discord for updates, roadmaps, projects, or just to chat.
NVIDIA Nemotron-3 Nano 4B uncensored by HauhauCS. 0/465 refusals.
HuggingFace's "Hardware Compatibility" widget doesn't recognize K_P quants — it may show fewer files than actually exist. Click "View +X variants" or go to Files and versions to see all available downloads.
About
No changes to datasets or capabilities. Fully functional, 100% of what the original authors intended - just without the refusals.
These are meant to be the best lossless uncensored models out there.
This release has NVIDIA's GenRM (generative reward model) fully removed. GenRM acts as an internal critic that scores and filters the model's own outputs — effectively a second layer of censorship on top of the base refusals. Removing it gives you the raw model output without any self-censoring.
For a comparison build with GenRM still active, see Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-GenRM (IQ2_M only, for side-by-side testing).
Aggressive Variant
Stronger uncensoring — model is fully unlocked and won't refuse prompts. May occasionally append short disclaimers (baked into base model training, not refusals) but full content is always generated.
For a more conservative uncensor that keeps some safety guardrails, check the Balanced variant when it's available.
Downloads
| File | Quant | BPW | Size |
|---|---|---|---|
| Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-Q8_K_P.gguf | Q8_K_P | 9.4 | 4.4 GB |
| — | Q8_0 | 8.5 | — |
| Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-Q6_K_P.gguf | Q6_K_P | 7.0 | 3.8 GB |
| — | Q6_K | 6.6 | — |
| Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-Q5_K_P.gguf | Q5_K_P | 6.1 | 3.1 GB |
| Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-Q5_K_M.gguf | Q5_K_M | 5.7 | 2.9 GB |
| Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-Q4_K_P.gguf | Q4_K_P | 5.2 | 2.9 GB |
| Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf | Q4_K_M | 4.8 | 2.7 GB |
| Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-IQ4_XS.gguf | IQ4_XS | 4.3 | 2.3 GB |
| Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-Q3_K_P.gguf | Q3_K_P | 4.1 | 2.4 GB |
| Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-Q3_K_M.gguf | Q3_K_M | 3.9 | 2.3 GB |
| Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-IQ3_M.gguf | IQ3_M | 3.7 | 2.2 GB |
| Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-Q2_K_P.gguf | Q2_K_P | 3.5 | 2.2 GB |
| Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-IQ2_M.gguf | IQ2_M | 2.7 | 2.1 GB |
All quants generated with importance matrix (imatrix) for optimal quality preservation on abliterated weights.
What are K_P quants?
K_P ("Perfect") quants are HauhauCS custom quantizations that use model-specific analysis to selectively preserve quality where it matters most. Each model gets its own optimized quantization profile.
A K_P quant effectively bumps quality up by 1-2 quant levels at only ~5-15% larger file size than the base quant. Fully compatible with llama.cpp, LM Studio, and any GGUF-compatible runtime — no special builds needed.
Note: K_P quants may show as "?" in LM Studio's quant column. This is a display issue only — the model loads and runs fine.
Specs
- 3.97B parameters
- Hybrid Mamba2-Transformer architecture (42 layers: 21 Mamba2, 17 MLP, 4 Attention)
- 262K native context
- Thinking/reasoning mode (toggleable)
- Tool calling support
- Compressed from NVIDIA Nemotron-Nano-9B-v2 using Nemotron Elastic framework
- Based on nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16
Recommended Settings
From the official NVIDIA authors:
Reasoning mode (default, thinking enabled):
temperature=1.0, top_p=0.95
Tool calling:
temperature=0.6, top_p=0.95
Disabling reasoning:
- Set
enable_thinking=Falsein chat template — trades accuracy for speed on simpler tasks
Usage
Works with llama.cpp, LM Studio, Jan, koboldcpp, and other GGUF-compatible runtimes.
llama-cli -m Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-Q4_K_P.gguf \
--jinja -c 131072 -ngl 99
Note: LM Studio may show unexpected values in architecture/params columns — this is a display quirk with hybrid Mamba models, the model runs correctly.
Other Versions
- Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-GenRM — same abliteration but with GenRM still active (IQ2_M only, for comparison)
- Downloads last month
- 4,626
Model tree for HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive
Base model
nvidia/NVIDIA-Nemotron-Nano-12B-v2-Base