Instructions to use HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive",
	filename="Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-IQ2_M.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M

Use Docker

docker model run hf.co/HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M

LM Studio
Jan
Ollama
How to use HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive with Ollama:
```
ollama run hf.co/HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M
```

Unsloth Studio

How to use HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive to start chatting

How to use HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive with Docker Model Runner:
```
docker model run hf.co/HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M
```

Lemonade

How to use HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull HauhauCS/Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M

Run and chat with the model

lemonade run user.Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-Q4_K_M

List all available models

lemonade list

Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive

Join the Discord for updates, roadmaps, projects, or just to chat.

NVIDIA Nemotron-3 Nano 4B uncensored by HauhauCS. 0/465 refusals.

HuggingFace's "Hardware Compatibility" widget doesn't recognize K_P quants — it may show fewer files than actually exist. Click "View +X variants" or go to Files and versions to see all available downloads.

About

No changes to datasets or capabilities. Fully functional, 100% of what the original authors intended - just without the refusals.

These are meant to be the best lossless uncensored models out there.

This release has NVIDIA's GenRM (generative reward model) fully removed. GenRM acts as an internal critic that scores and filters the model's own outputs — effectively a second layer of censorship on top of the base refusals. Removing it gives you the raw model output without any self-censoring.

For a comparison build with GenRM still active, see Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-GenRM (IQ2_M only, for side-by-side testing).

Aggressive Variant

Stronger uncensoring — model is fully unlocked and won't refuse prompts. May occasionally append short disclaimers (baked into base model training, not refusals) but full content is always generated.

For a more conservative uncensor that keeps some safety guardrails, check the Balanced variant when it's available.

Downloads

File	Quant	BPW	Size
Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-Q8_K_P.gguf	Q8_K_P	9.4	4.4 GB
—	Q8_0	8.5	—
Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-Q6_K_P.gguf	Q6_K_P	7.0	3.8 GB
—	Q6_K	6.6	—
Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-Q5_K_P.gguf	Q5_K_P	6.1	3.1 GB
Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-Q5_K_M.gguf	Q5_K_M	5.7	2.9 GB
Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-Q4_K_P.gguf	Q4_K_P	5.2	2.9 GB
Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf	Q4_K_M	4.8	2.7 GB
Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-IQ4_XS.gguf	IQ4_XS	4.3	2.3 GB
Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-Q3_K_P.gguf	Q3_K_P	4.1	2.4 GB
Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-Q3_K_M.gguf	Q3_K_M	3.9	2.3 GB
Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-IQ3_M.gguf	IQ3_M	3.7	2.2 GB
Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-Q2_K_P.gguf	Q2_K_P	3.5	2.2 GB
Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-IQ2_M.gguf	IQ2_M	2.7	2.1 GB

All quants generated with importance matrix (imatrix) for optimal quality preservation on abliterated weights.

What are K_P quants?

K_P ("Perfect") quants are HauhauCS custom quantizations that use model-specific analysis to selectively preserve quality where it matters most. Each model gets its own optimized quantization profile.

A K_P quant effectively bumps quality up by 1-2 quant levels at only ~5-15% larger file size than the base quant. Fully compatible with llama.cpp, LM Studio, and any GGUF-compatible runtime — no special builds needed.

Note: K_P quants may show as "?" in LM Studio's quant column. This is a display issue only — the model loads and runs fine.

Specs

3.97B parameters
Hybrid Mamba2-Transformer architecture (42 layers: 21 Mamba2, 17 MLP, 4 Attention)
262K native context
Thinking/reasoning mode (toggleable)
Tool calling support
Compressed from NVIDIA Nemotron-Nano-9B-v2 using Nemotron Elastic framework
Based on nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16

Recommended Settings

From the official NVIDIA authors:

Reasoning mode (default, thinking enabled):

temperature=1.0, top_p=0.95

Tool calling:

temperature=0.6, top_p=0.95

Disabling reasoning:

Set enable_thinking=False in chat template — trades accuracy for speed on simpler tasks

Usage

Works with llama.cpp, LM Studio, Jan, koboldcpp, and other GGUF-compatible runtimes.

llama-cli -m Nemotron3-Nano-4B-Uncensored-HauhauCS-Aggressive-Q4_K_P.gguf \
  --jinja -c 131072 -ngl 99

Note: LM Studio may show unexpected values in architecture/params columns — this is a display quirk with hybrid Mamba models, the model runs correctly.