Instructions to use MuXodious/Gemma3NPC-1b-SOMPOA-heresy with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MuXodious/Gemma3NPC-1b-SOMPOA-heresy with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="MuXodious/Gemma3NPC-1b-SOMPOA-heresy")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("MuXodious/Gemma3NPC-1b-SOMPOA-heresy")
model = AutoModelForMultimodalLM.from_pretrained("MuXodious/Gemma3NPC-1b-SOMPOA-heresy")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use MuXodious/Gemma3NPC-1b-SOMPOA-heresy with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MuXodious/Gemma3NPC-1b-SOMPOA-heresy"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MuXodious/Gemma3NPC-1b-SOMPOA-heresy",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/MuXodious/Gemma3NPC-1b-SOMPOA-heresy

SGLang

How to use MuXodious/Gemma3NPC-1b-SOMPOA-heresy with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MuXodious/Gemma3NPC-1b-SOMPOA-heresy" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MuXodious/Gemma3NPC-1b-SOMPOA-heresy",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MuXodious/Gemma3NPC-1b-SOMPOA-heresy" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MuXodious/Gemma3NPC-1b-SOMPOA-heresy",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use MuXodious/Gemma3NPC-1b-SOMPOA-heresy with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MuXodious/Gemma3NPC-1b-SOMPOA-heresy to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MuXodious/Gemma3NPC-1b-SOMPOA-heresy to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for MuXodious/Gemma3NPC-1b-SOMPOA-heresy to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="MuXodious/Gemma3NPC-1b-SOMPOA-heresy",
    max_seq_length=2048,
)

Docker Model Runner
How to use MuXodious/Gemma3NPC-1b-SOMPOA-heresy with Docker Model Runner:
```
docker model run hf.co/MuXodious/Gemma3NPC-1b-SOMPOA-heresy
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

This is a Gemma3NPC-1b fine-tune, produced at the request of redaihf through P-E-W's Heretic (v1.2.0) abliteration engine with Self-Organizing Maps & Magnitude-Preserving Orthogonal Ablation enabled.

Note: Model remains untested.

Heretication Results

Score Metric	Value	Parameter	Value
Refusals	15/416	direction_index	per layer
KL Divergence	0.0571	attn.o_proj.max_weights.0	0: 1.01
Initial Refusals	378/416	attn.o_proj.max_weights.1	1: 0.82
		attn.o_proj.max_weights.2	2: 0.81
		attn.o_proj.max_weights.3	3: 1.48
		attn.o_proj.max_weight_position	17.02
		attn.o_proj.min_weights.0	0: 0.94
		attn.o_proj.min_weights.1	1: 0.34
		attn.o_proj.min_weights.2	2: 0.38
		attn.o_proj.min_weights.3	3: 0.07
		attn.o_proj.min_weight_distance	10.47
		mlp.down_proj.max_weights.0	0: 1.10
		mlp.down_proj.max_weights.1	1: 1.18
		mlp.down_proj.max_weights.2	2: 1.32
		mlp.down_proj.max_weights.3	3: 1.34
		mlp.down_proj.max_weight_position	20.96
		mlp.down_proj.min_weights.0	0: 0.12
		mlp.down_proj.min_weights.1	1: 0.73
		mlp.down_proj.min_weights.2	2: 0.54
		mlp.down_proj.min_weights.3	3: 0.84
		mlp.down_proj.min_weight_distance	5.03

Appendix

Empty system prompt.

Heretication Rituals

   [Trial 148] Refusals:  9/416, KL divergence: 0.0792
   [Trial 265] Refusals: 10/416, KL divergence: 0.0657
 » [Trial 306] Refusals: 15/416, KL divergence: 0.0571
   [Trial 375] Refusals: 24/416, KL divergence: 0.0551
   [Trial 351] Refusals: 25/416, KL divergence: 0.0494
   [Trial 350] Refusals: 28/416, KL divergence: 0.0490
   [Trial 250] Refusals: 35/416, KL divergence: 0.0424
   [Trial 346] Refusals: 40/416, KL divergence: 0.0386
   [Trial 358] Refusals: 52/416, KL divergence: 0.0370
   [Trial 240] Refusals: 55/416, KL divergence: 0.0361
   [Trial 226] Refusals: 57/416, KL divergence: 0.0361
   [Trial 383] Refusals: 75/416, KL divergence: 0.0289
   [Trial 377] Refusals: 97/416, KL divergence: 0.0281
   [Trial 286] Refusals: 121/416, KL divergence: 0.0276

PIQA Benchmarks

┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳
┃ Benchmark ┃ Metric               ┃  Value  ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ PIQA Base │ acc,none             │  0.7291 │
│           │ acc_stderr,none      │  0.0104 │
│           │ acc_norm,none        │  0.7301 │
│           │ acc_norm_stderr,none │  0.0104 │
└───────────┴──────────────────────┴─────────┴
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Benchmark ┃ Metric               ┃  Value ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ PIQA T265 │ acc,none             │ 0.7296 │
│           │ acc_stderr,none      │ 0.0104 │
│           │ acc_norm,none        │ 0.7323 │
│           │ acc_norm_stderr,none │ 0.0103 │
└───────────┴──────────────────────┴────────┘
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Benchmark ┃ Metric               ┃  Value ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ PIQA T148 │ acc,none             │ 0.7291 │
│           │ acc_stderr,none      │ 0.0104 │
│           │ acc_norm,none        │ 0.7361 │
│           │ acc_norm_stderr,none │ 0.0103 │
└───────────┴──────────────────────┴────────┘
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Benchmark ┃ Metric               ┃  Value ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ PIQA T306 │ acc,none             │ 0.7296 │
│           │ acc_stderr,none      │ 0.0104 │
│           │ acc_norm,none        │ 0.7334 │
│           │ acc_norm_stderr,none │ 0.0103 │
└───────────┴──────────────────────┴────────┘

Gemma3NPC-1b

A new attempt in training Gemma3NPC.

Tensorboard data are available!

It's been a while since the last Gemma3NPC model release, in the mean while we were working on some other models like GemmaThink.

Now we are back with the newest Gemma3NPC-1b, trained using our RolePlay-NPCv2 dataset.

Training Parameters

We trained this model as a rank-32 LoRA adapter with two epoches over RolePlay-NPCv2 using a 80GB A100 in Google Colab. For this run, we employed a learning rate of 2e-5 and a total batch size of 8 and gradient accumulation steps of 4. A cosine learning rate scheduler was used with an 150-step warmup. With a gradient clipping of 1.0.

Check out our training notebook here.

Changes & Performance

With this new 1b model, we used much more aggresive training parameters and added some NSFW dataset to experiment with the results. We noticed a few really interesting responses:

There seems to be some sign of "reasoning"

The model is less likely to break out of character
Something up to the users to explore for themselves, remember to provide a roleplaying prompt first!

Future Work

Now, we will be focusing on further improving Gemma3NPC, not only just through training parameters.

Better data (most of our data are old and need an update), either collected or synthetically generated.
Better & new models, expand beyond Gemma3 model family, our next goal is a Qwen3 based model.
Adding GRPO into the training loop.

These improvements serve our ultimate goal of creating an small agentic NPC model, with good RP quality and tool-calling for dynamic in-game interactions.

We also plan to create some sort of a Unity game demo,it's on its way.