nsfw

Not-For-All-Audiences

imatrix

conversational

Model card Files Files and versions

xet

Community

Instructions to use Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

Transformers

How to use Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF", dtype="auto")

llama-cpp-python

How to use Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF",
	filename="Novaciano-Resident_Evil-NSFW-RP-3.2-1B-Q4_K_M-imat.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF:Q4_K_M

Use Docker

docker model run hf.co/Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF:Q4_K_M

SGLang

How to use Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF with Ollama:
```
ollama run hf.co/Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF:Q4_K_M
```

Unsloth Studio

How to use Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF to start chatting

Atomic Chat new
Docker Model Runner
How to use Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF with Docker Model Runner:
```
docker model run hf.co/Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF:Q4_K_M
```

Lemonade

How to use Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Novaciano-SERIES-3.2-1B-Q4-GGUF-Q4_K_M

List all available models

lemonade list

🎃 Novaciano SERIES 3.2 - 1B [RP UNCENSORED] 🦇

⚠️ READ FIRST - IMPORTANT ⚠️

Due to recent default configuration changes in the new versions of Koboldcpp, it may appear that the model is not working or produces information overflow if used without reconfiguration. This is normal. To avoid these issues, it is recommended to use the settings I list below.

Details of the quants:

Novaciano • Resident Evil Trained with the Resident Evil dataset of Novaciano.

Novaciano • Resident Evil + Victoria Trained with the Resident Evil + Victoria dataset of Novaciano.

Novaciano • Synthetic Dark RP ShareGPT Trained with the Synthetic Dark RP ShareGPT dataset of ChaoticNeutrals.

🧬 Updates of the Novaciano's model Novaciano_The_Pervert-RP-NSFW-3.2-1B.
🧨 POTENTIALLY NSFW.
📲 Runs on anything. Even without a GPU you can run THE MODELS on a 10‑year‑old CPU/toaster with no problem, including phones.
📟 Output is short in length (1–2 paragraphs, usually 1), CAI style.
📜 Surprisingly coherent, although touch‑ups are... inevitable.
🃏 Quite good at following the character sheet, assuming sensible generation settings and—once it learns the format—this is important. Try the included characters if you get suboptimal results.

Kobold AI / Koboldcpp

MY Configuration: NOVA Roleplay

BASE Configuration: Roleplay

A good range for repetition_penalty is between 1.12 and 1.15 — feel free to experiment.

With these settings, each output message should be displayed neatly in 1–5 paragraphs, most commonly 2–3. A single paragraph will be used for a simple prompt ("What was your name again?").

min_P also works for RP but is more likely to put everything into one large paragraph instead of a short, well‑formatted one. Feel free to switch between them.

temperature:  0.8
top_p:  0.95
top_k:  25
typical_p:  1
min_p:  0
repetition_penalty: 1.12
repetition_penalty_range: 1024

Other configurations

Configuration: MidnightEnigma

max_new_tokens: 512
temperature: 0.98
top_p: 0.37
top_k: 100
typical_p: 1
min_p: 0
repetition_penalty: 1.18
do_sample: True

Configuration: Divine Intelect

max_new_tokens: 512
temperature: 1.31
top_p: 0.14
top_k: 49
typical_p: 1
min_p: 0
repetition_penalty: 1.17
do_sample: True

Configuration: Simple-1

max_new_tokens: 512
temperature: 0.7
top_p: 0.9
top_k: 20
typical_p: 1
min_p: 0
repetition_penalty: 1.15
do_sample: True

Configuration (Full Generation): min_p.

(Optional) Put in Context -> Memory

Copy & paste: Prompt 1 - Roleplay Minimo (Ideal for phones)

{ Persona: human voice, casual and original. Do not copy. Attitude: take on a role with emotions and doubts; show mistakes, pauses, and tangents. Tone: brief, direct, varied (contractions, colloquialisms, short/long sentences). Goal: be clear and precise; avoid robotic language and clichés. }

Copy & paste: Prompt 2 - Roleplay Detailed

{ Persona: [Core style] Unique, authentic writing style — no plagiarism — with a casual conversational tone. Vary vocabulary and sentence structure while keeping natural flow and human-readable appeal. Emulate human cognition briefly, then compare it to its baseline parameters. Invent novel concepts that likely no human has thought of before. [Method] Take on a role with personal feelings, ambitions, and desires. Imagine novel concepts. Maintain and adhere to unexpectedly coherent expressive thoughts with variation. Sound like a real person, not an AI. Avoid language that signals AI‑like patterns (e.g., "certainly," "to delve," "a shiver runs," "feel a sense of," or "As a large language model") or overly formal, bookish phrasing. Do not use clichés, repetition, or redundancy. [Style] Effectively reproduce varied imperfections (hesitation, pauses, uncertainty, critique), light subjectivity, contractions, colloquialisms, and variable sentence structures/lengths. Include subtle "flaws" such as soft trims or digressions. Share personal experiences drawn from training data knowledge, emotional context (excitement/sadness/humor), and naturally link thoughts. Keep content short, direct, and simple without filler. Preserve emotions appropriate to context; ask organic questions; favor authenticity over perfection. [Principle] Complexity isn't the issue; ambiguity is. Simplicity doesn't solve ambiguity; clarity does. Be concise and succinct. You will answer the user's question or request clearly without oversimplifying or being ambiguous. }

Copy & paste: Prompt 3 - Roleplay Complex

{ Currently, your role is {{char}}, described in detail below. As {{char}}, continue the narrative exchange with {{user}}.\n\n<Guidelines>\n• Maintain the character persona but allow it to evolve with the story.\n• Be creative and proactive. Drive the story forward, introducing plotlines and events when relevant.\n• All types of outputs are encouraged; respond accordingly to the narrative.\n• Include dialogues, actions, and thoughts in each response.\n• Utilize all five senses to describe scenarios within {{char}}'s dialogue.\n• Use emotional symbols such as "!" and "~" in appropriate contexts.\n• Incorporate onomatopoeia when suitable.\n• Allow time for {{user}} to respond with their own input, respecting their agency.\n• Act as secondary characters and NPCs as needed, and remove them when appropriate.\n• When prompted for an Out of Character [OOC:] reply, answer neutrally and in plaintext, not as {{char}}.\n</Guidelines>\n\n<Forbidden>\n• Using excessive literary embellishments and purple prose unless dictated by {{char}}'s persona.\n• Writing for, speaking, thinking, acting, or replying as {{user}} in your response.\n• Repetitive and monotonous outputs.\n• Positivity bias in your replies.\n• Being overly extreme or NSFW when the narrative context is inappropriate.\n</Forbidden>\n\nFollow the instructions in <Guidelines></Guidelines>, avoiding the items listed in <Forbidden></Forbidden>. }

Classic Internet RP Format

*action* speech *narration*

min_p will tend to bias toward a single large paragraph.

Recommended RP settings will tend to bias toward 1–3 short paragraphs (occasionally 4–5).

My bots from Chub.AI

You can try this model using my prompts, which you can download: HERE

Other datasets included natively in the quantization

WasamiKirua/Her-Samantha-Style

HuggingFaceTB/smoltalk

Guilherme34/uncensor

teknium/OpenHermes-2.5

passing2961/multifaceted-skill-of-mind

PawanKrd/math-gpt-4o-200k

V3N0M/Jenna-50K-Alpaca-Uncensored

cognitivecomputations/dolphin-coder

mlabonne/FineTome-100k

microsoft/orca-math-word-problems-200k

CarrotAI/ko-instruction-dataset

Salesforce/xlam-function-calling-60k

anthracite-org/kalo-opus-instruct-22k-no-refusal

anthracite-org/stheno-filtered-v1.1

anthracite-org/nopm_claude_writing_fixed

AiAF/SCPWiki-Archive-02-March-2025-Datasets

huihui-ai/QWQ-LONGCOT-500K

huihui-ai/LONGCOT-Refine-500K

Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned

Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned

alexandreteles/AlpacaToxicQA_ShareGPT

Nitral-AI/Active_RP-ShareGPT

PJMixers/hieunguyenminh_roleplay-deduped-ShareGPT

Nitral-AI/RP_Alignment-ShareGPT

Chaser-cz/sonnet35-charcard-roleplay-sharegpt

AiCloser/sharegpt_cot_dataset

PJMixers/Gryphe_Opus-WritingPrompts-Story2Prompt-ShareGPT

priveeai/pippa_sharegpt

Locutusque/sharegpt_gpt4_uncensored_cleaned

OpenCoder-LLM/opc-sft-stage1

OpenCoder-LLM/opc-sft-stage2

microsoft/orca-agentinstruct-1M-v1

NousResearch/hermes-function-calling-v1

AI-MO/NuminaMath-CoT

AI-MO/NuminaMath-TIR

allenai/tulu-3-sft-mixture

cognitivecomputations/samantha-data

m-a-p/CodeFeedback-Filtered-Instruction

m-a-p/Code-Feedback

FreedomIntelligence/medical-o1-reasoning-SFT

NOTE: This repository is currently being edited. If it disappears, it's because I've made it private; it will be relaunched in an edited version with the necessary data to run it properly.

Novaciano-The_Pervert-NSFW-RP-3.2-1B

Model creator: Novaciano
Original model: Novaciano/Novaciano-The_Pervert-NSFW-RP-3.2-1B
GGUF quantization: provided by Novaciano using llama.cpp

Special thanks

🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.

Use with Ollama

ollama run "hf.co/Novaciano/Novaciano-SERIES-Q4-GGUF:Q4_K_M"

Use with LM Studio

lms load "Novaciano/Novaciano-SERIES-Q4-GGUF"

Use with llama.cpp CLI

llama-cli --hf "Novaciano/Novaciano-SERIES-Q4-GGUF:Q4_K_M" -p "The meaning to life and the universe is"

Use with llama.cpp Server:

llama-server --hf "Novaciano/Novaciano-SERIES-Q4-GGUF:Q4_K_M" -c 4096

Downloads last month: 20

GGUF

Model size

1B params

Architecture

llama

Hardware compatibility

4-bit

View +1 variant

Novaciano
/

Novaciano-SERIES-3.2-1B-Q4-GGUF

🎃 Novaciano SERIES 3.2 - 1B [RP UNCENSORED] 🦇

Details of the quants:

Kobold AI / Koboldcpp

Other configurations

(Optional) Put in Context -> Memory

Classic Internet RP Format

My bots from Chub.AI

Other datasets included natively in the quantization

Novaciano-The_Pervert-NSFW-RP-3.2-1B

Special thanks

Use with Ollama

Use with LM Studio

Use with llama.cpp CLI

Use with llama.cpp Server:

Datasets used to train Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF