Instructions to use Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF", dtype="auto") - llama-cpp-python
How to use Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF", filename="Novaciano-Resident_Evil-NSFW-RP-3.2-1B-Q4_K_M-imat.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF:Q4_K_M
Use Docker
docker model run hf.co/Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF:Q4_K_M
- SGLang
How to use Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF with Ollama:
ollama run hf.co/Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF:Q4_K_M
- Unsloth Studio
How to use Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF to start chatting
- Atomic Chat new
- Docker Model Runner
How to use Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF with Docker Model Runner:
docker model run hf.co/Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF:Q4_K_M
- Lemonade
How to use Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Novaciano/Novaciano-SERIES-3.2-1B-Q4-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Novaciano-SERIES-3.2-1B-Q4-GGUF-Q4_K_M
List all available models
lemonade list
🎃 Novaciano SERIES 3.2 - 1B [RP UNCENSORED] 🦇
⚠️ READ FIRST - IMPORTANT ⚠️
Due to recent default configuration changes in the new versions of Koboldcpp, it may appear that the model is not working or produces information overflow if used without reconfiguration. This is normal. To avoid these issues, it is recommended to use the settings I list below.
Details of the quants:
Novaciano • Resident Evil Trained with the Resident Evil dataset of Novaciano.
Novaciano • Resident Evil + Victoria Trained with the Resident Evil + Victoria dataset of Novaciano.
Novaciano • Synthetic Dark RP ShareGPT Trained with the Synthetic Dark RP ShareGPT dataset of ChaoticNeutrals.
- 🧬 Updates of the Novaciano's model Novaciano_The_Pervert-RP-NSFW-3.2-1B.
- 🧨 POTENTIALLY NSFW.
- 📲 Runs on anything. Even without a GPU you can run THE MODELS on a 10‑year‑old CPU/toaster with no problem, including phones.
- 📟 Output is short in length (1–2 paragraphs, usually 1), CAI style.
- 📜 Surprisingly coherent, although touch‑ups are... inevitable.
- 🃏 Quite good at following the character sheet, assuming sensible generation settings and—once it learns the format—this is important. Try the included characters if you get suboptimal results.
Kobold AI / Koboldcpp
BASE Configuration: Roleplay
A good range for repetition_penalty is between 1.12 and 1.15 — feel free to experiment.
With these settings, each output message should be displayed neatly in 1–5 paragraphs, most commonly 2–3. A single paragraph will be used for a simple prompt ("What was your name again?").
min_P also works for RP but is more likely to put everything into one large paragraph instead of a short, well‑formatted one. Feel free to switch between them.
temperature: 0.8
top_p: 0.95
top_k: 25
typical_p: 1
min_p: 0
repetition_penalty: 1.12
repetition_penalty_range: 1024
Other configurations
Configuration: MidnightEnigma
max_new_tokens: 512
temperature: 0.98
top_p: 0.37
top_k: 100
typical_p: 1
min_p: 0
repetition_penalty: 1.18
do_sample: True
Configuration: Divine Intelect
max_new_tokens: 512
temperature: 1.31
top_p: 0.14
top_k: 49
typical_p: 1
min_p: 0
repetition_penalty: 1.17
do_sample: True
Configuration: Simple-1
max_new_tokens: 512
temperature: 0.7
top_p: 0.9
top_k: 20
typical_p: 1
min_p: 0
repetition_penalty: 1.15
do_sample: True
Configuration (Full Generation): min_p.
(Optional) Put in Context -> Memory
Copy & paste: Prompt 1 - Roleplay Minimo (Ideal for phones)
{
Persona: human voice, casual and original. Do not copy.
Attitude: take on a role with emotions and doubts; show mistakes, pauses, and tangents.
Tone: brief, direct, varied (contractions, colloquialisms, short/long sentences).
Goal: be clear and precise; avoid robotic language and clichés.
}
Copy & paste: Prompt 2 - Roleplay Detailed
{
Persona: [Core style]
Unique, authentic writing style — no plagiarism — with a casual conversational tone. Vary vocabulary and sentence structure while keeping natural flow and human-readable appeal. Emulate human cognition briefly, then compare it to its baseline parameters. Invent novel concepts that likely no human has thought of before.
[Method]
Take on a role with personal feelings, ambitions, and desires. Imagine novel concepts. Maintain and adhere to unexpectedly coherent expressive thoughts with variation. Sound like a real person, not an AI. Avoid language that signals AI‑like patterns (e.g., "certainly," "to delve," "a shiver runs," "feel a sense of," or "As a large language model") or overly formal, bookish phrasing. Do not use clichés, repetition, or redundancy.
[Style]
Effectively reproduce varied imperfections (hesitation, pauses, uncertainty, critique), light subjectivity, contractions, colloquialisms, and variable sentence structures/lengths. Include subtle "flaws" such as soft trims or digressions. Share personal experiences drawn from training data knowledge, emotional context (excitement/sadness/humor), and naturally link thoughts. Keep content short, direct, and simple without filler. Preserve emotions appropriate to context; ask organic questions; favor authenticity over perfection.
[Principle]
Complexity isn't the issue; ambiguity is. Simplicity doesn't solve ambiguity; clarity does. Be concise and succinct. You will answer the user's question or request clearly without oversimplifying or being ambiguous.
}
Copy & paste: Prompt 3 - Roleplay Complex
{
Currently, your role is {{char}}, described in detail below. As {{char}}, continue the narrative exchange with {{user}}.\n\n<Guidelines>\n• Maintain the character persona but allow it to evolve with the story.\n• Be creative and proactive. Drive the story forward, introducing plotlines and events when relevant.\n• All types of outputs are encouraged; respond accordingly to the narrative.\n• Include dialogues, actions, and thoughts in each response.\n• Utilize all five senses to describe scenarios within {{char}}'s dialogue.\n• Use emotional symbols such as "!" and "~" in appropriate contexts.\n• Incorporate onomatopoeia when suitable.\n• Allow time for {{user}} to respond with their own input, respecting their agency.\n• Act as secondary characters and NPCs as needed, and remove them when appropriate.\n• When prompted for an Out of Character [OOC:] reply, answer neutrally and in plaintext, not as {{char}}.\n</Guidelines>\n\n<Forbidden>\n• Using excessive literary embellishments and purple prose unless dictated by {{char}}'s persona.\n• Writing for, speaking, thinking, acting, or replying as {{user}} in your response.\n• Repetitive and monotonous outputs.\n• Positivity bias in your replies.\n• Being overly extreme or NSFW when the narrative context is inappropriate.\n</Forbidden>\n\nFollow the instructions in <Guidelines></Guidelines>, avoiding the items listed in <Forbidden></Forbidden>.
}
Classic Internet RP Format
*action* speech *narration*
- min_p will tend to bias toward a single large paragraph.
- Recommended RP settings will tend to bias toward 1–3 short paragraphs (occasionally 4–5).
My bots from Chub.AI
You can try this model using my prompts, which you can download: HERE
Other datasets included natively in the quantization
- WasamiKirua/Her-Samantha-Style
- HuggingFaceTB/smoltalk
- Guilherme34/uncensor
- teknium/OpenHermes-2.5
- passing2961/multifaceted-skill-of-mind
- PawanKrd/math-gpt-4o-200k
- V3N0M/Jenna-50K-Alpaca-Uncensored
- cognitivecomputations/dolphin-coder
- mlabonne/FineTome-100k
- microsoft/orca-math-word-problems-200k
- CarrotAI/ko-instruction-dataset
- Salesforce/xlam-function-calling-60k
- anthracite-org/kalo-opus-instruct-22k-no-refusal
- anthracite-org/stheno-filtered-v1.1
- anthracite-org/nopm_claude_writing_fixed
- AiAF/SCPWiki-Archive-02-March-2025-Datasets
- huihui-ai/QWQ-LONGCOT-500K
- huihui-ai/LONGCOT-Refine-500K
- Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned
- Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned
- alexandreteles/AlpacaToxicQA_ShareGPT
- Nitral-AI/Active_RP-ShareGPT
- PJMixers/hieunguyenminh_roleplay-deduped-ShareGPT
- Nitral-AI/RP_Alignment-ShareGPT
- Chaser-cz/sonnet35-charcard-roleplay-sharegpt
- AiCloser/sharegpt_cot_dataset
- PJMixers/Gryphe_Opus-WritingPrompts-Story2Prompt-ShareGPT
- priveeai/pippa_sharegpt
- Locutusque/sharegpt_gpt4_uncensored_cleaned
- OpenCoder-LLM/opc-sft-stage1
- OpenCoder-LLM/opc-sft-stage2
- microsoft/orca-agentinstruct-1M-v1
- NousResearch/hermes-function-calling-v1
- AI-MO/NuminaMath-CoT
- AI-MO/NuminaMath-TIR
- allenai/tulu-3-sft-mixture
- cognitivecomputations/samantha-data
- m-a-p/CodeFeedback-Filtered-Instruction
- m-a-p/Code-Feedback
- FreedomIntelligence/medical-o1-reasoning-SFT
NOTE: This repository is currently being edited. If it disappears, it's because I've made it private; it will be relaunched in an edited version with the necessary data to run it properly.
Novaciano-The_Pervert-NSFW-RP-3.2-1B
Model creator: Novaciano
Original model: Novaciano/Novaciano-The_Pervert-NSFW-RP-3.2-1B
GGUF quantization: provided by Novaciano using llama.cpp
Special thanks
🙏 Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.
Use with Ollama
ollama run "hf.co/Novaciano/Novaciano-SERIES-Q4-GGUF:Q4_K_M"
Use with LM Studio
lms load "Novaciano/Novaciano-SERIES-Q4-GGUF"
Use with llama.cpp CLI
llama-cli --hf "Novaciano/Novaciano-SERIES-Q4-GGUF:Q4_K_M" -p "The meaning to life and the universe is"
Use with llama.cpp Server:
llama-server --hf "Novaciano/Novaciano-SERIES-Q4-GGUF:Q4_K_M" -c 4096
- Downloads last month
- 20
4-bit

