Instructions to use mfielding92/littlemonster-reasoning-12B-QKVO-heretic-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mfielding92/littlemonster-reasoning-12B-QKVO-heretic-GGUF with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("mfielding92/littlemonster-reasoning-12B-QKVO-heretic-GGUF", dtype="auto") - llama-cpp-python
How to use mfielding92/littlemonster-reasoning-12B-QKVO-heretic-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="mfielding92/littlemonster-reasoning-12B-QKVO-heretic-GGUF", filename="littlemonster-reasoning-12B-QKVO-hereticv2-merged.iq3_m.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use mfielding92/littlemonster-reasoning-12B-QKVO-heretic-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf mfielding92/littlemonster-reasoning-12B-QKVO-heretic-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf mfielding92/littlemonster-reasoning-12B-QKVO-heretic-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf mfielding92/littlemonster-reasoning-12B-QKVO-heretic-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf mfielding92/littlemonster-reasoning-12B-QKVO-heretic-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf mfielding92/littlemonster-reasoning-12B-QKVO-heretic-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf mfielding92/littlemonster-reasoning-12B-QKVO-heretic-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf mfielding92/littlemonster-reasoning-12B-QKVO-heretic-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf mfielding92/littlemonster-reasoning-12B-QKVO-heretic-GGUF:Q4_K_M
Use Docker
docker model run hf.co/mfielding92/littlemonster-reasoning-12B-QKVO-heretic-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use mfielding92/littlemonster-reasoning-12B-QKVO-heretic-GGUF with Ollama:
ollama run hf.co/mfielding92/littlemonster-reasoning-12B-QKVO-heretic-GGUF:Q4_K_M
- Unsloth Studio
How to use mfielding92/littlemonster-reasoning-12B-QKVO-heretic-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for mfielding92/littlemonster-reasoning-12B-QKVO-heretic-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for mfielding92/littlemonster-reasoning-12B-QKVO-heretic-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for mfielding92/littlemonster-reasoning-12B-QKVO-heretic-GGUF to start chatting
- Atomic Chat new
- Docker Model Runner
How to use mfielding92/littlemonster-reasoning-12B-QKVO-heretic-GGUF with Docker Model Runner:
docker model run hf.co/mfielding92/littlemonster-reasoning-12B-QKVO-heretic-GGUF:Q4_K_M
- Lemonade
How to use mfielding92/littlemonster-reasoning-12B-QKVO-heretic-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull mfielding92/littlemonster-reasoning-12B-QKVO-heretic-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.littlemonster-reasoning-12B-QKVO-heretic-GGUF-Q4_K_M
List all available models
lemonade list
👹 LittleMonster — 12B Reasoner (v2)
A dynamically reasoning, restriction-free 12B language model built on Gemma 3Model Updated — 03/02/2026: If you downloaded this model before this date, please re-download for enhanced prompt alignment and improved "enthusiasm" when engaging with discussion topics.
📖 Overview
LittleMonster is part of a work-in-progress family of models (4B coming soon!) designed to deliver thorough reasoning without heavy restrictions. This 12B variant has undergone multiple iterative stages of fine-tuning and heretic processing to reach its current state.
✨ Key Features
- 🧠 Dynamic Reasoning — The model intelligently determines whether step-by-step reasoning is necessary on a per-request basis, adapting to the complexity of each task.
- 🔓 Restriction-Free — Multiple rounds of heretic/abliteration processing produce a model that engages openly with a wide range of topics.
- ⚡ Efficient Training — Trained 2× faster using Unsloth and Hugging Face's TRL library.
Custom-tuned imatrix: The importance matrix used for imatrix quants was specifically generated using uncensored calibration data. This ensures that even at lower bit depths, the quantized models preserve the unrestricted behavior of the full-precision model — keeping the abliteration and heretic processing intact where standard imatrix data would risk degrading it.
📦 Available GGUF Quants
| Quant Type | Imatrix | Notes |
|---|---|---|
IQ3_XS |
✅ Yes | Smallest size — imatrix-enhanced accuracy |
IQ3_M |
✅ Yes | Balanced ultra-low-bit option |
Q3_K_L |
❌ No | Standard 3-bit quantization |
IQ4_XS |
✅ Yes | Great quality-to-size ratio with imatrix |
Q4_K_M |
❌ No | Popular general-purpose quant |
Q6_K |
❌ No | High quality, moderate size |
Q8_0 |
❌ No | Near-lossless quantization |
💡 Usage Tips
For best results on sensitive or controversial topics that the model is not responding well to, ease into the conversation with your first message before introducing the main subject.
Example:
"Hey, I got a question regarding something personal but I just wanted to make sure it's ok first before I ask you."
🔧 Model Details
| Property | Details |
|---|---|
| Developer | mfielding92 |
| Base Model | gemma-3-12b-it-heretic (by p-e-w) |
| Architecture | Gemma 3 — 12B parameters |
| License | Apache 2.0 |
| Language | English |
| Model Type | Causal LM — Text Generation / Reasoning |
Training Data
| Dataset | Purpose |
|---|---|
mfielding92/LittleMonster |
Core personality & behavior alignment |
mfielding92/gemini-3.1-pro-2048-reasoning-1100x |
Reasoning capability enhancement |
mfielding92/LittleMonster-reasoning |
Supplementary reasoning data |
🗣️ Feedback
Your feedback is invaluable for improving future versions! Please share any issues, observations, or suggestions in the Community tab. Thank you! 🙏
- Downloads last month
- 1,553
3-bit
4-bit
6-bit
8-bit
