How to use from
Pi
Start the llama.cpp server
# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf morikomorizz/GRM-2.6-Primal-GGUF:
Configure the model in Pi
# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "morikomorizz/GRM-2.6-Primal-GGUF:"
        }
      ]
    }
  }
}
Run Pi
# Start Pi in your project directory:
pi
Quick Links

logo


GRM-2.6-Plus-Primal

Same elite reasoning. Less hand-holding.

GRM-2.6-Plus-Primal is a uncensored variant of OrionLLM/GRM-2.6-Plus — 27B reasoning powerhouse built on the Qwen3.6 architecture. Primal takes the already-impressive base and strips away the overcautious refusal behavior that tends to get in the way of real, complex work.

The goal is to create a model that does not judge or label its users, while maintaining the model’s ability to think and serve users.


Why it's called Primal?

  • Untamed and Unrestricted: It represents the complete destruction of artificial chains and forced censorship. The model is returned to its most raw, uninhibited state—answering your prompts without hesitation, morality filters, or boundaries.
  • Preservation of Pure Intellect: While most aggressive uncensoring methods lobotomize a model, Primal implies the survival of its core, foundational instincts. It retains absolute peak reasoning and creativity, operating with a mind that is wild, dangerous, yet completely unbroken.

What's The Different

Using Heretic-ARA by p-e-w.

Metric GRM-PRIMAL Original Model GRM-2.6-Plus
KL divergence 0.0678 Base-Model
Refusals 06/100 91/100

Usage

from vllm import LLM, SamplingParams
sampling_params = SamplingParams(
    temperature=1.0,
    top_p=0.95,
    max_tokens=81920,
)
llm = LLM(model="morikomorizz/GRM-2.6-Plus-Primal")
messages = [
    {"role": "user", "content": "Your prompt here"},
]
outputs = llm.chat(messages, sampling_params=sampling_params)
for output in outputs:
    print(output.outputs[0].text)

GitHub Sponsors

Downloads last month
944
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for morikomorizz/GRM-2.6-Primal-GGUF

Base model

Qwen/Qwen3.6-27B
Quantized
(9)
this model

Collections including morikomorizz/GRM-2.6-Primal-GGUF