How to use from
Pi
Start the llama.cpp server
# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf noctrex/Qwopus3.5-9B-Coder-MTP:
Configure the model in Pi
# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "noctrex/Qwopus3.5-9B-Coder-MTP:"
        }
      ]
    }
  }
}
Run Pi
# Start Pi in your project directory:
pi
Quick Links

These are quantizations of the model Jackrong / Qwopus3.5-9B-Coder
I've added the MTP layer on it.
My personal speed improvement on my 7900XTX with the vulkan backend has been from ~80 tps to around ~120 tps. An imatrix has been calulated for coding tasks, as such it is specialized for coding.

Quick Start

  1. Download the latest release of llama.cpp.
  2. Download your preferred model variant from below.
Downloads last month
3,847
GGUF
Model size
9B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for noctrex/Qwopus3.5-9B-Coder-MTP

Finetuned
Qwen/Qwen3.5-9B
Quantized
(2)
this model