How to use from
Pi
Start the llama.cpp server
# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf FreedomAISVR/Qwable-v1-MXFP4-MOE-GGUF:F16
Configure the model in Pi
# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "FreedomAISVR/Qwable-v1-MXFP4-MOE-GGUF:F16"
        }
      ]
    }
  }
}
Run Pi
# Start Pi in your project directory:
pi
Quick Links

Qwable-v1 MXFP4 MoE GGUF

GGUF quantization of lordx64/Qwable-v1 โ€” an agentic coding model built by layering Claude Fable-5 tool-use behavior on top of a Claude Opus 4.7 reasoning distill of Qwen3.6-35B-A3B.

Model Details

  • Architecture: Qwen3.5 MoE, 41 blocks (40 layers + 1 MTP head), 256 experts (8 active/token)
  • Active Parameters: ~3B
  • Context: 262,144 tokens
  • Vision: Yes (27-layer SigLIP ViT)
  • License: AGPL-3.0
  • Base: Qwen3.6-35B-A3B โ†’ Opus 4.7 reasoning distill โ†’ Fable-5 agentic SFT

What's Included

File Type Size BPW
qwable-v1-mxfp4_moe.gguf MXFP4 (experts) + Q8_0 (non-experts) ~18.87 GB 4.56
mmproj-qwable-v1-f16.gguf Vision projector (F16) ~0.88 GB F16

Quantization Details

MXFP4_MOE

  • MoE expert weights (ffn_down_exps, ffn_gate_exps, ffn_up_exps) quantized to MXFP4
  • Non-expert weights (attention, shared experts, norms) quantized to Q8_0
  • Router weights kept at F32
  • Vision encoder and projector kept at F16 (not quantized)

Usage

# With llama.cpp (vision + text)
./llama-server -m qwable-v1-mxfp4_moe.gguf --mmproj mmproj-qwable-v1-f16.gguf --host 0.0.0.0 --port 8080

# Text-only (no vision)
./llama-cli -m qwable-v1-mxfp4_moe.gguf -p "Hello, how are you?"

Agentic Tool-Use

Qwable-v1 emits <tool_use> XML when prompted with an agent-style system prompt:

system: You are a coding agent. When you need to read, write, edit, or run code,
emit XML tool calls in this exact format:
<tool_use name="X" id="toolu_01abc">
{"...": "..."}
</tool_use>

Without the agent prompt, the model falls back to the Opus 4.7 reasoning prior (markdown code blocks).

Credits

Downloads last month
602
GGUF
Model size
36B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support