Qwen3.6-14B-A3B-FableVibes-GGUF

GGUF quantizations of Qwen3.6-14B-A3B-FableVibes, a 14B MoE model fine-tuned on reasoning traces from Claude Fable 5.

Background

This model started as Qwen3.6-35B-A3B-heretic and was pruned via REAP down to ~14B active parameters, removing over half its expert capacity. A single QLoRA pass was then orchestrated entirely by an autonomous AI agent (Steve), utilizing ~4,600 raw reasoning traces from Claude Fable 5 (Mythos-class) to recover capabilities lost during pruning.

Rather than focusing strictly on agentic orchestration, this model serves as a general-purpose reasoning distill. The Fable CoT traces provide structured multi-step reasoning patterns from a frontier-class model, distilled into a footprint that can run on consumer hardware. The Fable traces are further supplemented by Claude Opus reasoning, Qwen tool-calling data, and Evol-Instruct-Code.

Available Formats

Quant Size Notes
F16 ~27GB Full precision reference
Q8_0 ~15GB Near-lossless
Q6_K ~11.3GB Quality/size sweet spot
Q5_K_M ~9.8GB High quality
BPW4.75 ~8.5GB Custom exl2-matched quantization array
Q4_K_M ~8.4GB Recommended for 8-12GB VRAM
Q3_K_M ~6.7GB Tight fits
Q2_K ~5.3GB Maximum compression

Vision support (mmproj files) is included for multimodal use.

Usage

Works with any llama.cpp-compatible backend (llama.cpp, LM Studio, Ollama, text-generation-webui). This model uses Qwen's thinking format -- it will produce reasoning tokens before its response. Give it sufficient generation budget (the reasoning pass typically uses hundreds to thousands of tokens before answering).

llama-cli -m Qwen3.6-14B-A3B-FableVibes-Q4_K_M.gguf \
  --mmproj Qwen3.6-14B-A3B-FableVibes-mmproj-F16.gguf \
  -p "Your prompt here"

Notes

  • This is a general-purpose reasoning model. The Fable traces improve structured thinking across domains.
  • The pruned base was not pre-fine-tuned before this run -- the Fable LoRA was applied directly to the REAP output.
  • Expect longer first-token latency due to the thinking pass, but higher quality reasoning on complex tasks.
Downloads last month
4,094
GGUF
Model size
14B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tvall43/Qwen3.6-14B-A3B-FableVibes-GGUF

Quantized
(1)
this model

Datasets used to train tvall43/Qwen3.6-14B-A3B-FableVibes-GGUF