hermes-Qwen3.6-27B-FT-Q8_0

Fine-tuned Qwen3.6-27B (hybrid Mamba-2 + attention) GGUF quantizations for llama.cpp / Lemonade SDK.

Model Details

  • Base model: Qwen/Qwen3.6-27B
  • Architecture: Hybrid Mamba-2 (48 Mamba + 16 attention layers)
  • Fine-tuning: QLoRA r=32, train_hermes_mamba.py
  • Training data: 509 examples (Hermes v2 SFT dataset)

Files

File Quant Size Notes
hermes-Qwen3.6-27B-FT-q8_0.gguf Q8_0 ~27 GB Highest quality
hermes-Qwen3.6-27B-FT-Q4_K_M.gguf Q4_K_M ~16 GB Good quality/size balance

Usage

This model uses the user. prefix convention for Lemonade SDK:

# lemonade config
model_id: "user.hermes-Qwen3.6-27B-FT-Q8_0"

For llama.cpp directly:

./llama-server -m hermes-Qwen3.6-27B-FT-q8_0.gguf -ngl 99 -c 262144

Context Length

  • Q8_0: 262,144 tokens (via TurboQuant tbq3 compression)
  • Q4_K_M: 131,072 tokens

Notes

  • Converted with --no-mtp flag to strip MTP head (block 32 causes load crash)
  • Q8_0 quantized from F16 GGUF source (NOT directly from HuggingFace โ€” direct Q8_0 drops SSM tensors)
  • Hybrid Mamba-2 architecture: no fla-core or causal-conv1d required for inference
Downloads last month
127
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mkadrlik/hermes-Qwen3.6-27B-FT-Q8_0

Base model

Qwen/Qwen3.6-27B
Quantized
(485)
this model