Ornith-1.0-35B-GGUF — iMatrix GGUF

GGUF quantizations of deepreinforce-ai/Ornith-1.0-35B-GGUF, published by Liodon AI.

Quick Start

llama.cpp

llama-cli -hf liodon-ai/Ornith-1.0-35B-GGUF-imatrix-GGUF:Q4_K_M

Ollama

ollama run hf.co/liodon-ai/Ornith-1.0-35B-GGUF-imatrix-GGUF:Q4_K_M

LM Studio / Jan — search liodon-ai/Ornith-1.0-35B-GGUF-imatrix-GGUF and pick your quant.

Quants

Quant Size VRAM est. Notes
IQ2_M 11.66 GB ~13 GB 2-bit, iMatrix — smallest usable
IQ3_M 15.44 GB ~18 GB 3-bit, iMatrix — great quality/size tradeoff
IQ4_XS 18.73 GB ~22 GB 4-bit extra-small, iMatrix
Q4_K_M 21.17 GB ~24 GB 4-bit, iMatrix-calibrated (recommended)
Q5_K_M 24.73 GB ~28 GB 5-bit, iMatrix-calibrated
Q6_K 28.51 GB ~33 GB 6-bit, iMatrix-calibrated, near-lossless
Q8_0 36.90 GB ~42 GB 8-bit, essentially lossless

What is iMatrix?

Standard quantization treats all weights equally. iMatrix runs 128 calibration chunks through the full-precision model to find which weights matter most, then allocates more precision where it counts. At Q2/Q3/Q4 this means noticeably better coherence and instruction-following — same file size, better output.

Calibration: 2M tokens of WikiText-103.

Also see plain (non-iMatrix) quants: liodon-ai/Ornith-1.0-35B-GGUF-GGUF

Source


Quantized by Liodon AI

Downloads last month
-
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for liodon-ai/Ornith-1.0-35B-GGUF-imatrix-GGUF

Quantized
(2)
this model