How to use from
Pi
Start the llama.cpp server
# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Intel/Qwen3-8B-GGUF-Q2KS-AS-AutoRound:Q2_K_S
Configure the model in Pi
# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Intel/Qwen3-8B-GGUF-Q2KS-AS-AutoRound:Q2_K_S"
        }
      ]
    }
  }
}
Run Pi
# Start Pi in your project directory:
pi
Quick Links

Model Details

This example model demonstrates how to use AutoRound’s compatibility to automatically generate a mixed-bit quantization recipe.

For more details, please refer to user guide

Generate the model

auto-round > 0.8.0

from auto_round import AutoRound, AutoScheme

model_name = "Qwen/Qwen3-8B"
avg_bits = 3.0
scheme = AutoScheme(avg_bits=avg_bits, options=("GGUF:Q2_K_S", "GGUF:Q4_K_S"), ignore_scale_zp_bits=True)
layer_config = {"lm_head": "GGUF:Q6_K"}

ar = AutoRound(model=model_name, scheme=scheme, layer_config=layer_config, iters=0)
ar.quantize_and_save()
Downloads last month
14
GGUF
Model size
8B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

2-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Intel/Qwen3-8B-GGUF-Q2KS-AS-AutoRound

Finetuned
Qwen/Qwen3-8B
Quantized
(301)
this model