Devstral-Small-2507-AWQ

This model was forked in an attempt to make changes so it will run in vLLM.

Method

Quantised using casper-hansen/AutoAWQ and the following configs:

quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

Inference

The quantised model's configs and weights are stored in hf and safetensors format, but the tokeniser remains in mistral format. Please load inference arguments accordingly, e.g.,:

vllm

vllm serve cpatonn/Devstral-Small-2507-AWQ --tokenizer_mode mistral --config_format hf --load_format safetensors --tool-call-parser mistral --enable-auto-tool-choice

Downloads last month: 64

Safetensors

Model size

24B params

Tensor type

I32

BF16

Model tree for btbtyler09/Devstral-Small-2507-AWQ

Base model

mistralai/Mistral-Small-3.1-24B-Base-2503

Finetuned

mistralai/Mistral-Small-3.1-24B-Instruct-2503

Finetuned

mistralai/Devstral-Small-2507

Quantized

(27)

this model