HuggingFaceH4/ultrachat_200k
Viewer • Updated • 515k • 72.9k • 711
This is a mixed BF16-INT8 AWQ layer quantization, with working MTP (speculative decoding) via llmcompressor.
The "UC" in the name refers to "HuggingFaceH4/ultrachat_200k" dataset used for this quant.
Fixed chat_template with "froggeric/Qwen-Fixed-Chat-Templates"
Working MTP with VLLM flag:
--speculative-config '{"method":"mtp","num_speculative_tokens":2}'
Tested with VLLM 0.19.1 and transformers 5.6.2
Recommended flags:
--enable-auto-tool-choice
--reasoning-parser qwen3
--tool-call-parser qwen3_xml
| Rank | Model / Dataset | HumanEval (Code) ↑ | Winogrande (Logic) ↑ | HellaSwag (Context) ↑ | WikiText (PPL) ↓ | Verdict |
|---|---|---|---|---|---|---|
| 1st | AWQ-UC (Ultrachat) | 0.6524 | 0.7459 | 0.7842 | 9.5979 | The Context King |
| 2nd | AWQ-CK (CyanKiwi) | 0.6524 | 0.7474 | 0.7839 | 9.5986 | Best for Pure Logic |
| 3rd | AWQ-NM (NeuralMagic) | 0.6585 | 0.7443 | 0.7833 | 9.5974 | Best for Code & PPL |
| 4th | Base (BF16) | 0.6524 | 0.7498 | 0.7843 | 9.5951 | Reference (Slow) |
Base model
Qwen/Qwen3.5-9B-Base