Model Card for Model ID

This is a mixed BF16-INT8 AWQ layer quantization, with working MTP (speculative decoding) via llmcompressor.

Model Details

📖 Model Description:

The "UC" in the name refers to "HuggingFaceH4/ultrachat_200k" dataset used for this quant.

Fixed chat_template with "froggeric/Qwen-Fixed-Chat-Templates"

Working MTP with VLLM flag:

--speculative-config '{"method":"mtp","num_speculative_tokens":2}'

Tested with VLLM 0.19.1 and transformers 5.6.2

Recommended flags:

--enable-auto-tool-choice
--reasoning-parser qwen3
--tool-call-parser qwen3_xml

🏆 Ranking: Qwen3.5-9B-DeepSeek-V4-Flash Series (just my benchmarks):

Rank Model / Dataset HumanEval (Code) ↑ Winogrande (Logic) ↑ HellaSwag (Context) ↑ WikiText (PPL) ↓ Verdict
1st AWQ-UC (Ultrachat) 0.6524 0.7459 0.7842 9.5979 The Context King
2nd AWQ-CK (CyanKiwi) 0.6524 0.7474 0.7839 9.5986 Best for Pure Logic
3rd AWQ-NM (NeuralMagic) 0.6585 0.7443 0.7833 9.5974 Best for Code & PPL
4th Base (BF16) 0.6524 0.7498 0.7843 9.5951 Reference (Slow)

🔍 Key Takeaways:

  • Maximum Fidelity (UC): The UltraChat 200k dataset achieved near-perfect context retention, staying within 0.0001 of the Base model's HellaSwag score. This makes it the superior choice for high-precision document processing.
  • Coding Optimization (NM): Interestingly, NeuralMagic was the only dataset that actually improved the HumanEval score over the Base model (+0.0061), suggesting a highly effective alignment for algorithmic tasks.
  • Logical Stability (CK): The CyanKiwi dataset preserved the highest level of Winogrande accuracy among all quantized versions, essential for reasoning-heavy workflows.
  • Efficiency: Quantization provided a consistent 30% reduction in execution time (from 00:40 to 00:28 in HumanEval) with negligible impact on language fluency (Perplexity).

🙏 Acknowledgements:

Downloads last month
636
Safetensors
Model size
10B params
Tensor type
I64
·
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Arien0/Qwen3.5-9B-DeepSeek-V4-Flash-AWQ-BF16-INT8-UC-MTP

Finetuned
Qwen/Qwen3.5-9B
Quantized
(9)
this model

Dataset used to train Arien0/Qwen3.5-9B-DeepSeek-V4-Flash-AWQ-BF16-INT8-UC-MTP