--- pipeline_tag: text-generation base_model: - Qwen/Qwen2.5-VL-3B-Instruct metrics: - perplexity --- # Qwen2.5-VL-3B-Instruct-per-grp-quant - ## Introduction This model was quantized using Quark 0.11 [amd_quark-0.11](https://download.amd.com/opendownload/Quark/amd_quark-0.11.zip) - ## Quantization Strategy - ***Quantized Layers***: All linear layers - ***Weight***: uint4 asymmetric per-group with group_size=128. - ## Quick Start 1. [Download the model](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) 2. Run the quantization script in the example folder using the following command line: ```sh python run_qwen2_5_vl_quantization.py ``` ## Evaluation Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss before and after quantization.The specific PPL algorithm can be referenced in the quantize_quark.py. The quantization evaluation results are conducted in pseudo-quantization mode, which may slightly differ from the actual quantized inference accuracy. These results are provided for reference only. #### Evaluation scores
Benchmark Qwen2.5-VL-3B-Instruct Qwen2.5-VL-3B-Instruct-per-grp-quant(this model)
Perplexity-wikitext2 11.1107 13.7743
#### License Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.