---
pipeline_tag: text-generation
base_model:
- Qwen/Qwen2.5-VL-3B-Instruct
metrics:
- perplexity
---

# Qwen2.5-VL-3B-Instruct-per-grp-quant 
- ## Introduction
  This model was quantized using Quark 0.11 [amd_quark-0.11](https://download.amd.com/opendownload/Quark/amd_quark-0.11.zip)
- ## Quantization Strategy
  - ***Quantized Layers***: All linear layers
  - ***Weight***: uint4 asymmetric per-group with group_size=128.
- ## Quick Start
1. [Download the model](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct)
2. Run the quantization script in the example folder using the following command line:
    ```sh
    python run_qwen2_5_vl_quantization.py
    ```
## Evaluation
Quark currently uses perplexity(PPL) as the evaluation metric for accuracy loss before and after quantization.The specific PPL algorithm can be referenced in the quantize_quark.py.
The quantization evaluation results are conducted in pseudo-quantization mode, which may slightly differ from the actual quantized inference accuracy. These results are provided for reference only.

#### Evaluation scores
<table>
  <tr>
   <td><strong>Benchmark</strong>
   </td>
   <td><strong> Qwen2.5-VL-3B-Instruct </strong>
   </td>
   <td><strong> Qwen2.5-VL-3B-Instruct-per-grp-quant(this model) </strong>
   </td>
  </tr>
  <tr>
   <td>Perplexity-wikitext2
   </td>
   <td>11.1107
   </td>
   <td>13.7743
   </td>
  </tr>
  
</table> 


#### License
Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.
Benchmark	Qwen2.5-VL-3B-Instruct	Qwen2.5-VL-3B-Instruct-per-grp-quant(this model)
Perplexity-wikitext2	11.1107	13.7743