---
language:
  - en
base_model:
  - ibm-granite/granite-4.0-h-tiny
pipeline_tag: text-generation
tags:
  - w8a8
  - int8
  - vllm
  - granite-4.0
  - moe
license: apache-2.0
license_name: apache-2.0
license_link: https://www.apache.org/licenses/LICENSE-2.0
name: fedora-copr/granite-4.0-h-tiny-quantized.w8a8
---

# fedora-copr/granite-4.0-h-tiny-quantized.w8a8

This is a W8A8 INT8 quantized version of [ibm-granite/granite-4.0-h-tiny](https://huggingface.co/ibm-granite/granite-4.0-h-tiny).

## Model Details

* **Quantized by:** Jiri Podivin <jpodivin@redhat.com>

* **Architecture:** Granite-4.0 Hybrid MoE (Mamba + Transformer)

* **Quantization:** INT8 Weight & Activation (W8A8)

* **Engine Support:** vLLM (0.6.0+)

## Performance & Accuracy

Results of short benchmark run executed with lm_eval are stored in `eval_results.json`.

## Implementation

The quantization was performed using the `llm-compressor` library.


### vLLM Serving

```
vllm serve fedora-copr/granite-4.0-h-tiny-quantized.w8a8 --quantization compressed-tensors

```