--- language: - en base_model: - ibm-granite/granite-4.0-h-tiny pipeline_tag: text-generation tags: - w8a8 - int8 - vllm - granite-4.0 - moe license: apache-2.0 license_name: apache-2.0 license_link: https://www.apache.org/licenses/LICENSE-2.0 name: fedora-copr/granite-4.0-h-tiny-quantized.w8a8 --- # fedora-copr/granite-4.0-h-tiny-quantized.w8a8 This is a W8A8 INT8 quantized version of [ibm-granite/granite-4.0-h-tiny](https://huggingface.co/ibm-granite/granite-4.0-h-tiny). ## Model Details * **Quantized by:** Jiri Podivin * **Architecture:** Granite-4.0 Hybrid MoE (Mamba + Transformer) * **Quantization:** INT8 Weight & Activation (W8A8) * **Engine Support:** vLLM (0.6.0+) ## Performance & Accuracy Results of short benchmark run executed with lm_eval are stored in `eval_results.json`. ## Implementation The quantization was performed using the `llm-compressor` library. ### vLLM Serving ``` vllm serve fedora-copr/granite-4.0-h-tiny-quantized.w8a8 --quantization compressed-tensors ```