--- license: apache-2.0 language: - multilingual tags: - translation - quality-estimation - reference-free - comet - cometkiwi - pruning base_model: Unbabel/wmt22-cometkiwi-da pipeline_tag: translation --- # wmt22-cometkiwi-da-int8 A compressed version of [Unbabel/wmt22-cometkiwi-da](https://huggingface.co/Unbabel/wmt22-cometkiwi-da) — a reference-free machine-translation quality estimation model (source + MT only, no human reference required). **Lossless compression** — zero human-Pearson loss, ~40% smaller on disk via int8 alone. ## What's different from the base model - ****No layer pruning** — all 24 XLM-R encoder layers retained. Compression comes entirely from dynamic int8 quantization + fp16 storage.** - `layerwise_attention` rebuilt to mix only the surviving layers (embeddings + kept layer outputs). - **Dynamic int8 quantization** on the XLM-R encoder + fp16 storage (cast back to fp32 at load before quant). No layer pruning — all 24 encoder layers retained. ## Accuracy Benchmarked on 1200 stratified segments from [RicardoRei/wmt-da-human-evaluation](https://huggingface.co/datasets/RicardoRei/wmt-da-human-evaluation) (reference-free, src+mt only): | Metric | This variant | Full cometkiwi | |---|---|---| | Pearson r vs human DA | **0.6404** | 0.6402 | | Spearman vs human DA | **0.6703** | 0.6698 | | Pearson r vs full | **0.9919** | 1.0000 | | MAE vs full | **0.0138** | 0.0000 | | Params | **565.1M** | 565.1M | | On-disk size | **~1130 MB** | ~2200 MB | ### All variants at a glance | Variant | Pearson(human) | Pearson(full) | Size | When to use | |---|---|---|---|---| | [full base](https://huggingface.co/Unbabel/wmt22-cometkiwi-da) | 0.6402 | 1.0000 | ~2200 MB | reference quality | | [`-int8`](https://huggingface.co/solailabs/wmt22-cometkiwi-da-int8) | **0.6404** | 0.9919 | ~1300 MB | **lossless compression** | | [`-pruned-k2`](https://huggingface.co/solailabs/wmt22-cometkiwi-da-pruned-k2) | **0.6300** | 0.9784 | ~2100 MB | best-quality pruned | | [`-pruned-k4`](https://huggingface.co/solailabs/wmt22-cometkiwi-da-pruned-k4) | 0.5642 | 0.8316 | ~2060 MB | aggressive prune | | [`-pruned-k4-xs`](https://huggingface.co/solailabs/wmt22-cometkiwi-da-pruned-k4-xs) | 0.5544 | 0.8113 | ~1030 MB | smallest footprint | ## Usage ```python # pip install "unbabel-comet" "setuptools<81" huggingface_hub # export HF_TOKEN= # must have Unbabel/wmt22-cometkiwi-da access from huggingface_hub import snapshot_download import sys folder = snapshot_download(repo_id="solailabs/wmt22-cometkiwi-da-int8") sys.path.insert(0, folder) from load import load_model model = load_model(folder) out = model.predict( [{{"src": "The meeting has been postponed until next week.", "mt": "La réunion a été reportée à la semaine prochaine."}}], batch_size=8, gpus=0, progress_bar=False, num_workers=2, ) print(out["scores"]) ``` The loader re-downloads the base cometkiwi, drops the same encoder layers, optionally applies int8 dynamic quantization, then loads the weights shipped in this repo. ## Files - `state_dict.pt` — pruned model weights - `config.json` — base model id, kept/dropped layer indices, quant flag, accuracy - `load.py` — drop-in loader - `README.md` — this file ## Gated base model The base `Unbabel/wmt22-cometkiwi-da` is gated. You must accept its license on the Hub while logged in with the same account your `HF_TOKEN` belongs to — otherwise the base-model download inside `load.py` returns 403. ## Citation **Base model:** [`Unbabel/wmt22-cometkiwi-da`](https://huggingface.co/Unbabel/wmt22-cometkiwi-da) by Unbabel. ``` @inproceedings{{rei-etal-2022-cometkiwi, title = "{{C}}omet{{K}}iwi: {{IST}}-{{U}}nbabel 2022 Submission for the Quality Estimation Shared Task", author = "Rei, Ricardo and others", booktitle = "WMT 2022", }} ``` Released under the same license as the base model (Apache 2.0).