solailabs's picture
Initial release: wmt22-cometkiwi-da-int8
0858348 verified
|
Raw
History Blame
3.97 kB
metadata
license: apache-2.0
language:
  - multilingual
tags:
  - translation
  - quality-estimation
  - reference-free
  - comet
  - cometkiwi
  - pruning
base_model: Unbabel/wmt22-cometkiwi-da
pipeline_tag: translation

wmt22-cometkiwi-da-int8

A compressed version of Unbabel/wmt22-cometkiwi-da — a reference-free machine-translation quality estimation model (source + MT only, no human reference required).

Lossless compression — zero human-Pearson loss, ~40% smaller on disk via int8 alone.

What's different from the base model

  • No layer pruning — all 24 XLM-R encoder layers retained. Compression comes entirely from dynamic int8 quantization + fp16 storage.
  • layerwise_attention rebuilt to mix only the surviving layers (embeddings + kept layer outputs).
  • Dynamic int8 quantization on the XLM-R encoder + fp16 storage (cast back to fp32 at load before quant). No layer pruning — all 24 encoder layers retained.

Accuracy

Benchmarked on 1200 stratified segments from RicardoRei/wmt-da-human-evaluation (reference-free, src+mt only):

Metric This variant Full cometkiwi
Pearson r vs human DA 0.6404 0.6402
Spearman vs human DA 0.6703 0.6698
Pearson r vs full 0.9919 1.0000
MAE vs full 0.0138 0.0000
Params 565.1M 565.1M
On-disk size ~1130 MB ~2200 MB

All variants at a glance

Variant Pearson(human) Pearson(full) Size When to use
full base 0.6402 1.0000 ~2200 MB reference quality
-int8 0.6404 0.9919 ~1300 MB lossless compression
-pruned-k2 0.6300 0.9784 ~2100 MB best-quality pruned
-pruned-k4 0.5642 0.8316 ~2060 MB aggressive prune
-pruned-k4-xs 0.5544 0.8113 ~1030 MB smallest footprint

Usage

# pip install "unbabel-comet" "setuptools<81" huggingface_hub
# export HF_TOKEN=<your_token>   # must have Unbabel/wmt22-cometkiwi-da access

from huggingface_hub import snapshot_download
import sys
folder = snapshot_download(repo_id="solailabs/wmt22-cometkiwi-da-int8")
sys.path.insert(0, folder)
from load import load_model

model = load_model(folder)
out = model.predict(
    [{{"src": "The meeting has been postponed until next week.",
       "mt":  "La réunion a été reportée à la semaine prochaine."}}],
    batch_size=8, gpus=0, progress_bar=False, num_workers=2,
)
print(out["scores"])

The loader re-downloads the base cometkiwi, drops the same encoder layers, optionally applies int8 dynamic quantization, then loads the weights shipped in this repo.

Files

  • state_dict.pt — pruned model weights
  • config.json — base model id, kept/dropped layer indices, quant flag, accuracy
  • load.py — drop-in loader
  • README.md — this file

Gated base model

The base Unbabel/wmt22-cometkiwi-da is gated. You must accept its license on the Hub while logged in with the same account your HF_TOKEN belongs to — otherwise the base-model download inside load.py returns 403.

Citation

Base model: Unbabel/wmt22-cometkiwi-da by Unbabel.

@inproceedings{{rei-etal-2022-cometkiwi,
    title = "{{C}}omet{{K}}iwi: {{IST}}-{{U}}nbabel 2022 Submission for the Quality Estimation Shared Task",
    author = "Rei, Ricardo  and others",
    booktitle = "WMT 2022",
}}

Released under the same license as the base model (Apache 2.0).