Initial release: wmt22-cometkiwi-da-int8

Browse files

Files changed (4) hide show

README.md +97 -0
config.json +39 -0
load.py +72 -0
state_dict.pt +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,97 @@

+---
+license: apache-2.0
+language:
+- multilingual
+tags:
+- translation
+- quality-estimation
+- reference-free
+- comet
+- cometkiwi
+- pruning
+base_model: Unbabel/wmt22-cometkiwi-da
+pipeline_tag: translation
+---
+# wmt22-cometkiwi-da-int8
+A compressed version of [Unbabel/wmt22-cometkiwi-da](https://huggingface.co/Unbabel/wmt22-cometkiwi-da) — a reference-free machine-translation quality estimation model (source + MT only, no human reference required).
+**Lossless compression** — zero human-Pearson loss, ~40% smaller on disk via int8 alone.
+## What's different from the base model
+- ****No layer pruning** — all 24 XLM-R encoder layers retained. Compression comes entirely from dynamic int8 quantization + fp16 storage.**
+- `layerwise_attention` rebuilt to mix only the surviving layers (embeddings + kept layer outputs).
+- **Dynamic int8 quantization** on the XLM-R encoder + fp16 storage (cast back to fp32 at load before quant). No layer pruning — all 24 encoder layers retained.
+## Accuracy
+Benchmarked on 1200 stratified segments from [RicardoRei/wmt-da-human-evaluation](https://huggingface.co/datasets/RicardoRei/wmt-da-human-evaluation) (reference-free, src+mt only):
+| Metric | This variant | Full cometkiwi |
+|---|---|---|
+| Pearson r vs human DA | **0.6404** | 0.6402 |
+| Spearman vs human DA  | **0.6703** | 0.6698 |
+| Pearson r vs full     | **0.9919** | 1.0000 |
+| MAE vs full           | **0.0138** | 0.0000 |
+| Params                | **565.1M** | 565.1M |
+| On-disk size          | **~1130 MB** | ~2200 MB |
+### All variants at a glance
+| Variant | Pearson(human) | Pearson(full) | Size | When to use |
+|---|---|---|---|---|
+| [full base](https://huggingface.co/Unbabel/wmt22-cometkiwi-da) | 0.6402 | 1.0000 | ~2200 MB | reference quality |
+| [`-int8`](https://huggingface.co/solailabs/wmt22-cometkiwi-da-int8) | **0.6404** | 0.9919 | ~1300 MB | **lossless compression** |
+| [`-pruned-k2`](https://huggingface.co/solailabs/wmt22-cometkiwi-da-pruned-k2) | **0.6300** | 0.9784 | ~2100 MB | best-quality pruned |
+| [`-pruned-k4`](https://huggingface.co/solailabs/wmt22-cometkiwi-da-pruned-k4) | 0.5642 | 0.8316 | ~2060 MB | aggressive prune |
+| [`-pruned-k4-xs`](https://huggingface.co/solailabs/wmt22-cometkiwi-da-pruned-k4-xs) | 0.5544 | 0.8113 | ~1030 MB | smallest footprint |
+## Usage
+```python
+# pip install "unbabel-comet" "setuptools<81" huggingface_hub
+# export HF_TOKEN=<your_token>   # must have Unbabel/wmt22-cometkiwi-da access
+from huggingface_hub import snapshot_download
+import sys
+folder = snapshot_download(repo_id="solailabs/wmt22-cometkiwi-da-int8")
+sys.path.insert(0, folder)
+from load import load_model
+model = load_model(folder)
+out = model.predict(
+    [{{"src": "The meeting has been postponed until next week.",
+       "mt":  "La réunion a été reportée à la semaine prochaine."}}],
+    batch_size=8, gpus=0, progress_bar=False, num_workers=2,
+)
+print(out["scores"])
+```
+The loader re-downloads the base cometkiwi, drops the same encoder layers, optionally applies int8 dynamic quantization, then loads the weights shipped in this repo.
+## Files
+- `state_dict.pt` — pruned model weights
+- `config.json`   — base model id, kept/dropped layer indices, quant flag, accuracy
+- `load.py`       — drop-in loader
+- `README.md`     — this file
+## Gated base model
+The base `Unbabel/wmt22-cometkiwi-da` is gated. You must accept its license on the Hub while logged in with the same account your `HF_TOKEN` belongs to — otherwise the base-model download inside `load.py` returns 403.
+## Citation
+**Base model:** [`Unbabel/wmt22-cometkiwi-da`](https://huggingface.co/Unbabel/wmt22-cometkiwi-da) by Unbabel.
+```
+@inproceedings{{rei-etal-2022-cometkiwi,
+    title = "{{C}}omet{{K}}iwi: {{IST}}-{{U}}nbabel 2022 Submission for the Quality Estimation Shared Task",
+    author = "Rei, Ricardo  and others",
+    booktitle = "WMT 2022",
+}}
+```
+Released under the same license as the base model (Apache 2.0).

config.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "base_model": "Unbabel/wmt22-cometkiwi-da",
+  "orig_num_layers": 24,
+  "keep_idx": [
+    0,
+    1,
+    2,
+    3,
+    4,
+    5,
+    6,
+    7,
+    8,
+    9,
+    10,
+    11,
+    12,
+    13,
+    14,
+    15,
+    16,
+    17,
+    18,
+    19,
+    20,
+    21,
+    22,
+    23
+  ],
+  "dropped": [],
+  "tag": "cometkiwi_int8",
+  "quantized": true,
+  "quant_dtype": "qint8",
+  "fp16_storage": true,
+  "pearson_vs_full": 0.9919,
+  "mae_vs_full": 0.0138,
+  "pearson_human": 0.6404,
+  "params_M": 565.137435
+}

load.py ADDED Viewed

	@@ -0,0 +1,72 @@

+"""
+Drop-in loader for solailabs/wmt22-comet-da-pruned* models.
+    from huggingface_hub import snapshot_download
+    import sys
+    folder = snapshot_download(repo_id="solailabs/wmt22-comet-da-pruned-k4-int8")
+    sys.path.insert(0, folder)
+    from load import load_model
+    model = load_model()
+    print(model.predict([{"src": "...", "mt": "...", "ref": "..."}], gpus=0)["scores"])
+"""
+import json
+import platform
+from pathlib import Path
+import torch
+from comet import download_model, load_from_checkpoint
+from torch.nn import Parameter, ParameterList
+def load_model(folder: str | Path | None = None):
+    """Reconstruct the pruned (and optionally int8-quantized) COMET model."""
+    folder = Path(folder) if folder else Path(__file__).parent
+    cfg = json.loads((folder / "config.json").read_text())
+    base_ckpt = download_model(cfg["base_model"])
+    model = load_from_checkpoint(base_ckpt)
+    keep = cfg["keep_idx"]
+    layers = model.encoder.model.encoder.layer
+    model.encoder.model.encoder.layer = torch.nn.ModuleList([layers[i] for i in keep])
+    model.encoder.model.config.num_hidden_layers = len(keep)
+    la = model.layerwise_attention
+    mix_keep = [0] + [i + 1 for i in keep]
+    la.scalar_parameters = ParameterList([
+        Parameter(la.scalar_parameters[i].data.clone(), requires_grad=True)
+        for i in mix_keep
+    ])
+    la.num_layers = len(mix_keep)
+    if hasattr(la, "dropout_mask"):
+        la.dropout_mask = torch.zeros(len(mix_keep))
+        la.dropout_fill = torch.empty(len(mix_keep)).fill_(-1e20)
+    quantize_at_load = cfg.get("quantized") and cfg.get("fp16_storage")
+    if cfg.get("quantized") and not quantize_at_load:
+        # Legacy path: state_dict contains already-quantized packed params
+        engine = "qnnpack" if platform.machine() in ("arm64", "aarch64") else "fbgemm"
+        torch.backends.quantized.engine = engine
+        model.encoder.model = torch.quantization.quantize_dynamic(
+            model.encoder.model, {torch.nn.Linear}, dtype=torch.qint8
+        )
+    state = torch.load(folder / "state_dict.pt", map_location="cpu", weights_only=False)
+    own = model.state_dict()
+    fixed = {}
+    for k, v in state.items():
+        if k in own and isinstance(v, torch.Tensor) and isinstance(own[k], torch.Tensor) and v.dtype != own[k].dtype:
+            fixed[k] = v.to(own[k].dtype)
+        else:
+            fixed[k] = v
+    model.load_state_dict(fixed, strict=False)
+    if quantize_at_load:
+        # Quantize AFTER loading fp16/fp32 weights
+        engine = "qnnpack" if platform.machine() in ("arm64", "aarch64") else "fbgemm"
+        torch.backends.quantized.engine = engine
+        model.encoder.model = torch.quantization.quantize_dynamic(
+            model.encoder.model, {torch.nn.Linear}, dtype=torch.qint8
+        )
+    model.eval()
+    return model

state_dict.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:720b028f0fc062623a63fefb5f6564289e9a6107fc293cdbc6ef79031072b929
+size 1130416312