| --- |
| language: [en, es] |
| license: cc-by-nc-4.0 |
| base_model: CohereForAI/aya-expanse-8b |
| tags: |
| - translation |
| - machine-translation |
| - aya-expanse |
| - layer-pruning |
| - interpretability |
| pipeline_tag: translation |
| --- |
| |
| # aya-enes-I2-8 |
|
|
| English -> Spanish translation model derived from |
| [CohereForAI/aya-expanse-8b](https://huggingface.co/CohereForAI/aya-expanse-8b) |
| (32 layers, 8B parameters). |
|
|
| ## Recipe |
|
|
| IFR-guided layer pruning (8 middle layers removed), LoRA fine-tuning + knowledge distillation from Aya-Expanse 32B. |
|
|
| - Number of transformer layers: **24** (of the original 32) |
| - Layers removed: `[8, 10, 11, 12, 13, 14, 15, 16]` |
| - Pruning method: **IFR (Information Flow Routes)** |
| - Fine-tuning: LoRA (r=16, alpha=32), 3 epochs on News Commentary v18 en-es |
| - Distillation: synthetic translations from Aya-Expanse 32B, filtered to COMET >= 0.7 |
| - Precision: fp16 |
|
|
| ## Evaluation |
|
|
| Evaluated on 500 held-out News Commentary v18 en-es sentences. |
|
|
| | Metric | Value | |
| |--------|------:| |
| | COMET (wmt22-comet-da) | 0.8880 | |
| | chrF++ | 67.13 | |
| | BLEU | 46.02 | |
|
|
| ## Usage |
|
|
| ```python |
| import torch |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| name = "adrianMT56/aya-enes-I2-8" |
| tokenizer = AutoTokenizer.from_pretrained(name) |
| model = AutoModelForCausalLM.from_pretrained(name, dtype=torch.float16) |
| |
| prompt = ("Translate the following English text to Spanish.\n\n" |
| "English: The quick brown fox jumps over the lazy dog.\n" |
| "Spanish:") |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
| out = model.generate(**inputs, max_new_tokens=128, do_sample=False) |
| print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)) |
| ``` |
|
|
| CPU users can omit `dtype=torch.float16` (defaults to float32) or leave it as fp16 |
| at the cost of some throughput. For GPTQ 4-bit conversion see the project's |
| `scripts/quantize_to_gptq.py`. |
|
|
| ## Reproducibility |
|
|
| This checkpoint was produced by the pipeline at |
| <https://github.com/adrianMT56/attention_lp>. |
| See `README.md` in that repo for the full training recipe and evaluation scripts. |
|
|