--- language: [en, es] license: cc-by-nc-4.0 base_model: CohereForAI/aya-expanse-8b tags: - translation - machine-translation - aya-expanse - layer-pruning - interpretability pipeline_tag: translation --- # aya-enes-I2-8 English -> Spanish translation model derived from [CohereForAI/aya-expanse-8b](https://huggingface.co/CohereForAI/aya-expanse-8b) (32 layers, 8B parameters). ## Recipe IFR-guided layer pruning (8 middle layers removed), LoRA fine-tuning + knowledge distillation from Aya-Expanse 32B. - Number of transformer layers: **24** (of the original 32) - Layers removed: `[8, 10, 11, 12, 13, 14, 15, 16]` - Pruning method: **IFR (Information Flow Routes)** - Fine-tuning: LoRA (r=16, alpha=32), 3 epochs on News Commentary v18 en-es - Distillation: synthetic translations from Aya-Expanse 32B, filtered to COMET >= 0.7 - Precision: fp16 ## Evaluation Evaluated on 500 held-out News Commentary v18 en-es sentences. | Metric | Value | |--------|------:| | COMET (wmt22-comet-da) | 0.8880 | | chrF++ | 67.13 | | BLEU | 46.02 | ## Usage ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer name = "adrianMT56/aya-enes-I2-8" tokenizer = AutoTokenizer.from_pretrained(name) model = AutoModelForCausalLM.from_pretrained(name, dtype=torch.float16) prompt = ("Translate the following English text to Spanish.\n\n" "English: The quick brown fox jumps over the lazy dog.\n" "Spanish:") inputs = tokenizer(prompt, return_tensors="pt").to(model.device) out = model.generate(**inputs, max_new_tokens=128, do_sample=False) print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)) ``` CPU users can omit `dtype=torch.float16` (defaults to float32) or leave it as fp16 at the cost of some throughput. For GPTQ 4-bit conversion see the project's `scripts/quantize_to_gptq.py`. ## Reproducibility This checkpoint was produced by the pipeline at . See `README.md` in that repo for the full training recipe and evaluation scripts.