aya-enes-I2-8 / README.md
adrianMT56's picture
Upload folder using huggingface_hub
8050ff8 verified
|
Raw
History Blame Contribute Delete
2.07 kB
---
language: [en, es]
license: cc-by-nc-4.0
base_model: CohereForAI/aya-expanse-8b
tags:
- translation
- machine-translation
- aya-expanse
- layer-pruning
- interpretability
pipeline_tag: translation
---
# aya-enes-I2-8
English -> Spanish translation model derived from
[CohereForAI/aya-expanse-8b](https://huggingface.co/CohereForAI/aya-expanse-8b)
(32 layers, 8B parameters).
## Recipe
IFR-guided layer pruning (8 middle layers removed), LoRA fine-tuning + knowledge distillation from Aya-Expanse 32B.
- Number of transformer layers: **24** (of the original 32)
- Layers removed: `[8, 10, 11, 12, 13, 14, 15, 16]`
- Pruning method: **IFR (Information Flow Routes)**
- Fine-tuning: LoRA (r=16, alpha=32), 3 epochs on News Commentary v18 en-es
- Distillation: synthetic translations from Aya-Expanse 32B, filtered to COMET >= 0.7
- Precision: fp16
## Evaluation
Evaluated on 500 held-out News Commentary v18 en-es sentences.
| Metric | Value |
|--------|------:|
| COMET (wmt22-comet-da) | 0.8880 |
| chrF++ | 67.13 |
| BLEU | 46.02 |
## Usage
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
name = "adrianMT56/aya-enes-I2-8"
tokenizer = AutoTokenizer.from_pretrained(name)
model = AutoModelForCausalLM.from_pretrained(name, dtype=torch.float16)
prompt = ("Translate the following English text to Spanish.\n\n"
"English: The quick brown fox jumps over the lazy dog.\n"
"Spanish:")
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=128, do_sample=False)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
```
CPU users can omit `dtype=torch.float16` (defaults to float32) or leave it as fp16
at the cost of some throughput. For GPTQ 4-bit conversion see the project's
`scripts/quantize_to_gptq.py`.
## Reproducibility
This checkpoint was produced by the pipeline at
<https://github.com/adrianMT56/attention_lp>.
See `README.md` in that repo for the full training recipe and evaluation scripts.