Instructions to use guerreropaula/translategemma4b-grpov2-es-va with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use guerreropaula/translategemma4b-grpov2-es-va with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="guerreropaula/translategemma4b-grpov2-es-va")# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("guerreropaula/translategemma4b-grpov2-es-va") model = AutoModelForMultimodalLM.from_pretrained("guerreropaula/translategemma4b-grpov2-es-va") - COMET
How to use guerreropaula/translategemma4b-grpov2-es-va with COMET:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
TranslateGemma-4B GRPOv2 for Spanish-Valencian
Model Summary
guerreropaula/translategemma4b-grpov2-es-va is the best-performing model in the spanish-valencian-mt-rl collection. It starts from the SFT checkpoint and applies GRPO with a composite reward tailored to low-resource dialectal translation.
The reward combines adequacy, quality estimation, lexical diversity, and anti-copy behavior:
- chrF reward
- COMET reward
- type-token ratio reward
- source-copy penalty
Model Details
- Model ID:
guerreropaula/translategemma4b-grpov2-es-va - Collection:
guerreropaula/spanish-valencian-mt-rl - Developed by: Paula Guerrero Castello
- Initialization checkpoint:
guerreropaula/translategemma4b-sft-es-va - Original base model:
google/translategemma-4b-it - Task: Spanish to Valencian machine translation
- License for model weights: Gemma license
Intended Use
This model is intended for:
- the main ES-VA system reported in the EAMT 2026 submission
- research on reward design for low-resource dialectal MT
- comparison against SFT and classifier-guided GRPO
It is not intended for:
- uncontrolled deployment in high-stakes domains
- translation directions beyond Spanish to Valencian
- applications that require stable terminology control without post-editing
Training Data
GRPOv2 uses gplsi/amic_parallel.
- Training samples: 10,000
- Validation split: 2%
- Validation samples used during periodic model selection: 200
- Source column:
ES - Target column:
VA
Training Procedure
GRPOv2 continues from the SFT checkpoint with Group Relative Policy Optimization and a composite reward.
- Optimizer:
paged_adamw_8bit - Learning rate:
5e-6 - Batch size: 1
- Gradient accumulation: 16
- Max steps: 200
- Warmup steps: 20
- Number of generations per prompt: 4
- Max completion length: 128
- GRPO beta: 0.04
- GRPO epsilon: 0.2
- Scheduler: cosine
- Precision: bf16 when supported, otherwise fp16
Composite reward weights:
- chrF: 0.5
- COMET: 0.3
- TTR: 0.2
- copy penalty: added when the output copies the Spanish source too closely
Evaluation
The model was evaluated on 1,000 sentences from gplsi/ES-VA_translation_test.
| Metric | Score |
|---|---|
| chrF | 84.68 |
| BLEU | 62.16 |
| TER | 20.63 |
| BLEURT | 0.544 |
| COMET | 0.936 |
| Dialectal Valencian Score | 36.2% |
This is the strongest overall model in the repository on corpus-level automatic MT metrics. It outperforms the SFT model on chrF, BLEU, TER, BLEURT, and COMET, while preserving a high Valencian-form usage rate.
How To Use
In this repository, GRPOv2 is loaded as the base TranslateGemma model plus the GRPOv2 adapter:
from config import Config
from utils.model import build_bnb_config, load_base_tokenizer
from transformers import AutoModelForCausalLM
from peft import PeftModel
cfg = Config()
bnb = build_bnb_config(cfg)
tokenizer = load_base_tokenizer(cfg)
base_model = AutoModelForCausalLM.from_pretrained(
cfg.base_model_id,
quantization_config=bnb,
device_map="auto",
use_safetensors=True,
)
model = PeftModel.from_pretrained(base_model, cfg.grpov2_model_id)
Limitations
- Dialectal Valencian usage remains below the SFT checkpoint on the repository's handcrafted feature score.
- COMET is used as part of the reward and may bias training toward its own preferences.
- The model is evaluated on a public 1,000-sentence test set, not a large multi-domain benchmark.
- Reward optimization can improve average metrics while still failing on individual sentences.
License
This model is distributed under the Gemma license inherited from google/translategemma-4b-it. Users should verify compatibility with the dataset licenses and their own deployment requirements.
Citation
@inproceedings{guerrero-2026-enhancing,
title = {Enhancing LLM Translation Performance for Spanish-Valencian through Supervised Fine-tuning and Reinforcement Learning},
author = {Guerrero Castello, Paula},
booktitle = {Proceedings of the 25th Annual Conference of the European Association for Machine Translation},
year = {2026}
}
- Downloads last month
- 64
Model tree for guerreropaula/translategemma4b-grpov2-es-va
Base model
google/translategemma-4b-it