Instructions to use guerreropaula/translategemma4b-grpov1-es-va with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use guerreropaula/translategemma4b-grpov1-es-va with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="guerreropaula/translategemma4b-grpov1-es-va")# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("guerreropaula/translategemma4b-grpov1-es-va") model = AutoModelForMultimodalLM.from_pretrained("guerreropaula/translategemma4b-grpov1-es-va") - Notebooks
- Google Colab
- Kaggle
TranslateGemma-4B GRPOv1 for Spanish-Valencian
Model Summary
guerreropaula/translategemma4b-grpov1-es-va is a GRPO-trained Spanish to Valencian translation model initialized from the SFT checkpoint guerreropaula/translategemma4b-sft-es-va. It is part of the spanish-valencian-mt-rl EAMT 2026 submission.
GRPOv1 combines a reference-based chrF reward with a naturalness reward produced by a separate HT/MT classifier. The goal is to improve translation quality while nudging outputs toward more human-like Valencian phrasing.
Model Details
- Model ID:
guerreropaula/translategemma4b-grpov1-es-va - Collection:
guerreropaula/spanish-valencian-mt-rl - Developed by: Paula Guerrero Castello
- Initialization checkpoint:
guerreropaula/translategemma4b-sft-es-va - Original base model:
google/translategemma-4b-it - Task: Spanish to Valencian machine translation
- License for model weights: Gemma license
- Auxiliary reward model:
guerreropaula/ht_mt_classifier_best
Intended Use
This model is intended for:
- research on reinforcement learning for low-resource dialectal MT
- ablation against SFT and GRPOv2 in the EAMT submission
- studying reward shaping with classifier-based translation naturalness signals
It is not intended for:
- production use without manual quality control
- general-purpose text generation
- use cases that require guaranteed dialectal consistency
Training Data
GRPOv1 uses gplsi/amic_parallel.
- Training samples: 5,000
- Validation split: 2%
- Validation samples used during periodic model selection: 200
- Source column:
ES - Target column:
VA
Training Procedure
GRPOv1 continues training from the SFT checkpoint with Group Relative Policy Optimization.
- Optimizer:
paged_adamw_8bit - Learning rate:
5e-6 - Batch size: 1
- Gradient accumulation: 8
- Max steps: 100
- Warmup steps: 20
- Number of generations per prompt: 2
- Max completion length: 100
- GRPO beta: 0.04
- Scheduler: cosine
- Precision: bf16 when supported, otherwise fp16
Reward definition:
chrFreward on the generated hypothesis against the referenceP(HT | text)from the fine-tuned classifierguerreropaula/ht_mt_classifier_best- linear annealing of classifier weight from 0 up to 0.3 over the first 50 steps
Evaluation
The model was evaluated on 1,000 sentences from gplsi/ES-VA_translation_test.
| Metric | Score |
|---|---|
| chrF | 81.65 |
| BLEU | 56.94 |
| TER | 23.96 |
| BLEURT | 0.481 |
| COMET | 0.926 |
| Dialectal Valencian Score | 15.9% |
Relative to the SFT checkpoint, GRPOv1 did not improve the final corpus-level translation metrics in this repository and also reduced the dialectal Valencian rate.
How To Use
The evaluation script in this repository loads GRPOv1 as a standalone causal LM checkpoint:
from config import Config
from utils.model import build_bnb_config
from transformers import AutoTokenizer, AutoModelForCausalLM
cfg = Config()
bnb = build_bnb_config(cfg)
tokenizer = AutoTokenizer.from_pretrained(cfg.grpov1_model_id)
model = AutoModelForCausalLM.from_pretrained(
cfg.grpov1_model_id,
quantization_config=bnb,
device_map="auto",
use_safetensors=True,
)
Limitations
- The classifier reward is only an indirect proxy for translation naturalness.
- Improvements in reward can diverge from downstream MT metrics.
- The model remains sensitive to the base model's Catalan-centric prior.
- The reinforcement learning stage uses only 5,000 training examples.
License
This model is distributed under the Gemma license inherited from the TranslateGemma base model family. Users should additionally review the licenses of the datasets and the auxiliary classifier used during training.
Citation
@inproceedings{guerrero-2026-enhancing,
title = {Enhancing LLM Translation Performance for Spanish-Valencian through Supervised Fine-tuning and Reinforcement Learning},
author = {Guerrero Castello, Paula},
booktitle = {Proceedings of the 25th Annual Conference of the European Association for Machine Translation},
year = {2026}
}
- Downloads last month
- 68
Model tree for guerreropaula/translategemma4b-grpov1-es-va
Base model
google/translategemma-4b-it