TranslateGemma-4B GRPOv2 for Spanish-Valencian

Model Summary

guerreropaula/translategemma4b-grpov2-es-va is the best-performing model in the spanish-valencian-mt-rl collection. It starts from the SFT checkpoint and applies GRPO with a composite reward tailored to low-resource dialectal translation.

The reward combines adequacy, quality estimation, lexical diversity, and anti-copy behavior:

  • chrF reward
  • COMET reward
  • type-token ratio reward
  • source-copy penalty

Model Details

  • Model ID: guerreropaula/translategemma4b-grpov2-es-va
  • Collection: guerreropaula/spanish-valencian-mt-rl
  • Developed by: Paula Guerrero Castello
  • Initialization checkpoint: guerreropaula/translategemma4b-sft-es-va
  • Original base model: google/translategemma-4b-it
  • Task: Spanish to Valencian machine translation
  • License for model weights: Gemma license

Intended Use

This model is intended for:

  • the main ES-VA system reported in the EAMT 2026 submission
  • research on reward design for low-resource dialectal MT
  • comparison against SFT and classifier-guided GRPO

It is not intended for:

  • uncontrolled deployment in high-stakes domains
  • translation directions beyond Spanish to Valencian
  • applications that require stable terminology control without post-editing

Training Data

GRPOv2 uses gplsi/amic_parallel.

  • Training samples: 10,000
  • Validation split: 2%
  • Validation samples used during periodic model selection: 200
  • Source column: ES
  • Target column: VA

Training Procedure

GRPOv2 continues from the SFT checkpoint with Group Relative Policy Optimization and a composite reward.

  • Optimizer: paged_adamw_8bit
  • Learning rate: 5e-6
  • Batch size: 1
  • Gradient accumulation: 16
  • Max steps: 200
  • Warmup steps: 20
  • Number of generations per prompt: 4
  • Max completion length: 128
  • GRPO beta: 0.04
  • GRPO epsilon: 0.2
  • Scheduler: cosine
  • Precision: bf16 when supported, otherwise fp16

Composite reward weights:

  • chrF: 0.5
  • COMET: 0.3
  • TTR: 0.2
  • copy penalty: added when the output copies the Spanish source too closely

Evaluation

The model was evaluated on 1,000 sentences from gplsi/ES-VA_translation_test.

Metric Score
chrF 84.68
BLEU 62.16
TER 20.63
BLEURT 0.544
COMET 0.936
Dialectal Valencian Score 36.2%

This is the strongest overall model in the repository on corpus-level automatic MT metrics. It outperforms the SFT model on chrF, BLEU, TER, BLEURT, and COMET, while preserving a high Valencian-form usage rate.

How To Use

In this repository, GRPOv2 is loaded as the base TranslateGemma model plus the GRPOv2 adapter:

from config import Config
from utils.model import build_bnb_config, load_base_tokenizer
from transformers import AutoModelForCausalLM
from peft import PeftModel

cfg = Config()
bnb = build_bnb_config(cfg)
tokenizer = load_base_tokenizer(cfg)

base_model = AutoModelForCausalLM.from_pretrained(
    cfg.base_model_id,
    quantization_config=bnb,
    device_map="auto",
    use_safetensors=True,
)

model = PeftModel.from_pretrained(base_model, cfg.grpov2_model_id)

Limitations

  • Dialectal Valencian usage remains below the SFT checkpoint on the repository's handcrafted feature score.
  • COMET is used as part of the reward and may bias training toward its own preferences.
  • The model is evaluated on a public 1,000-sentence test set, not a large multi-domain benchmark.
  • Reward optimization can improve average metrics while still failing on individual sentences.

License

This model is distributed under the Gemma license inherited from google/translategemma-4b-it. Users should verify compatibility with the dataset licenses and their own deployment requirements.

Citation

@inproceedings{guerrero-2026-enhancing,
  title     = {Enhancing LLM Translation Performance for Spanish-Valencian through Supervised Fine-tuning and Reinforcement Learning},
  author    = {Guerrero Castello, Paula},
  booktitle = {Proceedings of the 25th Annual Conference of the European Association for Machine Translation},
  year      = {2026}
}
Downloads last month
64
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for guerreropaula/translategemma4b-grpov2-es-va

Dataset used to train guerreropaula/translategemma4b-grpov2-es-va

Collection including guerreropaula/translategemma4b-grpov2-es-va