---
license: gemma
base_model: google/gemma-3-4b-pt
language:
  - es
  - pt
  - fr
  - gl
  - it
  - ro
  - ca
tags:
  - continual-pretraining
  - multilingual
  - x-elm
  - gemma-3
  - expert
  - romance
library_name: transformers
pipeline_tag: text-generation
---
# xelm-gemma-4b-romance-expert

Single-family expert: CPT of Gemma-3-4B on one language family only. Used as a building block for model soup and for measuring per-family specialization.

- **Base model**: [google/gemma-3-4b-pt](https://huggingface.co/google/gemma-3-4b-pt)
- **Strategy**: `expert`
- **Language family**: Romance
- **Code**: [https://github.com/sanchit-ahuja/scaling-multilingual-experts](https://github.com/sanchit-ahuja/scaling-multilingual-experts)

## Loading

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("sanchitahuja205/xelm-gemma-4b-romance-expert")
tokenizer = AutoTokenizer.from_pretrained("sanchitahuja205/xelm-gemma-4b-romance-expert")
```

## Training recipe

The exact training recipe lives in [`configs/yaml/train_gemma_single_expert.yaml`](https://github.com/sanchit-ahuja/scaling-multilingual-experts/blob/main/configs/yaml/train_gemma_single_expert.yaml) in the code repo. The resolved config used for this specific run is also included in this model repo as `training_config.yaml` — load it with pyrallis to reproduce the run bit-for-bit:

```bash
python train.py --config_path configs/yaml/train_gemma_single_expert.yaml
```

## Reproducing the reverted variant

The *expert-reverted* variant restores middle-layer weights to the base Gemma-3-4B while keeping the trained first/last layers. It is not uploaded to the Hub; regenerate it with:

```bash
python train.py --config_path configs/yaml/revert_gemma_checkpoint.yaml \
    --revert.checkpoint_path $(huggingface-cli download sanchitahuja205/xelm-gemma-4b-romance-expert) \
    --revert.revert_output_path ./reverted
```

## Reproducing the expert soup

See the `xelm-gemma-4b-dense` repo README for the full soup recipe.

## Citation

```bibtex
@misc{ahuja2026parameteralignmentmitigatescatastrophic,
      title={Parameter Alignment Mitigates Catastrophic Forgetting in Multilingual Expert Language Models},
      author={Sanchit Ahuja and Terra Blevins},
      year={2026},
      eprint={2606.00284},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2606.00284},
}
```