--- license: gemma base_model: google/gemma-3-4b-pt language: - es - pt - fr - gl - it - ro - ca tags: - continual-pretraining - multilingual - x-elm - gemma-3 - expert - romance library_name: transformers pipeline_tag: text-generation --- # xelm-gemma-4b-romance-expert Single-family expert: CPT of Gemma-3-4B on one language family only. Used as a building block for model soup and for measuring per-family specialization. - **Base model**: [google/gemma-3-4b-pt](https://huggingface.co/google/gemma-3-4b-pt) - **Strategy**: `expert` - **Language family**: Romance - **Code**: [https://github.com/sanchit-ahuja/scaling-multilingual-experts](https://github.com/sanchit-ahuja/scaling-multilingual-experts) ## Loading ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("sanchitahuja205/xelm-gemma-4b-romance-expert") tokenizer = AutoTokenizer.from_pretrained("sanchitahuja205/xelm-gemma-4b-romance-expert") ``` ## Training recipe The exact training recipe lives in [`configs/yaml/train_gemma_single_expert.yaml`](https://github.com/sanchit-ahuja/scaling-multilingual-experts/blob/main/configs/yaml/train_gemma_single_expert.yaml) in the code repo. The resolved config used for this specific run is also included in this model repo as `training_config.yaml` — load it with pyrallis to reproduce the run bit-for-bit: ```bash python train.py --config_path configs/yaml/train_gemma_single_expert.yaml ``` ## Reproducing the reverted variant The *expert-reverted* variant restores middle-layer weights to the base Gemma-3-4B while keeping the trained first/last layers. It is not uploaded to the Hub; regenerate it with: ```bash python train.py --config_path configs/yaml/revert_gemma_checkpoint.yaml \ --revert.checkpoint_path $(huggingface-cli download sanchitahuja205/xelm-gemma-4b-romance-expert) \ --revert.revert_output_path ./reverted ``` ## Reproducing the expert soup See the `xelm-gemma-4b-dense` repo README for the full soup recipe. ## Citation ```bibtex @misc{ahuja2026parameteralignmentmitigatescatastrophic, title={Parameter Alignment Mitigates Catastrophic Forgetting in Multilingual Expert Language Models}, author={Sanchit Ahuja and Terra Blevins}, year={2026}, eprint={2606.00284}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2606.00284}, } ```