xelm-gemma-4b-slavic-expert

Single-family expert: CPT of Gemma-3-4B on one language family only. Used as a building block for model soup and for measuring per-family specialization.

Loading

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("sanchitahuja205/xelm-gemma-4b-slavic-expert")
tokenizer = AutoTokenizer.from_pretrained("sanchitahuja205/xelm-gemma-4b-slavic-expert")

Training recipe

The exact training recipe lives in configs/yaml/train_gemma_single_expert.yaml in the code repo. The resolved config used for this specific run is also included in this model repo as training_config.yaml — load it with pyrallis to reproduce the run bit-for-bit:

python train.py --config_path configs/yaml/train_gemma_single_expert.yaml

Reproducing the reverted variant

The expert-reverted variant restores middle-layer weights to the base Gemma-3-4B while keeping the trained first/last layers. It is not uploaded to the Hub; regenerate it with:

python train.py --config_path configs/yaml/revert_gemma_checkpoint.yaml \
    --revert.checkpoint_path $(huggingface-cli download sanchitahuja205/xelm-gemma-4b-slavic-expert) \
    --revert.revert_output_path ./reverted

Reproducing the expert soup

See the xelm-gemma-4b-dense repo README for the full soup recipe.

Citation

@misc{ahuja2026parameteralignmentmitigatescatastrophic,
      title={Parameter Alignment Mitigates Catastrophic Forgetting in Multilingual Expert Language Models},
      author={Sanchit Ahuja and Terra Blevins},
      year={2026},
      eprint={2606.00284},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2606.00284},
}
Downloads last month
24
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sanchitahuja205/xelm-gemma-4b-slavic-expert

Finetuned
(308)
this model

Collection including sanchitahuja205/xelm-gemma-4b-slavic-expert

Paper for sanchitahuja205/xelm-gemma-4b-slavic-expert