--- license: gemma base_model: google/gemma-3-4b-pt language: - sm - jv - ceb - fil - id - ms - ilo - war - mg - mi - su tags: - continual-pretraining - multilingual - x-elm - gemma-3 - freeze - austronesian library_name: transformers pipeline_tag: text-generation --- # xelm-gemma-4b-austronesian-freeze Layer-freezing strategy: middle transformer layers are frozen at the base Gemma-3-4B weights; only the first and last layers are updated during CPT. Mitigates catastrophic forgetting of general capabilities. - **Base model**: [google/gemma-3-4b-pt](https://huggingface.co/google/gemma-3-4b-pt) - **Strategy**: `freeze` - **Language family**: Austronesian - **Code**: [https://github.com/sanchit-ahuja/scaling-multilingual-experts](https://github.com/sanchit-ahuja/scaling-multilingual-experts) ## Loading ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("sanchitahuja205/xelm-gemma-4b-austronesian-freeze") tokenizer = AutoTokenizer.from_pretrained("sanchitahuja205/xelm-gemma-4b-austronesian-freeze") ``` ## Training recipe The exact training recipe lives in [`configs/yaml/train_gemma_freeze.yaml`](https://github.com/sanchit-ahuja/scaling-multilingual-experts/blob/main/configs/yaml/train_gemma_freeze.yaml) in the code repo. The resolved config used for this specific run is also included in this model repo as `training_config.yaml` — load it with pyrallis to reproduce the run bit-for-bit: ```bash python train.py --config_path configs/yaml/train_gemma_freeze.yaml ``` ## Citation ```bibtex @misc{ahuja2026parameteralignmentmitigatescatastrophic, title={Parameter Alignment Mitigates Catastrophic Forgetting in Multilingual Expert Language Models}, author={Sanchit Ahuja and Terra Blevins}, year={2026}, eprint={2606.00284}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2606.00284}, } ```