---
license: gemma
base_model: google/gemma-3-4b-pt
language:
  - sm
  - jv
  - ceb
  - fil
  - id
  - ms
  - ilo
  - war
  - mg
  - mi
  - su
tags:
  - continual-pretraining
  - multilingual
  - x-elm
  - gemma-3
  - freeze
  - austronesian
library_name: transformers
pipeline_tag: text-generation
---
# xelm-gemma-4b-austronesian-freeze

Layer-freezing strategy: middle transformer layers are frozen at the base Gemma-3-4B weights; only the first and last layers are updated during CPT. Mitigates catastrophic forgetting of general capabilities.

- **Base model**: [google/gemma-3-4b-pt](https://huggingface.co/google/gemma-3-4b-pt)
- **Strategy**: `freeze`
- **Language family**: Austronesian
- **Code**: [https://github.com/sanchit-ahuja/scaling-multilingual-experts](https://github.com/sanchit-ahuja/scaling-multilingual-experts)

## Loading

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("sanchitahuja205/xelm-gemma-4b-austronesian-freeze")
tokenizer = AutoTokenizer.from_pretrained("sanchitahuja205/xelm-gemma-4b-austronesian-freeze")
```

## Training recipe

The exact training recipe lives in [`configs/yaml/train_gemma_freeze.yaml`](https://github.com/sanchit-ahuja/scaling-multilingual-experts/blob/main/configs/yaml/train_gemma_freeze.yaml) in the code repo. The resolved config used for this specific run is also included in this model repo as `training_config.yaml` — load it with pyrallis to reproduce the run bit-for-bit:

```bash
python train.py --config_path configs/yaml/train_gemma_freeze.yaml
```


## Citation

```bibtex
@misc{ahuja2026parameteralignmentmitigatescatastrophic,
      title={Parameter Alignment Mitigates Catastrophic Forgetting in Multilingual Expert Language Models},
      author={Sanchit Ahuja and Terra Blevins},
      year={2026},
      eprint={2606.00284},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2606.00284},
}
```