--- language: - en - nl - zh tags: - babylm-2026 - multilingual - typology - taam - pstar license: apache-2.0 --- # TAAM — Typology-Aware Adaptive Mixing — `Pstar` seed `1337` This is a BabyLM 2026 Multilingual Track submission checkpoint. It was trained on English + Dutch + Mandarin Chinese under a ≤100M-unique-token budget with the [TAAM](https://github.com/Amos-Luna/Asymmetric-Multilingual-Acquisition_TAAM) method. - **Method**: `Pstar` - **Seed**: `1337` - **Repo**: `amosluna/babylm-2026-pstar-seed1337` - **Final π** (per-language sampling probability): `eng=0.159, nld=0.258, zho=0.583` - **Total token exposures**: `655360000` - **Training wall-clock**: `10117.415275096893 s` - **Source run dir**: `runs/2026-06-01_Pstar_seed1337` ## Intermediate checkpoints This repo exposes `24` intermediate checkpoints as branches following the BabyLM 2026 naming convention: `chck_1M, chck_2M, ..., chck_10M, chck_20M, ..., chck_100M, chck_200M, ..., chck_600M`. The eval pipeline at [babylm-org/babylm-eval](https://github.com/babylm-org/babylm-eval) pulls these revisions automatically with: ```bash bash multilingual/scripts/zeroshot_model_fast_all.sh \ --model_name amosluna/babylm-2026-pstar-seed1337 ``` ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer tok = AutoTokenizer.from_pretrained("amosluna/babylm-2026-pstar-seed1337") model = AutoModelForCausalLM.from_pretrained("amosluna/babylm-2026-pstar-seed1337") # final checkpoint # Intermediate checkpoint: # model = AutoModelForCausalLM.from_pretrained("amosluna/babylm-2026-pstar-seed1337", revision="chck_100M") ``` ## Method summary TAAM combines (a) a URIEL/lang2vec-derived typological prior over initial sampling probabilities, (b) EXP3 online updates over per-language sampling probabilities, and (c) byte-premium-aware token budgeting. The two reward variants are `normalized_excess_loss` (v1, delta-based) and `cross_lingual_deficit` (v2, level-based). See the paper and the public repo for full details, including the structural floor derivation that explains why the v1 reward starves the hardest language under typological asymmetry. ## Citation ```bibtex @inproceedings{taam2026, title = {Typology-Aware Adaptive Mixing for Multilingual BabyLMs}, author = {Luna, Amos and collaborators}, year = {2026}, booktitle = {Proceedings of the BabyLM Workshop at EMNLP 2026} } ```