TAAM β€” Typology-Aware Adaptive Mixing β€” Pstar seed 1337

This is a BabyLM 2026 Multilingual Track submission checkpoint. It was trained on English + Dutch + Mandarin Chinese under a ≀100M-unique-token budget with the TAAM method.

  • Method: Pstar
  • Seed: 1337
  • Repo: amosluna/babylm-2026-pstar-seed1337
  • Final Ο€ (per-language sampling probability): eng=0.159, nld=0.258, zho=0.583
  • Total token exposures: 655360000
  • Training wall-clock: 10117.415275096893 s
  • Source run dir: runs/2026-06-01_Pstar_seed1337

Intermediate checkpoints

This repo exposes 24 intermediate checkpoints as branches following the BabyLM 2026 naming convention: chck_1M, chck_2M, ..., chck_10M, chck_20M, ..., chck_100M, chck_200M, ..., chck_600M. The eval pipeline at babylm-org/babylm-eval pulls these revisions automatically with:

bash multilingual/scripts/zeroshot_model_fast_all.sh \
    --model_name amosluna/babylm-2026-pstar-seed1337

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

tok = AutoTokenizer.from_pretrained("amosluna/babylm-2026-pstar-seed1337")
model = AutoModelForCausalLM.from_pretrained("amosluna/babylm-2026-pstar-seed1337")  # final checkpoint
# Intermediate checkpoint:
# model = AutoModelForCausalLM.from_pretrained("amosluna/babylm-2026-pstar-seed1337", revision="chck_100M")

Method summary

TAAM combines (a) a URIEL/lang2vec-derived typological prior over initial sampling probabilities, (b) EXP3 online updates over per-language sampling probabilities, and (c) byte-premium-aware token budgeting. The two reward variants are normalized_excess_loss (v1, delta-based) and cross_lingual_deficit (v2, level-based).

See the paper and the public repo for full details, including the structural floor derivation that explains why the v1 reward starves the hardest language under typological asymmetry.

Citation

@inproceedings{taam2026,
  title  = {Typology-Aware Adaptive Mixing for Multilingual BabyLMs},
  author = {Luna, Amos and collaborators},
  year   = {2026},
  booktitle = {Proceedings of the BabyLM Workshop at EMNLP 2026}
}
Downloads last month
300
Safetensors
Model size
35.6M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support