TAAM — Typology-Aware Adaptive Mixing — `B0` seed `1337`

This is a BabyLM 2026 Multilingual Track submission checkpoint. It was trained on English + Dutch + Mandarin Chinese under a ≤100M-unique-token budget with the TAAM method.

Method: B0
Seed: 1337
Repo: amosluna/babylm-2026-b0-seed1337
Final π (per-language sampling probability): eng=0.333, nld=0.333, zho=0.333
Total token exposures: 655360000
Training wall-clock: 10121.990438699722 s
Source run dir: runs/2026-05-19_B0_seed1337

Intermediate checkpoints

This repo exposes 24 intermediate checkpoints as branches following the BabyLM 2026 naming convention: chck_1M, chck_2M, ..., chck_10M, chck_20M, ..., chck_100M, chck_200M, ..., chck_600M. The eval pipeline at babylm-org/babylm-eval pulls these revisions automatically with:

bash multilingual/scripts/zeroshot_model_fast_all.sh \
    --model_name amosluna/babylm-2026-b0-seed1337

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

tok = AutoTokenizer.from_pretrained("amosluna/babylm-2026-b0-seed1337")
model = AutoModelForCausalLM.from_pretrained("amosluna/babylm-2026-b0-seed1337")  # final checkpoint
# Intermediate checkpoint:
# model = AutoModelForCausalLM.from_pretrained("amosluna/babylm-2026-b0-seed1337", revision="chck_100M")

Method summary

TAAM combines (a) a URIEL/lang2vec-derived typological prior over initial sampling probabilities, (b) EXP3 online updates over per-language sampling probabilities, and (c) byte-premium-aware token budgeting. The two reward variants are normalized_excess_loss (v1, delta-based) and cross_lingual_deficit (v2, level-based).

See the paper and the public repo for full details, including the structural floor derivation that explains why the v1 reward starves the hardest language under typological asymmetry.

Citation

@inproceedings{taam2026,
  title  = {Typology-Aware Adaptive Mixing for Multilingual BabyLMs},
  author = {Luna, Amos and collaborators},
  year   = {2026},
  booktitle = {Proceedings of the BabyLM Workshop at EMNLP 2026}
}

Downloads last month: 5

Safetensors

Model size

35.6M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

TAAM — Typology-Aware Adaptive Mixing — B0 seed 1337

Intermediate checkpoints

Usage

Method summary

Citation

TAAM — Typology-Aware Adaptive Mixing — `B0` seed `1337`