TAAM β Typology-Aware Adaptive Mixing β B0 seed 1337
This is a BabyLM 2026 Multilingual Track submission checkpoint. It was trained on English + Dutch + Mandarin Chinese under a β€100M-unique-token budget with the TAAM method.
- Method:
B0 - Seed:
1337 - Repo:
amosluna/babylm-2026-b0-seed1337 - Final Ο (per-language sampling probability):
eng=0.333, nld=0.333, zho=0.333 - Total token exposures:
655360000 - Training wall-clock:
10121.990438699722 s - Source run dir:
runs/2026-05-19_B0_seed1337
Intermediate checkpoints
This repo exposes 24 intermediate checkpoints as branches following
the BabyLM 2026 naming convention: chck_1M, chck_2M, ..., chck_10M, chck_20M, ..., chck_100M, chck_200M, ..., chck_600M. The eval pipeline at
babylm-org/babylm-eval pulls
these revisions automatically with:
bash multilingual/scripts/zeroshot_model_fast_all.sh \
--model_name amosluna/babylm-2026-b0-seed1337
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("amosluna/babylm-2026-b0-seed1337")
model = AutoModelForCausalLM.from_pretrained("amosluna/babylm-2026-b0-seed1337") # final checkpoint
# Intermediate checkpoint:
# model = AutoModelForCausalLM.from_pretrained("amosluna/babylm-2026-b0-seed1337", revision="chck_100M")
Method summary
TAAM combines (a) a URIEL/lang2vec-derived typological prior over initial
sampling probabilities, (b) EXP3 online updates over per-language sampling
probabilities, and (c) byte-premium-aware token budgeting. The two reward
variants are normalized_excess_loss (v1, delta-based) and
cross_lingual_deficit (v2, level-based).
See the paper and the public repo for full details, including the structural floor derivation that explains why the v1 reward starves the hardest language under typological asymmetry.
Citation
@inproceedings{taam2026,
title = {Typology-Aware Adaptive Mixing for Multilingual BabyLMs},
author = {Luna, Amos and collaborators},
year = {2026},
booktitle = {Proceedings of the BabyLM Workshop at EMNLP 2026}
}
- Downloads last month
- 5