amosluna
/

babylm-2026-pstar-seed1337

Model card Files Files and versions

babylm-2026-pstar-seed1337 / README.md

amosluna's picture

main: final checkpoint (step=20000, tokens=655M)

57cf63e verified 23 days ago

|

History Blame Contribute Delete

2.41 kB

	---
	language:
	- en
	- nl
	- zh
	tags:
	- babylm-2026
	- multilingual
	- typology
	- taam
	- pstar
	license: apache-2.0
	---

	# TAAM — Typology-Aware Adaptive Mixing — `Pstar` seed `1337`

	This is a BabyLM 2026 Multilingual Track submission checkpoint. It was trained
	on English + Dutch + Mandarin Chinese under a ≤100M-unique-token budget with
	the [TAAM](https://github.com/Amos-Luna/Asymmetric-Multilingual-Acquisition_TAAM)
	method.

	- Method: `Pstar`
	- Seed: `1337`
	- Repo: `amosluna/babylm-2026-pstar-seed1337`
	- Final π (per-language sampling probability): `eng=0.159, nld=0.258, zho=0.583`
	- Total token exposures: `655360000`
	- Training wall-clock: `10117.415275096893 s`
	- Source run dir: `runs/2026-06-01_Pstar_seed1337`

	## Intermediate checkpoints

	This repo exposes `24` intermediate checkpoints as branches following
	the BabyLM 2026 naming convention: `chck_1M, chck_2M, ..., chck_10M,
	chck_20M, ..., chck_100M, chck_200M, ..., chck_600M`. The eval pipeline at
	[babylm-org/babylm-eval](https://github.com/babylm-org/babylm-eval) pulls
	these revisions automatically with:

	```bash
	bash multilingual/scripts/zeroshot_model_fast_all.sh \
	--model_name amosluna/babylm-2026-pstar-seed1337
	```

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	tok = AutoTokenizer.from_pretrained("amosluna/babylm-2026-pstar-seed1337")
	model = AutoModelForCausalLM.from_pretrained("amosluna/babylm-2026-pstar-seed1337") # final checkpoint
	# Intermediate checkpoint:
	# model = AutoModelForCausalLM.from_pretrained("amosluna/babylm-2026-pstar-seed1337", revision="chck_100M")
	```

	## Method summary

	TAAM combines (a) a URIEL/lang2vec-derived typological prior over initial
	sampling probabilities, (b) EXP3 online updates over per-language sampling
	probabilities, and (c) byte-premium-aware token budgeting. The two reward
	variants are `normalized_excess_loss` (v1, delta-based) and
	`cross_lingual_deficit` (v2, level-based).

	See the paper and the public repo for full details, including the structural
	floor derivation that explains why the v1 reward starves the hardest
	language under typological asymmetry.

	## Citation

	```bibtex
	@inproceedings{taam2026,
	title = {Typology-Aware Adaptive Mixing for Multilingual BabyLMs},
	author = {Luna, Amos and collaborators},
	year = {2026},
	booktitle = {Proceedings of the BabyLM Workshop at EMNLP 2026}
	}
	```