ModernBERT-base Yat MLP Swap

This repo contains Yat student replacements for all 22 MLP blocks of answerdotai/ModernBERT-base.

The checkpoints were trained on Kaggle 2xT4 with a two-phase pipeline:

Phase 1: data-free random/on-shell distillation. Random Gaussian probes are projected through each layer's mlp_norm, then each Yat block learns the corresponding ModernBERT MLP delta map.
Phase 2: real-activation fine-tuning. Real mlp_norm activations are collected from Wikitext text, then every Phase-1 Yat block is fine-tuned on its layer's real activations.

The model is distributed as a lightweight patch over the base ModernBERT weights. yat_modernbert.py downloads answerdotai/ModernBERT-base, loads the Phase-2 Yat checkpoints, and replaces every model.model.layers[i].mlp module.

Results

Phase 1 mean random/on-shell rho: 0.281783

Phase 2 mean real-activation rho: 1.654114 -> 0.074821

Layer	Phase 1 rho	Phase 2 pre-real	Phase 2 post-real
0	0.144016	2.261558	0.044480
1	0.346435	1.351833	0.105199
2	0.372108	1.215147	0.080989
3	0.405880	0.692723	0.059312
4	0.295711	0.024458	0.001823
5	0.246746	0.765731	0.077416
6	0.251584	1.105588	0.109735
7	0.166338	1.832873	0.144971
8	0.200662	1.940811	0.127343
9	0.220264	0.022831	0.000741
10	0.166898	0.664773	0.032474
11	0.160852	0.381549	0.000065
12	0.225648	0.597184	0.047100
13	0.223533	1.688230	0.125407
14	0.212146	4.214903	0.113031
15	0.168918	0.264562	0.000735
16	0.241367	1.651894	0.116954
17	0.338128	1.422414	0.066886
18	0.374190	0.588581	0.061256
19	0.447556	0.883236	0.091415
20	0.488182	1.122052	0.143083
21	0.502065	11.697567	0.095645

Usage

from yat_modernbert import load_model

model, tokenizer = load_model("azettaai/modernbert-base-yat-mlp-swap")
inputs = tokenizer("Yat blocks replace every ModernBERT MLP.", return_tensors="pt")
inputs = inputs.to(next(model.parameters()).device)
out = model(**inputs)
print(out.logits.shape)

Files

phase2/block0.safetensors ... phase2/block21.safetensors: final Yat MLP replacements.
phase1/: random/on-shell warm-start checkpoints.
scripts/: Kaggle scripts used to train and bundle the model.
yat_modernbert.py: loader and patching code.
manifest.json: bundle metadata copied from the Kaggle artifact.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for azettaai/modernbert-base-yat-mlp-swap

Base model

answerdotai/ModernBERT-base

Finetuned

(1314)

this model