How to use from the
Use from the
Transformers library
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("azettaai/modernbert-base-yat-mlp-swap", dtype="auto")
Quick Links

ModernBERT-base Yat MLP Swap

This repo contains Yat student replacements for all 22 MLP blocks of answerdotai/ModernBERT-base.

The checkpoints were trained on Kaggle 2xT4 with a two-phase pipeline:

  1. Phase 1: data-free random/on-shell distillation. Random Gaussian probes are projected through each layer's mlp_norm, then each Yat block learns the corresponding ModernBERT MLP delta map.
  2. Phase 2: real-activation fine-tuning. Real mlp_norm activations are collected from Wikitext text, then every Phase-1 Yat block is fine-tuned on its layer's real activations.

The model is distributed as a lightweight patch over the base ModernBERT weights. yat_modernbert.py downloads answerdotai/ModernBERT-base, loads the Phase-2 Yat checkpoints, and replaces every model.model.layers[i].mlp module.

Results

Phase 1 mean random/on-shell rho: 0.281783

Phase 2 mean real-activation rho: 1.654114 -> 0.074821

Layer Phase 1 rho Phase 2 pre-real Phase 2 post-real
0 0.144016 2.261558 0.044480
1 0.346435 1.351833 0.105199
2 0.372108 1.215147 0.080989
3 0.405880 0.692723 0.059312
4 0.295711 0.024458 0.001823
5 0.246746 0.765731 0.077416
6 0.251584 1.105588 0.109735
7 0.166338 1.832873 0.144971
8 0.200662 1.940811 0.127343
9 0.220264 0.022831 0.000741
10 0.166898 0.664773 0.032474
11 0.160852 0.381549 0.000065
12 0.225648 0.597184 0.047100
13 0.223533 1.688230 0.125407
14 0.212146 4.214903 0.113031
15 0.168918 0.264562 0.000735
16 0.241367 1.651894 0.116954
17 0.338128 1.422414 0.066886
18 0.374190 0.588581 0.061256
19 0.447556 0.883236 0.091415
20 0.488182 1.122052 0.143083
21 0.502065 11.697567 0.095645

Usage

from yat_modernbert import load_model

model, tokenizer = load_model("azettaai/modernbert-base-yat-mlp-swap")
inputs = tokenizer("Yat blocks replace every ModernBERT MLP.", return_tensors="pt")
inputs = inputs.to(next(model.parameters()).device)
out = model(**inputs)
print(out.logits.shape)

Files

  • phase2/block0.safetensors ... phase2/block21.safetensors: final Yat MLP replacements.
  • phase1/: random/on-shell warm-start checkpoints.
  • scripts/: Kaggle scripts used to train and bundle the model.
  • yat_modernbert.py: loader and patching code.
  • manifest.json: bundle metadata copied from the Kaggle artifact.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for azettaai/modernbert-base-yat-mlp-swap

Finetuned
(1314)
this model