Instructions to use azettaai/modernbert-base-yat-mlp-swap with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use azettaai/modernbert-base-yat-mlp-swap with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("azettaai/modernbert-base-yat-mlp-swap", dtype="auto") - Notebooks
- Google Colab
- Kaggle
ModernBERT-base Yat MLP Swap
This repo contains Yat student replacements for all 22 MLP blocks of answerdotai/ModernBERT-base.
The checkpoints were trained on Kaggle 2xT4 with a two-phase pipeline:
- Phase 1: data-free random/on-shell distillation. Random Gaussian probes are projected through each layer's
mlp_norm, then each Yat block learns the corresponding ModernBERT MLP delta map. - Phase 2: real-activation fine-tuning. Real
mlp_normactivations are collected from Wikitext text, then every Phase-1 Yat block is fine-tuned on its layer's real activations.
The model is distributed as a lightweight patch over the base ModernBERT weights. yat_modernbert.py downloads answerdotai/ModernBERT-base, loads the Phase-2 Yat checkpoints, and replaces every model.model.layers[i].mlp module.
Results
Phase 1 mean random/on-shell rho: 0.281783
Phase 2 mean real-activation rho: 1.654114 -> 0.074821
| Layer | Phase 1 rho | Phase 2 pre-real | Phase 2 post-real |
|---|---|---|---|
| 0 | 0.144016 | 2.261558 | 0.044480 |
| 1 | 0.346435 | 1.351833 | 0.105199 |
| 2 | 0.372108 | 1.215147 | 0.080989 |
| 3 | 0.405880 | 0.692723 | 0.059312 |
| 4 | 0.295711 | 0.024458 | 0.001823 |
| 5 | 0.246746 | 0.765731 | 0.077416 |
| 6 | 0.251584 | 1.105588 | 0.109735 |
| 7 | 0.166338 | 1.832873 | 0.144971 |
| 8 | 0.200662 | 1.940811 | 0.127343 |
| 9 | 0.220264 | 0.022831 | 0.000741 |
| 10 | 0.166898 | 0.664773 | 0.032474 |
| 11 | 0.160852 | 0.381549 | 0.000065 |
| 12 | 0.225648 | 0.597184 | 0.047100 |
| 13 | 0.223533 | 1.688230 | 0.125407 |
| 14 | 0.212146 | 4.214903 | 0.113031 |
| 15 | 0.168918 | 0.264562 | 0.000735 |
| 16 | 0.241367 | 1.651894 | 0.116954 |
| 17 | 0.338128 | 1.422414 | 0.066886 |
| 18 | 0.374190 | 0.588581 | 0.061256 |
| 19 | 0.447556 | 0.883236 | 0.091415 |
| 20 | 0.488182 | 1.122052 | 0.143083 |
| 21 | 0.502065 | 11.697567 | 0.095645 |
Usage
from yat_modernbert import load_model
model, tokenizer = load_model("azettaai/modernbert-base-yat-mlp-swap")
inputs = tokenizer("Yat blocks replace every ModernBERT MLP.", return_tensors="pt")
inputs = inputs.to(next(model.parameters()).device)
out = model(**inputs)
print(out.logits.shape)
Files
phase2/block0.safetensors...phase2/block21.safetensors: final Yat MLP replacements.phase1/: random/on-shell warm-start checkpoints.scripts/: Kaggle scripts used to train and bundle the model.yat_modernbert.py: loader and patching code.manifest.json: bundle metadata copied from the Kaggle artifact.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for azettaai/modernbert-base-yat-mlp-swap
Base model
answerdotai/ModernBERT-base