Sentence Similarity
sentence-transformers
Safetensors
modernbert
multilingual
layer-pruning
vocab-pruning
knowledge-distillation
text-embeddings-inference
Instructions to use gomyk/modernbert-student-modernbert_L4_uniform_distilled with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use gomyk/modernbert-student-modernbert_L4_uniform_distilled with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("gomyk/modernbert-student-modernbert_L4_uniform_distilled") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
modernbert_L4_uniform_distilled (Distilled)
Lightweight sentence encoder created from answerdotai/ModernBERT-base via layer pruning + vocabulary pruning + knowledge distillation.
Model Details
| Property | Value |
|---|---|
| Teacher | answerdotai/ModernBERT-base |
| Architecture | ModernBERT (pruned) |
| Hidden dim | 768 |
| Layers | 4 / 22 |
| Layer indices | [0, 7, 14, 21] |
| Strategy | 4 layers, evenly spaced from ModernBERT (22L) |
| Parameters | 55,607,040 |
| Model size (FP32) | 137.7MB |
| Distilled | Yes |
Architecture
==============================================================
TEACHER: ModernBERT β STUDENT: 4L / 24,978 vocab
==============================================================
TEACHER STUDENT
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
β Input Tokens β β Input Tokens β
ββββββββββββββ¬βββββββββββββ ββββββββββββββ¬βββββββββββββ
β β
ββββββββββββββ΄βββββββββββββ ββββββββββββββ΄βββββββββββββ
β Embeddings β β Embeddings (pruned) β
β vocab: 50,368 β β vocab: 24,978 β
β dim: 768 β β dim: 768 β
ββββββββββββββ¬βββββββββββββ ββββββββββββββ¬βββββββββββββ
β β
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
β Layer 0 β βββΊ β Layer 0 β L0 β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 1 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 2 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 3 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 4 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 5 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 6 β β³ β β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 7 β βββΊ β Layer 1 β L7 β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 8 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 9 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 10 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 11 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 12 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 13 β β³ β β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 14 β βββΊ β Layer 2 β L14 β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 15 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 16 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 17 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 18 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 19 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 20 β β³ β β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 21 β βββΊ β Layer 3 β L21 β
ββββββββββββββ¬βββββββββββββ ββββββββββββββ¬βββββββββββββ
β β
ββββββββββββββ΄βββββββββββββ ββββββββββββββ΄βββββββββββββ
β Mean Pooling β β Mean Pooling β
β β 768d embedding β β β 768d embedding β
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
Size: 495.8MB (FP32) β 137.7MB (FP32)
Params: 129,980,160 β 36,107,520
Reduction: 72.2%
==============================================================
Quick Start
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("modernbert_L4_uniform_distilled", trust_remote_code=True)
sentences = [
"Hello, how are you?",
"μλ
νμΈμ",
"Bonjour, comment allez-vous?",
]
embeddings = model.encode(sentences)
print(embeddings.shape) # (3, 768)
MTEB Evaluation Results
Overall Average: 57.89%
| Task Group | Average |
|---|---|
| Classification | 57.89% |
Classification
| Task | Average | Details |
|---|---|---|
| AmazonCounterfactualClassification | 65.15% | en: 68.52%, en-ext: 66.24%, de: 64.72% |
| Banking77Classification | 59.57% | default: 59.57% |
| ImdbClassification | 53.47% | default: 53.47% |
| MTOPDomainClassification | 53.38% | en: 68.3%, es: 58.09%, de: 53.4% |
Distillation Impact
| Task | Before | After | Delta |
|---|---|---|---|
| AmazonCounterfactualClassification | 62.15% | 65.15% | +3.0%p |
| Banking77Classification | 45.76% | 59.57% | +13.81%p |
| ImdbClassification | 57.29% | 53.47% | -3.82%p |
| MTOPDomainClassification | 49.25% | 53.38% | +4.13%p |
Training
Stage 1: Layer Pruning
- Teacher:
answerdotai/ModernBERT-base(22 layers, 768d) - Selected layers:
[0, 7, 14, 21](4 layers, evenly spaced from ModernBERT (22L)) - Vocabulary pruning applied
Stage 2: Knowledge Distillation
- Method: MSE + Cosine Similarity loss
- Data: MTEB Classification/Clustering/STS task datasets
- Optimizer: AdamW (lr=2e-5, weight_decay=0.01)
- Schedule: Cosine annealing over 3 epochs
Supported Languages (18)
ko, en, ja, zh, es, fr, de, pt, it, ru, ar, hi, th, vi, id, tr, nl, pl
- Downloads last month
- 11