Sentence Similarity
sentence-transformers
Safetensors
modernbert
multilingual
layer-pruning
vocab-pruning
knowledge-distillation
text-embeddings-inference
Instructions to use gomyk/modernbert-student-modernbert_L6_uniform_distilled with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use gomyk/modernbert-student-modernbert_L6_uniform_distilled with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("gomyk/modernbert-student-modernbert_L6_uniform_distilled") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
modernbert_L6_uniform_distilled (Distilled)
Lightweight sentence encoder created from answerdotai/ModernBERT-base via layer pruning + vocabulary pruning + knowledge distillation.
Model Details
| Property | Value |
|---|---|
| Teacher | answerdotai/ModernBERT-base |
| Architecture | ModernBERT (pruned) |
| Hidden dim | 768 |
| Layers | 6 / 22 |
| Layer indices | [0, 4, 8, 13, 17, 21] |
| Strategy | 6 layers, evenly spaced from ModernBERT (22L) |
| Parameters | 63,870,720 |
| Model size (FP32) | 176.0MB |
| Distilled | Yes |
Architecture
==============================================================
TEACHER: ModernBERT β STUDENT: 6L / 27,279 vocab
==============================================================
TEACHER STUDENT
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
β Input Tokens β β Input Tokens β
ββββββββββββββ¬βββββββββββββ ββββββββββββββ¬βββββββββββββ
β β
ββββββββββββββ΄βββββββββββββ ββββββββββββββ΄βββββββββββββ
β Embeddings β β Embeddings (pruned) β
β vocab: 50,368 β β vocab: 27,279 β
β dim: 768 β β dim: 768 β
ββββββββββββββ¬βββββββββββββ ββββββββββββββ¬βββββββββββββ
β β
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
β Layer 0 β βββΊ β Layer 0 β L0 β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 1 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 2 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 3 β β³ β β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 4 β βββΊ β Layer 1 β L4 β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 5 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 6 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 7 β β³ β β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 8 β βββΊ β Layer 2 β L8 β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 9 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 10 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 11 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 12 β β³ β β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 13 β βββΊ β Layer 3 β L13 β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 14 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 15 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 16 β β³ β β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 17 β βββΊ β Layer 4 β L17 β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 18 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 19 β β³ β β
β β β β β β β β β β β ββ€ β β
β Layer 20 β β³ β β
βββββββββββββββββββββββββββ€ βββββββββββββββββββββββββββ€
β Layer 21 β βββΊ β Layer 5 β L21 β
ββββββββββββββ¬βββββββββββββ ββββββββββββββ¬βββββββββββββ
β β
ββββββββββββββ΄βββββββββββββ ββββββββββββββ΄βββββββββββββ
β Mean Pooling β β Mean Pooling β
β β 768d embedding β β β 768d embedding β
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
Size: 495.8MB (FP32) β 176.0MB (FP32)
Params: 129,980,160 β 46,138,368
Reduction: 64.5%
==============================================================
Quick Start
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("modernbert_L6_uniform_distilled", trust_remote_code=True)
sentences = [
"Hello, how are you?",
"μλ
νμΈμ",
"Bonjour, comment allez-vous?",
]
embeddings = model.encode(sentences)
print(embeddings.shape) # (3, 768)
MTEB Evaluation Results
Overall Average: 40.35%
| Task Group | Average |
|---|---|
| Classification | 50.57% |
| Clustering | 29.91% |
| STS | 40.56% |
Classification
| Task | Average | Details |
|---|---|---|
| AmazonCounterfactualClassification | 66.11% | en: 69.96%, en-ext: 68.7%, de: 63.61% |
| Banking77Classification | 61.69% | default: 61.69% |
| ImdbClassification | 53.73% | default: 53.73% |
| MTOPDomainClassification | 54.47% | en: 71.35%, es: 57.77%, de: 54.3% |
| MassiveIntentClassification | 31.3% | en: 53.06%, zh-CN: 47.36%, zh-TW: 41.65% |
| MassiveScenarioClassification | 32.3% | en: 57.38%, zh-CN: 50.25%, zh-TW: 43.81% |
| ToxicConversationsClassification | 57.77% | default: 57.77% |
| TweetSentimentExtractionClassification | 47.19% | default: 47.19% |
Clustering
| Task | Average | Details |
|---|---|---|
| ArXivHierarchicalClusteringP2P | 49.8% | default: 49.8% |
| ArXivHierarchicalClusteringS2S | 48.45% | default: 48.45% |
| BiorxivClusteringP2P.v2 | 11.05% | default: 11.05% |
| MedrxivClusteringP2P.v2 | 21.71% | default: 21.71% |
| MedrxivClusteringS2S.v2 | 21.75% | default: 21.75% |
| StackExchangeClustering.v2 | 42.7% | default: 42.7% |
| StackExchangeClusteringP2P.v2 | 32.54% | default: 32.54% |
| TwentyNewsgroupsClustering.v2 | 11.24% | default: 11.24% |
STS
| Task | Average | Details |
|---|---|---|
| BIOSSES | 42.43% | default: 42.43% |
| SICK-R | 53.89% | default: 53.89% |
| STS12 | 43.95% | default: 43.95% |
| STS13 | 42.51% | default: 42.51% |
| STS14 | 40.74% | default: 40.74% |
| STS15 | 53.89% | default: 53.89% |
| STS17 | 27.51% | en-en: 60.2%, es-es: 58.08%, ko-ko: 44.39% |
| STS22.v2 | 18.53% | zh: 47.25%, es: 39.83%, fr: 35.19% |
| STSBenchmark | 41.58% | default: 41.58% |
Distillation Impact
| Task | Before | After | Delta |
|---|---|---|---|
| AmazonCounterfactualClassification | 59.33% | 66.11% | +6.78%p |
| ArXivHierarchicalClusteringP2P | 50.19% | 49.8% | -0.39%p |
| ArXivHierarchicalClusteringS2S | 46.96% | 48.45% | +1.49%p |
| Banking77Classification | 35.01% | 61.69% | +26.68%p |
| BiorxivClusteringP2P.v2 | 12.62% | 11.05% | -1.57%p |
| BIOSSES | 33.84% | 42.43% | +8.59%p |
| ImdbClassification | 55.05% | 53.73% | -1.32%p |
| MassiveIntentClassification | 25.86% | 31.3% | +5.44%p |
| MassiveScenarioClassification | 26.28% | 32.3% | +6.02%p |
| MedrxivClusteringP2P.v2 | 22.13% | 21.71% | -0.42%p |
| MedrxivClusteringS2S.v2 | 19.43% | 21.75% | +2.32%p |
| MTOPDomainClassification | 43.24% | 54.47% | +11.23%p |
| SICK-R | 46.99% | 53.89% | +6.9%p |
| StackExchangeClustering.v2 | 34.26% | 42.7% | +8.44%p |
| StackExchangeClusteringP2P.v2 | 31.01% | 32.54% | +1.53%p |
| STS12 | 35.32% | 43.95% | +8.63%p |
| STS13 | 33.7% | 42.51% | +8.81%p |
| STS14 | 37.07% | 40.74% | +3.67%p |
| STS15 | 49.85% | 53.89% | +4.04%p |
| STS17 | 23.34% | 27.51% | +4.17%p |
| STS22.v2 | 24.05% | 18.53% | -5.52%p |
| STSBenchmark | 39.82% | 41.58% | +1.76%p |
| ToxicConversationsClassification | 52.6% | 57.77% | +5.17%p |
| TweetSentimentExtractionClassification | 38.42% | 47.19% | +8.77%p |
| TwentyNewsgroupsClustering.v2 | 9.11% | 11.24% | +2.13%p |
Training
Stage 1: Layer Pruning
- Teacher:
answerdotai/ModernBERT-base(22 layers, 768d) - Selected layers:
[0, 4, 8, 13, 17, 21](6 layers, evenly spaced from ModernBERT (22L)) - Vocabulary pruning applied
Stage 2: Knowledge Distillation
- Method: MSE + Cosine Similarity loss
- Data: MTEB Classification/Clustering/STS task datasets
- Optimizer: AdamW (lr=2e-5, weight_decay=0.01)
- Schedule: Cosine annealing over 3 epochs
Supported Languages (18)
ko, en, ja, zh, es, fr, de, pt, it, ru, ar, hi, th, vi, id, tr, nl, pl
- Downloads last month
- 2