Sentence Similarity
sentence-transformers
Safetensors
eurobert
multilingual
model-compression
layer-pruning
vocab-pruning
knowledge-distillation
jina-v5-nano
custom_code
Instructions to use gomyk/jina-v5-h256-distilled with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use gomyk/jina-v5-h256-distilled with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("gomyk/jina-v5-h256-distilled", trust_remote_code=True) sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
jina_v5_h256_distilled (Distilled)
Compact multilingual sentence encoder compressed from jinaai/jina-embeddings-v5-text-nano (12x compression).
Model Details
| Property | Value |
|---|---|
| Base model | jinaai/jina-embeddings-v5-text-nano |
| Architecture | eurobert (decoder) |
| Hidden dim | 256 (from 768) |
| Layers | 6 (from 12) |
| Intermediate | 1024 |
| Attention heads | 4 |
| KV heads | 4 |
| Vocab size | 41,778 (from 128,256) |
| Parameters | ~17.0M |
| Model size (FP32) | 64.8MB |
| Compression | 12x |
| Distilled | Yes |
Quick Start
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("jina_v5_h256_distilled", trust_remote_code=True)
sentences = [
"Hello, how are you?",
"안녕하세요, 잘 지내세요?",
"こんにちは、元気ですか?",
"你好,你好吗?",
]
embeddings = model.encode(sentences)
print(embeddings.shape) # (4, 256)
MTEB Evaluation Results
Overall Average: 50.77%
| Task Group | Average |
|---|---|
| Classification | 56.35% |
| Clustering | 32.67% |
| STS | 63.3% |
Classification
| Task | Average | Details |
|---|---|---|
| AmazonCounterfactualClassification | 65.07% | de: 68.66%, en: 66.66%, en-ext: 66.57%, ja: 58.39% |
| Banking77Classification | 76.16% | default: 76.16% |
| ImdbClassification | 67.53% | default: 67.53% |
| MTOPDomainClassification | 65.06% | en: 80.9%, es: 75.71%, fr: 70.84%, de: 70.05%, th: 46.93% |
| MassiveIntentClassification | 27.25% | zh-CN: 67.17%, en: 66.34%, ja: 63.51%, fr: 61.69%, ko: 60.2% |
| MassiveScenarioClassification | 34.0% | zh-CN: 73.44%, en: 73.16%, de: 70.43%, fr: 68.91%, ja: 68.77% |
| ToxicConversationsClassification | 58.82% | default: 58.82% |
| TweetSentimentExtractionClassification | 56.93% | default: 56.93% |
Clustering
| Task | Average | Details |
|---|---|---|
| ArXivHierarchicalClusteringP2P | 51.55% | default: 51.55% |
| ArXivHierarchicalClusteringS2S | 48.1% | default: 48.1% |
| BiorxivClusteringP2P.v2 | 17.44% | default: 17.44% |
| MedrxivClusteringP2P.v2 | 25.15% | default: 25.15% |
| MedrxivClusteringS2S.v2 | 21.6% | default: 21.6% |
| StackExchangeClustering.v2 | 44.4% | default: 44.4% |
| StackExchangeClusteringP2P.v2 | 34.83% | default: 34.83% |
| TwentyNewsgroupsClustering.v2 | 18.27% | default: 18.27% |
STS
| Task | Average | Details |
|---|---|---|
| BIOSSES | 59.18% | default: 59.18% |
| SICK-R | 71.03% | default: 71.03% |
| STS12 | 64.24% | default: 64.24% |
| STS13 | 70.9% | default: 70.9% |
| STS14 | 67.0% | default: 67.0% |
| STS15 | 75.87% | default: 75.87% |
| STS17 | 25.43% | en-en: 71.54%, es-es: 68.16%, ko-ko: 58.07%, ar-ar: 52.76%, fr-en: 19.09% |
| STSBenchmark | 72.71% | default: 72.71% |
Distillation Impact
| Task | Before | After | Delta |
|---|---|---|---|
| AmazonCounterfactualClassification | 61.0% | 65.07% | +4.07%p |
| ArXivHierarchicalClusteringP2P | 47.44% | 51.55% | +4.11%p |
| ArXivHierarchicalClusteringS2S | 47.23% | 48.1% | +0.87%p |
| BIOSSES | 46.49% | 59.18% | +12.69%p |
| Banking77Classification | 40.63% | 76.16% | +35.53%p |
| BiorxivClusteringP2P.v2 | 12.75% | 17.44% | +4.69%p |
| ImdbClassification | 53.68% | 67.53% | +13.85%p |
| MTOPDomainClassification | 42.82% | 65.06% | +22.24%p |
| MassiveIntentClassification | 25.56% | 27.25% | +1.69%p |
| MassiveScenarioClassification | 26.49% | 34.0% | +7.51%p |
| MedrxivClusteringP2P.v2 | 22.35% | 25.15% | +2.8%p |
| MedrxivClusteringS2S.v2 | 19.66% | 21.6% | +1.94%p |
| SICK-R | 51.25% | 71.03% | +19.78%p |
| STS12 | 32.58% | 64.24% | +31.66%p |
| STS13 | 40.72% | 70.9% | +30.18%p |
| STS14 | 40.39% | 67.0% | +26.61%p |
| STS15 | 54.56% | 75.87% | +21.31%p |
| STS17 | 21.6% | 25.43% | +3.83%p |
| STSBenchmark | 33.56% | 72.71% | +39.15%p |
| StackExchangeClustering.v2 | 38.95% | 44.4% | +5.45%p |
| StackExchangeClusteringP2P.v2 | 32.89% | 34.83% | +1.94%p |
| ToxicConversationsClassification | 53.11% | 58.82% | +5.71%p |
| TweetSentimentExtractionClassification | 37.0% | 56.93% | +19.93%p |
| TwentyNewsgroupsClustering.v2 | 9.32% | 18.27% | +8.95%p |
Training
Stage 1: Model Compression
- Teacher:
jinaai/jina-embeddings-v5-text-nano(12L, 768d) - Compression: Layer pruning + Vocab pruning
- Result: 6L / 256d / 41,778 vocab
Stage 2: Knowledge Distillation
- Method: MSE + Cosine Similarity loss
- Data: MTEB Classification/Clustering/STS task datasets
- Optimizer: AdamW (lr=2e-5, weight_decay=0.01)
- Schedule: Cosine annealing over 3 epochs
License
This model is a derivative of Jina AI's jina-embeddings-v5-text-nano. The original model is provided under CC BY-NC 4.0 license. See jina-embeddings-v5-text-nano for details.
Supported Languages (16)
ko, en, ja, zh, es, fr, de, pt, it, ru, ar, hi, th, vi, id, pl
- Downloads last month
- 4