jina_v5_h256_distilled (Distilled)

Compact multilingual sentence encoder compressed from jinaai/jina-embeddings-v5-text-nano (12x compression).

Model Details

Property Value
Base model jinaai/jina-embeddings-v5-text-nano
Architecture eurobert (decoder)
Hidden dim 256 (from 768)
Layers 6 (from 12)
Intermediate 1024
Attention heads 4
KV heads 4
Vocab size 41,778 (from 128,256)
Parameters ~17.0M
Model size (FP32) 64.8MB
Compression 12x
Distilled Yes

Quick Start

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("jina_v5_h256_distilled", trust_remote_code=True)

sentences = [
    "Hello, how are you?",
    "안녕하세요, 잘 지내세요?",
    "こんにちは、元気ですか?",
    "你好,你好吗?",
]

embeddings = model.encode(sentences)
print(embeddings.shape)  # (4, 256)

MTEB Evaluation Results

Overall Average: 50.77%

Task Group Average
Classification 56.35%
Clustering 32.67%
STS 63.3%

Classification

Task Average Details
AmazonCounterfactualClassification 65.07% de: 68.66%, en: 66.66%, en-ext: 66.57%, ja: 58.39%
Banking77Classification 76.16% default: 76.16%
ImdbClassification 67.53% default: 67.53%
MTOPDomainClassification 65.06% en: 80.9%, es: 75.71%, fr: 70.84%, de: 70.05%, th: 46.93%
MassiveIntentClassification 27.25% zh-CN: 67.17%, en: 66.34%, ja: 63.51%, fr: 61.69%, ko: 60.2%
MassiveScenarioClassification 34.0% zh-CN: 73.44%, en: 73.16%, de: 70.43%, fr: 68.91%, ja: 68.77%
ToxicConversationsClassification 58.82% default: 58.82%
TweetSentimentExtractionClassification 56.93% default: 56.93%

Clustering

Task Average Details
ArXivHierarchicalClusteringP2P 51.55% default: 51.55%
ArXivHierarchicalClusteringS2S 48.1% default: 48.1%
BiorxivClusteringP2P.v2 17.44% default: 17.44%
MedrxivClusteringP2P.v2 25.15% default: 25.15%
MedrxivClusteringS2S.v2 21.6% default: 21.6%
StackExchangeClustering.v2 44.4% default: 44.4%
StackExchangeClusteringP2P.v2 34.83% default: 34.83%
TwentyNewsgroupsClustering.v2 18.27% default: 18.27%

STS

Task Average Details
BIOSSES 59.18% default: 59.18%
SICK-R 71.03% default: 71.03%
STS12 64.24% default: 64.24%
STS13 70.9% default: 70.9%
STS14 67.0% default: 67.0%
STS15 75.87% default: 75.87%
STS17 25.43% en-en: 71.54%, es-es: 68.16%, ko-ko: 58.07%, ar-ar: 52.76%, fr-en: 19.09%
STSBenchmark 72.71% default: 72.71%

Distillation Impact

Task Before After Delta
AmazonCounterfactualClassification 61.0% 65.07% +4.07%p
ArXivHierarchicalClusteringP2P 47.44% 51.55% +4.11%p
ArXivHierarchicalClusteringS2S 47.23% 48.1% +0.87%p
BIOSSES 46.49% 59.18% +12.69%p
Banking77Classification 40.63% 76.16% +35.53%p
BiorxivClusteringP2P.v2 12.75% 17.44% +4.69%p
ImdbClassification 53.68% 67.53% +13.85%p
MTOPDomainClassification 42.82% 65.06% +22.24%p
MassiveIntentClassification 25.56% 27.25% +1.69%p
MassiveScenarioClassification 26.49% 34.0% +7.51%p
MedrxivClusteringP2P.v2 22.35% 25.15% +2.8%p
MedrxivClusteringS2S.v2 19.66% 21.6% +1.94%p
SICK-R 51.25% 71.03% +19.78%p
STS12 32.58% 64.24% +31.66%p
STS13 40.72% 70.9% +30.18%p
STS14 40.39% 67.0% +26.61%p
STS15 54.56% 75.87% +21.31%p
STS17 21.6% 25.43% +3.83%p
STSBenchmark 33.56% 72.71% +39.15%p
StackExchangeClustering.v2 38.95% 44.4% +5.45%p
StackExchangeClusteringP2P.v2 32.89% 34.83% +1.94%p
ToxicConversationsClassification 53.11% 58.82% +5.71%p
TweetSentimentExtractionClassification 37.0% 56.93% +19.93%p
TwentyNewsgroupsClustering.v2 9.32% 18.27% +8.95%p

Training

Stage 1: Model Compression

  • Teacher: jinaai/jina-embeddings-v5-text-nano (12L, 768d)
  • Compression: Layer pruning + Vocab pruning
  • Result: 6L / 256d / 41,778 vocab

Stage 2: Knowledge Distillation

  • Method: MSE + Cosine Similarity loss
  • Data: MTEB Classification/Clustering/STS task datasets
  • Optimizer: AdamW (lr=2e-5, weight_decay=0.01)
  • Schedule: Cosine annealing over 3 epochs

License

This model is a derivative of Jina AI's jina-embeddings-v5-text-nano. The original model is provided under CC BY-NC 4.0 license. See jina-embeddings-v5-text-nano for details.

Supported Languages (16)

ko, en, ja, zh, es, fr, de, pt, it, ru, ar, hi, th, vi, id, pl

Downloads last month
4
Safetensors
Model size
17M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support