Instructions to use jinaai/jina-embeddings-v5-text-nano with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jinaai/jina-embeddings-v5-text-nano with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="jinaai/jina-embeddings-v5-text-nano", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("jinaai/jina-embeddings-v5-text-nano", trust_remote_code=True, dtype="auto") - sentence-transformers
How to use jinaai/jina-embeddings-v5-text-nano with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("jinaai/jina-embeddings-v5-text-nano", trust_remote_code=True) sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
Can this model be layer pruned?
I want to run this model on my phone, but when I run layer pruning and do distillation, I find that the effect drops dramatically. Is this related to the use of RoPE in this model?
Hey, I don't think it's related to RoPE, it could be that nano model is already quite compact for layer pruning depending on the pruning degree you experimented with, or that the distillation setup needs some tuning. Either way, if your goal is on-device inference, I'd suggest trying quantized GGUF versions instead. Q4_K_M (157 MB, down from 424 MB) is probably the best tradeoff between size and quality.
Hey, I don't think it's related to RoPE, it could be that nano model is already quite compact for layer pruning depending on the pruning degree you experimented with, or that the distillation setup needs some tuning. Either way, if your goal is on-device inference, I'd suggest trying quantized GGUF versions instead. Q4_K_M (157 MB, down from 424 MB) is probably the best tradeoff between size and quality.
Are there any metrics for the performance of Q4_K_M?
We don't have a detailed evaluation but @hanxiao ran this evaluation: https://www.linkedin.com/posts/hxiao87_low-quant-weights-make-the-embedding-model-activity-7449362879687880704-k25C?utm_source=share&utm_medium=member_desktop&rcm=ACoAACC3cG0Be5GVkDYvYPegcoac1w5VakM7_t8 which showed that Q4 is probably a good tradeoff: Q4_K_M.