--- language: ["ko", "en", "ja", "zh", "es", "fr", "de", "pt", "it", "ru", "ar", "hi", "th", "vi", "id", "tr", "nl", "pl"] tags: - sentence-transformers - intent-classification - multilingual - layer-pruning - vocab-pruning library_name: sentence-transformers pipeline_tag: sentence-similarity license: apache-2.0 --- # L6_uniform Lightweight multilingual sentence encoder optimized for intent classification. Created from `paraphrase-multilingual-MiniLM-L12-v2` via layer pruning + corpus-based vocabulary pruning. ## Model Details | Property | Value | |----------|-------| | Teacher | paraphrase-multilingual-MiniLM-L12-v2 | | Architecture | XLM-RoBERTa (pruned) | | Hidden dim | 384 | | Layers | 6 / 12 | | Layer indices | [0, 2, 4, 7, 9, 11] | | Strategy | 6 layers, evenly spaced (general-purpose) | | Vocab size | ~38,330 (pruned from 250K) | | Parameters | 26,184,576 | | Safetensors size | 98.1MB | | Distilled | No | ## Supported Languages (18) ko, en, ja, zh, es, fr, de, pt, it, ru, ar, hi, th, vi, id, tr, nl, pl ## Quick Start ```python from sentence_transformers import SentenceTransformer model = SentenceTransformer("L6_uniform") sentences = [ "예약 좀 해줘", # Korean "What did I order?", # English "今日はいい天気ですね", # Japanese "Reserva una mesa", # Spanish ] embeddings = model.encode(sentences) print(embeddings.shape) # (4, 384) ``` ## MTEB Evaluation Results **Overall Average: 55.55%** ### MassiveIntentClassification **Average: 52.9%** | Language | Score | |----------|-------| | ar | 42.79% | | en | 61.83% | | es | 52.89% | | ko | 54.08% | ### MassiveScenarioClassification **Average: 58.2%** | Language | Score | |----------|-------| | ar | 46.87% | | en | 67.91% | | es | 59.42% | | ko | 58.62% | ## Training This model was created via **layer pruning + vocabulary pruning**: 1. **Teacher**: `paraphrase-multilingual-MiniLM-L12-v2` (12 layers, 384 hidden dim) 2. **Layer selection**: `[0, 2, 4, 7, 9, 11]` - 6 layers, evenly spaced (general-purpose) 3. **Vocab pruning**: 250K -> ~38K tokens (corpus-based filtering for 18 target languages) 4. **No additional training** - weights are directly copied from the teacher A distilled version of this model is also available with improved performance. ## Compression Summary | Stage | Vocab | Layers | Size | |-------|-------|--------|------| | Teacher (original) | 250,002 | 12 | ~480MB | | + Layer pruning | 250,002 | 6 | ~407MB | | + Vocab pruning | ~38,330 | 6 | ~98MB | ## Limitations - Vocabulary pruning restricts the model to the 18 target languages - Designed for short dialogue utterances, not long documents - Layer pruning may reduce performance on complex semantic tasks