Instructions to use Waqf-AI/arabic-splade-efficient with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Waqf-AI/arabic-splade-efficient with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Waqf-AI/arabic-splade-efficient") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
| language: ar | |
| license: apache-2.0 | |
| library_name: sentence-transformers | |
| tags: | |
| - sentence-transformers | |
| - sparse-encoder | |
| - splade | |
| - arabic | |
| - retrieval | |
| datasets: | |
| - oddadmix/arabic-triplets-large | |
| base_model: distilbert-base-multilingual-cased | |
| metrics: | |
| - ndcg@10 | |
| - mrr@10 | |
| # Arabic SPLADE — Phase 3 | |
| Efficient symmetric SPLADE using DistilBERT multilingual for faster inference. | |
| ## Architecture | |
| Symmetric shared (MLMTransformer+SpladePooling, sequential) | |
| **Base model:** distilbert-base-multilingual-cased | |
| ## Training | |
| - **Dataset:** `oddadmix/arabic-triplets-large` (104K triplets, 92K unique passages) | |
| - **Loss:** `SpladeLoss(SparseMultipleNegativesRankingLoss, q_reg=5e-5, d_reg=3e-5)` | |
| - **Batch:** 16 per GPU, grad accum 4 | |
| - **Learning rate:** 2e-5 | |
| - **Epochs:** 1 | |
| - **AMP:** fp16 | |
| - **Sampler:** NO_DUPLICATES | |
| ## Evaluation on Arabic NanoBEIR (13 datasets) | |
| | Metric | Score | | |
| |--------|-------| | |
| | NDCG@10 | 0.2528 | | |
| | MRR@10 | 0.3052 | | |
| For reference: BM25 scores 0.3824 NDCG@10, 0.4483 MRR@10 on the same benchmark. | |
| ## Training Details | |
| DistilBERT multilingual (6-layer, 119K vocab), ~2x faster than AraBERT | |
| ### Hardware | |
| - 2× NVIDIA TITAN RTX (23.5 GB each) | |
| - DDP via `torchrun` | |
| ## Usage | |
| ```python | |
| from sentence_transformers.sparse_encoder import SparseEncoder | |
| model = SparseEncoder("Abdelkareem/arabic-splade-efficient") | |
| embeddings = model.encode([ | |
| "ما هي عاصمة مصر؟", | |
| "القاهرة هي عاصمة مصر وأكبر مدنها.", | |
| ]) | |
| print(embeddings.shape) | |
| # Decode top tokens | |
| decoded = model.decode(embeddings, top_k=10) | |
| for d in decoded: | |
| print(d) | |
| ``` | |