Sentence Similarity
sentence-transformers
Safetensors
Arabic
bert
mteb
feature-extraction
Generated from Trainer
dataset_size:557850
loss:MatryoshkaLoss
loss:MultipleNegativesRankingLoss
Eval Results (legacy)
text-embeddings-inference
Instructions to use Omartificial-Intelligence-Space/Marbert-all-nli-triplet-Matryoshka with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Omartificial-Intelligence-Space/Marbert-all-nli-triplet-Matryoshka with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Omartificial-Intelligence-Space/Marbert-all-nli-triplet-Matryoshka") sentences = [ "ذكر متوازن بعناية يقف على قدم واحدة بالقرب من منطقة شاطئ المحيط النظيفة", "رجل يقدم عرضاً", "هناك رجل بالخارج قرب الشاطئ", "رجل يجلس على أريكه" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
| language: | |
| - ar | |
| library_name: sentence-transformers | |
| tags: | |
| - mteb | |
| - sentence-transformers | |
| - sentence-similarity | |
| - feature-extraction | |
| - generated_from_trainer | |
| - dataset_size:557850 | |
| - loss:MatryoshkaLoss | |
| - loss:MultipleNegativesRankingLoss | |
| base_model: UBC-NLP/MARBERTv2 | |
| datasets: | |
| - Omartificial-Intelligence-Space/Arabic-NLi-Triplet | |
| metrics: | |
| - pearson_cosine | |
| - spearman_cosine | |
| - pearson_manhattan | |
| - spearman_manhattan | |
| - pearson_euclidean | |
| - spearman_euclidean | |
| - pearson_dot | |
| - spearman_dot | |
| - pearson_max | |
| - spearman_max | |
| widget: | |
| - source_sentence: ذكر متوازن بعناية يقف على قدم واحدة بالقرب من منطقة شاطئ المحيط النظيفة | |
| sentences: | |
| - رجل يقدم عرضاً | |
| - هناك رجل بالخارج قرب الشاطئ | |
| - رجل يجلس على أريكه | |
| - source_sentence: رجل يقفز إلى سريره القذر | |
| sentences: | |
| - السرير قذر. | |
| - رجل يضحك أثناء غسيل الملابس | |
| - الرجل على القمر | |
| - source_sentence: الفتيات بالخارج | |
| sentences: | |
| - امرأة تلف الخيط إلى كرات بجانب كومة من الكرات | |
| - فتيان يركبان في جولة متعة | |
| - >- | |
| ثلاث فتيات يقفون سوية في غرفة واحدة تستمع وواحدة تكتب على الحائط والثالثة | |
| تتحدث إليهن | |
| - source_sentence: الرجل يرتدي قميصاً أزرق. | |
| sentences: | |
| - >- | |
| رجل يرتدي قميصاً أزرق يميل إلى الجدار بجانب الطريق مع شاحنة زرقاء وسيارة | |
| حمراء مع الماء في الخلفية. | |
| - كتاب القصص مفتوح | |
| - رجل يرتدي قميص أسود يعزف على الجيتار. | |
| - source_sentence: يجلس شاب ذو شعر أشقر على الحائط يقرأ جريدة بينما تمر امرأة وفتاة شابة. | |
| sentences: | |
| - ذكر شاب ينظر إلى جريدة بينما تمر إمرأتان بجانبه | |
| - رجل يستلقي على وجهه على مقعد في الحديقة. | |
| - الشاب نائم بينما الأم تقود ابنتها إلى الحديقة | |
| pipeline_tag: sentence-similarity | |
| model-index: | |
| - name: Omartificial-Intelligence-Space/Marbert-all-nli-triplet-Matryoshka | |
| results: | |
| - dataset: | |
| config: ar | |
| name: MTEB MintakaRetrieval (ar) | |
| revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e | |
| split: test | |
| type: mintaka/mmteb-mintaka | |
| metrics: | |
| - type: main_score | |
| value: 16.058 | |
| - type: map_at_1 | |
| value: 8.398 | |
| - type: map_at_3 | |
| value: 11.681 | |
| - type: map_at_5 | |
| value: 12.616 | |
| - type: map_at_10 | |
| value: 13.281 | |
| - type: ndcg_at_1 | |
| value: 8.398 | |
| - type: ndcg_at_3 | |
| value: 12.75 | |
| - type: ndcg_at_5 | |
| value: 14.453 | |
| - type: ndcg_at_10 | |
| value: 16.058 | |
| - type: recall_at_1 | |
| value: 8.398 | |
| - type: recall_at_3 | |
| value: 15.842 | |
| - type: recall_at_5 | |
| value: 20.018 | |
| - type: recall_at_10 | |
| value: 24.966 | |
| - type: precision_at_1 | |
| value: 8.398 | |
| - type: precision_at_3 | |
| value: 5.281 | |
| - type: precision_at_5 | |
| value: 4.004 | |
| - type: precision_at_10 | |
| value: 2.497 | |
| - type: mrr_at_1 | |
| value: 8.3976 | |
| - type: mrr_at_3 | |
| value: 11.681 | |
| - type: mrr_at_5 | |
| value: 12.6161 | |
| - type: mrr_at_10 | |
| value: 13.2812 | |
| task: | |
| type: Retrieval | |
| - dataset: | |
| config: ar | |
| name: MTEB MIRACLRetrievalHardNegatives (ar) | |
| revision: 95c8db7d4a6e9c1d8a60601afd63d553ae20a2eb | |
| split: dev | |
| type: miracl/mmteb-miracl-hardnegatives | |
| metrics: | |
| - type: main_score | |
| value: 15.853 | |
| - type: map_at_1 | |
| value: 5.867 | |
| - type: map_at_3 | |
| value: 9.003 | |
| - type: map_at_5 | |
| value: 10.068 | |
| - type: map_at_10 | |
| value: 11.294 | |
| - type: ndcg_at_1 | |
| value: 9.0 | |
| - type: ndcg_at_3 | |
| value: 11.363 | |
| - type: ndcg_at_5 | |
| value: 12.986 | |
| - type: ndcg_at_10 | |
| value: 15.853 | |
| - type: recall_at_1 | |
| value: 5.867 | |
| - type: recall_at_3 | |
| value: 12.639 | |
| - type: recall_at_5 | |
| value: 16.649 | |
| - type: recall_at_10 | |
| value: 24.422 | |
| - type: precision_at_1 | |
| value: 9.0 | |
| - type: precision_at_3 | |
| value: 7.1 | |
| - type: precision_at_5 | |
| value: 5.82 | |
| - type: precision_at_10 | |
| value: 4.38 | |
| - type: mrr_at_1 | |
| value: 9.0 | |
| - type: mrr_at_3 | |
| value: 13.4667 | |
| - type: mrr_at_5 | |
| value: 14.6367 | |
| - type: mrr_at_10 | |
| value: 16.0177 | |
| task: | |
| type: Retrieval | |
| - dataset: | |
| config: ar | |
| name: MTEB MLQARetrieval (ar) | |
| revision: 397ed406c1a7902140303e7faf60fff35b58d285 | |
| split: validation | |
| type: mlqa/mmteb-mlqa | |
| metrics: | |
| - type: main_score | |
| value: 58.919 | |
| - type: map_at_1 | |
| value: 44.874 | |
| - type: map_at_3 | |
| value: 51.902 | |
| - type: map_at_5 | |
| value: 53.198 | |
| - type: map_at_10 | |
| value: 54.181 | |
| - type: ndcg_at_1 | |
| value: 44.874 | |
| - type: ndcg_at_3 | |
| value: 54.218 | |
| - type: ndcg_at_5 | |
| value: 56.541 | |
| - type: ndcg_at_10 | |
| value: 58.919 | |
| - type: recall_at_1 | |
| value: 44.874 | |
| - type: recall_at_3 | |
| value: 60.928 | |
| - type: recall_at_5 | |
| value: 66.538 | |
| - type: recall_at_10 | |
| value: 73.888 | |
| - type: precision_at_1 | |
| value: 44.874 | |
| - type: precision_at_3 | |
| value: 20.309 | |
| - type: precision_at_5 | |
| value: 13.308 | |
| - type: precision_at_10 | |
| value: 7.389 | |
| - type: mrr_at_1 | |
| value: 44.8743 | |
| - type: mrr_at_3 | |
| value: 51.902 | |
| - type: mrr_at_5 | |
| value: 53.1979 | |
| - type: mrr_at_10 | |
| value: 54.1809 | |
| task: | |
| type: Retrieval | |
| - dataset: | |
| config: default | |
| name: MTEB SadeemQuestionRetrieval (ar) | |
| revision: 3cb0752b182e5d5d740df547748b06663c8e0bd9 | |
| split: test | |
| type: sadeem/mmteb-sadeem | |
| metrics: | |
| - type: main_score | |
| value: 57.068 | |
| - type: map_at_1 | |
| value: 24.414 | |
| - type: map_at_3 | |
| value: 45.333 | |
| - type: map_at_5 | |
| value: 46.695 | |
| - type: map_at_10 | |
| value: 47.429 | |
| - type: ndcg_at_1 | |
| value: 24.414 | |
| - type: ndcg_at_3 | |
| value: 52.828 | |
| - type: ndcg_at_5 | |
| value: 55.288 | |
| - type: ndcg_at_10 | |
| value: 57.068 | |
| - type: recall_at_1 | |
| value: 24.414 | |
| - type: recall_at_3 | |
| value: 74.725 | |
| - type: recall_at_5 | |
| value: 80.708 | |
| - type: recall_at_10 | |
| value: 86.213 | |
| - type: precision_at_1 | |
| value: 24.414 | |
| - type: precision_at_3 | |
| value: 24.908 | |
| - type: precision_at_5 | |
| value: 16.142 | |
| - type: precision_at_10 | |
| value: 8.621 | |
| - type: mrr_at_1 | |
| value: 25.2753 | |
| - type: mrr_at_3 | |
| value: 45.58 | |
| - type: mrr_at_5 | |
| value: 46.8581 | |
| - type: mrr_at_10 | |
| value: 47.6414 | |
| task: | |
| type: Retrieval | |
| - dataset: | |
| config: default | |
| name: MTEB BIOSSES (default) | |
| revision: d3fb88f8f02e40887cd149695127462bbcf29b4a | |
| split: test | |
| type: mteb/biosses-sts | |
| metrics: | |
| - type: cosine_pearson | |
| value: 49.25240527202211 | |
| - type: cosine_spearman | |
| value: 51.87708566904703 | |
| - type: euclidean_pearson | |
| value: 49.790877425774696 | |
| - type: euclidean_spearman | |
| value: 51.725274981021855 | |
| - type: main_score | |
| value: 51.87708566904703 | |
| - type: manhattan_pearson | |
| value: 52.31560776967401 | |
| - type: manhattan_spearman | |
| value: 54.28979124658997 | |
| task: | |
| type: STS | |
| - dataset: | |
| config: default | |
| name: MTEB SICK-R (default) | |
| revision: 20a6d6f312dd54037fe07a32d58e5e168867909d | |
| split: test | |
| type: mteb/sickr-sts | |
| metrics: | |
| - type: cosine_pearson | |
| value: 65.81089479351829 | |
| - type: cosine_spearman | |
| value: 65.80163441928238 | |
| - type: euclidean_pearson | |
| value: 65.2718874370746 | |
| - type: euclidean_spearman | |
| value: 65.92429031695988 | |
| - type: main_score | |
| value: 65.80163441928238 | |
| - type: manhattan_pearson | |
| value: 65.28701419332383 | |
| - type: manhattan_spearman | |
| value: 65.94229793651319 | |
| task: | |
| type: STS | |
| - dataset: | |
| config: default | |
| name: MTEB STS12 (default) | |
| revision: a0d554a64d88156834ff5ae9920b964011b16384 | |
| split: test | |
| type: mteb/sts12-sts | |
| metrics: | |
| - type: cosine_pearson | |
| value: 65.11346939995998 | |
| - type: cosine_spearman | |
| value: 63.00297824477175 | |
| - type: euclidean_pearson | |
| value: 63.85320097970942 | |
| - type: euclidean_spearman | |
| value: 63.25151047701848 | |
| - type: main_score | |
| value: 63.00297824477175 | |
| - type: manhattan_pearson | |
| value: 64.40291990853984 | |
| - type: manhattan_spearman | |
| value: 63.63497232399945 | |
| task: | |
| type: STS | |
| - dataset: | |
| config: default | |
| name: MTEB STS13 (default) | |
| revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca | |
| split: test | |
| type: mteb/sts13-sts | |
| metrics: | |
| - type: cosine_pearson | |
| value: 52.2735823521702 | |
| - type: cosine_spearman | |
| value: 52.23198766098021 | |
| - type: euclidean_pearson | |
| value: 54.12467577456837 | |
| - type: euclidean_spearman | |
| value: 52.40014028261351 | |
| - type: main_score | |
| value: 52.23198766098021 | |
| - type: manhattan_pearson | |
| value: 54.38052509834607 | |
| - type: manhattan_spearman | |
| value: 52.70836595958237 | |
| task: | |
| type: STS | |
| - dataset: | |
| config: default | |
| name: MTEB STS14 (default) | |
| revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 | |
| split: test | |
| type: mteb/sts14-sts | |
| metrics: | |
| - type: cosine_pearson | |
| value: 58.55307076840419 | |
| - type: cosine_spearman | |
| value: 59.2261024017655 | |
| - type: euclidean_pearson | |
| value: 59.55734715751804 | |
| - type: euclidean_spearman | |
| value: 60.135899681574834 | |
| - type: main_score | |
| value: 59.2261024017655 | |
| - type: manhattan_pearson | |
| value: 59.99274396356966 | |
| - type: manhattan_spearman | |
| value: 60.44325356503041 | |
| task: | |
| type: STS | |
| - dataset: | |
| config: default | |
| name: MTEB STS15 (default) | |
| revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 | |
| split: test | |
| type: mteb/sts15-sts | |
| metrics: | |
| - type: cosine_pearson | |
| value: 68.94418532602707 | |
| - type: cosine_spearman | |
| value: 70.01912156519296 | |
| - type: euclidean_pearson | |
| value: 71.67028435860581 | |
| - type: euclidean_spearman | |
| value: 71.48252471922122 | |
| - type: main_score | |
| value: 70.01912156519296 | |
| - type: manhattan_pearson | |
| value: 71.9587452337792 | |
| - type: manhattan_spearman | |
| value: 71.69160519065173 | |
| task: | |
| type: STS | |
| - dataset: | |
| config: default | |
| name: MTEB STS16 (default) | |
| revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 | |
| split: test | |
| type: mteb/sts16-sts | |
| metrics: | |
| - type: cosine_pearson | |
| value: 62.81619254162203 | |
| - type: cosine_spearman | |
| value: 64.98814526698425 | |
| - type: euclidean_pearson | |
| value: 66.43531796610995 | |
| - type: euclidean_spearman | |
| value: 66.53768451143964 | |
| - type: main_score | |
| value: 64.98814526698425 | |
| - type: manhattan_pearson | |
| value: 66.57822125651369 | |
| - type: manhattan_spearman | |
| value: 66.71830390508079 | |
| task: | |
| type: STS | |
| - dataset: | |
| config: ar-ar | |
| name: MTEB STS17 (ar-ar) | |
| revision: faeb762787bd10488a50c8b5be4a3b82e411949c | |
| split: test | |
| type: mteb/sts17-crosslingual-sts | |
| metrics: | |
| - type: cosine_pearson | |
| value: 81.68055610903552 | |
| - type: cosine_spearman | |
| value: 82.18125783448961 | |
| - type: euclidean_pearson | |
| value: 80.5422740473486 | |
| - type: euclidean_spearman | |
| value: 81.79456727036232 | |
| - type: main_score | |
| value: 82.18125783448961 | |
| - type: manhattan_pearson | |
| value: 80.43564733654793 | |
| - type: manhattan_spearman | |
| value: 81.76103816207625 | |
| task: | |
| type: STS | |
| - dataset: | |
| config: ar | |
| name: MTEB STS22 (ar) | |
| revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 | |
| split: test | |
| type: mteb/sts22-crosslingual-sts | |
| metrics: | |
| - type: cosine_pearson | |
| value: 51.33460593849487 | |
| - type: cosine_spearman | |
| value: 58.07741072443786 | |
| - type: euclidean_pearson | |
| value: 54.26430308336828 | |
| - type: euclidean_spearman | |
| value: 58.8384539429318 | |
| - type: main_score | |
| value: 58.07741072443786 | |
| - type: manhattan_pearson | |
| value: 54.41587176266624 | |
| - type: manhattan_spearman | |
| value: 58.831993325957086 | |
| task: | |
| type: STS | |
| - dataset: | |
| config: default | |
| name: MTEB STSBenchmark (default) | |
| revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 | |
| split: test | |
| type: mteb/stsbenchmark-sts | |
| metrics: | |
| - type: cosine_pearson | |
| value: 61.11956207522431 | |
| - type: cosine_spearman | |
| value: 61.16768766134144 | |
| - type: euclidean_pearson | |
| value: 64.44141934993837 | |
| - type: euclidean_spearman | |
| value: 63.450379593077066 | |
| - type: main_score | |
| value: 61.16768766134144 | |
| - type: manhattan_pearson | |
| value: 64.43852352892529 | |
| - type: manhattan_spearman | |
| value: 63.57630045107761 | |
| task: | |
| type: STS | |
| - dataset: | |
| config: default | |
| name: MTEB SummEval (default) | |
| revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c | |
| split: test | |
| type: mteb/summeval | |
| metrics: | |
| - type: cosine_pearson | |
| value: 29.583566160417668 | |
| - type: cosine_spearman | |
| value: 29.534419950502212 | |
| - type: dot_pearson | |
| value: 28.13970643170574 | |
| - type: dot_spearman | |
| value: 28.907762267009073 | |
| - type: main_score | |
| value: 29.534419950502212 | |
| - type: pearson | |
| value: 29.583566160417668 | |
| - type: spearman | |
| value: 29.534419950502212 | |
| task: | |
| type: Summarization | |
| - name: SentenceTransformer based on UBC-NLP/MARBERTv2 | |
| results: | |
| - task: | |
| type: semantic-similarity | |
| name: Semantic Similarity | |
| dataset: | |
| name: sts test 768 | |
| type: sts-test-768 | |
| metrics: | |
| - type: pearson_cosine | |
| value: 0.611168498883907 | |
| name: Pearson Cosine | |
| - type: spearman_cosine | |
| value: 0.6116733587939157 | |
| name: Spearman Cosine | |
| - type: pearson_manhattan | |
| value: 0.6443687886661206 | |
| name: Pearson Manhattan | |
| - type: spearman_manhattan | |
| value: 0.6358107360369792 | |
| name: Spearman Manhattan | |
| - type: pearson_euclidean | |
| value: 0.644404066642609 | |
| name: Pearson Euclidean | |
| - type: spearman_euclidean | |
| value: 0.6345893921062774 | |
| name: Spearman Euclidean | |
| - type: pearson_dot | |
| value: 0.4723643245352202 | |
| name: Pearson Dot | |
| - type: spearman_dot | |
| value: 0.44844519905410135 | |
| name: Spearman Dot | |
| - type: pearson_max | |
| value: 0.644404066642609 | |
| name: Pearson Max | |
| - type: spearman_max | |
| value: 0.6358107360369792 | |
| name: Spearman Max | |
| - task: | |
| type: semantic-similarity | |
| name: Semantic Similarity | |
| dataset: | |
| name: sts test 512 | |
| type: sts-test-512 | |
| metrics: | |
| - type: pearson_cosine | |
| value: 0.6664570291720014 | |
| name: Pearson Cosine | |
| - type: spearman_cosine | |
| value: 0.6647687532159875 | |
| name: Spearman Cosine | |
| - type: pearson_manhattan | |
| value: 0.6429976947418544 | |
| name: Pearson Manhattan | |
| - type: spearman_manhattan | |
| value: 0.6334753432753939 | |
| name: Spearman Manhattan | |
| - type: pearson_euclidean | |
| value: 0.6466249455585532 | |
| name: Pearson Euclidean | |
| - type: spearman_euclidean | |
| value: 0.6373181315122213 | |
| name: Spearman Euclidean | |
| - type: pearson_dot | |
| value: 0.5370129457359227 | |
| name: Pearson Dot | |
| - type: spearman_dot | |
| value: 0.5241649973373772 | |
| name: Spearman Dot | |
| - type: pearson_max | |
| value: 0.6664570291720014 | |
| name: Pearson Max | |
| - type: spearman_max | |
| value: 0.6647687532159875 | |
| name: Spearman Max | |
| - task: | |
| type: semantic-similarity | |
| name: Semantic Similarity | |
| dataset: | |
| name: sts test 256 | |
| type: sts-test-256 | |
| metrics: | |
| - type: pearson_cosine | |
| value: 0.6601248277308522 | |
| name: Pearson Cosine | |
| - type: spearman_cosine | |
| value: 0.6592739654246011 | |
| name: Spearman Cosine | |
| - type: pearson_manhattan | |
| value: 0.6361644543165994 | |
| name: Pearson Manhattan | |
| - type: spearman_manhattan | |
| value: 0.6250621947417249 | |
| name: Spearman Manhattan | |
| - type: pearson_euclidean | |
| value: 0.6408426652431157 | |
| name: Pearson Euclidean | |
| - type: spearman_euclidean | |
| value: 0.6300109524350457 | |
| name: Spearman Euclidean | |
| - type: pearson_dot | |
| value: 0.5250513197384045 | |
| name: Pearson Dot | |
| - type: spearman_dot | |
| value: 0.5154779060125071 | |
| name: Spearman Dot | |
| - type: pearson_max | |
| value: 0.6601248277308522 | |
| name: Pearson Max | |
| - type: spearman_max | |
| value: 0.6592739654246011 | |
| name: Spearman Max | |
| - task: | |
| type: semantic-similarity | |
| name: Semantic Similarity | |
| dataset: | |
| name: sts test 128 | |
| type: sts-test-128 | |
| metrics: | |
| - type: pearson_cosine | |
| value: 0.6549481034721005 | |
| name: Pearson Cosine | |
| - type: spearman_cosine | |
| value: 0.6523201621940143 | |
| name: Spearman Cosine | |
| - type: pearson_manhattan | |
| value: 0.6342700090917214 | |
| name: Pearson Manhattan | |
| - type: spearman_manhattan | |
| value: 0.6226791710099966 | |
| name: Spearman Manhattan | |
| - type: pearson_euclidean | |
| value: 0.6397224689512541 | |
| name: Pearson Euclidean | |
| - type: spearman_euclidean | |
| value: 0.6280973341704362 | |
| name: Spearman Euclidean | |
| - type: pearson_dot | |
| value: 0.47240889358810917 | |
| name: Pearson Dot | |
| - type: spearman_dot | |
| value: 0.4633669926372942 | |
| name: Spearman Dot | |
| - type: pearson_max | |
| value: 0.6549481034721005 | |
| name: Pearson Max | |
| - type: spearman_max | |
| value: 0.6523201621940143 | |
| name: Spearman Max | |
| - task: | |
| type: semantic-similarity | |
| name: Semantic Similarity | |
| dataset: | |
| name: sts test 64 | |
| type: sts-test-64 | |
| metrics: | |
| - type: pearson_cosine | |
| value: 0.6367217585211098 | |
| name: Pearson Cosine | |
| - type: spearman_cosine | |
| value: 0.6370191671711296 | |
| name: Spearman Cosine | |
| - type: pearson_manhattan | |
| value: 0.6263730801254332 | |
| name: Pearson Manhattan | |
| - type: spearman_manhattan | |
| value: 0.6118927366012856 | |
| name: Spearman Manhattan | |
| - type: pearson_euclidean | |
| value: 0.6327699647617465 | |
| name: Pearson Euclidean | |
| - type: spearman_euclidean | |
| value: 0.6180184829867724 | |
| name: Spearman Euclidean | |
| - type: pearson_dot | |
| value: 0.41169381399943167 | |
| name: Pearson Dot | |
| - type: spearman_dot | |
| value: 0.40444222536491986 | |
| name: Spearman Dot | |
| - type: pearson_max | |
| value: 0.6367217585211098 | |
| name: Pearson Max | |
| - type: spearman_max | |
| value: 0.6370191671711296 | |
| name: Spearman Max | |
| license: apache-2.0 | |
| # SentenceTransformer based on UBC-NLP/MARBERTv2 | |
| This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [UBC-NLP/MARBERTv2](https://huggingface.co/UBC-NLP/MARBERTv2) on the Omartificial-Intelligence-Space/arabic-n_li-triplet dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. | |
| ## Model Details | |
| ### Model Description | |
| - **Model Type:** Sentence Transformer | |
| - **Base model:** [UBC-NLP/MARBERTv2](https://huggingface.co/UBC-NLP/MARBERTv2) <!-- at revision fe88db9db8ccdb0c4e1627495f405c44a5f89066 --> | |
| - **Maximum Sequence Length:** 512 tokens | |
| - **Output Dimensionality:** 768 tokens | |
| - **Similarity Function:** Cosine Similarity | |
| - **Training Dataset:** | |
| - Omartificial-Intelligence-Space/arabic-n_li-triplet | |
| <!-- - **Language:** Unknown --> | |
| <!-- - **License:** Unknown --> | |
| ### Model Sources | |
| - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) | |
| - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) | |
| - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) | |
| ### Full Model Architecture | |
| ``` | |
| SentenceTransformer( | |
| (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel | |
| (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) | |
| ) | |
| ``` | |
| ## Usage | |
| ### Direct Usage (Sentence Transformers) | |
| First install the Sentence Transformers library: | |
| ```bash | |
| pip install -U sentence-transformers | |
| ``` | |
| Then you can load this model and run inference. | |
| ```python | |
| from sentence_transformers import SentenceTransformer | |
| # Download from the 🤗 Hub | |
| model = SentenceTransformer("Omartificial-Intelligence-Space/Marbert-all-nli-triplet") | |
| # Run inference | |
| sentences = [ | |
| 'يجلس شاب ذو شعر أشقر على الحائط يقرأ جريدة بينما تمر امرأة وفتاة شابة.', | |
| 'ذكر شاب ينظر إلى جريدة بينما تمر إمرأتان بجانبه', | |
| 'الشاب نائم بينما الأم تقود ابنتها إلى الحديقة', | |
| ] | |
| embeddings = model.encode(sentences) | |
| print(embeddings.shape) | |
| # [3, 768] | |
| # Get the similarity scores for the embeddings | |
| similarities = model.similarity(embeddings, embeddings) | |
| print(similarities.shape) | |
| # [3, 3] | |
| ``` | |
| <!-- | |
| ### Direct Usage (Transformers) | |
| <details><summary>Click to see the direct usage in Transformers</summary> | |
| </details> | |
| --> | |
| <!-- | |
| ### Downstream Usage (Sentence Transformers) | |
| You can finetune this model on your own dataset. | |
| <details><summary>Click to expand</summary> | |
| </details> | |
| --> | |
| <!-- | |
| ### Out-of-Scope Use | |
| *List how the model may foreseeably be misused and address what users ought not to do with the model.* | |
| --> | |
| ## Evaluation | |
| ### Metrics | |
| #### Semantic Similarity | |
| * Dataset: `sts-test-768` | |
| * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator) | |
| | Metric | Value | | |
| |:--------------------|:-----------| | |
| | pearson_cosine | 0.6112 | | |
| | **spearman_cosine** | **0.6117** | | |
| | pearson_manhattan | 0.6444 | | |
| | spearman_manhattan | 0.6358 | | |
| | pearson_euclidean | 0.6444 | | |
| | spearman_euclidean | 0.6346 | | |
| | pearson_dot | 0.4724 | | |
| | spearman_dot | 0.4484 | | |
| | pearson_max | 0.6444 | | |
| | spearman_max | 0.6358 | | |
| #### Semantic Similarity | |
| * Dataset: `sts-test-512` | |
| * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator) | |
| | Metric | Value | | |
| |:--------------------|:-----------| | |
| | pearson_cosine | 0.6665 | | |
| | **spearman_cosine** | **0.6648** | | |
| | pearson_manhattan | 0.643 | | |
| | spearman_manhattan | 0.6335 | | |
| | pearson_euclidean | 0.6466 | | |
| | spearman_euclidean | 0.6373 | | |
| | pearson_dot | 0.537 | | |
| | spearman_dot | 0.5242 | | |
| | pearson_max | 0.6665 | | |
| | spearman_max | 0.6648 | | |
| #### Semantic Similarity | |
| * Dataset: `sts-test-256` | |
| * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator) | |
| | Metric | Value | | |
| |:--------------------|:-----------| | |
| | pearson_cosine | 0.6601 | | |
| | **spearman_cosine** | **0.6593** | | |
| | pearson_manhattan | 0.6362 | | |
| | spearman_manhattan | 0.6251 | | |
| | pearson_euclidean | 0.6408 | | |
| | spearman_euclidean | 0.63 | | |
| | pearson_dot | 0.5251 | | |
| | spearman_dot | 0.5155 | | |
| | pearson_max | 0.6601 | | |
| | spearman_max | 0.6593 | | |
| #### Semantic Similarity | |
| * Dataset: `sts-test-128` | |
| * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator) | |
| | Metric | Value | | |
| |:--------------------|:-----------| | |
| | pearson_cosine | 0.6549 | | |
| | **spearman_cosine** | **0.6523** | | |
| | pearson_manhattan | 0.6343 | | |
| | spearman_manhattan | 0.6227 | | |
| | pearson_euclidean | 0.6397 | | |
| | spearman_euclidean | 0.6281 | | |
| | pearson_dot | 0.4724 | | |
| | spearman_dot | 0.4634 | | |
| | pearson_max | 0.6549 | | |
| | spearman_max | 0.6523 | | |
| #### Semantic Similarity | |
| * Dataset: `sts-test-64` | |
| * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator) | |
| | Metric | Value | | |
| |:--------------------|:----------| | |
| | pearson_cosine | 0.6367 | | |
| | **spearman_cosine** | **0.637** | | |
| | pearson_manhattan | 0.6264 | | |
| | spearman_manhattan | 0.6119 | | |
| | pearson_euclidean | 0.6328 | | |
| | spearman_euclidean | 0.618 | | |
| | pearson_dot | 0.4117 | | |
| | spearman_dot | 0.4044 | | |
| | pearson_max | 0.6367 | | |
| | spearman_max | 0.637 | | |
| <!-- | |
| ## Bias, Risks and Limitations | |
| *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.* | |
| --> | |
| <!-- | |
| ### Recommendations | |
| *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.* | |
| --> | |
| ## Training Details | |
| ### Training Dataset | |
| #### Omartificial-Intelligence-Space/arabic-n_li-triplet | |
| * Dataset: Omartificial-Intelligence-Space/arabic-n_li-triplet | |
| * Size: 557,850 training samples | |
| * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code> | |
| * Approximate statistics based on the first 1000 samples: | |
| | | anchor | positive | negative | | |
| |:--------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------| | |
| | type | string | string | string | | |
| | details | <ul><li>min: 4 tokens</li><li>mean: 7.68 tokens</li><li>max: 43 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 9.66 tokens</li><li>max: 35 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 10.47 tokens</li><li>max: 40 tokens</li></ul> | | |
| * Samples: | |
| | anchor | positive | negative | | |
| |:------------------------------------------------------------|:--------------------------------------------|:------------------------------------| | |
| | <code>شخص على حصان يقفز فوق طائرة معطلة</code> | <code>شخص في الهواء الطلق، على حصان.</code> | <code>شخص في مطعم، يطلب عجة.</code> | | |
| | <code>أطفال يبتسمون و يلوحون للكاميرا</code> | <code>هناك أطفال حاضرون</code> | <code>الاطفال يتجهمون</code> | | |
| | <code>صبي يقفز على لوح التزلج في منتصف الجسر الأحمر.</code> | <code>الفتى يقوم بخدعة التزلج</code> | <code>الصبي يتزلج على الرصيف</code> | | |
| * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: | |
| ```json | |
| { | |
| "loss": "MultipleNegativesRankingLoss", | |
| "matryoshka_dims": [ | |
| 768, | |
| 512, | |
| 256, | |
| 128, | |
| 64 | |
| ], | |
| "matryoshka_weights": [ | |
| 1, | |
| 1, | |
| 1, | |
| 1, | |
| 1 | |
| ], | |
| "n_dims_per_step": -1 | |
| } | |
| ``` | |
| ### Evaluation Dataset | |
| #### Omartificial-Intelligence-Space/arabic-n_li-triplet | |
| * Dataset: Omartificial-Intelligence-Space/arabic-n_li-triplet | |
| * Size: 6,584 evaluation samples | |
| * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code> | |
| * Approximate statistics based on the first 1000 samples: | |
| | | anchor | positive | negative | | |
| |:--------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------| | |
| | type | string | string | string | | |
| | details | <ul><li>min: 4 tokens</li><li>mean: 14.78 tokens</li><li>max: 70 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 7.41 tokens</li><li>max: 29 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 7.95 tokens</li><li>max: 21 tokens</li></ul> | | |
| * Samples: | |
| | anchor | positive | negative | | |
| |:-----------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------|:---------------------------------------------------| | |
| | <code>امرأتان يتعانقان بينما يحملان حزمة</code> | <code>إمرأتان يحملان حزمة</code> | <code>الرجال يتشاجرون خارج مطعم</code> | | |
| | <code>طفلين صغيرين يرتديان قميصاً أزرق، أحدهما يرتدي الرقم 9 والآخر يرتدي الرقم 2 يقفان على خطوات خشبية في الحمام ويغسلان أيديهما في المغسلة.</code> | <code>طفلين يرتديان قميصاً مرقماً يغسلون أيديهم</code> | <code>طفلين يرتديان سترة يذهبان إلى المدرسة</code> | | |
| | <code>رجل يبيع الدونات لعميل خلال معرض عالمي أقيم في مدينة أنجليس</code> | <code>رجل يبيع الدونات لعميل</code> | <code>امرأة تشرب قهوتها في مقهى صغير</code> | | |
| * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: | |
| ```json | |
| { | |
| "loss": "MultipleNegativesRankingLoss", | |
| "matryoshka_dims": [ | |
| 768, | |
| 512, | |
| 256, | |
| 128, | |
| 64 | |
| ], | |
| "matryoshka_weights": [ | |
| 1, | |
| 1, | |
| 1, | |
| 1, | |
| 1 | |
| ], | |
| "n_dims_per_step": -1 | |
| } | |
| ``` | |
| ### Training Hyperparameters | |
| #### Non-Default Hyperparameters | |
| - `per_device_train_batch_size`: 64 | |
| - `per_device_eval_batch_size`: 64 | |
| - `num_train_epochs`: 1 | |
| - `warmup_ratio`: 0.1 | |
| - `fp16`: True | |
| - `batch_sampler`: no_duplicates | |
| #### All Hyperparameters | |
| <details><summary>Click to expand</summary> | |
| - `overwrite_output_dir`: False | |
| - `do_predict`: False | |
| - `prediction_loss_only`: True | |
| - `per_device_train_batch_size`: 64 | |
| - `per_device_eval_batch_size`: 64 | |
| - `per_gpu_train_batch_size`: None | |
| - `per_gpu_eval_batch_size`: None | |
| - `gradient_accumulation_steps`: 1 | |
| - `eval_accumulation_steps`: None | |
| - `learning_rate`: 5e-05 | |
| - `weight_decay`: 0.0 | |
| - `adam_beta1`: 0.9 | |
| - `adam_beta2`: 0.999 | |
| - `adam_epsilon`: 1e-08 | |
| - `max_grad_norm`: 1.0 | |
| - `num_train_epochs`: 1 | |
| - `max_steps`: -1 | |
| - `lr_scheduler_type`: linear | |
| - `lr_scheduler_kwargs`: {} | |
| - `warmup_ratio`: 0.1 | |
| - `warmup_steps`: 0 | |
| - `log_level`: passive | |
| - `log_level_replica`: warning | |
| - `log_on_each_node`: True | |
| - `logging_nan_inf_filter`: True | |
| - `save_safetensors`: True | |
| - `save_on_each_node`: False | |
| - `save_only_model`: False | |
| - `no_cuda`: False | |
| - `use_cpu`: False | |
| - `use_mps_device`: False | |
| - `seed`: 42 | |
| - `data_seed`: None | |
| - `jit_mode_eval`: False | |
| - `use_ipex`: False | |
| - `bf16`: False | |
| - `fp16`: True | |
| - `fp16_opt_level`: O1 | |
| - `half_precision_backend`: auto | |
| - `bf16_full_eval`: False | |
| - `fp16_full_eval`: False | |
| - `tf32`: None | |
| - `local_rank`: 0 | |
| - `ddp_backend`: None | |
| - `tpu_num_cores`: None | |
| - `tpu_metrics_debug`: False | |
| - `debug`: [] | |
| - `dataloader_drop_last`: False | |
| - `dataloader_num_workers`: 0 | |
| - `dataloader_prefetch_factor`: None | |
| - `past_index`: -1 | |
| - `disable_tqdm`: False | |
| - `remove_unused_columns`: True | |
| - `label_names`: None | |
| - `load_best_model_at_end`: False | |
| - `ignore_data_skip`: False | |
| - `fsdp`: [] | |
| - `fsdp_min_num_params`: 0 | |
| - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} | |
| - `fsdp_transformer_layer_cls_to_wrap`: None | |
| - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'gradient_accumulation_kwargs': None} | |
| - `deepspeed`: None | |
| - `label_smoothing_factor`: 0.0 | |
| - `optim`: adamw_torch | |
| - `optim_args`: None | |
| - `adafactor`: False | |
| - `group_by_length`: False | |
| - `length_column_name`: length | |
| - `ddp_find_unused_parameters`: None | |
| - `ddp_bucket_cap_mb`: None | |
| - `ddp_broadcast_buffers`: False | |
| - `dataloader_pin_memory`: True | |
| - `dataloader_persistent_workers`: False | |
| - `skip_memory_metrics`: True | |
| - `use_legacy_prediction_loop`: False | |
| - `push_to_hub`: False | |
| - `resume_from_checkpoint`: None | |
| - `hub_model_id`: None | |
| - `hub_strategy`: every_save | |
| - `hub_private_repo`: False | |
| - `hub_always_push`: False | |
| - `gradient_checkpointing`: False | |
| - `gradient_checkpointing_kwargs`: None | |
| - `include_inputs_for_metrics`: False | |
| - `eval_do_concat_batches`: True | |
| - `fp16_backend`: auto | |
| - `push_to_hub_model_id`: None | |
| - `push_to_hub_organization`: None | |
| - `mp_parameters`: | |
| - `auto_find_batch_size`: False | |
| - `full_determinism`: False | |
| - `torchdynamo`: None | |
| - `ray_scope`: last | |
| - `ddp_timeout`: 1800 | |
| - `torch_compile`: False | |
| - `torch_compile_backend`: None | |
| - `torch_compile_mode`: None | |
| - `dispatch_batches`: None | |
| - `split_batches`: None | |
| - `include_tokens_per_second`: False | |
| - `include_num_input_tokens_seen`: False | |
| - `neftune_noise_alpha`: None | |
| - `optim_target_modules`: None | |
| - `batch_sampler`: no_duplicates | |
| - `multi_dataset_batch_sampler`: proportional | |
| </details> | |
| ### Training Logs | |
| | Epoch | Step | Training Loss | sts-test-128_spearman_cosine | sts-test-256_spearman_cosine | sts-test-512_spearman_cosine | sts-test-64_spearman_cosine | sts-test-768_spearman_cosine | | |
| |:------:|:----:|:-------------:|:----------------------------:|:----------------------------:|:----------------------------:|:---------------------------:|:----------------------------:| | |
| | 0.0229 | 200 | 25.0771 | - | - | - | - | - | | |
| | 0.0459 | 400 | 9.1435 | - | - | - | - | - | | |
| | 0.0688 | 600 | 8.0492 | - | - | - | - | - | | |
| | 0.0918 | 800 | 7.1378 | - | - | - | - | - | | |
| | 0.1147 | 1000 | 7.6249 | - | - | - | - | - | | |
| | 0.1377 | 1200 | 7.3604 | - | - | - | - | - | | |
| | 0.1606 | 1400 | 6.5783 | - | - | - | - | - | | |
| | 0.1835 | 1600 | 6.4145 | - | - | - | - | - | | |
| | 0.2065 | 1800 | 6.1781 | - | - | - | - | - | | |
| | 0.2294 | 2000 | 6.2375 | - | - | - | - | - | | |
| | 0.2524 | 2200 | 6.2587 | - | - | - | - | - | | |
| | 0.2753 | 2400 | 6.0826 | - | - | - | - | - | | |
| | 0.2983 | 2600 | 6.1514 | - | - | - | - | - | | |
| | 0.3212 | 2800 | 5.6949 | - | - | - | - | - | | |
| | 0.3442 | 3000 | 6.0062 | - | - | - | - | - | | |
| | 0.3671 | 3200 | 5.7551 | - | - | - | - | - | | |
| | 0.3900 | 3400 | 5.658 | - | - | - | - | - | | |
| | 0.4130 | 3600 | 5.7135 | - | - | - | - | - | | |
| | 0.4359 | 3800 | 5.3909 | - | - | - | - | - | | |
| | 0.4589 | 4000 | 5.5068 | - | - | - | - | - | | |
| | 0.4818 | 4200 | 5.2261 | - | - | - | - | - | | |
| | 0.5048 | 4400 | 5.1674 | - | - | - | - | - | | |
| | 0.5277 | 4600 | 5.0427 | - | - | - | - | - | | |
| | 0.5506 | 4800 | 5.3824 | - | - | - | - | - | | |
| | 0.5736 | 5000 | 5.3063 | - | - | - | - | - | | |
| | 0.5965 | 5200 | 5.2174 | - | - | - | - | - | | |
| | 0.6195 | 5400 | 5.2116 | - | - | - | - | - | | |
| | 0.6424 | 5600 | 5.2226 | - | - | - | - | - | | |
| | 0.6654 | 5800 | 5.2051 | - | - | - | - | - | | |
| | 0.6883 | 6000 | 5.204 | - | - | - | - | - | | |
| | 0.7113 | 6200 | 5.154 | - | - | - | - | - | | |
| | 0.7342 | 6400 | 5.0236 | - | - | - | - | - | | |
| | 0.7571 | 6600 | 4.9476 | - | - | - | - | - | | |
| | 0.7801 | 6800 | 4.0164 | - | - | - | - | - | | |
| | 0.8030 | 7000 | 3.5707 | - | - | - | - | - | | |
| | 0.8260 | 7200 | 3.3586 | - | - | - | - | - | | |
| | 0.8489 | 7400 | 3.2376 | - | - | - | - | - | | |
| | 0.8719 | 7600 | 3.0282 | - | - | - | - | - | | |
| | 0.8948 | 7800 | 2.901 | - | - | - | - | - | | |
| | 0.9177 | 8000 | 2.9371 | - | - | - | - | - | | |
| | 0.9407 | 8200 | 2.8362 | - | - | - | - | - | | |
| | 0.9636 | 8400 | 2.8121 | - | - | - | - | - | | |
| | 0.9866 | 8600 | 2.7105 | - | - | - | - | - | | |
| | 1.0 | 8717 | - | 0.6523 | 0.6593 | 0.6648 | 0.6370 | 0.6117 | | |
| ### Framework Versions | |
| - Python: 3.9.18 | |
| - Sentence Transformers: 3.0.1 | |
| - Transformers: 4.40.0 | |
| - PyTorch: 2.2.2+cu121 | |
| - Accelerate: 0.26.1 | |
| - Datasets: 2.19.0 | |
| - Tokenizers: 0.19.1 | |
| ## Citation | |
| ### BibTeX | |
| #### Sentence Transformers | |
| ```bibtex | |
| @inproceedings{reimers-2019-sentence-bert, | |
| title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", | |
| author = "Reimers, Nils and Gurevych, Iryna", | |
| booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", | |
| month = "11", | |
| year = "2019", | |
| publisher = "Association for Computational Linguistics", | |
| url = "https://arxiv.org/abs/1908.10084", | |
| } | |
| ``` | |
| #### MatryoshkaLoss | |
| ```bibtex | |
| @misc{kusupati2024matryoshka, | |
| title={Matryoshka Representation Learning}, | |
| author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi}, | |
| year={2024}, | |
| eprint={2205.13147}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.LG} | |
| } | |
| ``` | |
| #### MultipleNegativesRankingLoss | |
| ```bibtex | |
| @misc{henderson2017efficient, | |
| title={Efficient Natural Language Response Suggestion for Smart Reply}, | |
| author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, | |
| year={2017}, | |
| eprint={1705.00652}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CL} | |
| } | |
| ``` | |
| ## <span style="color:blue">Acknowledgments</span> | |
| The author would like to thank Prince Sultan University for their invaluable support in this project. Their contributions and resources have been instrumental in the development and fine-tuning of these models. | |
| ```markdown | |
| ## Citation | |
| If you use the Arabic Matryoshka Embeddings Model, please cite it as follows: | |
| ```bibtex | |
| @misc{nacar2024enhancingsemanticsimilarityunderstanding, | |
| title={Enhancing Semantic Similarity Understanding in Arabic NLP with Nested Embedding Learning}, | |
| author={Omer Nacar and Anis Koubaa}, | |
| year={2024}, | |
| eprint={2407.21139}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CL}, | |
| url={https://arxiv.org/abs/2407.21139}, | |
| } |