--- base_model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 datasets: - Omartificial-Intelligence-Space/Arabic-NLi-Triplet language: - ar library_name: sentence-transformers license: apache-2.0 metrics: - pearson_cosine - spearman_cosine - pearson_manhattan - spearman_manhattan - pearson_euclidean - spearman_euclidean - pearson_dot - spearman_dot - pearson_max - spearman_max pipeline_tag: feature-extraction tags: - mteb - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:557850 - loss:MatryoshkaLoss - loss:MultipleNegativesRankingLoss widget: - source_sentence: ذكر متوازن بعناية يقف على قدم واحدة بالقرب من منطقة شاطئ المحيط النظيفة sentences: - رجل يقدم عرضاً - هناك رجل بالخارج قرب الشاطئ - رجل يجلس على أريكه - source_sentence: رجل يقفز إلى سريره القذر sentences: - السرير قذر. - رجل يضحك أثناء غسيل الملابس - الرجل على القمر - source_sentence: الفتيات بالخارج sentences: - امرأة تلف الخيط إلى كرات بجانب كومة من الكرات - فتيان يركبان في جولة متعة - ثلاث فتيات يقفون سوية في غرفة واحدة تستمع وواحدة تكتب على الحائط والثالثة تتحدث إليهن - source_sentence: الرجل يرتدي قميصاً أزرق. sentences: - رجل يرتدي قميصاً أزرق يميل إلى الجدار بجانب الطريق مع شاحنة زرقاء وسيارة حمراء مع الماء في الخلفية. - كتاب القصص مفتوح - رجل يرتدي قميص أسود يعزف على الجيتار. - source_sentence: يجلس شاب ذو شعر أشقر على الحائط يقرأ جريدة بينما تمر امرأة وفتاة شابة. sentences: - ذكر شاب ينظر إلى جريدة بينما تمر إمرأتان بجانبه - رجل يستلقي على وجهه على مقعد في الحديقة. - الشاب نائم بينما الأم تقود ابنتها إلى الحديقة model-index: - name: SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 results: - task: type: Retrieval dataset: name: MTEB MintakaRetrieval (ar) type: mintaka/mmteb-mintaka config: ar split: test revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e metrics: - type: main_score value: 12.493 - type: map_at_1 value: 5.719 - type: map_at_3 value: 8.269 - type: map_at_5 value: 9.172 - type: map_at_10 value: 9.894 - type: ndcg_at_1 value: 5.719 - type: ndcg_at_3 value: 9.128 - type: ndcg_at_5 value: 10.745 - type: ndcg_at_10 value: 12.493 - type: recall_at_1 value: 5.719 - type: recall_at_3 value: 11.621 - type: recall_at_5 value: 15.524 - type: recall_at_10 value: 20.926 - type: precision_at_1 value: 5.719 - type: precision_at_3 value: 3.874 - type: precision_at_5 value: 3.105 - type: precision_at_10 value: 2.093 - type: mrr_at_1 value: 5.7195 - type: mrr_at_3 value: 8.269 - type: mrr_at_5 value: 9.1723 - type: mrr_at_10 value: 9.8942 - task: type: Retrieval dataset: name: MTEB MIRACLRetrievalHardNegatives (ar) type: miracl/mmteb-miracl-hardnegatives config: ar split: dev revision: 95c8db7d4a6e9c1d8a60601afd63d553ae20a2eb metrics: - type: main_score value: 22.396 - type: map_at_1 value: 8.866 - type: map_at_3 value: 13.905 - type: map_at_5 value: 15.326 - type: map_at_10 value: 16.851 - type: ndcg_at_1 value: 13.9 - type: ndcg_at_3 value: 17.309 - type: ndcg_at_5 value: 19.174 - type: ndcg_at_10 value: 22.396 - type: recall_at_1 value: 8.866 - type: recall_at_3 value: 19.177 - type: recall_at_5 value: 23.999 - type: recall_at_10 value: 32.421 - type: precision_at_1 value: 13.9 - type: precision_at_3 value: 10.933 - type: precision_at_5 value: 8.5 - type: precision_at_10 value: 5.96 - type: mrr_at_1 value: 13.9 - type: mrr_at_3 value: 20.0667 - type: mrr_at_5 value: 21.3617 - type: mrr_at_10 value: 22.7531 - task: type: Retrieval dataset: name: MTEB MLQARetrieval (ar) type: mlqa/mmteb-mlqa config: ar split: validation revision: 397ed406c1a7902140303e7faf60fff35b58d285 metrics: - type: main_score value: 57.312 - type: map_at_1 value: 44.487 - type: map_at_3 value: 50.516 - type: map_at_5 value: 51.715 - type: map_at_10 value: 52.778 - type: ndcg_at_1 value: 44.487 - type: ndcg_at_3 value: 52.586 - type: ndcg_at_5 value: 54.742 - type: ndcg_at_10 value: 57.312 - type: recall_at_1 value: 44.487 - type: recall_at_3 value: 58.607 - type: recall_at_5 value: 63.83 - type: recall_at_10 value: 71.76 - type: precision_at_1 value: 44.487 - type: precision_at_3 value: 19.536 - type: precision_at_5 value: 12.766 - type: precision_at_10 value: 7.176 - type: mrr_at_1 value: 44.4874 - type: mrr_at_3 value: 50.5158 - type: mrr_at_5 value: 51.715 - type: mrr_at_10 value: 52.7782 - task: type: Retrieval dataset: name: MTEB SadeemQuestionRetrieval (ar) type: sadeem/mmteb-sadeem config: default split: test revision: 3cb0752b182e5d5d740df547748b06663c8e0bd9 metrics: - type: main_score value: 52.976 - type: map_at_1 value: 22.307 - type: map_at_3 value: 41.727 - type: map_at_5 value: 43.052 - type: map_at_10 value: 43.844 - type: ndcg_at_1 value: 22.307 - type: ndcg_at_3 value: 48.7 - type: ndcg_at_5 value: 51.057 - type: ndcg_at_10 value: 52.976 - type: recall_at_1 value: 22.307 - type: recall_at_3 value: 69.076 - type: recall_at_5 value: 74.725 - type: recall_at_10 value: 80.661 - type: precision_at_1 value: 22.307 - type: precision_at_3 value: 23.025 - type: precision_at_5 value: 14.945 - type: precision_at_10 value: 8.066 - type: mrr_at_1 value: 21.0148 - type: mrr_at_3 value: 40.8808 - type: mrr_at_5 value: 42.1254 - type: mrr_at_10 value: 42.9125 - task: type: STS dataset: name: MTEB BIOSSES (default) type: mteb/biosses-sts config: default split: test revision: d3fb88f8f02e40887cd149695127462bbcf29b4a metrics: - type: cosine_pearson value: 72.5081840952171 - type: cosine_spearman value: 69.41362982941537 - type: euclidean_pearson value: 67.45121490183709 - type: euclidean_spearman value: 67.15273493989758 - type: main_score value: 69.41362982941537 - type: manhattan_pearson value: 67.6119022794479 - type: manhattan_spearman value: 67.51659865246586 - task: type: STS dataset: name: MTEB SICK-R (default) type: mteb/sickr-sts config: default split: test revision: 20a6d6f312dd54037fe07a32d58e5e168867909d metrics: - type: cosine_pearson value: 83.61591268324493 - type: cosine_spearman value: 79.61914245705792 - type: euclidean_pearson value: 81.32044881859483 - type: euclidean_spearman value: 79.04866675279919 - type: main_score value: 79.61914245705792 - type: manhattan_pearson value: 81.09220518201322 - type: manhattan_spearman value: 78.87590523907905 - task: type: STS dataset: name: MTEB STS12 (default) type: mteb/sts12-sts config: default split: test revision: a0d554a64d88156834ff5ae9920b964011b16384 metrics: - type: cosine_pearson value: 84.59807803376341 - type: cosine_spearman value: 77.38689922564416 - type: euclidean_pearson value: 83.92034850646732 - type: euclidean_spearman value: 76.75857193093438 - type: main_score value: 77.38689922564416 - type: manhattan_pearson value: 83.97191863964667 - type: manhattan_spearman value: 76.89790070725708 - task: type: STS dataset: name: MTEB STS13 (default) type: mteb/sts13-sts config: default split: test revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca metrics: - type: cosine_pearson value: 78.18664268536664 - type: cosine_spearman value: 79.58989311630421 - type: euclidean_pearson value: 79.25259731614729 - type: euclidean_spearman value: 80.1701122827397 - type: main_score value: 79.58989311630421 - type: manhattan_pearson value: 79.12601451996869 - type: manhattan_spearman value: 79.98999436073663 - task: type: STS dataset: name: MTEB STS14 (default) type: mteb/sts14-sts config: default split: test revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 metrics: - type: cosine_pearson value: 80.97541876658141 - type: cosine_spearman value: 79.78614320477877 - type: euclidean_pearson value: 81.01514505747167 - type: euclidean_spearman value: 80.73664735567839 - type: main_score value: 79.78614320477877 - type: manhattan_pearson value: 80.8746560526314 - type: manhattan_spearman value: 80.67025673179079 - task: type: STS dataset: name: MTEB STS15 (default) type: mteb/sts15-sts config: default split: test revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 metrics: - type: cosine_pearson value: 85.23661155813113 - type: cosine_spearman value: 86.21134464371615 - type: euclidean_pearson value: 85.82518684522182 - type: euclidean_spearman value: 86.43600784349509 - type: main_score value: 86.21134464371615 - type: manhattan_pearson value: 85.83101152371589 - type: manhattan_spearman value: 86.42228695679498 - task: type: STS dataset: name: MTEB STS16 (default) type: mteb/sts16-sts config: default split: test revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 metrics: - type: cosine_pearson value: 79.20106689077852 - type: cosine_spearman value: 81.39570893867825 - type: euclidean_pearson value: 80.39578888768929 - type: euclidean_spearman value: 81.19950443340412 - type: main_score value: 81.39570893867825 - type: manhattan_pearson value: 80.2226679341839 - type: manhattan_spearman value: 80.99142422593823 - task: type: STS dataset: name: MTEB STS17 (ar-ar) type: mteb/sts17-crosslingual-sts config: ar-ar split: test revision: faeb762787bd10488a50c8b5be4a3b82e411949c metrics: - type: cosine_pearson value: 81.05294851623468 - type: cosine_spearman value: 81.10570655134113 - type: euclidean_pearson value: 79.22292773537778 - type: euclidean_spearman value: 78.84204232638425 - type: main_score value: 81.10570655134113 - type: manhattan_pearson value: 79.43750460320484 - type: manhattan_spearman value: 79.33713593557482 - task: type: STS dataset: name: MTEB STS22 (ar) type: mteb/sts22-crosslingual-sts config: ar split: test revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 metrics: - type: cosine_pearson value: 45.96875498680092 - type: cosine_spearman value: 52.405509117149904 - type: euclidean_pearson value: 42.097450896728226 - type: euclidean_spearman value: 50.89022884113707 - type: main_score value: 52.405509117149904 - type: manhattan_pearson value: 42.22827727075534 - type: manhattan_spearman value: 50.912841055442634 - task: type: STS dataset: name: MTEB STSBenchmark (default) type: mteb/stsbenchmark-sts config: default split: test revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 metrics: - type: cosine_pearson value: 83.13261516884116 - type: cosine_spearman value: 84.3492527221498 - type: euclidean_pearson value: 82.691603178401 - type: euclidean_spearman value: 83.0499566200785 - type: main_score value: 84.3492527221498 - type: manhattan_pearson value: 82.68307441014618 - type: manhattan_spearman value: 83.01315787964519 - task: type: Summarization dataset: name: MTEB SummEval (default) type: mteb/summeval config: default split: test revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c metrics: - type: cosine_pearson value: 31.149232235402845 - type: cosine_spearman value: 30.685504130606255 - type: dot_pearson value: 27.466307571160375 - type: dot_spearman value: 28.93064261485915 - type: main_score value: 30.685504130606255 - type: pearson value: 31.149232235402845 - type: spearman value: 30.685504130606255 - task: type: semantic-similarity name: Semantic Similarity dataset: name: sts test 256 type: sts-test-256 metrics: - type: pearson_cosine value: 0.8264447022356382 name: Pearson Cosine - type: spearman_cosine value: 0.8386403752382455 name: Spearman Cosine - type: pearson_manhattan value: 0.8219134931449013 name: Pearson Manhattan - type: spearman_manhattan value: 0.825509659109493 name: Spearman Manhattan - type: pearson_euclidean value: 0.8223094468630248 name: Pearson Euclidean - type: spearman_euclidean value: 0.8260503151751462 name: Spearman Euclidean - type: pearson_dot value: 0.6375226884845725 name: Pearson Dot - type: spearman_dot value: 0.6287228614640888 name: Spearman Dot - type: pearson_max value: 0.8264447022356382 name: Pearson Max - type: spearman_max value: 0.8386403752382455 name: Spearman Max - task: type: semantic-similarity name: Semantic Similarity dataset: name: sts test 128 type: sts-test-128 metrics: - type: pearson_cosine value: 0.8209661910768973 name: Pearson Cosine - type: spearman_cosine value: 0.8347149482673766 name: Spearman Cosine - type: pearson_manhattan value: 0.8082811559854036 name: Pearson Manhattan - type: spearman_manhattan value: 0.8148314269262763 name: Spearman Manhattan - type: pearson_euclidean value: 0.8093138512113149 name: Pearson Euclidean - type: spearman_euclidean value: 0.8156468458613929 name: Spearman Euclidean - type: pearson_dot value: 0.5795109620454884 name: Pearson Dot - type: spearman_dot value: 0.5760223026552876 name: Spearman Dot - type: pearson_max value: 0.8209661910768973 name: Pearson Max - type: spearman_max value: 0.8347149482673766 name: Spearman Max - task: type: semantic-similarity name: Semantic Similarity dataset: name: sts test 64 type: sts-test-64 metrics: - type: pearson_cosine value: 0.808708530451336 name: Pearson Cosine - type: spearman_cosine value: 0.8217532539767914 name: Spearman Cosine - type: pearson_manhattan value: 0.7876121380998453 name: Pearson Manhattan - type: spearman_manhattan value: 0.7969092304137347 name: Spearman Manhattan - type: pearson_euclidean value: 0.7902997966909958 name: Pearson Euclidean - type: spearman_euclidean value: 0.7987635968785215 name: Spearman Euclidean - type: pearson_dot value: 0.495047136234386 name: Pearson Dot - type: spearman_dot value: 0.49287000679901516 name: Spearman Dot - type: pearson_max value: 0.808708530451336 name: Pearson Max - type: spearman_max value: 0.8217532539767914 name: Spearman Max --- # SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) on the Omartificial-Intelligence-Space/arabic-n_li-triplet dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. This model is part of the [Arabic Matryoshka Embedding Models collection](https://huggingface.co/collections/Omartificial-Intelligence-Space/arabic-matryoshka-embedding-models-666f764d3b570f44d7f77d4e). It was presented in the paper [GATE: General Arabic Text Embedding for Enhanced Semantic Textual Similarity with Matryoshka Representation Learning and Hybrid Loss Training](https://huggingface.co/papers/2505.24581). ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) - **Maximum Sequence Length:** 128 tokens - **Output Dimensionality:** 384 tokens - **Similarity Function:** Cosine Similarity - **Training Dataset:** - Omartificial-Intelligence-Space/arabic-n_li-triplet ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging