Omartificial-Intelligence-Space's picture
Improve model card: link to paper, set correct pipeline tag (#3)
e956be2 verified about 1 year ago
18.7 kB
base_model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
datasets:
  - Omartificial-Intelligence-Space/Arabic-NLi-Triplet
language:
  - ar
library_name: sentence-transformers
license: apache-2.0
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
pipeline_tag: feature-extraction
tags:
  - mteb
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:557850
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
widget:
  - source_sentence: ذكر متوازن بعناية يقف على قدم واحدة بالقرب من منطقة شاطئ المحيط النظيفة
    sentences:
      - رجل يقدم عرضاً
      - هناك رجل بالخارج قرب الشاطئ
      - رجل يجلس على أريكه
  - source_sentence: رجل يقفز إلى سريره القذر
    sentences:
      - السرير قذر.
      - رجل يضحك أثناء غسيل الملابس
      - الرجل على القمر
  - source_sentence: الفتيات بالخارج
    sentences:
      - امرأة تلف الخيط إلى كرات بجانب كومة من الكرات
      - فتيان يركبان في جولة متعة
      - >-
        ثلاث فتيات يقفون سوية في غرفة واحدة تستمع وواحدة تكتب على الحائط
        والثالثة تتحدث إليهن
  - source_sentence: الرجل يرتدي قميصاً أزرق.
    sentences:
      - >-
        رجل يرتدي قميصاً أزرق يميل إلى الجدار بجانب الطريق مع شاحنة زرقاء وسيارة
        حمراء مع الماء في الخلفية.
      - كتاب القصص مفتوح
      - رجل يرتدي قميص أسود يعزف على الجيتار.
  - source_sentence: يجلس شاب ذو شعر أشقر على الحائط يقرأ جريدة بينما تمر امرأة وفتاة شابة.
    sentences:
      - ذكر شاب ينظر إلى جريدة بينما تمر إمرأتان بجانبه
      - رجل يستلقي على وجهه على مقعد في الحديقة.
      - الشاب نائم بينما الأم تقود ابنتها إلى الحديقة
model-index:
  - name: >-
      SentenceTransformer based on
      sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
    results:
      - task:
          type: Retrieval
        dataset:
          name: MTEB MintakaRetrieval (ar)
          type: mintaka/mmteb-mintaka
          config: ar
          split: test
          revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e
        metrics:
          - type: main_score
            value: 12.493
          - type: map_at_1
            value: 5.719
          - type: map_at_3
            value: 8.269
          - type: map_at_5
            value: 9.172
          - type: map_at_10
            value: 9.894
          - type: ndcg_at_1
            value: 5.719
          - type: ndcg_at_3
            value: 9.128
          - type: ndcg_at_5
            value: 10.745
          - type: ndcg_at_10
            value: 12.493
          - type: recall_at_1
            value: 5.719
          - type: recall_at_3
            value: 11.621
          - type: recall_at_5
            value: 15.524
          - type: recall_at_10
            value: 20.926
          - type: precision_at_1
            value: 5.719
          - type: precision_at_3
            value: 3.874
          - type: precision_at_5
            value: 3.105
          - type: precision_at_10
            value: 2.093
          - type: mrr_at_1
            value: 5.7195
          - type: mrr_at_3
            value: 8.269
          - type: mrr_at_5
            value: 9.1723
          - type: mrr_at_10
            value: 9.8942
      - task:
          type: Retrieval
        dataset:
          name: MTEB MIRACLRetrievalHardNegatives (ar)
          type: miracl/mmteb-miracl-hardnegatives
          config: ar
          split: dev
          revision: 95c8db7d4a6e9c1d8a60601afd63d553ae20a2eb
        metrics:
          - type: main_score
            value: 22.396
          - type: map_at_1
            value: 8.866
          - type: map_at_3
            value: 13.905
          - type: map_at_5
            value: 15.326
          - type: map_at_10
            value: 16.851
          - type: ndcg_at_1
            value: 13.9
          - type: ndcg_at_3
            value: 17.309
          - type: ndcg_at_5
            value: 19.174
          - type: ndcg_at_10
            value: 22.396
          - type: recall_at_1
            value: 8.866
          - type: recall_at_3
            value: 19.177
          - type: recall_at_5
            value: 23.999
          - type: recall_at_10
            value: 32.421
          - type: precision_at_1
            value: 13.9
          - type: precision_at_3
            value: 10.933
          - type: precision_at_5
            value: 8.5
          - type: precision_at_10
            value: 5.96
          - type: mrr_at_1
            value: 13.9
          - type: mrr_at_3
            value: 20.0667
          - type: mrr_at_5
            value: 21.3617
          - type: mrr_at_10
            value: 22.7531
      - task:
          type: Retrieval
        dataset:
          name: MTEB MLQARetrieval (ar)
          type: mlqa/mmteb-mlqa
          config: ar
          split: validation
          revision: 397ed406c1a7902140303e7faf60fff35b58d285
        metrics:
          - type: main_score
            value: 57.312
          - type: map_at_1
            value: 44.487
          - type: map_at_3
            value: 50.516
          - type: map_at_5
            value: 51.715
          - type: map_at_10
            value: 52.778
          - type: ndcg_at_1
            value: 44.487
          - type: ndcg_at_3
            value: 52.586
          - type: ndcg_at_5
            value: 54.742
          - type: ndcg_at_10
            value: 57.312
          - type: recall_at_1
            value: 44.487
          - type: recall_at_3
            value: 58.607
          - type: recall_at_5
            value: 63.83
          - type: recall_at_10
            value: 71.76
          - type: precision_at_1
            value: 44.487
          - type: precision_at_3
            value: 19.536
          - type: precision_at_5
            value: 12.766
          - type: precision_at_10
            value: 7.176
          - type: mrr_at_1
            value: 44.4874
          - type: mrr_at_3
            value: 50.5158
          - type: mrr_at_5
            value: 51.715
          - type: mrr_at_10
            value: 52.7782
      - task:
          type: Retrieval
        dataset:
          name: MTEB SadeemQuestionRetrieval (ar)
          type: sadeem/mmteb-sadeem
          config: default
          split: test
          revision: 3cb0752b182e5d5d740df547748b06663c8e0bd9
        metrics:
          - type: main_score
            value: 52.976
          - type: map_at_1
            value: 22.307
          - type: map_at_3
            value: 41.727
          - type: map_at_5
            value: 43.052
          - type: map_at_10
            value: 43.844
          - type: ndcg_at_1
            value: 22.307
          - type: ndcg_at_3
            value: 48.7
          - type: ndcg_at_5
            value: 51.057
          - type: ndcg_at_10
            value: 52.976
          - type: recall_at_1
            value: 22.307
          - type: recall_at_3
            value: 69.076
          - type: recall_at_5
            value: 74.725
          - type: recall_at_10
            value: 80.661
          - type: precision_at_1
            value: 22.307
          - type: precision_at_3
            value: 23.025
          - type: precision_at_5
            value: 14.945
          - type: precision_at_10
            value: 8.066
          - type: mrr_at_1
            value: 21.0148
          - type: mrr_at_3
            value: 40.8808
          - type: mrr_at_5
            value: 42.1254
          - type: mrr_at_10
            value: 42.9125
      - task:
          type: STS
        dataset:
          name: MTEB BIOSSES (default)
          type: mteb/biosses-sts
          config: default
          split: test
          revision: d3fb88f8f02e40887cd149695127462bbcf29b4a
        metrics:
          - type: cosine_pearson
            value: 72.5081840952171
          - type: cosine_spearman
            value: 69.41362982941537
          - type: euclidean_pearson
            value: 67.45121490183709
          - type: euclidean_spearman
            value: 67.15273493989758
          - type: main_score
            value: 69.41362982941537
          - type: manhattan_pearson
            value: 67.6119022794479
          - type: manhattan_spearman
            value: 67.51659865246586
      - task:
          type: STS
        dataset:
          name: MTEB SICK-R (default)
          type: mteb/sickr-sts
          config: default
          split: test
          revision: 20a6d6f312dd54037fe07a32d58e5e168867909d
        metrics:
          - type: cosine_pearson
            value: 83.61591268324493
          - type: cosine_spearman
            value: 79.61914245705792
          - type: euclidean_pearson
            value: 81.32044881859483
          - type: euclidean_spearman
            value: 79.04866675279919
          - type: main_score
            value: 79.61914245705792
          - type: manhattan_pearson
            value: 81.09220518201322
          - type: manhattan_spearman
            value: 78.87590523907905
      - task:
          type: STS
        dataset:
          name: MTEB STS12 (default)
          type: mteb/sts12-sts
          config: default
          split: test
          revision: a0d554a64d88156834ff5ae9920b964011b16384
        metrics:
          - type: cosine_pearson
            value: 84.59807803376341
          - type: cosine_spearman
            value: 77.38689922564416
          - type: euclidean_pearson
            value: 83.92034850646732
          - type: euclidean_spearman
            value: 76.75857193093438
          - type: main_score
            value: 77.38689922564416
          - type: manhattan_pearson
            value: 83.97191863964667
          - type: manhattan_spearman
            value: 76.89790070725708
      - task:
          type: STS
        dataset:
          name: MTEB STS13 (default)
          type: mteb/sts13-sts
          config: default
          split: test
          revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca
        metrics:
          - type: cosine_pearson
            value: 78.18664268536664
          - type: cosine_spearman
            value: 79.58989311630421
          - type: euclidean_pearson
            value: 79.25259731614729
          - type: euclidean_spearman
            value: 80.1701122827397
          - type: main_score
            value: 79.58989311630421
          - type: manhattan_pearson
            value: 79.12601451996869
          - type: manhattan_spearman
            value: 79.98999436073663
      - task:
          type: STS
        dataset:
          name: MTEB STS14 (default)
          type: mteb/sts14-sts
          config: default
          split: test
          revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375
        metrics:
          - type: cosine_pearson
            value: 80.97541876658141
          - type: cosine_spearman
            value: 79.78614320477877
          - type: euclidean_pearson
            value: 81.01514505747167
          - type: euclidean_spearman
            value: 80.73664735567839
          - type: main_score
            value: 79.78614320477877
          - type: manhattan_pearson
            value: 80.8746560526314
          - type: manhattan_spearman
            value: 80.67025673179079
      - task:
          type: STS
        dataset:
          name: MTEB STS15 (default)
          type: mteb/sts15-sts
          config: default
          split: test
          revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3
        metrics:
          - type: cosine_pearson
            value: 85.23661155813113
          - type: cosine_spearman
            value: 86.21134464371615
          - type: euclidean_pearson
            value: 85.82518684522182
          - type: euclidean_spearman
            value: 86.43600784349509
          - type: main_score
            value: 86.21134464371615
          - type: manhattan_pearson
            value: 85.83101152371589
          - type: manhattan_spearman
            value: 86.42228695679498
      - task:
          type: STS
        dataset:
          name: MTEB STS16 (default)
          type: mteb/sts16-sts
          config: default
          split: test
          revision: 4d8694f8f0e0100860b497b999b3dbed754a0513
        metrics:
          - type: cosine_pearson
            value: 79.20106689077852
          - type: cosine_spearman
            value: 81.39570893867825
          - type: euclidean_pearson
            value: 80.39578888768929
          - type: euclidean_spearman
            value: 81.19950443340412
          - type: main_score
            value: 81.39570893867825
          - type: manhattan_pearson
            value: 80.2226679341839
          - type: manhattan_spearman
            value: 80.99142422593823
      - task:
          type: STS
        dataset:
          name: MTEB STS17 (ar-ar)
          type: mteb/sts17-crosslingual-sts
          config: ar-ar
          split: test
          revision: faeb762787bd10488a50c8b5be4a3b82e411949c
        metrics:
          - type: cosine_pearson
            value: 81.05294851623468
          - type: cosine_spearman
            value: 81.10570655134113
          - type: euclidean_pearson
            value: 79.22292773537778
          - type: euclidean_spearman
            value: 78.84204232638425
          - type: main_score
            value: 81.10570655134113
          - type: manhattan_pearson
            value: 79.43750460320484
          - type: manhattan_spearman
            value: 79.33713593557482
      - task:
          type: STS
        dataset:
          name: MTEB STS22 (ar)
          type: mteb/sts22-crosslingual-sts
          config: ar
          split: test
          revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3
        metrics:
          - type: cosine_pearson
            value: 45.96875498680092
          - type: cosine_spearman
            value: 52.405509117149904
          - type: euclidean_pearson
            value: 42.097450896728226
          - type: euclidean_spearman
            value: 50.89022884113707
          - type: main_score
            value: 52.405509117149904
          - type: manhattan_pearson
            value: 42.22827727075534
          - type: manhattan_spearman
            value: 50.912841055442634
      - task:
          type: STS
        dataset:
          name: MTEB STSBenchmark (default)
          type: mteb/stsbenchmark-sts
          config: default
          split: test
          revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831
        metrics:
          - type: cosine_pearson
            value: 83.13261516884116
          - type: cosine_spearman
            value: 84.3492527221498
          - type: euclidean_pearson
            value: 82.691603178401
          - type: euclidean_spearman
            value: 83.0499566200785
          - type: main_score
            value: 84.3492527221498
          - type: manhattan_pearson
            value: 82.68307441014618
          - type: manhattan_spearman
            value: 83.01315787964519
      - task:
          type: Summarization
        dataset:
          name: MTEB SummEval (default)
          type: mteb/summeval
          config: default
          split: test
          revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c
        metrics:
          - type: cosine_pearson
            value: 31.149232235402845
          - type: cosine_spearman
            value: 30.685504130606255
          - type: dot_pearson
            value: 27.466307571160375
          - type: dot_spearman
            value: 28.93064261485915
          - type: main_score
            value: 30.685504130606255
          - type: pearson
            value: 31.149232235402845
          - type: spearman
            value: 30.685504130606255
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test 256
          type: sts-test-256
        metrics:
          - type: pearson_cosine
            value: 0.8264447022356382
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8386403752382455
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8219134931449013
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.825509659109493
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8223094468630248
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8260503151751462
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.6375226884845725
            name: Pearson Dot
          - type: spearman_dot
            value: 0.6287228614640888
            name: Spearman Dot
          - type: pearson_max
            value: 0.8264447022356382
            name: Pearson Max
          - type: spearman_max
            value: 0.8386403752382455
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test 128
          type: sts-test-128
        metrics:
          - type: pearson_cosine
            value: 0.8209661910768973
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8347149482673766
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8082811559854036
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8148314269262763
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.8093138512113149
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8156468458613929
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.5795109620454884
            name: Pearson Dot
          - type: spearman_dot
            value: 0.5760223026552876
            name: Spearman Dot
          - type: pearson_max
            value: 0.8209661910768973
            name: Pearson Max
          - type: spearman_max
            value: 0.8347149482673766
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test 64
          type: sts-test-64
        metrics:
          - type: pearson_cosine
            value: 0.808708530451336
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8217532539767914
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.7876121380998453
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.7969092304137347
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.7902997966909958
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.7987635968785215
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.495047136234386
            name: Pearson Dot
          - type: spearman_dot
            value: 0.49287000679901516
            name: Spearman Dot
          - type: pearson_max
            value: 0.808708530451336
            name: Pearson Max
          - type: spearman_max
            value: 0.8217532539767914
            name: Spearman Max
SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

This is a sentence-transformers model finetuned from sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 on the Omartificial-Intelligence-Space/arabic-n_li-triplet dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. This model is part of the Arabic Matryoshka Embedding Models collection. It was presented in the paper GATE: General Arabic Text Embedding for Enhanced Semantic Textual Similarity with Matryoshka Representation Learning and Hybrid Loss Training.
Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
Maximum Sequence Length: 128 tokens
Output Dimensionality: 384 tokens
Similarity Function: Cosine Similarity
Training Dataset:
- Omartificial-Intelligence-Space/arabic-n_li-triplet
Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
**Hugging