Feature Extraction
sentence-transformers
Safetensors
Arabic
bert
mteb
sentence-similarity
Generated from Trainer
dataset_size:557850
loss:MatryoshkaLoss
loss:MultipleNegativesRankingLoss
Eval Results (legacy)
text-embeddings-inference
Instructions to use Omartificial-Intelligence-Space/Arabic-MiniLM-L12-v2-all-nli-triplet with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Omartificial-Intelligence-Space/Arabic-MiniLM-L12-v2-all-nli-triplet with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Omartificial-Intelligence-Space/Arabic-MiniLM-L12-v2-all-nli-triplet") sentences = [ "ذكر متوازن بعناية يقف على قدم واحدة بالقرب من منطقة شاطئ المحيط النظيفة", "رجل يقدم عرضاً", "هناك رجل بالخارج قرب الشاطئ", "رجل يجلس على أريكه" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
metadata
base_model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
datasets:
- Omartificial-Intelligence-Space/Arabic-NLi-Triplet
language:
- ar
library_name: sentence-transformers
license: apache-2.0
metrics:
- pearson_cosine
- spearman_cosine
- pearson_manhattan
- spearman_manhattan
- pearson_euclidean
- spearman_euclidean
- pearson_dot
- spearman_dot
- pearson_max
- spearman_max
pipeline_tag: feature-extraction
tags:
- mteb
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:557850
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
widget:
- source_sentence: ذكر متوازن بعناية يقف على قدم واحدة بالقرب من منطقة شاطئ المحيط النظيفة
sentences:
- رجل يقدم عرضاً
- هناك رجل بالخارج قرب الشاطئ
- رجل يجلس على أريكه
- source_sentence: رجل يقفز إلى سريره القذر
sentences:
- السرير قذر.
- رجل يضحك أثناء غسيل الملابس
- الرجل على القمر
- source_sentence: الفتيات بالخارج
sentences:
- امرأة تلف الخيط إلى كرات بجانب كومة من الكرات
- فتيان يركبان في جولة متعة
- >-
ثلاث فتيات يقفون سوية في غرفة واحدة تستمع وواحدة تكتب على الحائط
والثالثة تتحدث إليهن
- source_sentence: الرجل يرتدي قميصاً أزرق.
sentences:
- >-
رجل يرتدي قميصاً أزرق يميل إلى الجدار بجانب الطريق مع شاحنة زرقاء وسيارة
حمراء مع الماء في الخلفية.
- كتاب القصص مفتوح
- رجل يرتدي قميص أسود يعزف على الجيتار.
- source_sentence: يجلس شاب ذو شعر أشقر على الحائط يقرأ جريدة بينما تمر امرأة وفتاة شابة.
sentences:
- ذكر شاب ينظر إلى جريدة بينما تمر إمرأتان بجانبه
- رجل يستلقي على وجهه على مقعد في الحديقة.
- الشاب نائم بينما الأم تقود ابنتها إلى الحديقة
model-index:
- name: >-
SentenceTransformer based on
sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
results:
- task:
type: Retrieval
dataset:
name: MTEB MintakaRetrieval (ar)
type: mintaka/mmteb-mintaka
config: ar
split: test
revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e
metrics:
- type: main_score
value: 12.493
- type: map_at_1
value: 5.719
- type: map_at_3
value: 8.269
- type: map_at_5
value: 9.172
- type: map_at_10
value: 9.894
- type: ndcg_at_1
value: 5.719
- type: ndcg_at_3
value: 9.128
- type: ndcg_at_5
value: 10.745
- type: ndcg_at_10
value: 12.493
- type: recall_at_1
value: 5.719
- type: recall_at_3
value: 11.621
- type: recall_at_5
value: 15.524
- type: recall_at_10
value: 20.926
- type: precision_at_1
value: 5.719
- type: precision_at_3
value: 3.874
- type: precision_at_5
value: 3.105
- type: precision_at_10
value: 2.093
- type: mrr_at_1
value: 5.7195
- type: mrr_at_3
value: 8.269
- type: mrr_at_5
value: 9.1723
- type: mrr_at_10
value: 9.8942
- task:
type: Retrieval
dataset:
name: MTEB MIRACLRetrievalHardNegatives (ar)
type: miracl/mmteb-miracl-hardnegatives
config: ar
split: dev
revision: 95c8db7d4a6e9c1d8a60601afd63d553ae20a2eb
metrics:
- type: main_score
value: 22.396
- type: map_at_1
value: 8.866
- type: map_at_3
value: 13.905
- type: map_at_5
value: 15.326
- type: map_at_10
value: 16.851
- type: ndcg_at_1
value: 13.9
- type: ndcg_at_3
value: 17.309
- type: ndcg_at_5
value: 19.174
- type: ndcg_at_10
value: 22.396
- type: recall_at_1
value: 8.866
- type: recall_at_3
value: 19.177
- type: recall_at_5
value: 23.999
- type: recall_at_10
value: 32.421
- type: precision_at_1
value: 13.9
- type: precision_at_3
value: 10.933
- type: precision_at_5
value: 8.5
- type: precision_at_10
value: 5.96
- type: mrr_at_1
value: 13.9
- type: mrr_at_3
value: 20.0667
- type: mrr_at_5
value: 21.3617
- type: mrr_at_10
value: 22.7531
- task:
type: Retrieval
dataset:
name: MTEB MLQARetrieval (ar)
type: mlqa/mmteb-mlqa
config: ar
split: validation
revision: 397ed406c1a7902140303e7faf60fff35b58d285
metrics:
- type: main_score
value: 57.312
- type: map_at_1
value: 44.487
- type: map_at_3
value: 50.516
- type: map_at_5
value: 51.715
- type: map_at_10
value: 52.778
- type: ndcg_at_1
value: 44.487
- type: ndcg_at_3
value: 52.586
- type: ndcg_at_5
value: 54.742
- type: ndcg_at_10
value: 57.312
- type: recall_at_1
value: 44.487
- type: recall_at_3
value: 58.607
- type: recall_at_5
value: 63.83
- type: recall_at_10
value: 71.76
- type: precision_at_1
value: 44.487
- type: precision_at_3
value: 19.536
- type: precision_at_5
value: 12.766
- type: precision_at_10
value: 7.176
- type: mrr_at_1
value: 44.4874
- type: mrr_at_3
value: 50.5158
- type: mrr_at_5
value: 51.715
- type: mrr_at_10
value: 52.7782
- task:
type: Retrieval
dataset:
name: MTEB SadeemQuestionRetrieval (ar)
type: sadeem/mmteb-sadeem
config: default
split: test
revision: 3cb0752b182e5d5d740df547748b06663c8e0bd9
metrics:
- type: main_score
value: 52.976
- type: map_at_1
value: 22.307
- type: map_at_3
value: 41.727
- type: map_at_5
value: 43.052
- type: map_at_10
value: 43.844
- type: ndcg_at_1
value: 22.307
- type: ndcg_at_3
value: 48.7
- type: ndcg_at_5
value: 51.057
- type: ndcg_at_10
value: 52.976
- type: recall_at_1
value: 22.307
- type: recall_at_3
value: 69.076
- type: recall_at_5
value: 74.725
- type: recall_at_10
value: 80.661
- type: precision_at_1
value: 22.307
- type: precision_at_3
value: 23.025
- type: precision_at_5
value: 14.945
- type: precision_at_10
value: 8.066
- type: mrr_at_1
value: 21.0148
- type: mrr_at_3
value: 40.8808
- type: mrr_at_5
value: 42.1254
- type: mrr_at_10
value: 42.9125
- task:
type: STS
dataset:
name: MTEB BIOSSES (default)
type: mteb/biosses-sts
config: default
split: test
revision: d3fb88f8f02e40887cd149695127462bbcf29b4a
metrics:
- type: cosine_pearson
value: 72.5081840952171
- type: cosine_spearman
value: 69.41362982941537
- type: euclidean_pearson
value: 67.45121490183709
- type: euclidean_spearman
value: 67.15273493989758
- type: main_score
value: 69.41362982941537
- type: manhattan_pearson
value: 67.6119022794479
- type: manhattan_spearman
value: 67.51659865246586
- task:
type: STS
dataset:
name: MTEB SICK-R (default)
type: mteb/sickr-sts
config: default
split: test
revision: 20a6d6f312dd54037fe07a32d58e5e168867909d
metrics:
- type: cosine_pearson
value: 83.61591268324493
- type: cosine_spearman
value: 79.61914245705792
- type: euclidean_pearson
value: 81.32044881859483
- type: euclidean_spearman
value: 79.04866675279919
- type: main_score
value: 79.61914245705792
- type: manhattan_pearson
value: 81.09220518201322
- type: manhattan_spearman
value: 78.87590523907905
- task:
type: STS
dataset:
name: MTEB STS12 (default)
type: mteb/sts12-sts
config: default
split: test
revision: a0d554a64d88156834ff5ae9920b964011b16384
metrics:
- type: cosine_pearson
value: 84.59807803376341
- type: cosine_spearman
value: 77.38689922564416
- type: euclidean_pearson
value: 83.92034850646732
- type: euclidean_spearman
value: 76.75857193093438
- type: main_score
value: 77.38689922564416
- type: manhattan_pearson
value: 83.97191863964667
- type: manhattan_spearman
value: 76.89790070725708
- task:
type: STS
dataset:
name: MTEB STS13 (default)
type: mteb/sts13-sts
config: default
split: test
revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca
metrics:
- type: cosine_pearson
value: 78.18664268536664
- type: cosine_spearman
value: 79.58989311630421
- type: euclidean_pearson
value: 79.25259731614729
- type: euclidean_spearman
value: 80.1701122827397
- type: main_score
value: 79.58989311630421
- type: manhattan_pearson
value: 79.12601451996869
- type: manhattan_spearman
value: 79.98999436073663
- task:
type: STS
dataset:
name: MTEB STS14 (default)
type: mteb/sts14-sts
config: default
split: test
revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375
metrics:
- type: cosine_pearson
value: 80.97541876658141
- type: cosine_spearman
value: 79.78614320477877
- type: euclidean_pearson
value: 81.01514505747167
- type: euclidean_spearman
value: 80.73664735567839
- type: main_score
value: 79.78614320477877
- type: manhattan_pearson
value: 80.8746560526314
- type: manhattan_spearman
value: 80.67025673179079
- task:
type: STS
dataset:
name: MTEB STS15 (default)
type: mteb/sts15-sts
config: default
split: test
revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3
metrics:
- type: cosine_pearson
value: 85.23661155813113
- type: cosine_spearman
value: 86.21134464371615
- type: euclidean_pearson
value: 85.82518684522182
- type: euclidean_spearman
value: 86.43600784349509
- type: main_score
value: 86.21134464371615
- type: manhattan_pearson
value: 85.83101152371589
- type: manhattan_spearman
value: 86.42228695679498
- task:
type: STS
dataset:
name: MTEB STS16 (default)
type: mteb/sts16-sts
config: default
split: test
revision: 4d8694f8f0e0100860b497b999b3dbed754a0513
metrics:
- type: cosine_pearson
value: 79.20106689077852
- type: cosine_spearman
value: 81.39570893867825
- type: euclidean_pearson
value: 80.39578888768929
- type: euclidean_spearman
value: 81.19950443340412
- type: main_score
value: 81.39570893867825
- type: manhattan_pearson
value: 80.2226679341839
- type: manhattan_spearman
value: 80.99142422593823
- task:
type: STS
dataset:
name: MTEB STS17 (ar-ar)
type: mteb/sts17-crosslingual-sts
config: ar-ar
split: test
revision: faeb762787bd10488a50c8b5be4a3b82e411949c
metrics:
- type: cosine_pearson
value: 81.05294851623468
- type: cosine_spearman
value: 81.10570655134113
- type: euclidean_pearson
value: 79.22292773537778
- type: euclidean_spearman
value: 78.84204232638425
- type: main_score
value: 81.10570655134113
- type: manhattan_pearson
value: 79.43750460320484
- type: manhattan_spearman
value: 79.33713593557482
- task:
type: STS
dataset:
name: MTEB STS22 (ar)
type: mteb/sts22-crosslingual-sts
config: ar
split: test
revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3
metrics:
- type: cosine_pearson
value: 45.96875498680092
- type: cosine_spearman
value: 52.405509117149904
- type: euclidean_pearson
value: 42.097450896728226
- type: euclidean_spearman
value: 50.89022884113707
- type: main_score
value: 52.405509117149904
- type: manhattan_pearson
value: 42.22827727075534
- type: manhattan_spearman
value: 50.912841055442634
- task:
type: STS
dataset:
name: MTEB STSBenchmark (default)
type: mteb/stsbenchmark-sts
config: default
split: test
revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831
metrics:
- type: cosine_pearson
value: 83.13261516884116
- type: cosine_spearman
value: 84.3492527221498
- type: euclidean_pearson
value: 82.691603178401
- type: euclidean_spearman
value: 83.0499566200785
- type: main_score
value: 84.3492527221498
- type: manhattan_pearson
value: 82.68307441014618
- type: manhattan_spearman
value: 83.01315787964519
- task:
type: Summarization
dataset:
name: MTEB SummEval (default)
type: mteb/summeval
config: default
split: test
revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c
metrics:
- type: cosine_pearson
value: 31.149232235402845
- type: cosine_spearman
value: 30.685504130606255
- type: dot_pearson
value: 27.466307571160375
- type: dot_spearman
value: 28.93064261485915
- type: main_score
value: 30.685504130606255
- type: pearson
value: 31.149232235402845
- type: spearman
value: 30.685504130606255
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test 256
type: sts-test-256
metrics:
- type: pearson_cosine
value: 0.8264447022356382
name: Pearson Cosine
- type: spearman_cosine
value: 0.8386403752382455
name: Spearman Cosine
- type: pearson_manhattan
value: 0.8219134931449013
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.825509659109493
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.8223094468630248
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8260503151751462
name: Spearman Euclidean
- type: pearson_dot
value: 0.6375226884845725
name: Pearson Dot
- type: spearman_dot
value: 0.6287228614640888
name: Spearman Dot
- type: pearson_max
value: 0.8264447022356382
name: Pearson Max
- type: spearman_max
value: 0.8386403752382455
name: Spearman Max
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test 128
type: sts-test-128
metrics:
- type: pearson_cosine
value: 0.8209661910768973
name: Pearson Cosine
- type: spearman_cosine
value: 0.8347149482673766
name: Spearman Cosine
- type: pearson_manhattan
value: 0.8082811559854036
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.8148314269262763
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.8093138512113149
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8156468458613929
name: Spearman Euclidean
- type: pearson_dot
value: 0.5795109620454884
name: Pearson Dot
- type: spearman_dot
value: 0.5760223026552876
name: Spearman Dot
- type: pearson_max
value: 0.8209661910768973
name: Pearson Max
- type: spearman_max
value: 0.8347149482673766
name: Spearman Max
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test 64
type: sts-test-64
metrics:
- type: pearson_cosine
value: 0.808708530451336
name: Pearson Cosine
- type: spearman_cosine
value: 0.8217532539767914
name: Spearman Cosine
- type: pearson_manhattan
value: 0.7876121380998453
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.7969092304137347
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.7902997966909958
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.7987635968785215
name: Spearman Euclidean
- type: pearson_dot
value: 0.495047136234386
name: Pearson Dot
- type: spearman_dot
value: 0.49287000679901516
name: Spearman Dot
- type: pearson_max
value: 0.808708530451336
name: Pearson Max
- type: spearman_max
value: 0.8217532539767914
name: Spearman Max
SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
This is a sentence-transformers model finetuned from sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 on the Omartificial-Intelligence-Space/arabic-n_li-triplet dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. This model is part of the Arabic Matryoshka Embedding Models collection. It was presented in the paper GATE: General Arabic Text Embedding for Enhanced Semantic Textual Similarity with Matryoshka Representation Learning and Hybrid Loss Training.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
- Maximum Sequence Length: 128 tokens
- Output Dimensionality: 384 tokens
- Similarity Function: Cosine Similarity
- Training Dataset:
- Omartificial-Intelligence-Space/arabic-n_li-triplet
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- **Hugging