SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2 on the nyu-mll/glue dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ejun26/minilm-mrpc-clean-retrieval")
# Run inference
sentences = [
    'In a statement later , he said it appeared his side may have fallen a bit short .',
    'Zilkha conceded in a statement issued today that his group may have fallen " a bit short . "',
    "U.S. law enforcement officials are sneering at Dar Heatherington 's version of of the events -- including a police conspiracy to discredit her -- which thrust her into the public spotlight .",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Clean Information Retrieval

  • Dataset: mrpc-validation-clean-v2
  • Evaluated with models.evaluator.CleanInformationRetrievalEvaluator
Metric Value
cosine_accuracy@1 0.0
cosine_accuracy@3 1.0
cosine_accuracy@5 1.0
cosine_accuracy@10 1.0
cosine_precision@1 0.0
cosine_precision@3 0.3357
cosine_precision@5 0.2014
cosine_precision@10 0.1007
cosine_recall@1 0.0
cosine_recall@3 1.0
cosine_recall@5 1.0
cosine_recall@10 1.0
cosine_ndcg@10 0.6309
cosine_mrr@10 0.4994
cosine_map@100 0.5
dot_accuracy@1 0.0
dot_accuracy@3 1.0
dot_accuracy@5 1.0
dot_accuracy@10 1.0
dot_precision@1 0.0
dot_precision@3 0.3357
dot_precision@5 0.2014
dot_precision@10 0.1007
dot_recall@1 0.0
dot_recall@3 1.0
dot_recall@5 1.0
dot_recall@10 1.0
dot_ndcg@10 0.6309
dot_mrr@10 0.4994
dot_map@100 0.5

Clean Information Retrieval

  • Dataset: mrpc-test-clean-v2
  • Evaluated with models.evaluator.CleanInformationRetrievalEvaluator
Metric Value
cosine_accuracy@1 0.0
cosine_accuracy@3 0.9799
cosine_accuracy@5 0.9895
cosine_accuracy@10 0.9965
cosine_precision@1 0.0
cosine_precision@3 0.3275
cosine_precision@5 0.1984
cosine_precision@10 0.0999
cosine_recall@1 0.0
cosine_recall@3 0.9799
cosine_recall@5 0.9891
cosine_recall@10 0.9961
cosine_ndcg@10 0.623
cosine_mrr@10 0.4912
cosine_map@100 0.4915
dot_accuracy@1 0.0
dot_accuracy@3 0.9799
dot_accuracy@5 0.9895
dot_accuracy@10 0.9965
dot_precision@1 0.0
dot_precision@3 0.3275
dot_precision@5 0.1984
dot_precision@10 0.0999
dot_recall@1 0.0
dot_recall@3 0.9799
dot_recall@5 0.9891
dot_recall@10 0.9961
dot_ndcg@10 0.623
dot_mrr@10 0.4912
dot_map@100 0.4915

Training Details

Training Dataset

nyu-mll/glue

  • Dataset: nyu-mll/glue at bcdcba7
  • Size: 3,668 training samples
  • Columns: text1, text2, and label
  • Approximate statistics based on the first 1000 samples:
    text1 text2 label
    type string string int
    details
    • min: 9 tokens
    • mean: 27.16 tokens
    • max: 47 tokens
    • min: 11 tokens
    • mean: 26.88 tokens
    • max: 49 tokens
    • 0: ~33.70%
    • 1: ~66.30%
  • Samples:
    text1 text2 label
    Amrozi accused his brother , whom he called " the witness " , of deliberately distorting his evidence . Referring to him as only " the witness " , Amrozi accused his brother of deliberately distorting his evidence . 1
    Yucaipa owned Dominick 's before selling the chain to Safeway in 1998 for $ 2.5 billion . Yucaipa bought Dominick 's in 1995 for $ 693 million and sold it to Safeway for $ 1.8 billion in 1998 . 0
    They had published an advertisement on the Internet on June 10 , offering the cargo for sale , he added . On June 10 , the ship 's owners had published an advertisement on the Internet , offering the explosives for sale . 1
  • Loss: OnlineContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • learning_rate: 2e-05
  • num_train_epochs: 5
  • warmup_ratio: 0.1
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss mrpc-test-clean-v2_cosine_map@100 mrpc-validation-clean-v2_cosine_map@100
0 0 - - 0.4961
0.0870 10 1.6166 - -
0.1739 20 1.668 - -
0.2609 30 1.5081 - -
0.3478 40 1.3996 - -
0.4348 50 1.2969 - 0.4985
0.5217 60 1.1771 - -
0.6087 70 0.9977 - -
0.6957 80 1.1213 - -
0.7826 90 1.139 - -
0.8696 100 1.0821 - 0.5
0.9565 110 1.1488 - -
1.0435 120 0.932 - -
1.1304 130 0.794 - -
1.2174 140 0.9996 - -
1.3043 150 0.9328 - 0.5
1.3913 160 1.1032 - -
1.4783 170 0.9692 - -
1.5652 180 0.9501 - -
1.6522 190 0.7863 - -
1.7391 200 0.8454 - 0.5
1.8261 210 0.9311 - -
1.9130 220 0.8134 - -
2.0 230 1.0013 - -
2.0870 240 0.7564 - -
2.1739 250 0.9165 - 0.5
2.2609 260 0.7668 - -
2.3478 270 0.6587 - -
2.4348 280 0.5904 - -
2.5217 290 0.7431 - -
2.6087 300 0.6133 - 0.5
2.6957 310 0.5994 - -
2.7826 320 0.6256 - -
2.8696 330 0.7294 - -
2.9565 340 0.7527 - -
3.0435 350 0.6908 - 0.5
3.1304 360 0.6455 - -
3.2174 370 0.3765 - -
3.3043 380 0.5955 - -
3.3913 390 0.6239 - -
3.4783 400 0.6666 - 0.5
3.5652 410 0.6498 - -
3.6522 420 0.6363 - -
3.7391 430 0.7046 - -
3.8261 440 0.4384 - -
3.9130 450 0.6721 - 0.5
4.0 460 0.5341 - -
4.0870 470 0.4459 - -
4.1739 480 0.4153 - -
4.2609 490 0.5116 - -
4.3478 500 0.4221 - 0.5
4.4348 510 0.4696 - -
4.5217 520 0.4552 - -
4.6087 530 0.5403 - -
4.6957 540 0.367 - -
4.7826 550 0.3275 - 0.5
4.8696 560 0.4016 - -
4.9565 570 0.4889 - -
5.0 575 - 0.4915 -
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.0
  • PyTorch: 2.2.2+cu121
  • Accelerate: 1.12.0
  • Datasets: 3.3.2
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
2
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ejun26/minilm-mrpc-clean-retrieval

Dataset used to train ejun26/minilm-mrpc-clean-retrieval

Paper for ejun26/minilm-mrpc-clean-retrieval

Evaluation results

  • Cosine Accuracy@1 on mrpc validation clean v2
    self-reported
    0.000
  • Cosine Accuracy@3 on mrpc validation clean v2
    self-reported
    1.000
  • Cosine Accuracy@5 on mrpc validation clean v2
    self-reported
    1.000
  • Cosine Accuracy@10 on mrpc validation clean v2
    self-reported
    1.000
  • Cosine Precision@1 on mrpc validation clean v2
    self-reported
    0.000
  • Cosine Precision@3 on mrpc validation clean v2
    self-reported
    0.336
  • Cosine Precision@5 on mrpc validation clean v2
    self-reported
    0.201
  • Cosine Precision@10 on mrpc validation clean v2
    self-reported
    0.101