Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 13
How to use KingTechnician/osmosis-joint-setfit with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("KingTechnician/osmosis-joint-setfit")
sentences = [
"Objective:\nHas anyone got a job through a staffing service? If so did you like it?\nResponse: Has anyone got a job through a staffing service? I have hun and they take all your money. Careers USA especially. I worked 40 hours last week and really only got paid for 30",
"Objective:\nwhat is the name of spray used for immidiate releif of pain from sports injury? Is above mentioned medicine freely available in the market\nResponse: what is the name of spray used for immidiate releif of pain from sports injury? It's probably ethyl chloride. See link below for more info.",
"Objective:\nWhat is your favorite Evil Dead movie quote? Of the Sam Raimi 'Holy Trinity' in which Bruce Campbell stared (so Spiderman is not included, of course) which of these three do you have a fav. quote from? Also, can you describe why it's your favorite?\\n\\nEvil Dead\\nEvil Dead II\\nor Army of Darkness\\n\\nI can't decide what quote I love the most.\nResponse: What is your favorite Evil Dead movie quote? \"Shut the door... where you raised on a barn? <quiet> you probably were, with all of the other primates</quiet>\" - Army of Darkness -> It's really funny",
"Objective:\nPolice brutality statistics for the last 5 years? I need any info io can get on police brutality preferably statistics or some good sites that would help me find this information\nResponse: Police brutality statistics for the last 5 years? I'm glad you asked! I am a librarian and I recently referred a patron to this report that will answer all of your questions:\\n\\nBureau of Justice Statistics: Police Use of Force\\nhttp://www.ojp.usdoj.gov/bjs/pub/pdf/ndcopuof.pdf\\n\\nIf you need more information, try looking on the Bureau of Justice Web site:\\n\\nhttp://www.ojp.usdoj.gov/bjs/\\n\\nAfter you enter the site, try searching by keyword \"brutality\"."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from sentence-transformers/paraphrase-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("KingTechnician/osmosis-joint-setfit")
# Run inference
sentences = [
'Objective:\nWhere can i get help for Basic of creating Macros in Visual Basics? Where can i get help to create macros for my daily activities which are to be done everyday and time consuming.\\nSo wanted to know the basic of creating macros in Visual Basic for my activities in Excel\nResponse: Where can i get help for Basic of creating Macros in Visual Basics? I am an Excel developer for a major financial company in NYC and I use MSDN daily to research the Excel Object Model.',
"Objective:\nI started working out about 1 month ago, specifically working on my abs.How can I get rid of the belly fat? I'm working out 5-6 days a week, and have started eating healthier. I'm feeling some results but I'm not really seeing them yet. When can I expect to see some changes in my stomache?\nResponse: I started working out about 1 month ago, specifically working on my abs.How can I get rid of the belly fat? You have to do cardio to burn belly fat. Just lifting weights and doing sit ups wont cut it. Cardio, cardio, cardio! Everytime you do cardio be sure to throw some stomach crunches in there too. you should see results in about a month.",
'Objective:\nAssess student perceptions of question-asking encouragement and answer quality.\nResponse: that reminds me of group projects... we had to work together a lot and it was kinda hard to get everyone on the same page, i mean we had to use this one tool to communicate and it was really annoying to use, the interface was all weird and stuff',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
To load later: encoder = SentenceTransformer('KingTechnician/osmosis-joint-setfit') head = joblib.load('head.joblib') # download from repo first model = SetFitClassifier(encoder, head)
sentence_0, sentence_1, and label| sentence_0 | sentence_1 | label | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| sentence_0 | sentence_1 | label |
|---|---|---|
Objective: |
Objective: |
0.0 |
Objective: |
Objective: |
0.0 |
Objective: |
Objective: |
1.0 |
CosineSimilarityLoss with these parameters:{
"loss_fct": "torch.nn.modules.loss.MSELoss"
}
per_device_train_batch_size: 16per_device_eval_batch_size: 16multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseeval_use_gather_object: Falsebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin| Epoch | Step | Training Loss |
|---|---|---|
| 0.2 | 500 | 0.3074 |
| 0.4 | 1000 | 0.2529 |
| 0.6 | 1500 | 0.2518 |
| 0.8 | 2000 | 0.2524 |
| 1.0 | 2500 | 0.2514 |
| 1.2 | 3000 | 0.2472 |
| 1.4 | 3500 | 0.2451 |
| 1.6 | 4000 | 0.2439 |
| 1.8 | 4500 | 0.24 |
| 2.0 | 5000 | 0.2369 |
| 2.2 | 5500 | 0.2249 |
| 2.4 | 6000 | 0.2235 |
| 2.6 | 6500 | 0.2178 |
| 2.8 | 7000 | 0.217 |
| 3.0 | 7500 | 0.2111 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}