Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 13
How to use huzaifanasirrr/pubmedbert-medical-embeddings with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("huzaifanasirrr/pubmedbert-medical-embeddings")
sentences = [
"AbstractMicrosatellite instability (MSI) is an important biomarker in cancer. While routine methods can detect MSI in certain tumor types, in other tumor types the results may be incorrect due to differences in the MSI loci pattern. Here, we report the case of a patient with pancreatic adenocarcinoma, with confirmed MSI by two independent next-generation sequencing tests, but not by routine methods, who had progression on pembrolizumab. Comparison of the patient’s MSI loci patterns with MSI+ colorectal adenocarcinoma samples showed a lower fraction of unstable loci, low resolution of a second peak in the repeat length spectrum of unstable short tandem repeats in the patient's sample, and a lower length of indels (3.7 vs 4.5 base pairs, p < 0.01).",
"Hepatocellular carcinoma (HCC) is a leading cause of cancer-related mortality in India. This review explores the epidemiological trends and the landscape of systemic therapy for HCC in the Indian context, acknowledging the recent shift in etiology from viral hepatitis to lifestyle-associated factors.A comprehensive review of the literature was conducted, including data from the Global Cancer Observatory and the Indian Council of Medical Research, along with a critical analysis of various clinical trials. The article investigates systemic therapies in-depth, discussing their mechanisms, efficacy, and adaptation to Indian healthcare framework.Progression-free survival with a hazard ratio of ≤0.6 compared to sorafenib, overall survival of ∼16–19 months, and objective response rate of 20–30% are the defining thresholds for systemic therapy clinical trials. Systemic therapy for advanced HCC in India primarily involves the use of tyrosine kinase inhibitors such as sorafenib, lenvatinib, regorafenib, and cabozantinib, with sorafenib being the most commonly used drug for a long time. Monoclonal antibodies such as ramucirumab and bevacizumab and immune-checkpoint inhibitors, such as atezolizumab, nivolumab, and pembrolizumab, are expanding treatment horizons. Lenvatinib has emerged as a cost-effective alternative, and the combination of atezolizumab and bevacizumab has demonstrated superior outcomes in terms of overall survival and progression-free survival. Despite these advances, late-stage diagnosis and limited healthcare accessibility pose significant challenges, often relegating patients to palliative care.Addressing HCC in India demands an integrative approach that not only encompasses advancements in systemic therapy but also targets early detection and comprehensive care models. Future strategies should focus on enhancing awareness, screening for high-risk populations, and overcoming infrastructural disparities. Ensuring the judicious use of systemic therapies within the constraints of the Indian healthcare economy is crucial. Ultimately, a nuanced understanding of systemic therapeutic options and their optimal utilization will be pivotal in elevating the standard of HCC care in India.",
"We present an unusual case of uveitis secondary to avelumab and pembrolizumab in a 39-year-old Taiwanese male with stage IV clear cell renal cell carcinoma (ccRCC) and lung metastasis, who initially received pembrolizumab as his primary treatment. However, the patient experienced skin and liver immune-related adverse events (irAEs) after the seventh dose of pembrolizumab, which prompted a switch to avelumab. The patient began to experience gradual blurring of vision after completing the fifth cycle of avelumab immunotherapy. Ophthalmic examinations revealed findings consistent with bilateral anterior uveitis. Despite an initial lack of significant improvement with steroid treatment, the patient’s vision and inflammation improved upon discontinuation of avelumab. Due to the occurrence of uveitis, avelumab was switched back to pembrolizumab. However, three months after initiating pembrolizumab, the patient developed foggy vision and bilateral anterior uveitis with cystoid macular edema (CME). The administration of topical, oral, and subconjunctival steroids resulted in an improvement in vision and the resolution of CME, without the need to discontinue pembrolizumab. Over the subsequent eighteen months, there has been no recurrence of uveitis, and there is no evidence of relapse or further metastasis in his ccRCC.",
"Immune checkpoint inhibitors (ICIs) are monoclonal antibodies that block inhibitors of T cell activation and function. With the widespread use of ICIs in cancer therapy, immune-related adverse events (irAEs) have gradually emerged as urgent clinical issues. Tumors not only exhibit high heterogeneity, and their response to ICIs varies, with “hot” tumors showing better anti-tumor effects but also a higher susceptibility to irAEs. The manifestation of irAEs displays a tumor-heterogeneous pattern, correlating with the tumor type in terms of the affected organs, incidence, median onset time, and severity. Understanding the mechanisms underlying the pathogenic patterns of irAEs can provide novel insights into the prevention and management of irAEs, guide the development of biomarkers, and contribute to a deeper understanding of the toxicological characteristics of ICIs. In this review, we explore the impact of tumor type on the therapeutic efficacy of ICIs and further elucidate how these tumor types influence the occurrence of irAEs. Finally, we assess key candidate biomarkers and their relevance to proposed irAE mechanisms. This paper also outlines management strategies for patients with various types of tumors, based on their disease patterns."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'AbstractBackgroundCoronavirus disease 2019 (COVID-19) is a strong risk factor for venous thromboembolism (VTE). Few studies have evaluated the effectiveness of COVID-19 vaccination in preventing hospitalization for COVID-19 with VTE.MethodsAdults hospitalized at 21 sites between March 2021 and October 2022 with symptoms of acute respiratory illness were assessed for COVID-19, completion of the original monovalent messenger RNA (mRNA) COVID-19 vaccination series, and VTE. Prevalence of VTE was compared between unvaccinated and vaccinated patients with COVID-19. The vaccine effectiveness (VE) in preventing COVID-19 hospitalization with VTE was calculated using a test-negative design. The VE was also stratified by predominant circulating severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variant.ResultsAmong 18 811 patients (median age [interquartile range], 63 [50–73] years; 49% women; 59% non-Hispanic white, 20% non-Hispanic black, and 14% Hispanic; and median of 2 comorbid conditions [interquartile range, 1–3]), 9792 were admitted with COVID-19 (44% vaccinated), and 9019 were test-negative controls (73% vaccinated). Among patients with COVID-19, 601 had VTE diagnosed by hospital day 28, of whom 170 were vaccinated. VTE was more common among unvaccinated than vaccinated patients with COVID-19 (7.8% vs 4.0%; P = .001). The VE against COVID-19 hospitalization with VTE was 84% overall (95% confidence interval, 80%–87%), and VE stratified by predominant circulating variant was 88% (73%–95%) for Alpha, 93% (90%–95%) for Delta, and 68% (58%–76%) for Omicron variants.ConclusionsVaccination with the original monovalent mRNA series was associated with a decrease in COVID-19 hospitalization with VTE, though data detailing prior history of VTE and use of anticoagulation were not available. These findings will inform risk-benefit considerations for those considering vaccination.',
'PurposeProgrammed death-ligand 1 (PD-L1) expression may influence the prognosis of patients with localized esophageal cancer. The current study compared the prognostic value of PD-L1 expression between tumor cells and immune cells.MethodsArchival esophageal tumor tissue samples were collected from patients who received paclitaxel and cisplatin-based neoadjuvant chemoradiotherapy (CRT) for locally advanced esophageal squamous cell carcinoma (ESCC) in three prospective phase II trials. PD-L1 expression on tumor and immune cells was examined immunohistochemically by using the SP142 antibody and scored by two independent pathologists. The association of PD-L1 expression with patient’s outcomes was analyzed using a log-rank test and Cox regression multivariate analysis.ResultsA total of 100 patients were included. PD-L1 expression on tumor cells was positive (≥ 1%, TC-positive) in 55 patients; PD-L1 expression on immune cells was high (≥ 5%, IC-high) in 30 patients. TC-positive status was associated with poor overall survival (OS) (HR: 1.63, P = 0.035), whereas IC-high status was associated with improved OS (HR: 0.44, P = 0.0024). Multivariate analysis revealed that TC-positive, IC-high, and performance status were independent prognostic factors for progression-free survival and that IC-high and performance status were independent factors for OS. Furthermore, the combination of IC-high and TC-negative status was associated with the optimal OS, whereas that of TC-positive and IC-low status was associated with the worst OS.ConclusionPD-L1 expression on tumor and immune cells may have different prognostic value for patients with locally advanced ESCC receiving neoadjuvant CRT. A combination of these two indexes may further improve the prognostic prediction.Supplementary InformationThe online version contains supplementary material available at 10.1007/s00432-021-03772-7.',
'PurposeLymphocyte-monocyte ratio (LMR) has previously been used as a prognostic predictor in various solid tumors. This research aims in comparing the prognostic predictive Please check and conability of several inflammatory parameters and clinical parameters to validate further the excellent prognostic value of LMR in patients with gastric cancer treated with apatinib.MethodsMonitor inflammatory, nutritional parameters and tumor markers. Cutoff values of the parameters concerned were identified with the X-tile program. Subgroup analysis was made via Kaplan–Meier curves, and univariate and multivariate Cox regression analyses were used to find independent prognostic factors. The nomogram of logistic regression models was constructed according to the results.ResultsA total of 192 patients (115 divided into training group and 77 into validation group) who received the second- or later-line regimen of apatinib were retrospectively analyzed. The optimal cutoff value for LMR was 1.33. Patients with high LMR (LMR-H) were significantly longer than those with low LMR (LMR-L) in progression-free survival (median 121.0 days vs. median 44.5 days, P < 0.001). The predictive value of LMR was generally uniform across subgroups. Meanwhile, LMR and CA19-9 were the only hematological parameters with significant prognostic value in multivariate analysis. The area under the LMR curve (0.60) was greatest for all inflammatory indices. Adding LMR to the base model significantly enhanced the predictive power of the 6-month probability of disease progression (PD). The LMR-based nomogram showed good predictive power and discrimination in external validation.ConclusionLMR is a simple but effective predictor of prognosis for patients treated with apatinib.Supplementary InformationThe online version contains supplementary material available at 10.1007/s00432-023-04976-9.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.1375, 0.1558],
# [0.1375, 1.0000, 0.8143],
# [0.1558, 0.8143, 1.0000]])
medical-embedding-evalEmbeddingSimilarityEvaluator| Metric | Value |
|---|---|
| pearson_cosine | 0.695 |
| spearman_cosine | 0.7048 |
sentence_0, sentence_1, and label| sentence_0 | sentence_1 | label | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| sentence_0 | sentence_1 | label |
|---|---|---|
ABSTRACTKirsten rat sarcoma viral oncogene homolog (KRAS) mutation is prognostic of poor survival for patients with non-small cell lung cancer (NSCLC). KRAS G12C mutations occur in 13% of NSCLC cases and despite the frequency of this mutation, advances in drug development against KRAS have historically been impeded due to the extremely high affinity of KRAS for guanosine triphosphate (GTP) and the lack of a binding pocket on the surface of KRAS that is suitable for drug binding. Sotorasib, a first-in-class, highly selective KRAS G12C inhibitor overcomes this issue by irreversibly binding in the switch-II pocket. Sotorasib was granted accelerated FDA approval for the treatment of KRASG12C-mutated locally advanced/metastatic NSCLC who have received at least one prior systemic therapy. This review summarizes the pharmacology, clinical efficacy, adverse effects, and clinical considerations of sotorasib. |
Lung cancer is among the most common instances of cancer subtypes and is associated with high mortality rates. Due to the availability of fewer therapies and delayed clinical investigations, the number of cancer incidences is rising dramatically. This is possibly an effect of immune modulations and chemotherapeutic drugs that raises cancer resistance. Among the list, IL-6 and IL-17 are host-derived paradoxical effectors that attune immune responses in malignant lung cells. Their excessive release in the cytokine milieu stabilizes immunosuppressive phenotypes, resulting in cellular perturbations. During tumor development, the significance of these molecules is reflected in their potential to regulate oncogenesis by initiating a myriad of signaling events that influence tumor growth and the metastatic ability of benign cancer cells. Moreover, their transactivation contributes to antiapoptotic mechanisms and favors cancer cell survival via constitutive expression of immunoregulatory molec... |
1.0 |
BackgroundTrophoblast cell-surface antigen 2 (TROP2) is expressed on the surface of trophoblast cells and many malignant tumor cells. However, data on TROP2 expression in advanced lung cancer are insufficient, and its changes have not been fully evaluated.MethodsWe assessed the prevalence and changes in TROP2 expression in patients with lung cancer who received anti-cancer treatments using immunohistochemical (IHC) analysis with an anti-TROP2 antibody (clone: SP295). IHC scores were graded from 0 to 3; grade ≥ 2 was considered positive for TROP2 expression. We defined a difference in IHC score, before and after anti-cancer treatments, as the change in TROP2 expression.ResultsBefore anti-cancer treatment, TROP2 expression was observed in 89% (143/160) of the patients and was significantly more common in adenocarcinoma and squamous cell carcinoma than in neuroendocrine carcinoma (P < 0.001). After anti-cancer treatment, TROP2 expression was observed in 87% (139/160) of the patients. The ... |
PurposeTo investigate the prognostic value of the neutrophil to lymphocyte ratio (NLR) and platelet to lymphocyte ratio (PLR) in patients with hepatocellular carcinoma (HCC) treated with stereotactic body radiotherapy (SBRT).MethodsThe medical records of HCC patients treated with SBRT between 2008 and 2019 were reviewed retrospectively. The NLR and PLR were calculated from the serum complete blood count before and after SBRT, and the prognostic values of the NLR and PLR for the treatment outcomes were evaluated.ResultsThirty-nine patients with 49 HCC lesions were included. After a median follow-up of 26.8 months (range, 8.4-80.0 months), three-year local control, overall survival (OS), and progression-free survival (PFS) rate were 97.4%, 78.3%, and 35.2%, respectively. Both NLR and PLR increased significantly after SBRT and decreased slowly to the pre-SBRT value at 6 months. Univariable analysis showed that gross tumor volume (GTV) >14 cc, post-SBRT PLR >90, and PLR change >30 were ass... |
0.0 |
PurposeDissociated response (DR, reduction at baseline or increase < 20% in target lesions compared with nadir in the presence of new lesions) was observed in 20–34% of patients treated with immune checkpoint inhibitors (ICIs). DRs were defined as progression disease (PD) per response evaluation criteria in solid tumors (RECIST v1.1), while evaluation criteria related to immunotherapy incorporated the new lesions into the total tumor burden or conducted further evaluation after 4–8 weeks rather than declaring PD immediately. The main objective of this study is to compare survival between people who continuing initial ICIs treatment and those who switched to other anticancer therapy at the time of DR.Patients and methods235 patients with advanced lung cancer (LC) treated with ICIs were evaluated. Propensity score matching (PSM) was used to minimize potential confounding factors. Post-DR OS, target lesion changes were evaluated.Results52 patients had been estimated as DRs. After PSM, the... |
PurposeAging is closely related to the occurrence of many diseases, including cancer, and involves changes in the immune microenvironment. γδT cells are important components of resident lymphocytes in mucosal tissues. However, little is known about the effects that the aged lung has on γδT cells and their prognostic significance in non-small cell lung cancer.MethodsIn the current study, the expression of γδTCR and IL-17A was measured by immunohistochemistry in paraffin-embedded lung tissues from 168 patients with adenocarcinoma (LUAD) and 144 patients with squamous cell carcinoma (LUSC). Furthermore, gene transcription patterns in LUAD and LUSC tumors and normal controls were extracted from TCGA and GTEx databases and were analyzed.ResultsHigh frequency of γδT cells was observed in patients with LUAD and LUSC, whereas the levels of CD4 + T cells, CD8 + T cells and CD56 + cells were decreased. Elevated γδT cells in tumors were mainly IL-17A-releasing γδT17 cells, which were found to be ... |
1.0 |
CosineSimilarityLoss with these parameters:{
"loss_fct": "torch.nn.modules.loss.MSELoss"
}
eval_strategy: stepsnum_train_epochs: 2multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 8per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 2max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | medical-embedding-eval_spearman_cosine |
|---|---|---|
| 0.4 | 50 | 0.5811 |
| 0.8 | 100 | 0.6263 |
| 1.0 | 125 | 0.6923 |
| 1.2 | 150 | 0.6450 |
| 1.6 | 200 | 0.6659 |
| 2.0 | 250 | 0.7048 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}