AdamLucek/legal-rag-positives-synthetic
Viewer • Updated • 6.47k • 26 • 4
How to use aaa961/modernbert-embed-base-legal_no_MRL_reverse_dataset_early_stopping with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("aaa961/modernbert-embed-base-legal_no_MRL_reverse_dataset_early_stopping")
sentences = [
"What section of the U.S. Code is cited in relation to Exemption 2?",
"forma inmediata de la utilización de cualquiera y todo material en el \nque se utilizara la imagen de la parte apelada. En adición, le \ncondenó solidariamente al pago de $20,000.00 por la utilización no \n \n \n \nKLAN202300916 \n \n6\nautorizada de la imagen del señor Friger Salgueiro y $4,000.00 por \nhonorarios de abogado. \nEn desacuerdo, el 20 de septiembre de 2023, la parte apelante",
"How does the invocation of the attorney-client privilege by the CIA affect summary judgment?",
"Decl. Ex. K pt. 2, at 1, 8–14, 16–18, 22, 27, No. 11-445, ECF No. 29-3. Exemption 2 applies to \nmatters that “related solely to the internal personnel rules and practices of an agency.” 5 U.S.C. \n§ 552(b)(2). The CIA states in its declaration that all thirteen documents withheld under"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base on the legal-rag-positives-synthetic dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("aaa961/modernbert-embed-base-legal_no_MRL_reverse_dataset_early_stopping")
# Run inference
sentences = [
'confidentiality agreement/order, that remain following those discussions. This is a \nfinal report and notice of exceptions shall be filed within three days of the date of \nthis report, pursuant to Court of Chancery Rule 144(d)(2), given the expedited and \nsummary nature of Section 220 proceedings. \n \n \n \n \n \n \n \nRespectfully, \n \n \n \n \n \n \n \n \n/s/ Patricia W. Griffin',
'According to which court rule must the notice of exceptions be filed?',
'decides whether to submit proposals on future procurements, and excluding mentor-protégé JVs \nfrom proposing on a solicitation due to Section 125.9(b)(3)(i) unnecessarily prevents protégés from \naccessing opportunities to grow as a business. SHS MJAR at 22–23; VCH MJAR at 22–23. \nSuch a critique, however, merely highlights Plaintiffs’ disagreement with the SBA’s',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.4922, 0.0280],
# [0.4922, 1.0000, 0.0389],
# [0.0280, 0.0389, 1.0000]])
ir and ir_evalInformationRetrievalEvaluator| Metric | ir | ir_eval |
|---|---|---|
| cosine_accuracy@1 | 0.3617 | 0.6337 |
| cosine_accuracy@3 | 0.4096 | 0.6893 |
| cosine_accuracy@5 | 0.473 | 0.7759 |
| cosine_accuracy@10 | 0.5255 | 0.8377 |
| cosine_precision@1 | 0.3617 | 0.6337 |
| cosine_precision@3 | 0.3524 | 0.6028 |
| cosine_precision@5 | 0.277 | 0.4566 |
| cosine_precision@10 | 0.1631 | 0.2583 |
| cosine_recall@1 | 0.1269 | 0.2246 |
| cosine_recall@3 | 0.3452 | 0.5938 |
| cosine_recall@5 | 0.4405 | 0.7313 |
| cosine_recall@10 | 0.5161 | 0.8237 |
| cosine_ndcg@10 | 0.4459 | 0.736 |
| cosine_mrr@10 | 0.4019 | 0.6825 |
| cosine_map@100 | 0.442 | 0.7181 |
anchor and positive| anchor | positive | |
|---|---|---|
| type | string | string |
| details |
|
|
| anchor | positive |
|---|---|
What kinds of issues are mentioned in connection with wrongdoing? |
mismanagement, waste and wrongdoing – and that it has demonstrated more than a |
Project, 504 F.2d at 248 n.15). |
What page reference is given for the Lombardo v. Handler case in the aforementioned citation? |
Where can more detailed information regarding redactions be found? |
parties specifically with respect to the FOIA request at issue in Count Eighteen of No. 11-444. This is likely |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false,
"directions": [
"query_to_doc"
],
"partition_mode": "joint",
"hardness_mode": null,
"hardness_strength": 0.0
}
per_device_train_batch_size: 32num_train_epochs: 4learning_rate: 2e-05lr_scheduler_type: cosinewarmup_steps: 0.1optim: adamw_torch_fusedgradient_accumulation_steps: 16bf16: Truetf32: Trueeval_strategy: epochper_device_eval_batch_size: 16load_best_model_at_end: Trueper_device_train_batch_size: 32num_train_epochs: 4max_steps: -1learning_rate: 2e-05lr_scheduler_type: cosinelr_scheduler_kwargs: Nonewarmup_steps: 0.1optim: adamw_torch_fusedoptim_args: Noneweight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08optim_target_modules: Nonegradient_accumulation_steps: 16average_tokens_across_devices: Truemax_grad_norm: 1.0label_smoothing_factor: 0.0bf16: Truefp16: Falsebf16_full_eval: Falsefp16_full_eval: Falsetf32: Truegradient_checkpointing: Falsegradient_checkpointing_kwargs: Nonetorch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneuse_liger_kernel: Falseliger_kernel_config: Noneuse_cache: Falseneftune_noise_alpha: Nonetorch_empty_cache_steps: Noneauto_find_batch_size: Falselog_on_each_node: Truelogging_nan_inf_filter: Trueinclude_num_input_tokens_seen: nolog_level: passivelog_level_replica: warningdisable_tqdm: Falseproject: huggingfacetrackio_space_id: trackioeval_strategy: epochper_device_eval_batch_size: 16prediction_loss_only: Trueeval_on_start: Falseeval_do_concat_batches: Trueeval_use_gather_object: Falseeval_accumulation_steps: Noneinclude_for_metrics: []batch_eval_metrics: Falsesave_only_model: Falsesave_on_each_node: Falseenable_jit_checkpoint: Falsepush_to_hub: Falsehub_private_repo: Nonehub_model_id: Nonehub_strategy: every_savehub_always_push: Falsehub_revision: Noneload_best_model_at_end: Trueignore_data_skip: Falserestore_callback_states_from_checkpoint: Falsefull_determinism: Falseseed: 42data_seed: Noneuse_cpu: Falseaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedataloader_drop_last: Falsedataloader_num_workers: 0dataloader_pin_memory: Truedataloader_persistent_workers: Falsedataloader_prefetch_factor: Noneremove_unused_columns: Truelabel_names: Nonetrain_sampling_strategy: randomlength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falseddp_backend: Noneddp_timeout: 1800fsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}deepspeed: Nonedebug: []skip_memory_metrics: Truedo_predict: Falseresume_from_checkpoint: Nonewarmup_ratio: Nonelocal_rank: -1prompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | ir_cosine_ndcg@10 | ir_eval_cosine_ndcg@10 |
|---|---|---|---|---|
| -1 | -1 | - | 0.4459 | 0.4459 |
| 0.4396 | 10 | 1.4221 | - | - |
| 0.8791 | 20 | 0.6964 | - | - |
| 1.0 | 23 | - | - | 0.6760 |
| 1.3077 | 30 | 0.4787 | - | - |
| 1.7473 | 40 | 0.4033 | - | - |
| 2.0 | 46 | - | - | 0.7196 |
| 2.1758 | 50 | 0.3770 | - | - |
| 2.6154 | 60 | 0.3159 | - | - |
| 3.0 | 69 | - | - | 0.7361 |
| 3.0440 | 70 | 0.3345 | - | - |
| 3.4835 | 80 | 0.2698 | - | - |
| 3.9231 | 90 | 0.3188 | - | - |
| 4.0 | 92 | - | - | 0.7360 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{oord2019representationlearningcontrastivepredictive,
title={Representation Learning with Contrastive Predictive Coding},
author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
year={2019},
eprint={1807.03748},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/1807.03748},
}
Base model
answerdotai/ModernBERT-base