--- language: - en license: apache-2.0 tags: - sentence-transformers - sentence-similarity - feature-extraction - dense - generated_from_trainer - dataset_size:11644 - loss:MultipleNegativesRankingLoss base_model: nomic-ai/modernbert-embed-base widget: - source_sentence: What section of the U.S. Code is cited in relation to Exemption 2? sentences: - "forma inmediata de la utilización de cualquiera y todo material en el \nque se\ \ utilizara la imagen de la parte apelada. En adición, le \ncondenó solidariamente\ \ al pago de $20,000.00 por la utilización no \n \n \n \nKLAN202300916 \n \n6\n\ autorizada de la imagen del señor Friger Salgueiro y $4,000.00 por \nhonorarios\ \ de abogado. \nEn desacuerdo, el 20 de septiembre de 2023, la parte apelante" - How does the invocation of the attorney-client privilege by the CIA affect summary judgment? - "Decl. Ex. K pt. 2, at 1, 8–14, 16–18, 22, 27, No. 11-445, ECF No. 29-3. Exemption\ \ 2 applies to \nmatters that “related solely to the internal personnel rules\ \ and practices of an agency.” 5 U.S.C. \n§ 552(b)(2). The CIA states in its\ \ declaration that all thirteen documents withheld under" - source_sentence: "advisory committee.” Defs.’ Mem. at 23. So the Government’s\ \ invocation of “judicial estoppel” \njust boils down to its argument that the\ \ Commission is not an advisory committee. \n35 \nand constitutes a failure to\ \ perform duties owed to EPIC within the meaning of 28 U.S.C. \n§ 1361.” Id.\ \ ¶ 115. Count IV likewise asserts that the Commission’s “failure to make [its]" sentences: - What must this Court determine regarding GSA's interpretation of 41 U.S.C. § 3306(c)(3)? - "Mentor-Protégé JV to Submit an Individually Performed Relevant \nExperience Project,\ \ or (2) Allowing Prime Contractors to Rely on \nProjects Performed by First-Tier\ \ Subcontractors. \nPlaintiffs claim the Polaris Solicitations’ requirements\ \ for evaluating Relevant Experience \nProjects creates a disparity between offerors\ \ that hinders protégé firms and violates 13 C.F.R." - What is the Government's argument about the Commission? - source_sentence: Who provides checklists and worksheets to FOIA and Privacy Act analysts? sentences: - "checklists, worksheets, and similar documents provided to [CIA] FOIA and Privacy\ \ Act analysts \n(both agency employees and contractors).” See Third Lutz Decl.\ \ Ex. G at 1, No. 11-445, ECF \nNo. 52-1. The plaintiff’s requests to the State\ \ Department and the NSA were identical, except \nthat they sought training materials\ \ provided to State Department and NSA FOIA and Privacy Act" - According to Black's Law Dictionary, what is a 'function' associated with? - What is the section number for the exception created by Congress? - source_sentence: "submit one Relevant Experience Project for consideration, and\ \ (2) allow prime contractors to rely \n \n25 In their MJAR briefs, Plaintiffs\ \ originally argued that the Polaris Solicitations violated Section \n125.2(g)\ \ by preventing mentor-protégé JVs from using subcontractor projects to fulfill\ \ all Relevant \nExperience Project requirements. See SHS MJAR at 32–34; VCH\ \ MJAR at 32–34; Pl. Reply at" sentences: - "authority to make operational decisions for the JV under SBA regulation. See\ \ SHS MJAR at 22–\n22 \n \n23; VCH MJAR at 22–23. Thus, according to Plaintiff,\ \ the decision to preclude mentor-protégé \nJVs that share a mentor from bidding\ \ on the same Solicitation harms protégés, unduly restricts \ncompetition, and\ \ violates federal procurement law. See SHS MJAR at 20–23; VCH MJAR at 20–\n\ 23." - "demandante expresó todas las funciones que \nrealizaba para Mech-Tech, incluyendo\ \ ser la \n“Imagen del Colegio”. El demandante envió esta \ncomunicación como\ \ parte de su solicitud para que \nle aumentaran su compensación. \n \n18. En\ \ el verano del 2014 incrementó la compensación \ndel demandante para que continuara\ \ realizando \nsus funciones, incluyendo ser anfitrión (“host”) en" - What did the Plaintiffs originally argue about the Polaris Solicitations regarding Section 125.2(g)? - source_sentence: "confidentiality agreement/order, that remain following those discussions.\ \ This is a \nfinal report and notice of exceptions shall be filed within three\ \ days of the date of \nthis report, pursuant to Court of Chancery Rule 144(d)(2),\ \ given the expedited and \nsummary nature of Section 220 proceedings. \n \n\ \ \n \n \n \n \n \nRespectfully, \n \n \n \n \n \n \n \n \n/s/ Patricia W. Griffin" sentences: - What was the plaintiff's motion requesting from the CIA? - According to which court rule must the notice of exceptions be filed? - "decides whether to submit proposals on future procurements, and excluding mentor-protégé\ \ JVs \nfrom proposing on a solicitation due to Section 125.9(b)(3)(i) unnecessarily\ \ prevents protégés from \naccessing opportunities to grow as a business. SHS\ \ MJAR at 22–23; VCH MJAR at 22–23. \nSuch a critique, however, merely highlights\ \ Plaintiffs’ disagreement with the SBA’s" datasets: - AdamLucek/legal-rag-positives-synthetic pipeline_tag: sentence-similarity library_name: sentence-transformers metrics: - cosine_accuracy@1 - cosine_accuracy@3 - cosine_accuracy@5 - cosine_accuracy@10 - cosine_precision@1 - cosine_precision@3 - cosine_precision@5 - cosine_precision@10 - cosine_recall@1 - cosine_recall@3 - cosine_recall@5 - cosine_recall@10 - cosine_ndcg@10 - cosine_mrr@10 - cosine_map@100 model-index: - name: ModernBERT Embed Base Legal Fine-tuned results: - task: type: information-retrieval name: Information Retrieval dataset: name: ir type: ir metrics: - type: cosine_accuracy@1 value: 0.3616692426584235 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.4095826893353941 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.47295208655332305 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.5255023183925811 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.3616692426584235 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.35239567233384855 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.2769706336939722 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.16306027820710975 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.1268675940236991 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.3451828954147346 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.44049459041731065 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.5160999484801648 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.44587301287538633 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.40186943401781094 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.4419780310635075 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: ir eval type: ir_eval metrics: - type: cosine_accuracy@1 value: 0.633693972179289 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.6893353941267388 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.7758887171561051 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.8377125193199382 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.633693972179289 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.6027820710973725 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.4565687789799072 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.2582689335394127 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.22462648119526019 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.5937660999484802 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.7313240597630087 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.8236733642452345 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.7359943671334247 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.6824826427222096 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.718112335022202 name: Cosine Map@100 --- # ModernBERT Embed Base Legal Fine-tuned This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base) on the [legal-rag-positives-synthetic](https://huggingface.co/datasets/AdamLucek/legal-rag-positives-synthetic) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base) - **Maximum Sequence Length:** 8192 tokens - **Output Dimensionality:** 768 dimensions - **Similarity Function:** Cosine Similarity - **Training Dataset:** - [legal-rag-positives-synthetic](https://huggingface.co/datasets/AdamLucek/legal-rag-positives-synthetic) - **Language:** en - **License:** apache-2.0 ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'}) (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("aaa961/modernbert-embed-base-legal_no_MRL_reverse_dataset_early_stopping") # Run inference sentences = [ 'confidentiality agreement/order, that remain following those discussions. This is a \nfinal report and notice of exceptions shall be filed within three days of the date of \nthis report, pursuant to Court of Chancery Rule 144(d)(2), given the expedited and \nsummary nature of Section 220 proceedings. \n \n \n \n \n \n \n \nRespectfully, \n \n \n \n \n \n \n \n \n/s/ Patricia W. Griffin', 'According to which court rule must the notice of exceptions be filed?', 'decides whether to submit proposals on future procurements, and excluding mentor-protégé JVs \nfrom proposing on a solicitation due to Section 125.9(b)(3)(i) unnecessarily prevents protégés from \naccessing opportunities to grow as a business. SHS MJAR at 22–23; VCH MJAR at 22–23. \nSuch a critique, however, merely highlights Plaintiffs’ disagreement with the SBA’s', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities) # tensor([[1.0000, 0.4922, 0.0280], # [0.4922, 1.0000, 0.0389], # [0.0280, 0.0389, 1.0000]]) ``` ## Evaluation ### Metrics #### Information Retrieval * Datasets: `ir` and `ir_eval` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) | Metric | ir | ir_eval | |:--------------------|:-----------|:----------| | cosine_accuracy@1 | 0.3617 | 0.6337 | | cosine_accuracy@3 | 0.4096 | 0.6893 | | cosine_accuracy@5 | 0.473 | 0.7759 | | cosine_accuracy@10 | 0.5255 | 0.8377 | | cosine_precision@1 | 0.3617 | 0.6337 | | cosine_precision@3 | 0.3524 | 0.6028 | | cosine_precision@5 | 0.277 | 0.4566 | | cosine_precision@10 | 0.1631 | 0.2583 | | cosine_recall@1 | 0.1269 | 0.2246 | | cosine_recall@3 | 0.3452 | 0.5938 | | cosine_recall@5 | 0.4405 | 0.7313 | | cosine_recall@10 | 0.5161 | 0.8237 | | **cosine_ndcg@10** | **0.4459** | **0.736** | | cosine_mrr@10 | 0.4019 | 0.6825 | | cosine_map@100 | 0.442 | 0.7181 | ## Training Details ### Training Dataset #### legal-rag-positives-synthetic * Dataset: [legal-rag-positives-synthetic](https://huggingface.co/datasets/AdamLucek/legal-rag-positives-synthetic) at [f11534a](https://huggingface.co/datasets/AdamLucek/legal-rag-positives-synthetic/tree/f11534aeed060a3245f55f8f9d944cf8132c780d) * Size: 11,644 training samples * Columns: anchor and positive * Approximate statistics based on the first 1000 samples: | | anchor | positive | |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------| | type | string | string | | details | | | * Samples: | anchor | positive | |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | What kinds of issues are mentioned in connection with wrongdoing? | mismanagement, waste and wrongdoing – and that it has demonstrated more than a
credible basis from which the Court can infer possible mismanagement. It claims
DR’s management failed to follow corporate governance mechanics and made
critical business decisions without consulting with the Board or stockholders;
failed to act with due diligence related to undertaking an ICO and discontinuing
| | Project, 504 F.2d at 248 n.15).
More, the requirement of “substantial” authority suggests that the entity should be at the
“center of gravity in the exercise of administrative power.” Id. at 882 (quoting Lombardo v.
Handler, 397 F. Supp. 792, 796 (D.D.C. 1975), aff’d, 546 F.2d 1043 (D.C. Cir. 1976)). On this
| What page reference is given for the Lombardo v. Handler case in the aforementioned citation? | | Where can more detailed information regarding redactions be found? | parties specifically with respect to the FOIA request at issue in Count Eighteen of No. 11-444. This is likely
because the CIA has previously instituted a categorical policy of indicating the basis for redactions at a document
level, rather than a redaction level, as discussed above. See supra Part III.C.2. In light of the Court’s holding that
| * Loss: [MultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters: ```json { "scale": 20.0, "similarity_fct": "cos_sim", "gather_across_devices": false, "directions": [ "query_to_doc" ], "partition_mode": "joint", "hardness_mode": null, "hardness_strength": 0.0 } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `per_device_train_batch_size`: 32 - `num_train_epochs`: 4 - `learning_rate`: 2e-05 - `lr_scheduler_type`: cosine - `warmup_steps`: 0.1 - `optim`: adamw_torch_fused - `gradient_accumulation_steps`: 16 - `bf16`: True - `tf32`: True - `eval_strategy`: epoch - `per_device_eval_batch_size`: 16 - `load_best_model_at_end`: True #### All Hyperparameters
Click to expand - `per_device_train_batch_size`: 32 - `num_train_epochs`: 4 - `max_steps`: -1 - `learning_rate`: 2e-05 - `lr_scheduler_type`: cosine - `lr_scheduler_kwargs`: None - `warmup_steps`: 0.1 - `optim`: adamw_torch_fused - `optim_args`: None - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `optim_target_modules`: None - `gradient_accumulation_steps`: 16 - `average_tokens_across_devices`: True - `max_grad_norm`: 1.0 - `label_smoothing_factor`: 0.0 - `bf16`: True - `fp16`: False - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: True - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `use_liger_kernel`: False - `liger_kernel_config`: None - `use_cache`: False - `neftune_noise_alpha`: None - `torch_empty_cache_steps`: None - `auto_find_batch_size`: False - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `include_num_input_tokens_seen`: no - `log_level`: passive - `log_level_replica`: warning - `disable_tqdm`: False - `project`: huggingface - `trackio_space_id`: trackio - `eval_strategy`: epoch - `per_device_eval_batch_size`: 16 - `prediction_loss_only`: True - `eval_on_start`: False - `eval_do_concat_batches`: True - `eval_use_gather_object`: False - `eval_accumulation_steps`: None - `include_for_metrics`: [] - `batch_eval_metrics`: False - `save_only_model`: False - `save_on_each_node`: False - `enable_jit_checkpoint`: False - `push_to_hub`: False - `hub_private_repo`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_always_push`: False - `hub_revision`: None - `load_best_model_at_end`: True - `ignore_data_skip`: False - `restore_callback_states_from_checkpoint`: False - `full_determinism`: False - `seed`: 42 - `data_seed`: None - `use_cpu`: False - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `parallelism_config`: None - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `dataloader_prefetch_factor`: None - `remove_unused_columns`: True - `label_names`: None - `train_sampling_strategy`: random - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `ddp_backend`: None - `ddp_timeout`: 1800 - `fsdp`: [] - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `deepspeed`: None - `debug`: [] - `skip_memory_metrics`: True - `do_predict`: False - `resume_from_checkpoint`: None - `warmup_ratio`: None - `local_rank`: -1 - `prompts`: None - `batch_sampler`: batch_sampler - `multi_dataset_batch_sampler`: proportional - `router_mapping`: {} - `learning_rate_mapping`: {}
### Training Logs | Epoch | Step | Training Loss | ir_cosine_ndcg@10 | ir_eval_cosine_ndcg@10 | |:-------:|:------:|:-------------:|:-----------------:|:----------------------:| | -1 | -1 | - | 0.4459 | 0.4459 | | 0.4396 | 10 | 1.4221 | - | - | | 0.8791 | 20 | 0.6964 | - | - | | 1.0 | 23 | - | - | 0.6760 | | 1.3077 | 30 | 0.4787 | - | - | | 1.7473 | 40 | 0.4033 | - | - | | 2.0 | 46 | - | - | 0.7196 | | 2.1758 | 50 | 0.3770 | - | - | | 2.6154 | 60 | 0.3159 | - | - | | **3.0** | **69** | **-** | **-** | **0.7361** | | 3.0440 | 70 | 0.3345 | - | - | | 3.4835 | 80 | 0.2698 | - | - | | 3.9231 | 90 | 0.3188 | - | - | | 4.0 | 92 | - | - | 0.7360 | * The bold row denotes the saved checkpoint. ### Framework Versions - Python: 3.12.11 - Sentence Transformers: 5.3.0 - Transformers: 5.3.0 - PyTorch: 2.5.1+cu121 - Accelerate: 1.13.0 - Datasets: 4.8.2 - Tokenizers: 0.22.2 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MultipleNegativesRankingLoss ```bibtex @misc{oord2019representationlearningcontrastivepredictive, title={Representation Learning with Contrastive Predictive Coding}, author={Aaron van den Oord and Yazhe Li and Oriol Vinyals}, year={2019}, eprint={1807.03748}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/1807.03748}, } ```