Add new SentenceTransformer model

Browse files

Files changed (9) hide show

1_Pooling/config.json +10 -0
README.md +573 -0
config.json +77 -0
config_sentence_transformers.json +14 -0
model.safetensors +3 -0
modules.json +20 -0
sentence_bert_config.json +4 -0
tokenizer.json +0 -0
tokenizer_config.json +16 -0

1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+    "word_embedding_dimension": 768,
+    "pooling_mode_cls_token": false,
+    "pooling_mode_mean_tokens": true,
+    "pooling_mode_max_tokens": false,
+    "pooling_mode_mean_sqrt_len_tokens": false,
+    "pooling_mode_weightedmean_tokens": false,
+    "pooling_mode_lasttoken": false,
+    "include_prompt": true
+}

README.md ADDED Viewed

	@@ -0,0 +1,573 @@

+---
+language:
+- en
+license: apache-2.0
+tags:
+- sentence-transformers
+- sentence-similarity
+- feature-extraction
+- dense
+- generated_from_trainer
+- dataset_size:11644
+- loss:MultipleNegativesRankingLoss
+base_model: nomic-ai/modernbert-embed-base
+widget:
+- source_sentence: What section of the U.S. Code is cited in relation to Exemption
+    2?
+  sentences:
+  - "forma inmediata de la utilización de cualquiera y todo material en el \nque se\
+    \ utilizara la imagen de la parte apelada.  En adición, le \ncondenó solidariamente\
+    \ al pago de $20,000.00 por la utilización no \n \n \n \nKLAN202300916 \n \n6\n\
+    autorizada de la imagen del señor Friger Salgueiro y $4,000.00 por \nhonorarios\
+    \ de abogado. \nEn desacuerdo, el 20 de septiembre de 2023, la parte apelante"
+  - How does the invocation of the attorney-client privilege by the CIA affect summary
+    judgment?
+  - "Decl. Ex. K pt. 2, at 1, 8–14, 16–18, 22, 27, No. 11-445, ECF No. 29-3.  Exemption\
+    \ 2 applies to \nmatters that “related solely to the internal personnel rules\
+    \ and practices of an agency.”  5 U.S.C. \n§ 552(b)(2).  The CIA states in its\
+    \ declaration that all thirteen documents withheld under"
+- source_sentence: "advisory committee.”  Defs.’ Mem. at 23.  So the Government’s\
+    \ invocation of “judicial estoppel” \njust boils down to its argument that the\
+    \ Commission is not an advisory committee. \n35 \nand constitutes a failure to\
+    \ perform duties owed to EPIC within the meaning of 28 U.S.C. \n§ 1361.”  Id.\
+    \ ¶ 115.  Count IV likewise asserts that the Commission’s “failure to make [its]"
+  sentences:
+  - What must this Court determine regarding GSA's interpretation of 41 U.S.C. § 3306(c)(3)?
+  - "Mentor-Protégé JV to Submit an Individually Performed Relevant \nExperience Project,\
+    \ or (2) Allowing Prime Contractors to Rely on \nProjects Performed by First-Tier\
+    \ Subcontractors.  \nPlaintiffs claim the Polaris Solicitations’ requirements\
+    \ for evaluating Relevant Experience \nProjects creates a disparity between offerors\
+    \ that hinders protégé firms and violates 13 C.F.R."
+  - What is the Government's argument about the Commission?
+- source_sentence: Who provides checklists and worksheets to FOIA and Privacy Act
+    analysts?
+  sentences:
+  - "checklists, worksheets, and similar documents provided to [CIA] FOIA and Privacy\
+    \ Act analysts \n(both agency employees and contractors).”  See Third Lutz Decl.\
+    \ Ex. G at 1, No. 11-445, ECF \nNo. 52-1.  The plaintiff’s requests to the State\
+    \ Department and the NSA were identical, except \nthat they sought training materials\
+    \ provided to State Department and NSA FOIA and Privacy Act"
+  - According to Black's Law Dictionary, what is a 'function' associated with?
+  - What is the section number for the exception created by Congress?
+- source_sentence: "submit one Relevant Experience Project for consideration, and\
+    \ (2) allow prime contractors to rely \n \n25 In their MJAR briefs, Plaintiffs\
+    \ originally argued that the Polaris Solicitations violated Section \n125.2(g)\
+    \ by preventing mentor-protégé JVs from using subcontractor projects to fulfill\
+    \ all Relevant \nExperience Project requirements.  See SHS MJAR at 32–34; VCH\
+    \ MJAR at 32–34; Pl. Reply at"
+  sentences:
+  - "authority to make operational decisions for the JV under SBA regulation.  See\
+    \ SHS MJAR at 22–\n22 \n \n23; VCH MJAR at 22–23.  Thus, according to Plaintiff,\
+    \ the decision to preclude mentor-protégé \nJVs that share a mentor from bidding\
+    \ on the same Solicitation harms protégés, unduly restricts \ncompetition, and\
+    \ violates federal procurement law.  See SHS MJAR at 20–23; VCH MJAR at 20–\n\
+    23."
+  - "demandante expresó todas las funciones que \nrealizaba para Mech-Tech, incluyendo\
+    \ ser la \n“Imagen del Colegio”.  El demandante envió esta \ncomunicación como\
+    \ parte de su solicitud para que \nle aumentaran su compensación. \n \n18. En\
+    \ el verano del 2014 incrementó la compensación \ndel demandante para que continuara\
+    \ realizando \nsus funciones, incluyendo ser anfitrión (“host”) en"
+  - What did the Plaintiffs originally argue about the Polaris Solicitations regarding
+    Section 125.2(g)?
+- source_sentence: "confidentiality agreement/order, that remain following those discussions.\
+    \  This is a \nfinal report and notice of exceptions shall be filed within three\
+    \ days of the date of \nthis report, pursuant to Court of Chancery Rule 144(d)(2),\
+    \ given the expedited and \nsummary nature of Section 220 proceedings.  \n \n\
+    \ \n \n \n \n \n \nRespectfully, \n \n \n \n \n \n \n \n \n/s/ Patricia W. Griffin"
+  sentences:
+  - What was the plaintiff's motion requesting from the CIA?
+  - According to which court rule must the notice of exceptions be filed?
+  - "decides whether to submit proposals on future procurements, and excluding mentor-protégé\
+    \ JVs \nfrom proposing on a solicitation due to Section 125.9(b)(3)(i) unnecessarily\
+    \ prevents protégés from \naccessing opportunities to grow as a business.  SHS\
+    \ MJAR at 22–23; VCH MJAR at 22–23.   \nSuch a critique, however, merely highlights\
+    \ Plaintiffs’ disagreement with the SBA’s"
+datasets:
+- AdamLucek/legal-rag-positives-synthetic
+pipeline_tag: sentence-similarity
+library_name: sentence-transformers
+metrics:
+- cosine_accuracy@1
+- cosine_accuracy@3
+- cosine_accuracy@5
+- cosine_accuracy@10
+- cosine_precision@1
+- cosine_precision@3
+- cosine_precision@5
+- cosine_precision@10
+- cosine_recall@1
+- cosine_recall@3
+- cosine_recall@5
+- cosine_recall@10
+- cosine_ndcg@10
+- cosine_mrr@10
+- cosine_map@100
+model-index:
+- name: ModernBERT Embed Base Legal Fine-tuned
+  results:
+  - task:
+      type: information-retrieval
+      name: Information Retrieval
+    dataset:
+      name: ir
+      type: ir
+    metrics:
+    - type: cosine_accuracy@1
+      value: 0.3616692426584235
+      name: Cosine Accuracy@1
+    - type: cosine_accuracy@3
+      value: 0.4095826893353941
+      name: Cosine Accuracy@3
+    - type: cosine_accuracy@5
+      value: 0.47295208655332305
+      name: Cosine Accuracy@5
+    - type: cosine_accuracy@10
+      value: 0.5255023183925811
+      name: Cosine Accuracy@10
+    - type: cosine_precision@1
+      value: 0.3616692426584235
+      name: Cosine Precision@1
+    - type: cosine_precision@3
+      value: 0.35239567233384855
+      name: Cosine Precision@3
+    - type: cosine_precision@5
+      value: 0.2769706336939722
+      name: Cosine Precision@5
+    - type: cosine_precision@10
+      value: 0.16306027820710975
+      name: Cosine Precision@10
+    - type: cosine_recall@1
+      value: 0.1268675940236991
+      name: Cosine Recall@1
+    - type: cosine_recall@3
+      value: 0.3451828954147346
+      name: Cosine Recall@3
+    - type: cosine_recall@5
+      value: 0.44049459041731065
+      name: Cosine Recall@5
+    - type: cosine_recall@10
+      value: 0.5160999484801648
+      name: Cosine Recall@10
+    - type: cosine_ndcg@10
+      value: 0.44587301287538633
+      name: Cosine Ndcg@10
+    - type: cosine_mrr@10
+      value: 0.40186943401781094
+      name: Cosine Mrr@10
+    - type: cosine_map@100
+      value: 0.4419780310635075
+      name: Cosine Map@100
+  - task:
+      type: information-retrieval
+      name: Information Retrieval
+    dataset:
+      name: ir eval
+      type: ir_eval
+    metrics:
+    - type: cosine_accuracy@1
+      value: 0.633693972179289
+      name: Cosine Accuracy@1
+    - type: cosine_accuracy@3
+      value: 0.6893353941267388
+      name: Cosine Accuracy@3
+    - type: cosine_accuracy@5
+      value: 0.7758887171561051
+      name: Cosine Accuracy@5
+    - type: cosine_accuracy@10
+      value: 0.8377125193199382
+      name: Cosine Accuracy@10
+    - type: cosine_precision@1
+      value: 0.633693972179289
+      name: Cosine Precision@1
+    - type: cosine_precision@3
+      value: 0.6027820710973725
+      name: Cosine Precision@3
+    - type: cosine_precision@5
+      value: 0.4565687789799072
+      name: Cosine Precision@5
+    - type: cosine_precision@10
+      value: 0.2582689335394127
+      name: Cosine Precision@10
+    - type: cosine_recall@1
+      value: 0.22462648119526019
+      name: Cosine Recall@1
+    - type: cosine_recall@3
+      value: 0.5937660999484802
+      name: Cosine Recall@3
+    - type: cosine_recall@5
+      value: 0.7313240597630087
+      name: Cosine Recall@5
+    - type: cosine_recall@10
+      value: 0.8236733642452345
+      name: Cosine Recall@10
+    - type: cosine_ndcg@10
+      value: 0.7359943671334247
+      name: Cosine Ndcg@10
+    - type: cosine_mrr@10
+      value: 0.6824826427222096
+      name: Cosine Mrr@10
+    - type: cosine_map@100
+      value: 0.718112335022202
+      name: Cosine Map@100
+---
+# ModernBERT Embed Base Legal Fine-tuned
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base) on the [legal-rag-positives-synthetic](https://huggingface.co/datasets/AdamLucek/legal-rag-positives-synthetic) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
+## Model Details
+### Model Description
+- **Model Type:** Sentence Transformer
+- **Base model:** [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base) <!-- at revision d556a88e332558790b210f7bdbe87da2fa94a8d8 -->
+- **Maximum Sequence Length:** 8192 tokens
+- **Output Dimensionality:** 768 dimensions
+- **Similarity Function:** Cosine Similarity
+- **Training Dataset:**
+    - [legal-rag-positives-synthetic](https://huggingface.co/datasets/AdamLucek/legal-rag-positives-synthetic)
+- **Language:** en
+- **License:** apache-2.0
+### Model Sources
+- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
+- **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
+- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
+### Full Model Architecture
+```
+SentenceTransformer(
+  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
+  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
+  (2): Normalize()
+)
+```
+## Usage
+### Direct Usage (Sentence Transformers)
+First install the Sentence Transformers library:
+```bash
+pip install -U sentence-transformers
+```
+Then you can load this model and run inference.
+```python
+from sentence_transformers import SentenceTransformer
+# Download from the 🤗 Hub
+model = SentenceTransformer("aaa961/modernbert-embed-base-legal_no_MRL_reverse_dataset_early_stopping")
+# Run inference
+sentences = [
+    'confidentiality agreement/order, that remain following those discussions.  This is a \nfinal report and notice of exceptions shall be filed within three days of the date of \nthis report, pursuant to Court of Chancery Rule 144(d)(2), given the expedited and \nsummary nature of Section 220 proceedings.  \n \n \n \n \n \n \n \nRespectfully, \n \n \n \n \n \n \n \n \n/s/ Patricia W. Griffin',
+    'According to which court rule must the notice of exceptions be filed?',
+    'decides whether to submit proposals on future procurements, and excluding mentor-protégé JVs \nfrom proposing on a solicitation due to Section 125.9(b)(3)(i) unnecessarily prevents protégés from \naccessing opportunities to grow as a business.  SHS MJAR at 22–23; VCH MJAR at 22–23.   \nSuch a critique, however, merely highlights Plaintiffs’ disagreement with the SBA’s',
+]
+embeddings = model.encode(sentences)
+print(embeddings.shape)
+# [3, 768]
+# Get the similarity scores for the embeddings
+similarities = model.similarity(embeddings, embeddings)
+print(similarities)
+# tensor([[1.0000, 0.4922, 0.0280],
+#         [0.4922, 1.0000, 0.0389],
+#         [0.0280, 0.0389, 1.0000]])
+```
+<!--
+### Direct Usage (Transformers)
+<details><summary>Click to see the direct usage in Transformers</summary>
+</details>
+-->
+<!--
+### Downstream Usage (Sentence Transformers)
+You can finetune this model on your own dataset.
+<details><summary>Click to expand</summary>
+</details>
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+## Evaluation
+### Metrics
+#### Information Retrieval
+* Datasets: `ir` and `ir_eval`
+* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
+| Metric              | ir         | ir_eval   |
+|:--------------------|:-----------|:----------|
+| cosine_accuracy@1   | 0.3617     | 0.6337    |
+| cosine_accuracy@3   | 0.4096     | 0.6893    |
+| cosine_accuracy@5   | 0.473      | 0.7759    |
+| cosine_accuracy@10  | 0.5255     | 0.8377    |
+| cosine_precision@1  | 0.3617     | 0.6337    |
+| cosine_precision@3  | 0.3524     | 0.6028    |
+| cosine_precision@5  | 0.277      | 0.4566    |
+| cosine_precision@10 | 0.1631     | 0.2583    |
+| cosine_recall@1     | 0.1269     | 0.2246    |
+| cosine_recall@3     | 0.3452     | 0.5938    |
+| cosine_recall@5     | 0.4405     | 0.7313    |
+| cosine_recall@10    | 0.5161     | 0.8237    |
+| **cosine_ndcg@10**  | **0.4459** | **0.736** |
+| cosine_mrr@10       | 0.4019     | 0.6825    |
+| cosine_map@100      | 0.442      | 0.7181    |
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Dataset
+#### legal-rag-positives-synthetic
+* Dataset: [legal-rag-positives-synthetic](https://huggingface.co/datasets/AdamLucek/legal-rag-positives-synthetic) at [f11534a](https://huggingface.co/datasets/AdamLucek/legal-rag-positives-synthetic/tree/f11534aeed060a3245f55f8f9d944cf8132c780d)
+* Size: 11,644 training samples
+* Columns: <code>anchor</code> and <code>positive</code>
+* Approximate statistics based on the first 1000 samples:
+  |         | anchor                                                                             | positive                                                                           |
+  |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
+  | type    | string                                                                             | string                                                                             |
+  | details | <ul><li>min: 7 tokens</li><li>mean: 57.45 tokens</li><li>max: 160 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 57.77 tokens</li><li>max: 157 tokens</li></ul> |
+* Samples:
+  | anchor                                                                                                                                                                                                                                                                                                                                           | positive                                                                                                                                                                                                                                                                                                                                                                                                                              |
+  |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+  | <code>What kinds of issues are mentioned in connection with wrongdoing?</code>                                                                                                                                                                                                                                                                   | <code>mismanagement, waste and wrongdoing – and that it has demonstrated more than a <br>credible basis from which the Court can infer possible mismanagement.  It claims <br>DR’s management failed to follow corporate governance mechanics and made <br>critical business decisions without consulting with the Board or stockholders; <br>failed to act with due diligence related to undertaking an ICO and discontinuing</code> |
+  | <code>Project, 504 F.2d at 248 n.15). <br>More, the requirement of “substantial” authority suggests that the entity should be at the <br>“center of gravity in the exercise of administrative power.”  Id. at 882 (quoting Lombardo v. <br>Handler, 397 F. Supp. 792, 796 (D.D.C. 1975), aff’d, 546 F.2d 1043 (D.C. Cir. 1976)).  On this</code> | <code>What page reference is given for the Lombardo v. Handler case in the aforementioned citation?</code>                                                                                                                                                                                                                                                                                                                            |
+  | <code>Where can more detailed information regarding redactions be found?</code>                                                                                                                                                                                                                                                                  | <code>parties specifically with respect to the FOIA request at issue in Count Eighteen of No. 11-444.  This is likely <br>because the CIA has previously instituted a categorical policy of indicating the basis for redactions at a document <br>level, rather than a redaction level, as discussed above.  See supra Part III.C.2.  In light of the Court’s holding that</code>                                                     |
+* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
+  ```json
+  {
+      "scale": 20.0,
+      "similarity_fct": "cos_sim",
+      "gather_across_devices": false,
+      "directions": [
+          "query_to_doc"
+      ],
+      "partition_mode": "joint",
+      "hardness_mode": null,
+      "hardness_strength": 0.0
+  }
+  ```
+### Training Hyperparameters
+#### Non-Default Hyperparameters
+- `per_device_train_batch_size`: 32
+- `num_train_epochs`: 4
+- `learning_rate`: 2e-05
+- `lr_scheduler_type`: cosine
+- `warmup_steps`: 0.1
+- `optim`: adamw_torch_fused
+- `gradient_accumulation_steps`: 16
+- `bf16`: True
+- `tf32`: True
+- `eval_strategy`: epoch
+- `per_device_eval_batch_size`: 16
+- `load_best_model_at_end`: True
+#### All Hyperparameters
+<details><summary>Click to expand</summary>
+- `per_device_train_batch_size`: 32
+- `num_train_epochs`: 4
+- `max_steps`: -1
+- `learning_rate`: 2e-05
+- `lr_scheduler_type`: cosine
+- `lr_scheduler_kwargs`: None
+- `warmup_steps`: 0.1
+- `optim`: adamw_torch_fused
+- `optim_args`: None
+- `weight_decay`: 0.0
+- `adam_beta1`: 0.9
+- `adam_beta2`: 0.999
+- `adam_epsilon`: 1e-08
+- `optim_target_modules`: None
+- `gradient_accumulation_steps`: 16
+- `average_tokens_across_devices`: True
+- `max_grad_norm`: 1.0
+- `label_smoothing_factor`: 0.0
+- `bf16`: True
+- `fp16`: False
+- `bf16_full_eval`: False
+- `fp16_full_eval`: False
+- `tf32`: True
+- `gradient_checkpointing`: False
+- `gradient_checkpointing_kwargs`: None
+- `torch_compile`: False
+- `torch_compile_backend`: None
+- `torch_compile_mode`: None
+- `use_liger_kernel`: False
+- `liger_kernel_config`: None
+- `use_cache`: False
+- `neftune_noise_alpha`: None
+- `torch_empty_cache_steps`: None
+- `auto_find_batch_size`: False
+- `log_on_each_node`: True
+- `logging_nan_inf_filter`: True
+- `include_num_input_tokens_seen`: no
+- `log_level`: passive
+- `log_level_replica`: warning
+- `disable_tqdm`: False
+- `project`: huggingface
+- `trackio_space_id`: trackio
+- `eval_strategy`: epoch
+- `per_device_eval_batch_size`: 16
+- `prediction_loss_only`: True
+- `eval_on_start`: False
+- `eval_do_concat_batches`: True
+- `eval_use_gather_object`: False
+- `eval_accumulation_steps`: None
+- `include_for_metrics`: []
+- `batch_eval_metrics`: False
+- `save_only_model`: False
+- `save_on_each_node`: False
+- `enable_jit_checkpoint`: False
+- `push_to_hub`: False
+- `hub_private_repo`: None
+- `hub_model_id`: None
+- `hub_strategy`: every_save
+- `hub_always_push`: False
+- `hub_revision`: None
+- `load_best_model_at_end`: True
+- `ignore_data_skip`: False
+- `restore_callback_states_from_checkpoint`: False
+- `full_determinism`: False
+- `seed`: 42
+- `data_seed`: None
+- `use_cpu`: False
+- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
+- `parallelism_config`: None
+- `dataloader_drop_last`: False
+- `dataloader_num_workers`: 0
+- `dataloader_pin_memory`: True
+- `dataloader_persistent_workers`: False
+- `dataloader_prefetch_factor`: None
+- `remove_unused_columns`: True
+- `label_names`: None
+- `train_sampling_strategy`: random
+- `length_column_name`: length
+- `ddp_find_unused_parameters`: None
+- `ddp_bucket_cap_mb`: None
+- `ddp_broadcast_buffers`: False
+- `ddp_backend`: None
+- `ddp_timeout`: 1800
+- `fsdp`: []
+- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
+- `deepspeed`: None
+- `debug`: []
+- `skip_memory_metrics`: True
+- `do_predict`: False
+- `resume_from_checkpoint`: None
+- `warmup_ratio`: None
+- `local_rank`: -1
+- `prompts`: None
+- `batch_sampler`: batch_sampler
+- `multi_dataset_batch_sampler`: proportional
+- `router_mapping`: {}
+- `learning_rate_mapping`: {}
+</details>
+### Training Logs
+| Epoch   | Step   | Training Loss | ir_cosine_ndcg@10 | ir_eval_cosine_ndcg@10 |
+|:-------:|:------:|:-------------:|:-----------------:|:----------------------:|
+| -1      | -1     | -             | 0.4459            | 0.4459                 |
+| 0.4396  | 10     | 1.4221        | -                 | -                      |
+| 0.8791  | 20     | 0.6964        | -                 | -                      |
+| 1.0     | 23     | -             | -                 | 0.6760                 |
+| 1.3077  | 30     | 0.4787        | -                 | -                      |
+| 1.7473  | 40     | 0.4033        | -                 | -                      |
+| 2.0     | 46     | -             | -                 | 0.7196                 |
+| 2.1758  | 50     | 0.3770        | -                 | -                      |
+| 2.6154  | 60     | 0.3159        | -                 | -                      |
+| **3.0** | **69** | **-**         | **-**             | **0.7361**             |
+| 3.0440  | 70     | 0.3345        | -                 | -                      |
+| 3.4835  | 80     | 0.2698        | -                 | -                      |
+| 3.9231  | 90     | 0.3188        | -                 | -                      |
+| 4.0     | 92     | -             | -                 | 0.7360                 |
+* The bold row denotes the saved checkpoint.
+### Framework Versions
+- Python: 3.12.11
+- Sentence Transformers: 5.3.0
+- Transformers: 5.3.0
+- PyTorch: 2.5.1+cu121
+- Accelerate: 1.13.0
+- Datasets: 4.8.2
+- Tokenizers: 0.22.2
+## Citation
+### BibTeX
+#### Sentence Transformers
+```bibtex
+@inproceedings{reimers-2019-sentence-bert,
+    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
+    author = "Reimers, Nils and Gurevych, Iryna",
+    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
+    month = "11",
+    year = "2019",
+    publisher = "Association for Computational Linguistics",
+    url = "https://arxiv.org/abs/1908.10084",
+}
+```
+#### MultipleNegativesRankingLoss
+```bibtex
+@misc{oord2019representationlearningcontrastivepredictive,
+      title={Representation Learning with Contrastive Predictive Coding},
+      author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
+      year={2019},
+      eprint={1807.03748},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/1807.03748},
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

config.json ADDED Viewed

	@@ -0,0 +1,77 @@

+{
+  "architectures": [
+    "ModernBertModel"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 50281,
+  "classifier_activation": "gelu",
+  "classifier_bias": false,
+  "classifier_dropout": 0.0,
+  "classifier_pooling": "mean",
+  "cls_token_id": 50281,
+  "decoder_bias": true,
+  "deterministic_flash_attn": false,
+  "dtype": "float32",
+  "embedding_dropout": 0.0,
+  "eos_token_id": 50282,
+  "global_attn_every_n_layers": 3,
+  "gradient_checkpointing": false,
+  "hidden_activation": "gelu",
+  "hidden_size": 768,
+  "initializer_cutoff_factor": 2.0,
+  "initializer_range": 0.02,
+  "intermediate_size": 1152,
+  "layer_norm_eps": 1e-05,
+  "layer_types": [
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention",
+    "sliding_attention",
+    "sliding_attention",
+    "full_attention"
+  ],
+  "local_attention": 128,
+  "max_position_embeddings": 8192,
+  "mlp_bias": false,
+  "mlp_dropout": 0.0,
+  "model_type": "modernbert",
+  "norm_bias": false,
+  "norm_eps": 1e-05,
+  "num_attention_heads": 12,
+  "num_hidden_layers": 22,
+  "pad_token_id": 50283,
+  "position_embedding_type": "absolute",
+  "rope_parameters": {
+    "full_attention": {
+      "rope_theta": 160000.0,
+      "rope_type": "default"
+    },
+    "sliding_attention": {
+      "rope_theta": 10000.0,
+      "rope_type": "default"
+    }
+  },
+  "sep_token_id": 50282,
+  "sparse_pred_ignore_index": -100,
+  "sparse_prediction": false,
+  "tie_word_embeddings": true,
+  "transformers_version": "5.3.0",
+  "vocab_size": 50368
+}

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "__version__": {
+    "sentence_transformers": "5.3.0",
+    "transformers": "5.3.0",
+    "pytorch": "2.5.1+cu121"
+  },
+  "prompts": {
+    "query": "",
+    "document": ""
+  },
+  "default_prompt_name": null,
+  "similarity_fn_name": "cosine",
+  "model_type": "SentenceTransformer"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ab181a45d9a9893f19a53d36d28c46c6bd1bfb4b0a2ba81335c6946b0d426b6a
+size 596070136

modules.json ADDED Viewed

	@@ -0,0 +1,20 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Pooling",
+    "type": "sentence_transformers.models.Pooling"
+  },
+  {
+    "idx": 2,
+    "name": "2",
+    "path": "2_Normalize",
+    "type": "sentence_transformers.models.Normalize"
+  }
+]

sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+    "max_seq_length": 8192,
+    "do_lower_case": false
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+  "backend": "tokenizers",
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "is_local": false,
+  "mask_token": "[MASK]",
+  "model_input_names": [
+    "input_ids",
+    "attention_mask"
+  ],
+  "model_max_length": 8192,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "tokenizer_class": "TokenizersBackend",
+  "unk_token": "[UNK]"
+}