aaa961 commited on
Commit
dfaaf6c
·
verified ·
1 Parent(s): 9b8479a

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,573 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - dense
10
+ - generated_from_trainer
11
+ - dataset_size:11644
12
+ - loss:MultipleNegativesRankingLoss
13
+ base_model: nomic-ai/modernbert-embed-base
14
+ widget:
15
+ - source_sentence: What section of the U.S. Code is cited in relation to Exemption
16
+ 2?
17
+ sentences:
18
+ - "forma inmediata de la utilización de cualquiera y todo material en el \nque se\
19
+ \ utilizara la imagen de la parte apelada. En adición, le \ncondenó solidariamente\
20
+ \ al pago de $20,000.00 por la utilización no \n \n \n \nKLAN202300916 \n \n6\n\
21
+ autorizada de la imagen del señor Friger Salgueiro y $4,000.00 por \nhonorarios\
22
+ \ de abogado. \nEn desacuerdo, el 20 de septiembre de 2023, la parte apelante"
23
+ - How does the invocation of the attorney-client privilege by the CIA affect summary
24
+ judgment?
25
+ - "Decl. Ex. K pt. 2, at 1, 8–14, 16–18, 22, 27, No. 11-445, ECF No. 29-3. Exemption\
26
+ \ 2 applies to \nmatters that “related solely to the internal personnel rules\
27
+ \ and practices of an agency.” 5 U.S.C. \n§ 552(b)(2). The CIA states in its\
28
+ \ declaration that all thirteen documents withheld under"
29
+ - source_sentence: "advisory committee.” Defs.’ Mem. at 23. So the Government’s\
30
+ \ invocation of “judicial estoppel” \njust boils down to its argument that the\
31
+ \ Commission is not an advisory committee. \n35 \nand constitutes a failure to\
32
+ \ perform duties owed to EPIC within the meaning of 28 U.S.C. \n§ 1361.” Id.\
33
+ \ ¶ 115. Count IV likewise asserts that the Commission’s “failure to make [its]"
34
+ sentences:
35
+ - What must this Court determine regarding GSA's interpretation of 41 U.S.C. § 3306(c)(3)?
36
+ - "Mentor-Protégé JV to Submit an Individually Performed Relevant \nExperience Project,\
37
+ \ or (2) Allowing Prime Contractors to Rely on \nProjects Performed by First-Tier\
38
+ \ Subcontractors. \nPlaintiffs claim the Polaris Solicitations’ requirements\
39
+ \ for evaluating Relevant Experience \nProjects creates a disparity between offerors\
40
+ \ that hinders protégé firms and violates 13 C.F.R."
41
+ - What is the Government's argument about the Commission?
42
+ - source_sentence: Who provides checklists and worksheets to FOIA and Privacy Act
43
+ analysts?
44
+ sentences:
45
+ - "checklists, worksheets, and similar documents provided to [CIA] FOIA and Privacy\
46
+ \ Act analysts \n(both agency employees and contractors).” See Third Lutz Decl.\
47
+ \ Ex. G at 1, No. 11-445, ECF \nNo. 52-1. The plaintiff’s requests to the State\
48
+ \ Department and the NSA were identical, except \nthat they sought training materials\
49
+ \ provided to State Department and NSA FOIA and Privacy Act"
50
+ - According to Black's Law Dictionary, what is a 'function' associated with?
51
+ - What is the section number for the exception created by Congress?
52
+ - source_sentence: "submit one Relevant Experience Project for consideration, and\
53
+ \ (2) allow prime contractors to rely \n \n25 In their MJAR briefs, Plaintiffs\
54
+ \ originally argued that the Polaris Solicitations violated Section \n125.2(g)\
55
+ \ by preventing mentor-protégé JVs from using subcontractor projects to fulfill\
56
+ \ all Relevant \nExperience Project requirements. See SHS MJAR at 32–34; VCH\
57
+ \ MJAR at 32–34; Pl. Reply at"
58
+ sentences:
59
+ - "authority to make operational decisions for the JV under SBA regulation. See\
60
+ \ SHS MJAR at 22–\n22 \n \n23; VCH MJAR at 22–23. Thus, according to Plaintiff,\
61
+ \ the decision to preclude mentor-protégé \nJVs that share a mentor from bidding\
62
+ \ on the same Solicitation harms protégés, unduly restricts \ncompetition, and\
63
+ \ violates federal procurement law. See SHS MJAR at 20–23; VCH MJAR at 20–\n\
64
+ 23."
65
+ - "demandante expresó todas las funciones que \nrealizaba para Mech-Tech, incluyendo\
66
+ \ ser la \n“Imagen del Colegio”. El demandante envió esta \ncomunicación como\
67
+ \ parte de su solicitud para que \nle aumentaran su compensación. \n \n18. En\
68
+ \ el verano del 2014 incrementó la compensación \ndel demandante para que continuara\
69
+ \ realizando \nsus funciones, incluyendo ser anfitrión (“host”) en"
70
+ - What did the Plaintiffs originally argue about the Polaris Solicitations regarding
71
+ Section 125.2(g)?
72
+ - source_sentence: "confidentiality agreement/order, that remain following those discussions.\
73
+ \ This is a \nfinal report and notice of exceptions shall be filed within three\
74
+ \ days of the date of \nthis report, pursuant to Court of Chancery Rule 144(d)(2),\
75
+ \ given the expedited and \nsummary nature of Section 220 proceedings. \n \n\
76
+ \ \n \n \n \n \n \nRespectfully, \n \n \n \n \n \n \n \n \n/s/ Patricia W. Griffin"
77
+ sentences:
78
+ - What was the plaintiff's motion requesting from the CIA?
79
+ - According to which court rule must the notice of exceptions be filed?
80
+ - "decides whether to submit proposals on future procurements, and excluding mentor-protégé\
81
+ \ JVs \nfrom proposing on a solicitation due to Section 125.9(b)(3)(i) unnecessarily\
82
+ \ prevents protégés from \naccessing opportunities to grow as a business. SHS\
83
+ \ MJAR at 22–23; VCH MJAR at 22–23. \nSuch a critique, however, merely highlights\
84
+ \ Plaintiffs’ disagreement with the SBA’s"
85
+ datasets:
86
+ - AdamLucek/legal-rag-positives-synthetic
87
+ pipeline_tag: sentence-similarity
88
+ library_name: sentence-transformers
89
+ metrics:
90
+ - cosine_accuracy@1
91
+ - cosine_accuracy@3
92
+ - cosine_accuracy@5
93
+ - cosine_accuracy@10
94
+ - cosine_precision@1
95
+ - cosine_precision@3
96
+ - cosine_precision@5
97
+ - cosine_precision@10
98
+ - cosine_recall@1
99
+ - cosine_recall@3
100
+ - cosine_recall@5
101
+ - cosine_recall@10
102
+ - cosine_ndcg@10
103
+ - cosine_mrr@10
104
+ - cosine_map@100
105
+ model-index:
106
+ - name: ModernBERT Embed Base Legal Fine-tuned
107
+ results:
108
+ - task:
109
+ type: information-retrieval
110
+ name: Information Retrieval
111
+ dataset:
112
+ name: ir
113
+ type: ir
114
+ metrics:
115
+ - type: cosine_accuracy@1
116
+ value: 0.3616692426584235
117
+ name: Cosine Accuracy@1
118
+ - type: cosine_accuracy@3
119
+ value: 0.4095826893353941
120
+ name: Cosine Accuracy@3
121
+ - type: cosine_accuracy@5
122
+ value: 0.47295208655332305
123
+ name: Cosine Accuracy@5
124
+ - type: cosine_accuracy@10
125
+ value: 0.5255023183925811
126
+ name: Cosine Accuracy@10
127
+ - type: cosine_precision@1
128
+ value: 0.3616692426584235
129
+ name: Cosine Precision@1
130
+ - type: cosine_precision@3
131
+ value: 0.35239567233384855
132
+ name: Cosine Precision@3
133
+ - type: cosine_precision@5
134
+ value: 0.2769706336939722
135
+ name: Cosine Precision@5
136
+ - type: cosine_precision@10
137
+ value: 0.16306027820710975
138
+ name: Cosine Precision@10
139
+ - type: cosine_recall@1
140
+ value: 0.1268675940236991
141
+ name: Cosine Recall@1
142
+ - type: cosine_recall@3
143
+ value: 0.3451828954147346
144
+ name: Cosine Recall@3
145
+ - type: cosine_recall@5
146
+ value: 0.44049459041731065
147
+ name: Cosine Recall@5
148
+ - type: cosine_recall@10
149
+ value: 0.5160999484801648
150
+ name: Cosine Recall@10
151
+ - type: cosine_ndcg@10
152
+ value: 0.44587301287538633
153
+ name: Cosine Ndcg@10
154
+ - type: cosine_mrr@10
155
+ value: 0.40186943401781094
156
+ name: Cosine Mrr@10
157
+ - type: cosine_map@100
158
+ value: 0.4419780310635075
159
+ name: Cosine Map@100
160
+ - task:
161
+ type: information-retrieval
162
+ name: Information Retrieval
163
+ dataset:
164
+ name: ir eval
165
+ type: ir_eval
166
+ metrics:
167
+ - type: cosine_accuracy@1
168
+ value: 0.633693972179289
169
+ name: Cosine Accuracy@1
170
+ - type: cosine_accuracy@3
171
+ value: 0.6893353941267388
172
+ name: Cosine Accuracy@3
173
+ - type: cosine_accuracy@5
174
+ value: 0.7758887171561051
175
+ name: Cosine Accuracy@5
176
+ - type: cosine_accuracy@10
177
+ value: 0.8377125193199382
178
+ name: Cosine Accuracy@10
179
+ - type: cosine_precision@1
180
+ value: 0.633693972179289
181
+ name: Cosine Precision@1
182
+ - type: cosine_precision@3
183
+ value: 0.6027820710973725
184
+ name: Cosine Precision@3
185
+ - type: cosine_precision@5
186
+ value: 0.4565687789799072
187
+ name: Cosine Precision@5
188
+ - type: cosine_precision@10
189
+ value: 0.2582689335394127
190
+ name: Cosine Precision@10
191
+ - type: cosine_recall@1
192
+ value: 0.22462648119526019
193
+ name: Cosine Recall@1
194
+ - type: cosine_recall@3
195
+ value: 0.5937660999484802
196
+ name: Cosine Recall@3
197
+ - type: cosine_recall@5
198
+ value: 0.7313240597630087
199
+ name: Cosine Recall@5
200
+ - type: cosine_recall@10
201
+ value: 0.8236733642452345
202
+ name: Cosine Recall@10
203
+ - type: cosine_ndcg@10
204
+ value: 0.7359943671334247
205
+ name: Cosine Ndcg@10
206
+ - type: cosine_mrr@10
207
+ value: 0.6824826427222096
208
+ name: Cosine Mrr@10
209
+ - type: cosine_map@100
210
+ value: 0.718112335022202
211
+ name: Cosine Map@100
212
+ ---
213
+
214
+ # ModernBERT Embed Base Legal Fine-tuned
215
+
216
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base) on the [legal-rag-positives-synthetic](https://huggingface.co/datasets/AdamLucek/legal-rag-positives-synthetic) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
217
+
218
+ ## Model Details
219
+
220
+ ### Model Description
221
+ - **Model Type:** Sentence Transformer
222
+ - **Base model:** [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base) <!-- at revision d556a88e332558790b210f7bdbe87da2fa94a8d8 -->
223
+ - **Maximum Sequence Length:** 8192 tokens
224
+ - **Output Dimensionality:** 768 dimensions
225
+ - **Similarity Function:** Cosine Similarity
226
+ - **Training Dataset:**
227
+ - [legal-rag-positives-synthetic](https://huggingface.co/datasets/AdamLucek/legal-rag-positives-synthetic)
228
+ - **Language:** en
229
+ - **License:** apache-2.0
230
+
231
+ ### Model Sources
232
+
233
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
234
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
235
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
236
+
237
+ ### Full Model Architecture
238
+
239
+ ```
240
+ SentenceTransformer(
241
+ (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
242
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
243
+ (2): Normalize()
244
+ )
245
+ ```
246
+
247
+ ## Usage
248
+
249
+ ### Direct Usage (Sentence Transformers)
250
+
251
+ First install the Sentence Transformers library:
252
+
253
+ ```bash
254
+ pip install -U sentence-transformers
255
+ ```
256
+
257
+ Then you can load this model and run inference.
258
+ ```python
259
+ from sentence_transformers import SentenceTransformer
260
+
261
+ # Download from the 🤗 Hub
262
+ model = SentenceTransformer("aaa961/modernbert-embed-base-legal_no_MRL_reverse_dataset_early_stopping")
263
+ # Run inference
264
+ sentences = [
265
+ 'confidentiality agreement/order, that remain following those discussions. This is a \nfinal report and notice of exceptions shall be filed within three days of the date of \nthis report, pursuant to Court of Chancery Rule 144(d)(2), given the expedited and \nsummary nature of Section 220 proceedings. \n \n \n \n \n \n \n \nRespectfully, \n \n \n \n \n \n \n \n \n/s/ Patricia W. Griffin',
266
+ 'According to which court rule must the notice of exceptions be filed?',
267
+ 'decides whether to submit proposals on future procurements, and excluding mentor-protégé JVs \nfrom proposing on a solicitation due to Section 125.9(b)(3)(i) unnecessarily prevents protégés from \naccessing opportunities to grow as a business. SHS MJAR at 22–23; VCH MJAR at 22–23. \nSuch a critique, however, merely highlights Plaintiffs’ disagreement with the SBA’s',
268
+ ]
269
+ embeddings = model.encode(sentences)
270
+ print(embeddings.shape)
271
+ # [3, 768]
272
+
273
+ # Get the similarity scores for the embeddings
274
+ similarities = model.similarity(embeddings, embeddings)
275
+ print(similarities)
276
+ # tensor([[1.0000, 0.4922, 0.0280],
277
+ # [0.4922, 1.0000, 0.0389],
278
+ # [0.0280, 0.0389, 1.0000]])
279
+ ```
280
+
281
+ <!--
282
+ ### Direct Usage (Transformers)
283
+
284
+ <details><summary>Click to see the direct usage in Transformers</summary>
285
+
286
+ </details>
287
+ -->
288
+
289
+ <!--
290
+ ### Downstream Usage (Sentence Transformers)
291
+
292
+ You can finetune this model on your own dataset.
293
+
294
+ <details><summary>Click to expand</summary>
295
+
296
+ </details>
297
+ -->
298
+
299
+ <!--
300
+ ### Out-of-Scope Use
301
+
302
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
303
+ -->
304
+
305
+ ## Evaluation
306
+
307
+ ### Metrics
308
+
309
+ #### Information Retrieval
310
+
311
+ * Datasets: `ir` and `ir_eval`
312
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
313
+
314
+ | Metric | ir | ir_eval |
315
+ |:--------------------|:-----------|:----------|
316
+ | cosine_accuracy@1 | 0.3617 | 0.6337 |
317
+ | cosine_accuracy@3 | 0.4096 | 0.6893 |
318
+ | cosine_accuracy@5 | 0.473 | 0.7759 |
319
+ | cosine_accuracy@10 | 0.5255 | 0.8377 |
320
+ | cosine_precision@1 | 0.3617 | 0.6337 |
321
+ | cosine_precision@3 | 0.3524 | 0.6028 |
322
+ | cosine_precision@5 | 0.277 | 0.4566 |
323
+ | cosine_precision@10 | 0.1631 | 0.2583 |
324
+ | cosine_recall@1 | 0.1269 | 0.2246 |
325
+ | cosine_recall@3 | 0.3452 | 0.5938 |
326
+ | cosine_recall@5 | 0.4405 | 0.7313 |
327
+ | cosine_recall@10 | 0.5161 | 0.8237 |
328
+ | **cosine_ndcg@10** | **0.4459** | **0.736** |
329
+ | cosine_mrr@10 | 0.4019 | 0.6825 |
330
+ | cosine_map@100 | 0.442 | 0.7181 |
331
+
332
+ <!--
333
+ ## Bias, Risks and Limitations
334
+
335
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
336
+ -->
337
+
338
+ <!--
339
+ ### Recommendations
340
+
341
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
342
+ -->
343
+
344
+ ## Training Details
345
+
346
+ ### Training Dataset
347
+
348
+ #### legal-rag-positives-synthetic
349
+
350
+ * Dataset: [legal-rag-positives-synthetic](https://huggingface.co/datasets/AdamLucek/legal-rag-positives-synthetic) at [f11534a](https://huggingface.co/datasets/AdamLucek/legal-rag-positives-synthetic/tree/f11534aeed060a3245f55f8f9d944cf8132c780d)
351
+ * Size: 11,644 training samples
352
+ * Columns: <code>anchor</code> and <code>positive</code>
353
+ * Approximate statistics based on the first 1000 samples:
354
+ | | anchor | positive |
355
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
356
+ | type | string | string |
357
+ | details | <ul><li>min: 7 tokens</li><li>mean: 57.45 tokens</li><li>max: 160 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 57.77 tokens</li><li>max: 157 tokens</li></ul> |
358
+ * Samples:
359
+ | anchor | positive |
360
+ |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
361
+ | <code>What kinds of issues are mentioned in connection with wrongdoing?</code> | <code>mismanagement, waste and wrongdoing – and that it has demonstrated more than a <br>credible basis from which the Court can infer possible mismanagement. It claims <br>DR’s management failed to follow corporate governance mechanics and made <br>critical business decisions without consulting with the Board or stockholders; <br>failed to act with due diligence related to undertaking an ICO and discontinuing</code> |
362
+ | <code>Project, 504 F.2d at 248 n.15). <br>More, the requirement of “substantial” authority suggests that the entity should be at the <br>“center of gravity in the exercise of administrative power.” Id. at 882 (quoting Lombardo v. <br>Handler, 397 F. Supp. 792, 796 (D.D.C. 1975), aff’d, 546 F.2d 1043 (D.C. Cir. 1976)). On this</code> | <code>What page reference is given for the Lombardo v. Handler case in the aforementioned citation?</code> |
363
+ | <code>Where can more detailed information regarding redactions be found?</code> | <code>parties specifically with respect to the FOIA request at issue in Count Eighteen of No. 11-444. This is likely <br>because the CIA has previously instituted a categorical policy of indicating the basis for redactions at a document <br>level, rather than a redaction level, as discussed above. See supra Part III.C.2. In light of the Court’s holding that</code> |
364
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
365
+ ```json
366
+ {
367
+ "scale": 20.0,
368
+ "similarity_fct": "cos_sim",
369
+ "gather_across_devices": false,
370
+ "directions": [
371
+ "query_to_doc"
372
+ ],
373
+ "partition_mode": "joint",
374
+ "hardness_mode": null,
375
+ "hardness_strength": 0.0
376
+ }
377
+ ```
378
+
379
+ ### Training Hyperparameters
380
+ #### Non-Default Hyperparameters
381
+
382
+ - `per_device_train_batch_size`: 32
383
+ - `num_train_epochs`: 4
384
+ - `learning_rate`: 2e-05
385
+ - `lr_scheduler_type`: cosine
386
+ - `warmup_steps`: 0.1
387
+ - `optim`: adamw_torch_fused
388
+ - `gradient_accumulation_steps`: 16
389
+ - `bf16`: True
390
+ - `tf32`: True
391
+ - `eval_strategy`: epoch
392
+ - `per_device_eval_batch_size`: 16
393
+ - `load_best_model_at_end`: True
394
+
395
+ #### All Hyperparameters
396
+ <details><summary>Click to expand</summary>
397
+
398
+ - `per_device_train_batch_size`: 32
399
+ - `num_train_epochs`: 4
400
+ - `max_steps`: -1
401
+ - `learning_rate`: 2e-05
402
+ - `lr_scheduler_type`: cosine
403
+ - `lr_scheduler_kwargs`: None
404
+ - `warmup_steps`: 0.1
405
+ - `optim`: adamw_torch_fused
406
+ - `optim_args`: None
407
+ - `weight_decay`: 0.0
408
+ - `adam_beta1`: 0.9
409
+ - `adam_beta2`: 0.999
410
+ - `adam_epsilon`: 1e-08
411
+ - `optim_target_modules`: None
412
+ - `gradient_accumulation_steps`: 16
413
+ - `average_tokens_across_devices`: True
414
+ - `max_grad_norm`: 1.0
415
+ - `label_smoothing_factor`: 0.0
416
+ - `bf16`: True
417
+ - `fp16`: False
418
+ - `bf16_full_eval`: False
419
+ - `fp16_full_eval`: False
420
+ - `tf32`: True
421
+ - `gradient_checkpointing`: False
422
+ - `gradient_checkpointing_kwargs`: None
423
+ - `torch_compile`: False
424
+ - `torch_compile_backend`: None
425
+ - `torch_compile_mode`: None
426
+ - `use_liger_kernel`: False
427
+ - `liger_kernel_config`: None
428
+ - `use_cache`: False
429
+ - `neftune_noise_alpha`: None
430
+ - `torch_empty_cache_steps`: None
431
+ - `auto_find_batch_size`: False
432
+ - `log_on_each_node`: True
433
+ - `logging_nan_inf_filter`: True
434
+ - `include_num_input_tokens_seen`: no
435
+ - `log_level`: passive
436
+ - `log_level_replica`: warning
437
+ - `disable_tqdm`: False
438
+ - `project`: huggingface
439
+ - `trackio_space_id`: trackio
440
+ - `eval_strategy`: epoch
441
+ - `per_device_eval_batch_size`: 16
442
+ - `prediction_loss_only`: True
443
+ - `eval_on_start`: False
444
+ - `eval_do_concat_batches`: True
445
+ - `eval_use_gather_object`: False
446
+ - `eval_accumulation_steps`: None
447
+ - `include_for_metrics`: []
448
+ - `batch_eval_metrics`: False
449
+ - `save_only_model`: False
450
+ - `save_on_each_node`: False
451
+ - `enable_jit_checkpoint`: False
452
+ - `push_to_hub`: False
453
+ - `hub_private_repo`: None
454
+ - `hub_model_id`: None
455
+ - `hub_strategy`: every_save
456
+ - `hub_always_push`: False
457
+ - `hub_revision`: None
458
+ - `load_best_model_at_end`: True
459
+ - `ignore_data_skip`: False
460
+ - `restore_callback_states_from_checkpoint`: False
461
+ - `full_determinism`: False
462
+ - `seed`: 42
463
+ - `data_seed`: None
464
+ - `use_cpu`: False
465
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
466
+ - `parallelism_config`: None
467
+ - `dataloader_drop_last`: False
468
+ - `dataloader_num_workers`: 0
469
+ - `dataloader_pin_memory`: True
470
+ - `dataloader_persistent_workers`: False
471
+ - `dataloader_prefetch_factor`: None
472
+ - `remove_unused_columns`: True
473
+ - `label_names`: None
474
+ - `train_sampling_strategy`: random
475
+ - `length_column_name`: length
476
+ - `ddp_find_unused_parameters`: None
477
+ - `ddp_bucket_cap_mb`: None
478
+ - `ddp_broadcast_buffers`: False
479
+ - `ddp_backend`: None
480
+ - `ddp_timeout`: 1800
481
+ - `fsdp`: []
482
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
483
+ - `deepspeed`: None
484
+ - `debug`: []
485
+ - `skip_memory_metrics`: True
486
+ - `do_predict`: False
487
+ - `resume_from_checkpoint`: None
488
+ - `warmup_ratio`: None
489
+ - `local_rank`: -1
490
+ - `prompts`: None
491
+ - `batch_sampler`: batch_sampler
492
+ - `multi_dataset_batch_sampler`: proportional
493
+ - `router_mapping`: {}
494
+ - `learning_rate_mapping`: {}
495
+
496
+ </details>
497
+
498
+ ### Training Logs
499
+ | Epoch | Step | Training Loss | ir_cosine_ndcg@10 | ir_eval_cosine_ndcg@10 |
500
+ |:-------:|:------:|:-------------:|:-----------------:|:----------------------:|
501
+ | -1 | -1 | - | 0.4459 | 0.4459 |
502
+ | 0.4396 | 10 | 1.4221 | - | - |
503
+ | 0.8791 | 20 | 0.6964 | - | - |
504
+ | 1.0 | 23 | - | - | 0.6760 |
505
+ | 1.3077 | 30 | 0.4787 | - | - |
506
+ | 1.7473 | 40 | 0.4033 | - | - |
507
+ | 2.0 | 46 | - | - | 0.7196 |
508
+ | 2.1758 | 50 | 0.3770 | - | - |
509
+ | 2.6154 | 60 | 0.3159 | - | - |
510
+ | **3.0** | **69** | **-** | **-** | **0.7361** |
511
+ | 3.0440 | 70 | 0.3345 | - | - |
512
+ | 3.4835 | 80 | 0.2698 | - | - |
513
+ | 3.9231 | 90 | 0.3188 | - | - |
514
+ | 4.0 | 92 | - | - | 0.7360 |
515
+
516
+ * The bold row denotes the saved checkpoint.
517
+
518
+ ### Framework Versions
519
+ - Python: 3.12.11
520
+ - Sentence Transformers: 5.3.0
521
+ - Transformers: 5.3.0
522
+ - PyTorch: 2.5.1+cu121
523
+ - Accelerate: 1.13.0
524
+ - Datasets: 4.8.2
525
+ - Tokenizers: 0.22.2
526
+
527
+ ## Citation
528
+
529
+ ### BibTeX
530
+
531
+ #### Sentence Transformers
532
+ ```bibtex
533
+ @inproceedings{reimers-2019-sentence-bert,
534
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
535
+ author = "Reimers, Nils and Gurevych, Iryna",
536
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
537
+ month = "11",
538
+ year = "2019",
539
+ publisher = "Association for Computational Linguistics",
540
+ url = "https://arxiv.org/abs/1908.10084",
541
+ }
542
+ ```
543
+
544
+ #### MultipleNegativesRankingLoss
545
+ ```bibtex
546
+ @misc{oord2019representationlearningcontrastivepredictive,
547
+ title={Representation Learning with Contrastive Predictive Coding},
548
+ author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
549
+ year={2019},
550
+ eprint={1807.03748},
551
+ archivePrefix={arXiv},
552
+ primaryClass={cs.LG},
553
+ url={https://arxiv.org/abs/1807.03748},
554
+ }
555
+ ```
556
+
557
+ <!--
558
+ ## Glossary
559
+
560
+ *Clearly define terms in order to be accessible across audiences.*
561
+ -->
562
+
563
+ <!--
564
+ ## Model Card Authors
565
+
566
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
567
+ -->
568
+
569
+ <!--
570
+ ## Model Card Contact
571
+
572
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
573
+ -->
config.json ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ModernBertModel"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 50281,
8
+ "classifier_activation": "gelu",
9
+ "classifier_bias": false,
10
+ "classifier_dropout": 0.0,
11
+ "classifier_pooling": "mean",
12
+ "cls_token_id": 50281,
13
+ "decoder_bias": true,
14
+ "deterministic_flash_attn": false,
15
+ "dtype": "float32",
16
+ "embedding_dropout": 0.0,
17
+ "eos_token_id": 50282,
18
+ "global_attn_every_n_layers": 3,
19
+ "gradient_checkpointing": false,
20
+ "hidden_activation": "gelu",
21
+ "hidden_size": 768,
22
+ "initializer_cutoff_factor": 2.0,
23
+ "initializer_range": 0.02,
24
+ "intermediate_size": 1152,
25
+ "layer_norm_eps": 1e-05,
26
+ "layer_types": [
27
+ "full_attention",
28
+ "sliding_attention",
29
+ "sliding_attention",
30
+ "full_attention",
31
+ "sliding_attention",
32
+ "sliding_attention",
33
+ "full_attention",
34
+ "sliding_attention",
35
+ "sliding_attention",
36
+ "full_attention",
37
+ "sliding_attention",
38
+ "sliding_attention",
39
+ "full_attention",
40
+ "sliding_attention",
41
+ "sliding_attention",
42
+ "full_attention",
43
+ "sliding_attention",
44
+ "sliding_attention",
45
+ "full_attention",
46
+ "sliding_attention",
47
+ "sliding_attention",
48
+ "full_attention"
49
+ ],
50
+ "local_attention": 128,
51
+ "max_position_embeddings": 8192,
52
+ "mlp_bias": false,
53
+ "mlp_dropout": 0.0,
54
+ "model_type": "modernbert",
55
+ "norm_bias": false,
56
+ "norm_eps": 1e-05,
57
+ "num_attention_heads": 12,
58
+ "num_hidden_layers": 22,
59
+ "pad_token_id": 50283,
60
+ "position_embedding_type": "absolute",
61
+ "rope_parameters": {
62
+ "full_attention": {
63
+ "rope_theta": 160000.0,
64
+ "rope_type": "default"
65
+ },
66
+ "sliding_attention": {
67
+ "rope_theta": 10000.0,
68
+ "rope_type": "default"
69
+ }
70
+ },
71
+ "sep_token_id": 50282,
72
+ "sparse_pred_ignore_index": -100,
73
+ "sparse_prediction": false,
74
+ "tie_word_embeddings": true,
75
+ "transformers_version": "5.3.0",
76
+ "vocab_size": 50368
77
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "5.3.0",
4
+ "transformers": "5.3.0",
5
+ "pytorch": "2.5.1+cu121"
6
+ },
7
+ "prompts": {
8
+ "query": "",
9
+ "document": ""
10
+ },
11
+ "default_prompt_name": null,
12
+ "similarity_fn_name": "cosine",
13
+ "model_type": "SentenceTransformer"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ab181a45d9a9893f19a53d36d28c46c6bd1bfb4b0a2ba81335c6946b0d426b6a
3
+ size 596070136
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 8192,
3
+ "do_lower_case": false
4
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "backend": "tokenizers",
3
+ "clean_up_tokenization_spaces": true,
4
+ "cls_token": "[CLS]",
5
+ "is_local": false,
6
+ "mask_token": "[MASK]",
7
+ "model_input_names": [
8
+ "input_ids",
9
+ "attention_mask"
10
+ ],
11
+ "model_max_length": 8192,
12
+ "pad_token": "[PAD]",
13
+ "sep_token": "[SEP]",
14
+ "tokenizer_class": "TokenizersBackend",
15
+ "unk_token": "[UNK]"
16
+ }