alabenayed commited on
Commit
64d1035
·
verified ·
1 Parent(s): 4d65edc

delete checkpoint 1200

Browse files
checkpoint-1200/README.md DELETED
@@ -1,209 +0,0 @@
1
- ---
2
- base_model: CohereLabs/aya-expanse-8b
3
- library_name: peft
4
- pipeline_tag: text-generation
5
- tags:
6
- - base_model:adapter:CohereLabs/aya-expanse-8b
7
- - lora
8
- - sft
9
- - transformers
10
- - trl
11
- ---
12
-
13
- # Model Card for Model ID
14
-
15
- <!-- Provide a quick summary of what the model is/does. -->
16
-
17
-
18
-
19
- ## Model Details
20
-
21
- ### Model Description
22
-
23
- <!-- Provide a longer summary of what this model is. -->
24
-
25
-
26
-
27
- - **Developed by:** [More Information Needed]
28
- - **Funded by [optional]:** [More Information Needed]
29
- - **Shared by [optional]:** [More Information Needed]
30
- - **Model type:** [More Information Needed]
31
- - **Language(s) (NLP):** [More Information Needed]
32
- - **License:** [More Information Needed]
33
- - **Finetuned from model [optional]:** [More Information Needed]
34
-
35
- ### Model Sources [optional]
36
-
37
- <!-- Provide the basic links for the model. -->
38
-
39
- - **Repository:** [More Information Needed]
40
- - **Paper [optional]:** [More Information Needed]
41
- - **Demo [optional]:** [More Information Needed]
42
-
43
- ## Uses
44
-
45
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
46
-
47
- ### Direct Use
48
-
49
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
50
-
51
- [More Information Needed]
52
-
53
- ### Downstream Use [optional]
54
-
55
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
56
-
57
- [More Information Needed]
58
-
59
- ### Out-of-Scope Use
60
-
61
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
62
-
63
- [More Information Needed]
64
-
65
- ## Bias, Risks, and Limitations
66
-
67
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
68
-
69
- [More Information Needed]
70
-
71
- ### Recommendations
72
-
73
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
74
-
75
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
76
-
77
- ## How to Get Started with the Model
78
-
79
- Use the code below to get started with the model.
80
-
81
- [More Information Needed]
82
-
83
- ## Training Details
84
-
85
- ### Training Data
86
-
87
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
88
-
89
- [More Information Needed]
90
-
91
- ### Training Procedure
92
-
93
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
94
-
95
- #### Preprocessing [optional]
96
-
97
- [More Information Needed]
98
-
99
-
100
- #### Training Hyperparameters
101
-
102
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
103
-
104
- #### Speeds, Sizes, Times [optional]
105
-
106
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
107
-
108
- [More Information Needed]
109
-
110
- ## Evaluation
111
-
112
- <!-- This section describes the evaluation protocols and provides the results. -->
113
-
114
- ### Testing Data, Factors & Metrics
115
-
116
- #### Testing Data
117
-
118
- <!-- This should link to a Dataset Card if possible. -->
119
-
120
- [More Information Needed]
121
-
122
- #### Factors
123
-
124
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
125
-
126
- [More Information Needed]
127
-
128
- #### Metrics
129
-
130
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
131
-
132
- [More Information Needed]
133
-
134
- ### Results
135
-
136
- [More Information Needed]
137
-
138
- #### Summary
139
-
140
-
141
-
142
- ## Model Examination [optional]
143
-
144
- <!-- Relevant interpretability work for the model goes here -->
145
-
146
- [More Information Needed]
147
-
148
- ## Environmental Impact
149
-
150
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
151
-
152
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
153
-
154
- - **Hardware Type:** [More Information Needed]
155
- - **Hours used:** [More Information Needed]
156
- - **Cloud Provider:** [More Information Needed]
157
- - **Compute Region:** [More Information Needed]
158
- - **Carbon Emitted:** [More Information Needed]
159
-
160
- ## Technical Specifications [optional]
161
-
162
- ### Model Architecture and Objective
163
-
164
- [More Information Needed]
165
-
166
- ### Compute Infrastructure
167
-
168
- [More Information Needed]
169
-
170
- #### Hardware
171
-
172
- [More Information Needed]
173
-
174
- #### Software
175
-
176
- [More Information Needed]
177
-
178
- ## Citation [optional]
179
-
180
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
181
-
182
- **BibTeX:**
183
-
184
- [More Information Needed]
185
-
186
- **APA:**
187
-
188
- [More Information Needed]
189
-
190
- ## Glossary [optional]
191
-
192
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
193
-
194
- [More Information Needed]
195
-
196
- ## More Information [optional]
197
-
198
- [More Information Needed]
199
-
200
- ## Model Card Authors [optional]
201
-
202
- [More Information Needed]
203
-
204
- ## Model Card Contact
205
-
206
- [More Information Needed]
207
- ### Framework versions
208
-
209
- - PEFT 0.19.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-1200/adapter_config.json DELETED
@@ -1,48 +0,0 @@
1
- {
2
- "alora_invocation_tokens": null,
3
- "alpha_pattern": {},
4
- "arrow_config": null,
5
- "auto_mapping": null,
6
- "base_model_name_or_path": "CohereLabs/aya-expanse-8b",
7
- "bias": "none",
8
- "corda_config": null,
9
- "ensure_weight_tying": false,
10
- "eva_config": null,
11
- "exclude_modules": null,
12
- "fan_in_fan_out": false,
13
- "inference_mode": true,
14
- "init_lora_weights": true,
15
- "layer_replication": null,
16
- "layers_pattern": null,
17
- "layers_to_transform": null,
18
- "loftq_config": {},
19
- "lora_alpha": 32,
20
- "lora_bias": false,
21
- "lora_dropout": 0.05,
22
- "lora_ga_config": null,
23
- "megatron_config": null,
24
- "megatron_core": "megatron.core",
25
- "modules_to_save": null,
26
- "peft_type": "LORA",
27
- "peft_version": "0.19.1",
28
- "qalora_group_size": 16,
29
- "r": 16,
30
- "rank_pattern": {},
31
- "revision": null,
32
- "target_modules": [
33
- "down_proj",
34
- "k_proj",
35
- "v_proj",
36
- "up_proj",
37
- "o_proj",
38
- "gate_proj",
39
- "q_proj"
40
- ],
41
- "target_parameters": null,
42
- "task_type": "CAUSAL_LM",
43
- "trainable_token_indices": null,
44
- "use_bdlora": null,
45
- "use_dora": false,
46
- "use_qalora": false,
47
- "use_rslora": false
48
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-1200/adapter_model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:cb7eaf027740180adfed8d5402cc48be771c0592e319a6e09a2abfaeca8af673
3
- size 167832240
 
 
 
 
checkpoint-1200/chat_template.jinja DELETED
@@ -1 +0,0 @@
1
- {{ bos_token }}{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% elif false == true %}{% set loop_messages = messages %}{% set system_message = 'You are Aya, a brilliant, sophisticated, multilingual AI-assistant trained to assist human users by providing thorough responses. You are able to interact and respond to questions in 23 languages and you are powered by a multilingual model built by Cohere For AI.' %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% if system_message != false %}{{ '<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>' + system_message + '<|END_OF_TURN_TOKEN|>' }}{% endif %}{% for message in loop_messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% set content = message['content'] %}{% if message['role'] == 'user' %}{{ '<|START_OF_TURN_TOKEN|><|USER_TOKEN|>' + content.strip() + '<|END_OF_TURN_TOKEN|>' }}{% elif message['role'] == 'assistant' %}{{ '<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>' + content.strip() + '<|END_OF_TURN_TOKEN|>' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>' }}{% endif %}
 
 
checkpoint-1200/optimizer.pt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:9869c146737b2fc572bf90cca56c1697967b4e5e3f7aada26fee4f635a35f656
3
- size 335929123
 
 
 
 
checkpoint-1200/rng_state.pth DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:188b4c9f719da6122f7058d36595d3a0727129108423ad6928ba083cfc977073
3
- size 14645
 
 
 
 
checkpoint-1200/scheduler.pt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:b68b5be1196467245d97fdf07264667e377abf37f1c86865e3ac71fc66f675b3
3
- size 1465
 
 
 
 
checkpoint-1200/special_tokens_map.json DELETED
@@ -1,17 +0,0 @@
1
- {
2
- "bos_token": {
3
- "content": "<BOS_TOKEN>",
4
- "lstrip": false,
5
- "normalized": false,
6
- "rstrip": false,
7
- "single_word": false
8
- },
9
- "eos_token": {
10
- "content": "<|END_OF_TURN_TOKEN|>",
11
- "lstrip": false,
12
- "normalized": false,
13
- "rstrip": false,
14
- "single_word": false
15
- },
16
- "pad_token": "<PAD>"
17
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-1200/tokenizer.json DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:345ccf04a5257f473e331715ecc69365c5ac8fc2490923fe7155560af809ec1a
3
- size 20124090
 
 
 
 
checkpoint-1200/tokenizer_config.json DELETED
@@ -1,317 +0,0 @@
1
- {
2
- "add_bos_token": true,
3
- "add_eos_token": false,
4
- "add_prefix_space": false,
5
- "added_tokens_decoder": {
6
- "0": {
7
- "content": "<PAD>",
8
- "lstrip": false,
9
- "normalized": false,
10
- "rstrip": false,
11
- "single_word": false,
12
- "special": true
13
- },
14
- "1": {
15
- "content": "<UNK>",
16
- "lstrip": false,
17
- "normalized": false,
18
- "rstrip": false,
19
- "single_word": false,
20
- "special": true
21
- },
22
- "2": {
23
- "content": "<CLS>",
24
- "lstrip": false,
25
- "normalized": false,
26
- "rstrip": false,
27
- "single_word": false,
28
- "special": true
29
- },
30
- "3": {
31
- "content": "<SEP>",
32
- "lstrip": false,
33
- "normalized": false,
34
- "rstrip": false,
35
- "single_word": false,
36
- "special": true
37
- },
38
- "4": {
39
- "content": "<MASK_TOKEN>",
40
- "lstrip": false,
41
- "normalized": false,
42
- "rstrip": false,
43
- "single_word": false,
44
- "special": true
45
- },
46
- "5": {
47
- "content": "<BOS_TOKEN>",
48
- "lstrip": false,
49
- "normalized": false,
50
- "rstrip": false,
51
- "single_word": false,
52
- "special": true
53
- },
54
- "6": {
55
- "content": "<EOS_TOKEN>",
56
- "lstrip": false,
57
- "normalized": false,
58
- "rstrip": false,
59
- "single_word": false,
60
- "special": true
61
- },
62
- "7": {
63
- "content": "<EOP_TOKEN>",
64
- "lstrip": false,
65
- "normalized": false,
66
- "rstrip": false,
67
- "single_word": false,
68
- "special": true
69
- },
70
- "255000": {
71
- "content": "<|START_OF_TURN_TOKEN|>",
72
- "lstrip": false,
73
- "normalized": false,
74
- "rstrip": false,
75
- "single_word": false,
76
- "special": false
77
- },
78
- "255001": {
79
- "content": "<|END_OF_TURN_TOKEN|>",
80
- "lstrip": false,
81
- "normalized": false,
82
- "rstrip": false,
83
- "single_word": false,
84
- "special": true
85
- },
86
- "255002": {
87
- "content": "<|YES_TOKEN|>",
88
- "lstrip": false,
89
- "normalized": false,
90
- "rstrip": false,
91
- "single_word": false,
92
- "special": false
93
- },
94
- "255003": {
95
- "content": "<|NO_TOKEN|>",
96
- "lstrip": false,
97
- "normalized": false,
98
- "rstrip": false,
99
- "single_word": false,
100
- "special": false
101
- },
102
- "255004": {
103
- "content": "<|GOOD_TOKEN|>",
104
- "lstrip": false,
105
- "normalized": false,
106
- "rstrip": false,
107
- "single_word": false,
108
- "special": false
109
- },
110
- "255005": {
111
- "content": "<|BAD_TOKEN|>",
112
- "lstrip": false,
113
- "normalized": false,
114
- "rstrip": false,
115
- "single_word": false,
116
- "special": false
117
- },
118
- "255006": {
119
- "content": "<|USER_TOKEN|>",
120
- "lstrip": false,
121
- "normalized": false,
122
- "rstrip": false,
123
- "single_word": false,
124
- "special": false
125
- },
126
- "255007": {
127
- "content": "<|CHATBOT_TOKEN|>",
128
- "lstrip": false,
129
- "normalized": false,
130
- "rstrip": false,
131
- "single_word": false,
132
- "special": false
133
- },
134
- "255008": {
135
- "content": "<|SYSTEM_TOKEN|>",
136
- "lstrip": false,
137
- "normalized": false,
138
- "rstrip": false,
139
- "single_word": false,
140
- "special": false
141
- },
142
- "255009": {
143
- "content": "<|USER_0_TOKEN|>",
144
- "lstrip": false,
145
- "normalized": false,
146
- "rstrip": false,
147
- "single_word": false,
148
- "special": false
149
- },
150
- "255010": {
151
- "content": "<|USER_1_TOKEN|>",
152
- "lstrip": false,
153
- "normalized": false,
154
- "rstrip": false,
155
- "single_word": false,
156
- "special": false
157
- },
158
- "255011": {
159
- "content": "<|USER_2_TOKEN|>",
160
- "lstrip": false,
161
- "normalized": false,
162
- "rstrip": false,
163
- "single_word": false,
164
- "special": false
165
- },
166
- "255012": {
167
- "content": "<|USER_3_TOKEN|>",
168
- "lstrip": false,
169
- "normalized": false,
170
- "rstrip": false,
171
- "single_word": false,
172
- "special": false
173
- },
174
- "255013": {
175
- "content": "<|USER_4_TOKEN|>",
176
- "lstrip": false,
177
- "normalized": false,
178
- "rstrip": false,
179
- "single_word": false,
180
- "special": false
181
- },
182
- "255014": {
183
- "content": "<|USER_5_TOKEN|>",
184
- "lstrip": false,
185
- "normalized": false,
186
- "rstrip": false,
187
- "single_word": false,
188
- "special": false
189
- },
190
- "255015": {
191
- "content": "<|USER_6_TOKEN|>",
192
- "lstrip": false,
193
- "normalized": false,
194
- "rstrip": false,
195
- "single_word": false,
196
- "special": false
197
- },
198
- "255016": {
199
- "content": "<|USER_7_TOKEN|>",
200
- "lstrip": false,
201
- "normalized": false,
202
- "rstrip": false,
203
- "single_word": false,
204
- "special": false
205
- },
206
- "255017": {
207
- "content": "<|USER_8_TOKEN|>",
208
- "lstrip": false,
209
- "normalized": false,
210
- "rstrip": false,
211
- "single_word": false,
212
- "special": false
213
- },
214
- "255018": {
215
- "content": "<|USER_9_TOKEN|>",
216
- "lstrip": false,
217
- "normalized": false,
218
- "rstrip": false,
219
- "single_word": false,
220
- "special": false
221
- },
222
- "255019": {
223
- "content": "<|EXTRA_0_TOKEN|>",
224
- "lstrip": false,
225
- "normalized": false,
226
- "rstrip": false,
227
- "single_word": false,
228
- "special": false
229
- },
230
- "255020": {
231
- "content": "<|EXTRA_1_TOKEN|>",
232
- "lstrip": false,
233
- "normalized": false,
234
- "rstrip": false,
235
- "single_word": false,
236
- "special": false
237
- },
238
- "255021": {
239
- "content": "<|EXTRA_2_TOKEN|>",
240
- "lstrip": false,
241
- "normalized": false,
242
- "rstrip": false,
243
- "single_word": false,
244
- "special": false
245
- },
246
- "255022": {
247
- "content": "<|EXTRA_3_TOKEN|>",
248
- "lstrip": false,
249
- "normalized": false,
250
- "rstrip": false,
251
- "single_word": false,
252
- "special": false
253
- },
254
- "255023": {
255
- "content": "<|EXTRA_4_TOKEN|>",
256
- "lstrip": false,
257
- "normalized": false,
258
- "rstrip": false,
259
- "single_word": false,
260
- "special": false
261
- },
262
- "255024": {
263
- "content": "<|EXTRA_5_TOKEN|>",
264
- "lstrip": false,
265
- "normalized": false,
266
- "rstrip": false,
267
- "single_word": false,
268
- "special": false
269
- },
270
- "255025": {
271
- "content": "<|EXTRA_6_TOKEN|>",
272
- "lstrip": false,
273
- "normalized": false,
274
- "rstrip": false,
275
- "single_word": false,
276
- "special": false
277
- },
278
- "255026": {
279
- "content": "<|EXTRA_7_TOKEN|>",
280
- "lstrip": false,
281
- "normalized": false,
282
- "rstrip": false,
283
- "single_word": false,
284
- "special": false
285
- },
286
- "255027": {
287
- "content": "<|EXTRA_8_TOKEN|>",
288
- "lstrip": false,
289
- "normalized": false,
290
- "rstrip": false,
291
- "single_word": false,
292
- "special": false
293
- },
294
- "255028": {
295
- "content": "<|EXTRA_9_TOKEN|>",
296
- "lstrip": false,
297
- "normalized": false,
298
- "rstrip": false,
299
- "single_word": false,
300
- "special": false
301
- }
302
- },
303
- "bos_token": "<BOS_TOKEN>",
304
- "clean_up_tokenization_spaces": false,
305
- "eos_token": "<|END_OF_TURN_TOKEN|>",
306
- "extra_special_tokens": {},
307
- "legacy": true,
308
- "merges_file": null,
309
- "model_max_length": 1000000000000000019884624838656,
310
- "pad_token": "<PAD>",
311
- "sp_model_kwargs": {},
312
- "spaces_between_special_tokens": false,
313
- "tokenizer_class": "CohereTokenizer",
314
- "unk_token": null,
315
- "use_default_system_prompt": false,
316
- "vocab_file": null
317
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-1200/trainer_state.json DELETED
@@ -1,1234 +0,0 @@
1
- {
2
- "best_global_step": null,
3
- "best_metric": null,
4
- "best_model_checkpoint": null,
5
- "epoch": 1.5153141774550047,
6
- "eval_steps": 200,
7
- "global_step": 1200,
8
- "is_hyper_param_search": false,
9
- "is_local_process_zero": true,
10
- "is_world_process_zero": true,
11
- "log_history": [
12
- {
13
- "entropy": 2.4575500309467317,
14
- "epoch": 0.012630249447426587,
15
- "grad_norm": 4.916348934173584,
16
- "learning_rate": 1.8750000000000003e-06,
17
- "loss": 3.6598,
18
- "mean_token_accuracy": 0.4153611570596695,
19
- "num_tokens": 59642.0,
20
- "step": 10
21
- },
22
- {
23
- "entropy": 2.4072387635707857,
24
- "epoch": 0.025260498894853173,
25
- "grad_norm": 3.8026137351989746,
26
- "learning_rate": 3.958333333333333e-06,
27
- "loss": 3.3603,
28
- "mean_token_accuracy": 0.4350100517272949,
29
- "num_tokens": 119219.0,
30
- "step": 20
31
- },
32
- {
33
- "entropy": 2.3899864494800567,
34
- "epoch": 0.03789074834227976,
35
- "grad_norm": 3.7880399227142334,
36
- "learning_rate": 6.041666666666667e-06,
37
- "loss": 2.9434,
38
- "mean_token_accuracy": 0.4788561977446079,
39
- "num_tokens": 179590.0,
40
- "step": 30
41
- },
42
- {
43
- "entropy": 2.1122478008270265,
44
- "epoch": 0.05052099778970635,
45
- "grad_norm": 3.0592074394226074,
46
- "learning_rate": 8.125000000000001e-06,
47
- "loss": 2.3919,
48
- "mean_token_accuracy": 0.574567300081253,
49
- "num_tokens": 238845.0,
50
- "step": 40
51
- },
52
- {
53
- "entropy": 1.7037649989128112,
54
- "epoch": 0.06315124723713293,
55
- "grad_norm": 1.5836262702941895,
56
- "learning_rate": 9.993489583333334e-06,
57
- "loss": 1.912,
58
- "mean_token_accuracy": 0.6467478528618813,
59
- "num_tokens": 298317.0,
60
- "step": 50
61
- },
62
- {
63
- "entropy": 1.5623225390911102,
64
- "epoch": 0.07578149668455952,
65
- "grad_norm": 1.217679738998413,
66
- "learning_rate": 9.928385416666668e-06,
67
- "loss": 1.6762,
68
- "mean_token_accuracy": 0.679128734767437,
69
- "num_tokens": 357858.0,
70
- "step": 60
71
- },
72
- {
73
- "entropy": 1.5071247130632401,
74
- "epoch": 0.0884117461319861,
75
- "grad_norm": 0.973615288734436,
76
- "learning_rate": 9.863281250000001e-06,
77
- "loss": 1.5372,
78
- "mean_token_accuracy": 0.6943170607089997,
79
- "num_tokens": 418834.0,
80
- "step": 70
81
- },
82
- {
83
- "entropy": 1.4568549275398255,
84
- "epoch": 0.1010419955794127,
85
- "grad_norm": 0.9853116869926453,
86
- "learning_rate": 9.798177083333335e-06,
87
- "loss": 1.4751,
88
- "mean_token_accuracy": 0.7024633795022964,
89
- "num_tokens": 478960.0,
90
- "step": 80
91
- },
92
- {
93
- "entropy": 1.4889154583215714,
94
- "epoch": 0.11367224502683929,
95
- "grad_norm": 0.9147132039070129,
96
- "learning_rate": 9.733072916666667e-06,
97
- "loss": 1.474,
98
- "mean_token_accuracy": 0.6996816232800483,
99
- "num_tokens": 541795.0,
100
- "step": 90
101
- },
102
- {
103
- "entropy": 1.4158774405717849,
104
- "epoch": 0.12630249447426586,
105
- "grad_norm": 0.9684887528419495,
106
- "learning_rate": 9.66796875e-06,
107
- "loss": 1.3805,
108
- "mean_token_accuracy": 0.7165829420089722,
109
- "num_tokens": 601174.0,
110
- "step": 100
111
- },
112
- {
113
- "entropy": 1.4276181221008302,
114
- "epoch": 0.13893274392169244,
115
- "grad_norm": 0.9440239667892456,
116
- "learning_rate": 9.602864583333335e-06,
117
- "loss": 1.3718,
118
- "mean_token_accuracy": 0.7143253713846207,
119
- "num_tokens": 661048.0,
120
- "step": 110
121
- },
122
- {
123
- "entropy": 1.4359370201826096,
124
- "epoch": 0.15156299336911905,
125
- "grad_norm": 0.8779081702232361,
126
- "learning_rate": 9.537760416666667e-06,
127
- "loss": 1.3661,
128
- "mean_token_accuracy": 0.7162409156560898,
129
- "num_tokens": 722298.0,
130
- "step": 120
131
- },
132
- {
133
- "entropy": 1.3943599790334702,
134
- "epoch": 0.16419324281654563,
135
- "grad_norm": 0.8999291062355042,
136
- "learning_rate": 9.47265625e-06,
137
- "loss": 1.3193,
138
- "mean_token_accuracy": 0.7252198755741119,
139
- "num_tokens": 782683.0,
140
- "step": 130
141
- },
142
- {
143
- "entropy": 1.3758090347051621,
144
- "epoch": 0.1768234922639722,
145
- "grad_norm": 0.8218080997467041,
146
- "learning_rate": 9.407552083333334e-06,
147
- "loss": 1.3054,
148
- "mean_token_accuracy": 0.7277572214603424,
149
- "num_tokens": 842988.0,
150
- "step": 140
151
- },
152
- {
153
- "entropy": 1.381770172715187,
154
- "epoch": 0.1894537417113988,
155
- "grad_norm": 0.8062577843666077,
156
- "learning_rate": 9.342447916666668e-06,
157
- "loss": 1.3291,
158
- "mean_token_accuracy": 0.7222751513123512,
159
- "num_tokens": 903912.0,
160
- "step": 150
161
- },
162
- {
163
- "entropy": 1.352141672372818,
164
- "epoch": 0.2020839911588254,
165
- "grad_norm": 0.8221862316131592,
166
- "learning_rate": 9.277343750000001e-06,
167
- "loss": 1.2974,
168
- "mean_token_accuracy": 0.7260218441486359,
169
- "num_tokens": 964887.0,
170
- "step": 160
171
- },
172
- {
173
- "entropy": 1.346352329850197,
174
- "epoch": 0.21471424060625197,
175
- "grad_norm": 0.7375346422195435,
176
- "learning_rate": 9.212239583333335e-06,
177
- "loss": 1.2969,
178
- "mean_token_accuracy": 0.7252495244145394,
179
- "num_tokens": 1026887.0,
180
- "step": 170
181
- },
182
- {
183
- "entropy": 1.3165962457656861,
184
- "epoch": 0.22734449005367857,
185
- "grad_norm": 0.7950690388679504,
186
- "learning_rate": 9.147135416666667e-06,
187
- "loss": 1.2824,
188
- "mean_token_accuracy": 0.7250601649284363,
189
- "num_tokens": 1086995.0,
190
- "step": 180
191
- },
192
- {
193
- "entropy": 1.3047442227602004,
194
- "epoch": 0.23997473950110515,
195
- "grad_norm": 0.7147737145423889,
196
- "learning_rate": 9.082031250000001e-06,
197
- "loss": 1.2628,
198
- "mean_token_accuracy": 0.7318986386060715,
199
- "num_tokens": 1147209.0,
200
- "step": 190
201
- },
202
- {
203
- "entropy": 1.2989415228366852,
204
- "epoch": 0.25260498894853173,
205
- "grad_norm": 0.756094753742218,
206
- "learning_rate": 9.016927083333335e-06,
207
- "loss": 1.2484,
208
- "mean_token_accuracy": 0.7319697335362434,
209
- "num_tokens": 1207602.0,
210
- "step": 200
211
- },
212
- {
213
- "entropy": 1.2904020875692368,
214
- "epoch": 0.2652352383959583,
215
- "grad_norm": 0.7715655565261841,
216
- "learning_rate": 8.951822916666667e-06,
217
- "loss": 1.2447,
218
- "mean_token_accuracy": 0.7349080622196198,
219
- "num_tokens": 1267500.0,
220
- "step": 210
221
- },
222
- {
223
- "entropy": 1.2543610483407974,
224
- "epoch": 0.2778654878433849,
225
- "grad_norm": 0.6824166774749756,
226
- "learning_rate": 8.88671875e-06,
227
- "loss": 1.2111,
228
- "mean_token_accuracy": 0.7386362582445145,
229
- "num_tokens": 1327666.0,
230
- "step": 220
231
- },
232
- {
233
- "entropy": 1.2946221768856048,
234
- "epoch": 0.2904957372908115,
235
- "grad_norm": 0.6559598445892334,
236
- "learning_rate": 8.821614583333334e-06,
237
- "loss": 1.2574,
238
- "mean_token_accuracy": 0.7287471711635589,
239
- "num_tokens": 1389712.0,
240
- "step": 230
241
- },
242
- {
243
- "entropy": 1.2489944666624069,
244
- "epoch": 0.3031259867382381,
245
- "grad_norm": 0.7000382542610168,
246
- "learning_rate": 8.756510416666666e-06,
247
- "loss": 1.2092,
248
- "mean_token_accuracy": 0.7372458636760711,
249
- "num_tokens": 1448670.0,
250
- "step": 240
251
- },
252
- {
253
- "entropy": 1.2534994542598725,
254
- "epoch": 0.3157562361856647,
255
- "grad_norm": 0.6579836010932922,
256
- "learning_rate": 8.69140625e-06,
257
- "loss": 1.2132,
258
- "mean_token_accuracy": 0.7380462676286698,
259
- "num_tokens": 1508428.0,
260
- "step": 250
261
- },
262
- {
263
- "entropy": 1.2474523901939392,
264
- "epoch": 0.32838648563309125,
265
- "grad_norm": 0.6546089053153992,
266
- "learning_rate": 8.626302083333334e-06,
267
- "loss": 1.2103,
268
- "mean_token_accuracy": 0.7395781621336937,
269
- "num_tokens": 1568018.0,
270
- "step": 260
271
- },
272
- {
273
- "entropy": 1.2445458561182021,
274
- "epoch": 0.34101673508051783,
275
- "grad_norm": 0.6377413868904114,
276
- "learning_rate": 8.561197916666667e-06,
277
- "loss": 1.2007,
278
- "mean_token_accuracy": 0.7419240340590477,
279
- "num_tokens": 1627904.0,
280
- "step": 270
281
- },
282
- {
283
- "entropy": 1.279063493013382,
284
- "epoch": 0.3536469845279444,
285
- "grad_norm": 0.6460844278335571,
286
- "learning_rate": 8.496093750000001e-06,
287
- "loss": 1.2497,
288
- "mean_token_accuracy": 0.729638360440731,
289
- "num_tokens": 1689637.0,
290
- "step": 280
291
- },
292
- {
293
- "entropy": 1.2362476408481597,
294
- "epoch": 0.366277233975371,
295
- "grad_norm": 0.6648440361022949,
296
- "learning_rate": 8.430989583333335e-06,
297
- "loss": 1.2091,
298
- "mean_token_accuracy": 0.7385585099458695,
299
- "num_tokens": 1749861.0,
300
- "step": 290
301
- },
302
- {
303
- "entropy": 1.2533661901950837,
304
- "epoch": 0.3789074834227976,
305
- "grad_norm": 0.6637682318687439,
306
- "learning_rate": 8.365885416666667e-06,
307
- "loss": 1.2163,
308
- "mean_token_accuracy": 0.7371826618909836,
309
- "num_tokens": 1810407.0,
310
- "step": 300
311
- },
312
- {
313
- "entropy": 1.2383619010448457,
314
- "epoch": 0.3915377328702242,
315
- "grad_norm": 0.660043478012085,
316
- "learning_rate": 8.30078125e-06,
317
- "loss": 1.2026,
318
- "mean_token_accuracy": 0.7364327058196067,
319
- "num_tokens": 1871544.0,
320
- "step": 310
321
- },
322
- {
323
- "entropy": 1.2316229462623596,
324
- "epoch": 0.4041679823176508,
325
- "grad_norm": 0.6285788416862488,
326
- "learning_rate": 8.235677083333334e-06,
327
- "loss": 1.2064,
328
- "mean_token_accuracy": 0.7371214032173157,
329
- "num_tokens": 1932125.0,
330
- "step": 320
331
- },
332
- {
333
- "entropy": 1.2459111303091048,
334
- "epoch": 0.41679823176507735,
335
- "grad_norm": 0.6204569339752197,
336
- "learning_rate": 8.170572916666666e-06,
337
- "loss": 1.1997,
338
- "mean_token_accuracy": 0.7365512102842331,
339
- "num_tokens": 1993924.0,
340
- "step": 330
341
- },
342
- {
343
- "entropy": 1.2156363114714623,
344
- "epoch": 0.42942848121250393,
345
- "grad_norm": 0.6501284241676331,
346
- "learning_rate": 8.10546875e-06,
347
- "loss": 1.1863,
348
- "mean_token_accuracy": 0.741255110502243,
349
- "num_tokens": 2054496.0,
350
- "step": 340
351
- },
352
- {
353
- "entropy": 1.2222040683031081,
354
- "epoch": 0.4420587306599305,
355
- "grad_norm": 0.602418065071106,
356
- "learning_rate": 8.040364583333334e-06,
357
- "loss": 1.1913,
358
- "mean_token_accuracy": 0.739654652774334,
359
- "num_tokens": 2114825.0,
360
- "step": 350
361
- },
362
- {
363
- "entropy": 1.2437947690486908,
364
- "epoch": 0.45468898010735714,
365
- "grad_norm": 0.6289706230163574,
366
- "learning_rate": 7.975260416666668e-06,
367
- "loss": 1.2142,
368
- "mean_token_accuracy": 0.7374308854341507,
369
- "num_tokens": 2176058.0,
370
- "step": 360
371
- },
372
- {
373
- "entropy": 1.2139764934778214,
374
- "epoch": 0.4673192295547837,
375
- "grad_norm": 0.6439516544342041,
376
- "learning_rate": 7.910156250000001e-06,
377
- "loss": 1.1769,
378
- "mean_token_accuracy": 0.7426491379737854,
379
- "num_tokens": 2236783.0,
380
- "step": 370
381
- },
382
- {
383
- "entropy": 1.19720456302166,
384
- "epoch": 0.4799494790022103,
385
- "grad_norm": 0.6499606966972351,
386
- "learning_rate": 7.845052083333335e-06,
387
- "loss": 1.1829,
388
- "mean_token_accuracy": 0.7399616882205009,
389
- "num_tokens": 2298432.0,
390
- "step": 380
391
- },
392
- {
393
- "entropy": 1.205560651421547,
394
- "epoch": 0.4925797284496369,
395
- "grad_norm": 0.6545577645301819,
396
- "learning_rate": 7.779947916666667e-06,
397
- "loss": 1.1577,
398
- "mean_token_accuracy": 0.7463845536112785,
399
- "num_tokens": 2357808.0,
400
- "step": 390
401
- },
402
- {
403
- "entropy": 1.19621299803257,
404
- "epoch": 0.5052099778970635,
405
- "grad_norm": 0.6930111050605774,
406
- "learning_rate": 7.71484375e-06,
407
- "loss": 1.1583,
408
- "mean_token_accuracy": 0.7453805327415466,
409
- "num_tokens": 2417574.0,
410
- "step": 400
411
- },
412
- {
413
- "entropy": 1.1963690370321274,
414
- "epoch": 0.5178402273444901,
415
- "grad_norm": 0.648593544960022,
416
- "learning_rate": 7.649739583333334e-06,
417
- "loss": 1.1723,
418
- "mean_token_accuracy": 0.7415376961231231,
419
- "num_tokens": 2478088.0,
420
- "step": 410
421
- },
422
- {
423
- "entropy": 1.216522666811943,
424
- "epoch": 0.5304704767919166,
425
- "grad_norm": 0.6348926424980164,
426
- "learning_rate": 7.5846354166666665e-06,
427
- "loss": 1.1701,
428
- "mean_token_accuracy": 0.7432737082242966,
429
- "num_tokens": 2538612.0,
430
- "step": 420
431
- },
432
- {
433
- "entropy": 1.1990931153297424,
434
- "epoch": 0.5431007262393432,
435
- "grad_norm": 0.627249002456665,
436
- "learning_rate": 7.51953125e-06,
437
- "loss": 1.1688,
438
- "mean_token_accuracy": 0.7435364574193954,
439
- "num_tokens": 2599023.0,
440
- "step": 430
441
- },
442
- {
443
- "entropy": 1.1872963696718215,
444
- "epoch": 0.5557309756867698,
445
- "grad_norm": 0.6614134311676025,
446
- "learning_rate": 7.454427083333334e-06,
447
- "loss": 1.1622,
448
- "mean_token_accuracy": 0.7470521196722985,
449
- "num_tokens": 2658338.0,
450
- "step": 440
451
- },
452
- {
453
- "entropy": 1.215770760178566,
454
- "epoch": 0.5683612251341964,
455
- "grad_norm": 0.6228342652320862,
456
- "learning_rate": 7.389322916666667e-06,
457
- "loss": 1.1898,
458
- "mean_token_accuracy": 0.7409805700182914,
459
- "num_tokens": 2719316.0,
460
- "step": 450
461
- },
462
- {
463
- "entropy": 1.1998004853725432,
464
- "epoch": 0.580991474581623,
465
- "grad_norm": 0.6525698304176331,
466
- "learning_rate": 7.3242187500000006e-06,
467
- "loss": 1.167,
468
- "mean_token_accuracy": 0.7438512742519379,
469
- "num_tokens": 2780272.0,
470
- "step": 460
471
- },
472
- {
473
- "entropy": 1.1898580551147462,
474
- "epoch": 0.5936217240290496,
475
- "grad_norm": 0.6669884324073792,
476
- "learning_rate": 7.259114583333334e-06,
477
- "loss": 1.1669,
478
- "mean_token_accuracy": 0.7437147945165634,
479
- "num_tokens": 2840261.0,
480
- "step": 470
481
- },
482
- {
483
- "entropy": 1.21882204413414,
484
- "epoch": 0.6062519734764762,
485
- "grad_norm": 0.6129422783851624,
486
- "learning_rate": 7.194010416666667e-06,
487
- "loss": 1.177,
488
- "mean_token_accuracy": 0.7423913896083831,
489
- "num_tokens": 2901347.0,
490
- "step": 480
491
- },
492
- {
493
- "entropy": 1.163309469819069,
494
- "epoch": 0.6188822229239027,
495
- "grad_norm": 0.6334741115570068,
496
- "learning_rate": 7.128906250000001e-06,
497
- "loss": 1.1393,
498
- "mean_token_accuracy": 0.7511255607008934,
499
- "num_tokens": 2960518.0,
500
- "step": 490
501
- },
502
- {
503
- "entropy": 1.1994746267795562,
504
- "epoch": 0.6315124723713293,
505
- "grad_norm": 0.6261829733848572,
506
- "learning_rate": 7.063802083333335e-06,
507
- "loss": 1.1605,
508
- "mean_token_accuracy": 0.7433080047369003,
509
- "num_tokens": 3021957.0,
510
- "step": 500
511
- },
512
- {
513
- "entropy": 1.167793545126915,
514
- "epoch": 0.6441427218187559,
515
- "grad_norm": 0.5909908413887024,
516
- "learning_rate": 6.998697916666667e-06,
517
- "loss": 1.1468,
518
- "mean_token_accuracy": 0.7475745663046837,
519
- "num_tokens": 3083301.0,
520
- "step": 510
521
- },
522
- {
523
- "entropy": 1.1670663714408875,
524
- "epoch": 0.6567729712661825,
525
- "grad_norm": 0.6018249988555908,
526
- "learning_rate": 6.93359375e-06,
527
- "loss": 1.1425,
528
- "mean_token_accuracy": 0.7485125616192818,
529
- "num_tokens": 3143187.0,
530
- "step": 520
531
- },
532
- {
533
- "entropy": 1.1626142784953117,
534
- "epoch": 0.6694032207136091,
535
- "grad_norm": 0.6088816523551941,
536
- "learning_rate": 6.868489583333334e-06,
537
- "loss": 1.1297,
538
- "mean_token_accuracy": 0.7490727782249451,
539
- "num_tokens": 3202489.0,
540
- "step": 530
541
- },
542
- {
543
- "entropy": 1.1758243769407273,
544
- "epoch": 0.6820334701610357,
545
- "grad_norm": 0.6021592020988464,
546
- "learning_rate": 6.803385416666667e-06,
547
- "loss": 1.1656,
548
- "mean_token_accuracy": 0.7443674057722092,
549
- "num_tokens": 3263476.0,
550
- "step": 540
551
- },
552
- {
553
- "entropy": 1.179671287536621,
554
- "epoch": 0.6946637196084623,
555
- "grad_norm": 0.5955655574798584,
556
- "learning_rate": 6.738281250000001e-06,
557
- "loss": 1.1385,
558
- "mean_token_accuracy": 0.7481714516878128,
559
- "num_tokens": 3324008.0,
560
- "step": 550
561
- },
562
- {
563
- "entropy": 1.1886188358068466,
564
- "epoch": 0.7072939690558888,
565
- "grad_norm": 0.6246835589408875,
566
- "learning_rate": 6.6731770833333345e-06,
567
- "loss": 1.1607,
568
- "mean_token_accuracy": 0.7447509884834289,
569
- "num_tokens": 3383861.0,
570
- "step": 560
571
- },
572
- {
573
- "entropy": 1.1690475821495057,
574
- "epoch": 0.7199242185033154,
575
- "grad_norm": 0.606743335723877,
576
- "learning_rate": 6.6080729166666665e-06,
577
- "loss": 1.1298,
578
- "mean_token_accuracy": 0.7493681326508522,
579
- "num_tokens": 3443946.0,
580
- "step": 570
581
- },
582
- {
583
- "entropy": 1.1725697651505471,
584
- "epoch": 0.732554467950742,
585
- "grad_norm": 0.6846170425415039,
586
- "learning_rate": 6.54296875e-06,
587
- "loss": 1.1452,
588
- "mean_token_accuracy": 0.7482522815465927,
589
- "num_tokens": 3503787.0,
590
- "step": 580
591
- },
592
- {
593
- "entropy": 1.1713406786322593,
594
- "epoch": 0.7451847173981686,
595
- "grad_norm": 0.6522074341773987,
596
- "learning_rate": 6.477864583333334e-06,
597
- "loss": 1.1338,
598
- "mean_token_accuracy": 0.7498400524258614,
599
- "num_tokens": 3563403.0,
600
- "step": 590
601
- },
602
- {
603
- "entropy": 1.1848436295986176,
604
- "epoch": 0.7578149668455952,
605
- "grad_norm": 0.6417824625968933,
606
- "learning_rate": 6.412760416666667e-06,
607
- "loss": 1.1499,
608
- "mean_token_accuracy": 0.7452719643712044,
609
- "num_tokens": 3625007.0,
610
- "step": 600
611
- },
612
- {
613
- "entropy": 1.1822121858596801,
614
- "epoch": 0.7704452162930218,
615
- "grad_norm": 0.6329619884490967,
616
- "learning_rate": 6.3476562500000006e-06,
617
- "loss": 1.159,
618
- "mean_token_accuracy": 0.7452733591198921,
619
- "num_tokens": 3686099.0,
620
- "step": 610
621
- },
622
- {
623
- "entropy": 1.190292978286743,
624
- "epoch": 0.7830754657404484,
625
- "grad_norm": 0.6627410054206848,
626
- "learning_rate": 6.282552083333334e-06,
627
- "loss": 1.1558,
628
- "mean_token_accuracy": 0.7438480347394943,
629
- "num_tokens": 3747233.0,
630
- "step": 620
631
- },
632
- {
633
- "entropy": 1.1619529083371163,
634
- "epoch": 0.7957057151878749,
635
- "grad_norm": 0.5941329002380371,
636
- "learning_rate": 6.217447916666667e-06,
637
- "loss": 1.1377,
638
- "mean_token_accuracy": 0.7503219902515411,
639
- "num_tokens": 3807833.0,
640
- "step": 630
641
- },
642
- {
643
- "entropy": 1.1658748656511306,
644
- "epoch": 0.8083359646353016,
645
- "grad_norm": 0.6438832879066467,
646
- "learning_rate": 6.152343750000001e-06,
647
- "loss": 1.1397,
648
- "mean_token_accuracy": 0.7471553102135658,
649
- "num_tokens": 3868549.0,
650
- "step": 640
651
- },
652
- {
653
- "entropy": 1.1782082825899125,
654
- "epoch": 0.8209662140827282,
655
- "grad_norm": 0.6389635801315308,
656
- "learning_rate": 6.087239583333335e-06,
657
- "loss": 1.1434,
658
- "mean_token_accuracy": 0.7477709770202636,
659
- "num_tokens": 3929057.0,
660
- "step": 650
661
- },
662
- {
663
- "entropy": 1.1625961899757384,
664
- "epoch": 0.8335964635301547,
665
- "grad_norm": 0.6134201288223267,
666
- "learning_rate": 6.022135416666667e-06,
667
- "loss": 1.1352,
668
- "mean_token_accuracy": 0.748055274784565,
669
- "num_tokens": 3990676.0,
670
- "step": 660
671
- },
672
- {
673
- "entropy": 1.1510928481817246,
674
- "epoch": 0.8462267129775813,
675
- "grad_norm": 0.6336613893508911,
676
- "learning_rate": 5.95703125e-06,
677
- "loss": 1.1182,
678
- "mean_token_accuracy": 0.7524245917797089,
679
- "num_tokens": 4051046.0,
680
- "step": 670
681
- },
682
- {
683
- "entropy": 1.1498646020889283,
684
- "epoch": 0.8588569624250079,
685
- "grad_norm": 0.6758144497871399,
686
- "learning_rate": 5.891927083333334e-06,
687
- "loss": 1.1186,
688
- "mean_token_accuracy": 0.7507978692650795,
689
- "num_tokens": 4111084.0,
690
- "step": 680
691
- },
692
- {
693
- "entropy": 1.167962297797203,
694
- "epoch": 0.8714872118724345,
695
- "grad_norm": 0.6285990476608276,
696
- "learning_rate": 5.826822916666667e-06,
697
- "loss": 1.1395,
698
- "mean_token_accuracy": 0.7476246923208236,
699
- "num_tokens": 4172628.0,
700
- "step": 690
701
- },
702
- {
703
- "entropy": 1.1178194358944893,
704
- "epoch": 0.884117461319861,
705
- "grad_norm": 0.64762282371521,
706
- "learning_rate": 5.761718750000001e-06,
707
- "loss": 1.0919,
708
- "mean_token_accuracy": 0.7569874793291091,
709
- "num_tokens": 4231821.0,
710
- "step": 700
711
- },
712
- {
713
- "entropy": 1.1606462925672532,
714
- "epoch": 0.8967477107672877,
715
- "grad_norm": 0.6292758584022522,
716
- "learning_rate": 5.6966145833333344e-06,
717
- "loss": 1.1354,
718
- "mean_token_accuracy": 0.750880953669548,
719
- "num_tokens": 4292646.0,
720
- "step": 710
721
- },
722
- {
723
- "entropy": 1.1580617666244506,
724
- "epoch": 0.9093779602147143,
725
- "grad_norm": 0.6393706798553467,
726
- "learning_rate": 5.6315104166666665e-06,
727
- "loss": 1.1205,
728
- "mean_token_accuracy": 0.7499566927552224,
729
- "num_tokens": 4353199.0,
730
- "step": 720
731
- },
732
- {
733
- "entropy": 1.1515695974230766,
734
- "epoch": 0.9220082096621408,
735
- "grad_norm": 0.687380313873291,
736
- "learning_rate": 5.56640625e-06,
737
- "loss": 1.1138,
738
- "mean_token_accuracy": 0.7514134287834168,
739
- "num_tokens": 4414122.0,
740
- "step": 730
741
- },
742
- {
743
- "entropy": 1.1574165880680085,
744
- "epoch": 0.9346384591095674,
745
- "grad_norm": 0.6102684736251831,
746
- "learning_rate": 5.501302083333334e-06,
747
- "loss": 1.1302,
748
- "mean_token_accuracy": 0.7507740229368209,
749
- "num_tokens": 4474548.0,
750
- "step": 740
751
- },
752
- {
753
- "entropy": 1.1491190433502196,
754
- "epoch": 0.947268708556994,
755
- "grad_norm": 0.623504638671875,
756
- "learning_rate": 5.436197916666667e-06,
757
- "loss": 1.129,
758
- "mean_token_accuracy": 0.7512574091553688,
759
- "num_tokens": 4534678.0,
760
- "step": 750
761
- },
762
- {
763
- "entropy": 1.1538215219974517,
764
- "epoch": 0.9598989580044206,
765
- "grad_norm": 0.6368807554244995,
766
- "learning_rate": 5.3710937500000005e-06,
767
- "loss": 1.1181,
768
- "mean_token_accuracy": 0.7520082175731659,
769
- "num_tokens": 4594878.0,
770
- "step": 760
771
- },
772
- {
773
- "entropy": 1.1623035803437234,
774
- "epoch": 0.9725292074518471,
775
- "grad_norm": 0.6332852840423584,
776
- "learning_rate": 5.305989583333334e-06,
777
- "loss": 1.1308,
778
- "mean_token_accuracy": 0.7497873172163964,
779
- "num_tokens": 4656513.0,
780
- "step": 770
781
- },
782
- {
783
- "entropy": 1.1483627527952194,
784
- "epoch": 0.9851594568992738,
785
- "grad_norm": 0.6341389417648315,
786
- "learning_rate": 5.240885416666667e-06,
787
- "loss": 1.1142,
788
- "mean_token_accuracy": 0.7533516198396683,
789
- "num_tokens": 4717111.0,
790
- "step": 780
791
- },
792
- {
793
- "entropy": 1.1455359414219857,
794
- "epoch": 0.9977897063467004,
795
- "grad_norm": 0.6641396880149841,
796
- "learning_rate": 5.17578125e-06,
797
- "loss": 1.1117,
798
- "mean_token_accuracy": 0.7530950620770455,
799
- "num_tokens": 4777713.0,
800
- "step": 790
801
- },
802
- {
803
- "entropy": 1.148778918461922,
804
- "epoch": 1.0101041995579412,
805
- "grad_norm": 0.6454346776008606,
806
- "learning_rate": 5.110677083333334e-06,
807
- "loss": 1.1146,
808
- "mean_token_accuracy": 0.7511914097345792,
809
- "num_tokens": 4837103.0,
810
- "step": 800
811
- },
812
- {
813
- "entropy": 1.1441998034715652,
814
- "epoch": 1.0227344490053678,
815
- "grad_norm": 0.6368332505226135,
816
- "learning_rate": 5.045572916666667e-06,
817
- "loss": 1.1003,
818
- "mean_token_accuracy": 0.7535203993320465,
819
- "num_tokens": 4898715.0,
820
- "step": 810
821
- },
822
- {
823
- "entropy": 1.1195117503404617,
824
- "epoch": 1.0353646984527944,
825
- "grad_norm": 0.6546683311462402,
826
- "learning_rate": 4.98046875e-06,
827
- "loss": 1.0924,
828
- "mean_token_accuracy": 0.7574156150221825,
829
- "num_tokens": 4959681.0,
830
- "step": 820
831
- },
832
- {
833
- "entropy": 1.1403603315353394,
834
- "epoch": 1.047994947900221,
835
- "grad_norm": 0.6645976305007935,
836
- "learning_rate": 4.915364583333333e-06,
837
- "loss": 1.1031,
838
- "mean_token_accuracy": 0.7548869714140892,
839
- "num_tokens": 5020382.0,
840
- "step": 830
841
- },
842
- {
843
- "entropy": 1.1299657106399537,
844
- "epoch": 1.0606251973476477,
845
- "grad_norm": 0.6225126385688782,
846
- "learning_rate": 4.850260416666667e-06,
847
- "loss": 1.0915,
848
- "mean_token_accuracy": 0.7562400087714195,
849
- "num_tokens": 5080360.0,
850
- "step": 840
851
- },
852
- {
853
- "entropy": 1.12370226085186,
854
- "epoch": 1.0732554467950741,
855
- "grad_norm": 0.6478942036628723,
856
- "learning_rate": 4.785156250000001e-06,
857
- "loss": 1.1064,
858
- "mean_token_accuracy": 0.7542634457349777,
859
- "num_tokens": 5140349.0,
860
- "step": 850
861
- },
862
- {
863
- "entropy": 1.1469928681850434,
864
- "epoch": 1.0858856962425008,
865
- "grad_norm": 0.615678608417511,
866
- "learning_rate": 4.7200520833333336e-06,
867
- "loss": 1.1043,
868
- "mean_token_accuracy": 0.7529336720705032,
869
- "num_tokens": 5201690.0,
870
- "step": 860
871
- },
872
- {
873
- "entropy": 1.137891921401024,
874
- "epoch": 1.0985159456899274,
875
- "grad_norm": 0.6458525061607361,
876
- "learning_rate": 4.654947916666667e-06,
877
- "loss": 1.1081,
878
- "mean_token_accuracy": 0.7543051362037658,
879
- "num_tokens": 5261698.0,
880
- "step": 870
881
- },
882
- {
883
- "entropy": 1.1202880129218102,
884
- "epoch": 1.111146195137354,
885
- "grad_norm": 0.6362131237983704,
886
- "learning_rate": 4.58984375e-06,
887
- "loss": 1.0951,
888
- "mean_token_accuracy": 0.7552427321672439,
889
- "num_tokens": 5321775.0,
890
- "step": 880
891
- },
892
- {
893
- "entropy": 1.1365787714719773,
894
- "epoch": 1.1237764445847804,
895
- "grad_norm": 0.6511764526367188,
896
- "learning_rate": 4.524739583333334e-06,
897
- "loss": 1.0961,
898
- "mean_token_accuracy": 0.7562274217605591,
899
- "num_tokens": 5383140.0,
900
- "step": 890
901
- },
902
- {
903
- "entropy": 1.1074503496289254,
904
- "epoch": 1.136406694032207,
905
- "grad_norm": 0.6207822561264038,
906
- "learning_rate": 4.459635416666668e-06,
907
- "loss": 1.0848,
908
- "mean_token_accuracy": 0.7591574639081955,
909
- "num_tokens": 5443006.0,
910
- "step": 900
911
- },
912
- {
913
- "entropy": 1.1545074522495269,
914
- "epoch": 1.1490369434796337,
915
- "grad_norm": 0.6404831409454346,
916
- "learning_rate": 4.3945312500000005e-06,
917
- "loss": 1.1121,
918
- "mean_token_accuracy": 0.7507721096277237,
919
- "num_tokens": 5503942.0,
920
- "step": 910
921
- },
922
- {
923
- "entropy": 1.1401477769017219,
924
- "epoch": 1.1616671929270603,
925
- "grad_norm": 0.6468749046325684,
926
- "learning_rate": 4.329427083333333e-06,
927
- "loss": 1.1011,
928
- "mean_token_accuracy": 0.753543746471405,
929
- "num_tokens": 5564518.0,
930
- "step": 920
931
- },
932
- {
933
- "entropy": 1.0945423126220704,
934
- "epoch": 1.174297442374487,
935
- "grad_norm": 0.6418051719665527,
936
- "learning_rate": 4.264322916666667e-06,
937
- "loss": 1.0614,
938
- "mean_token_accuracy": 0.7643799662590027,
939
- "num_tokens": 5624109.0,
940
- "step": 930
941
- },
942
- {
943
- "entropy": 1.1136713281273842,
944
- "epoch": 1.1869276918219134,
945
- "grad_norm": 0.6422064304351807,
946
- "learning_rate": 4.19921875e-06,
947
- "loss": 1.0974,
948
- "mean_token_accuracy": 0.7561314895749092,
949
- "num_tokens": 5684801.0,
950
- "step": 940
951
- },
952
- {
953
- "entropy": 1.1215770334005355,
954
- "epoch": 1.19955794126934,
955
- "grad_norm": 0.6453995108604431,
956
- "learning_rate": 4.134114583333334e-06,
957
- "loss": 1.0801,
958
- "mean_token_accuracy": 0.7590720430016518,
959
- "num_tokens": 5745499.0,
960
- "step": 950
961
- },
962
- {
963
- "entropy": 1.1010483756661416,
964
- "epoch": 1.2121881907167666,
965
- "grad_norm": 0.61696857213974,
966
- "learning_rate": 4.0690104166666675e-06,
967
- "loss": 1.049,
968
- "mean_token_accuracy": 0.7627070844173431,
969
- "num_tokens": 5806117.0,
970
- "step": 960
971
- },
972
- {
973
- "entropy": 1.1082940384745599,
974
- "epoch": 1.2248184401641933,
975
- "grad_norm": 0.6523500680923462,
976
- "learning_rate": 4.00390625e-06,
977
- "loss": 1.0807,
978
- "mean_token_accuracy": 0.7579552844166756,
979
- "num_tokens": 5865537.0,
980
- "step": 970
981
- },
982
- {
983
- "entropy": 1.102595229446888,
984
- "epoch": 1.23744868961162,
985
- "grad_norm": 0.6376118063926697,
986
- "learning_rate": 3.938802083333333e-06,
987
- "loss": 1.0679,
988
- "mean_token_accuracy": 0.7592279806733131,
989
- "num_tokens": 5925254.0,
990
- "step": 980
991
- },
992
- {
993
- "entropy": 1.1277900233864784,
994
- "epoch": 1.2500789390590463,
995
- "grad_norm": 0.6571747660636902,
996
- "learning_rate": 3.873697916666667e-06,
997
- "loss": 1.0888,
998
- "mean_token_accuracy": 0.7549166217446327,
999
- "num_tokens": 5986084.0,
1000
- "step": 990
1001
- },
1002
- {
1003
- "entropy": 1.113915103673935,
1004
- "epoch": 1.262709188506473,
1005
- "grad_norm": 0.6531611084938049,
1006
- "learning_rate": 3.8085937500000002e-06,
1007
- "loss": 1.0718,
1008
- "mean_token_accuracy": 0.7577856734395028,
1009
- "num_tokens": 6046857.0,
1010
- "step": 1000
1011
- },
1012
- {
1013
- "entropy": 1.0966202467679977,
1014
- "epoch": 1.2753394379538996,
1015
- "grad_norm": 0.636698842048645,
1016
- "learning_rate": 3.7434895833333336e-06,
1017
- "loss": 1.0699,
1018
- "mean_token_accuracy": 0.7601938605308532,
1019
- "num_tokens": 6106886.0,
1020
- "step": 1010
1021
- },
1022
- {
1023
- "entropy": 1.1121985822916032,
1024
- "epoch": 1.2879696874013262,
1025
- "grad_norm": 0.6492161750793457,
1026
- "learning_rate": 3.6783854166666673e-06,
1027
- "loss": 1.0851,
1028
- "mean_token_accuracy": 0.7588792949914932,
1029
- "num_tokens": 6167935.0,
1030
- "step": 1020
1031
- },
1032
- {
1033
- "entropy": 1.1355163961648942,
1034
- "epoch": 1.3005999368487529,
1035
- "grad_norm": 0.6697131395339966,
1036
- "learning_rate": 3.61328125e-06,
1037
- "loss": 1.094,
1038
- "mean_token_accuracy": 0.754327917098999,
1039
- "num_tokens": 6228870.0,
1040
- "step": 1030
1041
- },
1042
- {
1043
- "entropy": 1.11816665828228,
1044
- "epoch": 1.3132301862961793,
1045
- "grad_norm": 0.6773020625114441,
1046
- "learning_rate": 3.5481770833333335e-06,
1047
- "loss": 1.0893,
1048
- "mean_token_accuracy": 0.7571294933557511,
1049
- "num_tokens": 6288847.0,
1050
- "step": 1040
1051
- },
1052
- {
1053
- "entropy": 1.1343947052955627,
1054
- "epoch": 1.325860435743606,
1055
- "grad_norm": 0.6566488146781921,
1056
- "learning_rate": 3.483072916666667e-06,
1057
- "loss": 1.0875,
1058
- "mean_token_accuracy": 0.755756102502346,
1059
- "num_tokens": 6350161.0,
1060
- "step": 1050
1061
- },
1062
- {
1063
- "entropy": 1.1109364911913873,
1064
- "epoch": 1.3384906851910325,
1065
- "grad_norm": 0.6575057506561279,
1066
- "learning_rate": 3.41796875e-06,
1067
- "loss": 1.0782,
1068
- "mean_token_accuracy": 0.7591001376509666,
1069
- "num_tokens": 6410972.0,
1070
- "step": 1060
1071
- },
1072
- {
1073
- "entropy": 1.1165167808532714,
1074
- "epoch": 1.3511209346384592,
1075
- "grad_norm": 0.6655089259147644,
1076
- "learning_rate": 3.3528645833333334e-06,
1077
- "loss": 1.0901,
1078
- "mean_token_accuracy": 0.7573199763894081,
1079
- "num_tokens": 6471984.0,
1080
- "step": 1070
1081
- },
1082
- {
1083
- "entropy": 1.1066906094551086,
1084
- "epoch": 1.3637511840858858,
1085
- "grad_norm": 0.6363748908042908,
1086
- "learning_rate": 3.287760416666667e-06,
1087
- "loss": 1.0716,
1088
- "mean_token_accuracy": 0.7598252177238465,
1089
- "num_tokens": 6532514.0,
1090
- "step": 1080
1091
- },
1092
- {
1093
- "entropy": 1.1047193810343743,
1094
- "epoch": 1.3763814335333122,
1095
- "grad_norm": 0.6684281826019287,
1096
- "learning_rate": 3.2226562500000004e-06,
1097
- "loss": 1.0823,
1098
- "mean_token_accuracy": 0.7593759268522262,
1099
- "num_tokens": 6592949.0,
1100
- "step": 1090
1101
- },
1102
- {
1103
- "entropy": 1.1348285049200058,
1104
- "epoch": 1.3890116829807388,
1105
- "grad_norm": 0.6439023017883301,
1106
- "learning_rate": 3.1575520833333333e-06,
1107
- "loss": 1.1031,
1108
- "mean_token_accuracy": 0.7526842474937439,
1109
- "num_tokens": 6654231.0,
1110
- "step": 1100
1111
- },
1112
- {
1113
- "entropy": 1.1191302105784415,
1114
- "epoch": 1.4016419324281655,
1115
- "grad_norm": 0.6556984186172485,
1116
- "learning_rate": 3.092447916666667e-06,
1117
- "loss": 1.0799,
1118
- "mean_token_accuracy": 0.7590983435511589,
1119
- "num_tokens": 6714430.0,
1120
- "step": 1110
1121
- },
1122
- {
1123
- "entropy": 1.093433029949665,
1124
- "epoch": 1.4142721818755921,
1125
- "grad_norm": 0.6618829965591431,
1126
- "learning_rate": 3.0273437500000003e-06,
1127
- "loss": 1.0614,
1128
- "mean_token_accuracy": 0.7611085593700408,
1129
- "num_tokens": 6774176.0,
1130
- "step": 1120
1131
- },
1132
- {
1133
- "entropy": 1.135184645652771,
1134
- "epoch": 1.4269024313230187,
1135
- "grad_norm": 0.6382298469543457,
1136
- "learning_rate": 2.962239583333333e-06,
1137
- "loss": 1.0939,
1138
- "mean_token_accuracy": 0.7532851651310921,
1139
- "num_tokens": 6836522.0,
1140
- "step": 1130
1141
- },
1142
- {
1143
- "entropy": 1.1093149304389953,
1144
- "epoch": 1.4395326807704452,
1145
- "grad_norm": 0.6382166147232056,
1146
- "learning_rate": 2.897135416666667e-06,
1147
- "loss": 1.0709,
1148
- "mean_token_accuracy": 0.7608326107263566,
1149
- "num_tokens": 6896353.0,
1150
- "step": 1140
1151
- },
1152
- {
1153
- "entropy": 1.1047044202685357,
1154
- "epoch": 1.4521629302178718,
1155
- "grad_norm": 0.6356373429298401,
1156
- "learning_rate": 2.8320312500000002e-06,
1157
- "loss": 1.0738,
1158
- "mean_token_accuracy": 0.7615469440817833,
1159
- "num_tokens": 6956828.0,
1160
- "step": 1150
1161
- },
1162
- {
1163
- "entropy": 1.1073317646980285,
1164
- "epoch": 1.4647931796652984,
1165
- "grad_norm": 0.6593008041381836,
1166
- "learning_rate": 2.7669270833333335e-06,
1167
- "loss": 1.0589,
1168
- "mean_token_accuracy": 0.7599197804927826,
1169
- "num_tokens": 7017026.0,
1170
- "step": 1160
1171
- },
1172
- {
1173
- "entropy": 1.0851576775312424,
1174
- "epoch": 1.4774234291127248,
1175
- "grad_norm": 0.6466282606124878,
1176
- "learning_rate": 2.7018229166666673e-06,
1177
- "loss": 1.0584,
1178
- "mean_token_accuracy": 0.7626572713255882,
1179
- "num_tokens": 7076806.0,
1180
- "step": 1170
1181
- },
1182
- {
1183
- "entropy": 1.1103300124406814,
1184
- "epoch": 1.4900536785601517,
1185
- "grad_norm": 0.6285493969917297,
1186
- "learning_rate": 2.63671875e-06,
1187
- "loss": 1.0753,
1188
- "mean_token_accuracy": 0.7593718692660332,
1189
- "num_tokens": 7137946.0,
1190
- "step": 1180
1191
- },
1192
- {
1193
- "entropy": 1.1066975593566895,
1194
- "epoch": 1.502683928007578,
1195
- "grad_norm": 0.6664257645606995,
1196
- "learning_rate": 2.5716145833333334e-06,
1197
- "loss": 1.0642,
1198
- "mean_token_accuracy": 0.7612839996814728,
1199
- "num_tokens": 7200103.0,
1200
- "step": 1190
1201
- },
1202
- {
1203
- "entropy": 1.0994308680295943,
1204
- "epoch": 1.5153141774550047,
1205
- "grad_norm": 0.683022141456604,
1206
- "learning_rate": 2.506510416666667e-06,
1207
- "loss": 1.0726,
1208
- "mean_token_accuracy": 0.7611020535230637,
1209
- "num_tokens": 7259051.0,
1210
- "step": 1200
1211
- }
1212
- ],
1213
- "logging_steps": 10,
1214
- "max_steps": 1584,
1215
- "num_input_tokens_seen": 0,
1216
- "num_train_epochs": 2,
1217
- "save_steps": 200,
1218
- "stateful_callbacks": {
1219
- "TrainerControl": {
1220
- "args": {
1221
- "should_epoch_stop": false,
1222
- "should_evaluate": false,
1223
- "should_log": false,
1224
- "should_save": true,
1225
- "should_training_stop": false
1226
- },
1227
- "attributes": {}
1228
- }
1229
- },
1230
- "total_flos": 4.111597909165179e+17,
1231
- "train_batch_size": 8,
1232
- "trial_name": null,
1233
- "trial_params": null
1234
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-1200/training_args.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:d2df6292f9521a8d8864a388f9a0d998b1dc00f8b533adedec1996ec1e3f6ea5
3
- size 6417