WhiteGiverPlus commited on
Commit
bde1506
Β·
verified Β·
1 Parent(s): ac012c7

Add files using upload-large-folder tool

Browse files
Files changed (45) hide show
  1. .gitattributes +4 -0
  2. README.md +207 -0
  3. adapter_config.json +46 -0
  4. adapter_model.safetensors +3 -0
  5. chat_template.jinja +154 -0
  6. checkpoint-2750/README.md +207 -0
  7. checkpoint-2750/adapter_config.json +46 -0
  8. checkpoint-2750/adapter_model.safetensors +3 -0
  9. checkpoint-2750/chat_template.jinja +154 -0
  10. checkpoint-2750/optimizer.pt +3 -0
  11. checkpoint-2750/rng_state.pth +3 -0
  12. checkpoint-2750/scheduler.pt +3 -0
  13. checkpoint-2750/tokenizer.json +3 -0
  14. checkpoint-2750/tokenizer_config.json +31 -0
  15. checkpoint-2750/trainer_state.json +2047 -0
  16. checkpoint-2750/training_args.bin +3 -0
  17. checkpoint-2865/README.md +207 -0
  18. checkpoint-2865/adapter_config.json +46 -0
  19. checkpoint-2865/adapter_model.safetensors +3 -0
  20. checkpoint-2865/chat_template.jinja +154 -0
  21. checkpoint-2865/optimizer.pt +3 -0
  22. checkpoint-2865/rng_state.pth +3 -0
  23. checkpoint-2865/scheduler.pt +3 -0
  24. checkpoint-2865/tokenizer.json +3 -0
  25. checkpoint-2865/tokenizer_config.json +31 -0
  26. checkpoint-2865/trainer_state.json +2124 -0
  27. checkpoint-2865/training_args.bin +3 -0
  28. merged/chat_template.jinja +154 -0
  29. merged/config.json +75 -0
  30. merged/generation_config.json +10 -0
  31. merged/model.safetensors +3 -0
  32. merged/tokenizer.json +3 -0
  33. merged/tokenizer_config.json +31 -0
  34. run-config.txt +12 -0
  35. skipped-tokenization.jsonl +422 -0
  36. speed-estimate.md +11 -0
  37. tokenizer.json +3 -0
  38. tokenizer_config.json +31 -0
  39. train-6144-mb2x8-3ep-gpu1.log +0 -0
  40. train-6144-mb2x8-gpu1.log +11 -0
  41. train-8192-mb4x4-gpu1.log +42 -0
  42. train-8192.log +8 -0
  43. train-8192.pid +1 -0
  44. train-manifest.json +17 -0
  45. training_args.bin +3 -0
.gitattributes CHANGED
@@ -33,3 +33,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ checkpoint-2865/tokenizer.json filter=lfs diff=lfs merge=lfs -text
38
+ merged/tokenizer.json filter=lfs diff=lfs merge=lfs -text
39
+ checkpoint-2750/tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen3.5-2B
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:Qwen/Qwen3.5-2B
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.0
adapter_config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "Qwen/Qwen3.5-2B",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 64,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.0",
27
+ "qalora_group_size": 16,
28
+ "r": 32,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "v_proj",
33
+ "k_proj",
34
+ "gate_proj",
35
+ "o_proj",
36
+ "down_proj",
37
+ "up_proj",
38
+ "q_proj"
39
+ ],
40
+ "target_parameters": null,
41
+ "task_type": "CAUSAL_LM",
42
+ "trainable_token_indices": null,
43
+ "use_dora": false,
44
+ "use_qalora": false,
45
+ "use_rslora": false
46
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e54460326b97c66aced3c8ec3a50427b59111b42282d8638b4bbbe132d510518
3
+ size 87319256
chat_template.jinja ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- set image_count = namespace(value=0) %}
2
+ {%- set video_count = namespace(value=0) %}
3
+ {%- macro render_content(content, do_vision_count, is_system_content=false) %}
4
+ {%- if content is string %}
5
+ {{- content }}
6
+ {%- elif content is iterable and content is not mapping %}
7
+ {%- for item in content %}
8
+ {%- if 'image' in item or 'image_url' in item or item.type == 'image' %}
9
+ {%- if is_system_content %}
10
+ {{- raise_exception('System message cannot contain images.') }}
11
+ {%- endif %}
12
+ {%- if do_vision_count %}
13
+ {%- set image_count.value = image_count.value + 1 %}
14
+ {%- endif %}
15
+ {%- if add_vision_id %}
16
+ {{- 'Picture ' ~ image_count.value ~ ': ' }}
17
+ {%- endif %}
18
+ {{- '<|vision_start|><|image_pad|><|vision_end|>' }}
19
+ {%- elif 'video' in item or item.type == 'video' %}
20
+ {%- if is_system_content %}
21
+ {{- raise_exception('System message cannot contain videos.') }}
22
+ {%- endif %}
23
+ {%- if do_vision_count %}
24
+ {%- set video_count.value = video_count.value + 1 %}
25
+ {%- endif %}
26
+ {%- if add_vision_id %}
27
+ {{- 'Video ' ~ video_count.value ~ ': ' }}
28
+ {%- endif %}
29
+ {{- '<|vision_start|><|video_pad|><|vision_end|>' }}
30
+ {%- elif 'text' in item %}
31
+ {{- item.text }}
32
+ {%- else %}
33
+ {{- raise_exception('Unexpected item type in content.') }}
34
+ {%- endif %}
35
+ {%- endfor %}
36
+ {%- elif content is none or content is undefined %}
37
+ {{- '' }}
38
+ {%- else %}
39
+ {{- raise_exception('Unexpected content type.') }}
40
+ {%- endif %}
41
+ {%- endmacro %}
42
+ {%- if not messages %}
43
+ {{- raise_exception('No messages provided.') }}
44
+ {%- endif %}
45
+ {%- if tools and tools is iterable and tools is not mapping %}
46
+ {{- '<|im_start|>system\n' }}
47
+ {{- "# Tools\n\nYou have access to the following functions:\n\n<tools>" }}
48
+ {%- for tool in tools %}
49
+ {{- "\n" }}
50
+ {{- tool | tojson }}
51
+ {%- endfor %}
52
+ {{- "\n</tools>" }}
53
+ {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
54
+ {%- if messages[0].role == 'system' %}
55
+ {%- set content = render_content(messages[0].content, false, true)|trim %}
56
+ {%- if content %}
57
+ {{- '\n\n' + content }}
58
+ {%- endif %}
59
+ {%- endif %}
60
+ {{- '<|im_end|>\n' }}
61
+ {%- else %}
62
+ {%- if messages[0].role == 'system' %}
63
+ {%- set content = render_content(messages[0].content, false, true)|trim %}
64
+ {{- '<|im_start|>system\n' + content + '<|im_end|>\n' }}
65
+ {%- endif %}
66
+ {%- endif %}
67
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
68
+ {%- for message in messages[::-1] %}
69
+ {%- set index = (messages|length - 1) - loop.index0 %}
70
+ {%- if ns.multi_step_tool and message.role == "user" %}
71
+ {%- set content = render_content(message.content, false)|trim %}
72
+ {%- if not(content.startswith('<tool_response>') and content.endswith('</tool_response>')) %}
73
+ {%- set ns.multi_step_tool = false %}
74
+ {%- set ns.last_query_index = index %}
75
+ {%- endif %}
76
+ {%- endif %}
77
+ {%- endfor %}
78
+ {%- if ns.multi_step_tool %}
79
+ {{- raise_exception('No user query found in messages.') }}
80
+ {%- endif %}
81
+ {%- for message in messages %}
82
+ {%- set content = render_content(message.content, true)|trim %}
83
+ {%- if message.role == "system" %}
84
+ {%- if not loop.first %}
85
+ {{- raise_exception('System message must be at the beginning.') }}
86
+ {%- endif %}
87
+ {%- elif message.role == "user" %}
88
+ {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
89
+ {%- elif message.role == "assistant" %}
90
+ {%- set reasoning_content = '' %}
91
+ {%- if message.reasoning_content is string %}
92
+ {%- set reasoning_content = message.reasoning_content %}
93
+ {%- else %}
94
+ {%- if '</think>' in content %}
95
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
96
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
97
+ {%- endif %}
98
+ {%- endif %}
99
+ {%- set reasoning_content = reasoning_content|trim %}
100
+ {%- if loop.index0 > ns.last_query_index %}
101
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content + '\n</think>\n\n' + content }}
102
+ {%- else %}
103
+ {{- '<|im_start|>' + message.role + '\n' + content }}
104
+ {%- endif %}
105
+ {%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}
106
+ {%- for tool_call in message.tool_calls %}
107
+ {%- if tool_call.function is defined %}
108
+ {%- set tool_call = tool_call.function %}
109
+ {%- endif %}
110
+ {%- if loop.first %}
111
+ {%- if content|trim %}
112
+ {{- '\n\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
113
+ {%- else %}
114
+ {{- '<tool_call>\n<function=' + tool_call.name + '>\n' }}
115
+ {%- endif %}
116
+ {%- else %}
117
+ {{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
118
+ {%- endif %}
119
+ {%- if tool_call.arguments is defined %}
120
+ {%- for args_name, args_value in tool_call.arguments|items %}
121
+ {{- '<parameter=' + args_name + '>\n' }}
122
+ {%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}
123
+ {{- args_value }}
124
+ {{- '\n</parameter>\n' }}
125
+ {%- endfor %}
126
+ {%- endif %}
127
+ {{- '</function>\n</tool_call>' }}
128
+ {%- endfor %}
129
+ {%- endif %}
130
+ {{- '<|im_end|>\n' }}
131
+ {%- elif message.role == "tool" %}
132
+ {%- if loop.previtem and loop.previtem.role != "tool" %}
133
+ {{- '<|im_start|>user' }}
134
+ {%- endif %}
135
+ {{- '\n<tool_response>\n' }}
136
+ {{- content }}
137
+ {{- '\n</tool_response>' }}
138
+ {%- if not loop.last and loop.nextitem.role != "tool" %}
139
+ {{- '<|im_end|>\n' }}
140
+ {%- elif loop.last %}
141
+ {{- '<|im_end|>\n' }}
142
+ {%- endif %}
143
+ {%- else %}
144
+ {{- raise_exception('Unexpected message role.') }}
145
+ {%- endif %}
146
+ {%- endfor %}
147
+ {%- if add_generation_prompt %}
148
+ {{- '<|im_start|>assistant\n' }}
149
+ {%- if enable_thinking is defined and enable_thinking is true %}
150
+ {{- '<think>\n' }}
151
+ {%- else %}
152
+ {{- '<think>\n\n</think>\n\n' }}
153
+ {%- endif %}
154
+ {%- endif %}
checkpoint-2750/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen3.5-2B
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:Qwen/Qwen3.5-2B
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.0
checkpoint-2750/adapter_config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "Qwen/Qwen3.5-2B",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 64,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.0",
27
+ "qalora_group_size": 16,
28
+ "r": 32,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "v_proj",
33
+ "k_proj",
34
+ "gate_proj",
35
+ "o_proj",
36
+ "down_proj",
37
+ "up_proj",
38
+ "q_proj"
39
+ ],
40
+ "target_parameters": null,
41
+ "task_type": "CAUSAL_LM",
42
+ "trainable_token_indices": null,
43
+ "use_dora": false,
44
+ "use_qalora": false,
45
+ "use_rslora": false
46
+ }
checkpoint-2750/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b49ebcad405adb694c065950e282615812e3981bdca4202f2a79e151d3c1ec2
3
+ size 87319256
checkpoint-2750/chat_template.jinja ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- set image_count = namespace(value=0) %}
2
+ {%- set video_count = namespace(value=0) %}
3
+ {%- macro render_content(content, do_vision_count, is_system_content=false) %}
4
+ {%- if content is string %}
5
+ {{- content }}
6
+ {%- elif content is iterable and content is not mapping %}
7
+ {%- for item in content %}
8
+ {%- if 'image' in item or 'image_url' in item or item.type == 'image' %}
9
+ {%- if is_system_content %}
10
+ {{- raise_exception('System message cannot contain images.') }}
11
+ {%- endif %}
12
+ {%- if do_vision_count %}
13
+ {%- set image_count.value = image_count.value + 1 %}
14
+ {%- endif %}
15
+ {%- if add_vision_id %}
16
+ {{- 'Picture ' ~ image_count.value ~ ': ' }}
17
+ {%- endif %}
18
+ {{- '<|vision_start|><|image_pad|><|vision_end|>' }}
19
+ {%- elif 'video' in item or item.type == 'video' %}
20
+ {%- if is_system_content %}
21
+ {{- raise_exception('System message cannot contain videos.') }}
22
+ {%- endif %}
23
+ {%- if do_vision_count %}
24
+ {%- set video_count.value = video_count.value + 1 %}
25
+ {%- endif %}
26
+ {%- if add_vision_id %}
27
+ {{- 'Video ' ~ video_count.value ~ ': ' }}
28
+ {%- endif %}
29
+ {{- '<|vision_start|><|video_pad|><|vision_end|>' }}
30
+ {%- elif 'text' in item %}
31
+ {{- item.text }}
32
+ {%- else %}
33
+ {{- raise_exception('Unexpected item type in content.') }}
34
+ {%- endif %}
35
+ {%- endfor %}
36
+ {%- elif content is none or content is undefined %}
37
+ {{- '' }}
38
+ {%- else %}
39
+ {{- raise_exception('Unexpected content type.') }}
40
+ {%- endif %}
41
+ {%- endmacro %}
42
+ {%- if not messages %}
43
+ {{- raise_exception('No messages provided.') }}
44
+ {%- endif %}
45
+ {%- if tools and tools is iterable and tools is not mapping %}
46
+ {{- '<|im_start|>system\n' }}
47
+ {{- "# Tools\n\nYou have access to the following functions:\n\n<tools>" }}
48
+ {%- for tool in tools %}
49
+ {{- "\n" }}
50
+ {{- tool | tojson }}
51
+ {%- endfor %}
52
+ {{- "\n</tools>" }}
53
+ {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
54
+ {%- if messages[0].role == 'system' %}
55
+ {%- set content = render_content(messages[0].content, false, true)|trim %}
56
+ {%- if content %}
57
+ {{- '\n\n' + content }}
58
+ {%- endif %}
59
+ {%- endif %}
60
+ {{- '<|im_end|>\n' }}
61
+ {%- else %}
62
+ {%- if messages[0].role == 'system' %}
63
+ {%- set content = render_content(messages[0].content, false, true)|trim %}
64
+ {{- '<|im_start|>system\n' + content + '<|im_end|>\n' }}
65
+ {%- endif %}
66
+ {%- endif %}
67
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
68
+ {%- for message in messages[::-1] %}
69
+ {%- set index = (messages|length - 1) - loop.index0 %}
70
+ {%- if ns.multi_step_tool and message.role == "user" %}
71
+ {%- set content = render_content(message.content, false)|trim %}
72
+ {%- if not(content.startswith('<tool_response>') and content.endswith('</tool_response>')) %}
73
+ {%- set ns.multi_step_tool = false %}
74
+ {%- set ns.last_query_index = index %}
75
+ {%- endif %}
76
+ {%- endif %}
77
+ {%- endfor %}
78
+ {%- if ns.multi_step_tool %}
79
+ {{- raise_exception('No user query found in messages.') }}
80
+ {%- endif %}
81
+ {%- for message in messages %}
82
+ {%- set content = render_content(message.content, true)|trim %}
83
+ {%- if message.role == "system" %}
84
+ {%- if not loop.first %}
85
+ {{- raise_exception('System message must be at the beginning.') }}
86
+ {%- endif %}
87
+ {%- elif message.role == "user" %}
88
+ {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
89
+ {%- elif message.role == "assistant" %}
90
+ {%- set reasoning_content = '' %}
91
+ {%- if message.reasoning_content is string %}
92
+ {%- set reasoning_content = message.reasoning_content %}
93
+ {%- else %}
94
+ {%- if '</think>' in content %}
95
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
96
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
97
+ {%- endif %}
98
+ {%- endif %}
99
+ {%- set reasoning_content = reasoning_content|trim %}
100
+ {%- if loop.index0 > ns.last_query_index %}
101
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content + '\n</think>\n\n' + content }}
102
+ {%- else %}
103
+ {{- '<|im_start|>' + message.role + '\n' + content }}
104
+ {%- endif %}
105
+ {%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}
106
+ {%- for tool_call in message.tool_calls %}
107
+ {%- if tool_call.function is defined %}
108
+ {%- set tool_call = tool_call.function %}
109
+ {%- endif %}
110
+ {%- if loop.first %}
111
+ {%- if content|trim %}
112
+ {{- '\n\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
113
+ {%- else %}
114
+ {{- '<tool_call>\n<function=' + tool_call.name + '>\n' }}
115
+ {%- endif %}
116
+ {%- else %}
117
+ {{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
118
+ {%- endif %}
119
+ {%- if tool_call.arguments is defined %}
120
+ {%- for args_name, args_value in tool_call.arguments|items %}
121
+ {{- '<parameter=' + args_name + '>\n' }}
122
+ {%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}
123
+ {{- args_value }}
124
+ {{- '\n</parameter>\n' }}
125
+ {%- endfor %}
126
+ {%- endif %}
127
+ {{- '</function>\n</tool_call>' }}
128
+ {%- endfor %}
129
+ {%- endif %}
130
+ {{- '<|im_end|>\n' }}
131
+ {%- elif message.role == "tool" %}
132
+ {%- if loop.previtem and loop.previtem.role != "tool" %}
133
+ {{- '<|im_start|>user' }}
134
+ {%- endif %}
135
+ {{- '\n<tool_response>\n' }}
136
+ {{- content }}
137
+ {{- '\n</tool_response>' }}
138
+ {%- if not loop.last and loop.nextitem.role != "tool" %}
139
+ {{- '<|im_end|>\n' }}
140
+ {%- elif loop.last %}
141
+ {{- '<|im_end|>\n' }}
142
+ {%- endif %}
143
+ {%- else %}
144
+ {{- raise_exception('Unexpected message role.') }}
145
+ {%- endif %}
146
+ {%- endfor %}
147
+ {%- if add_generation_prompt %}
148
+ {{- '<|im_start|>assistant\n' }}
149
+ {%- if enable_thinking is defined and enable_thinking is true %}
150
+ {{- '<think>\n' }}
151
+ {%- else %}
152
+ {{- '<think>\n\n</think>\n\n' }}
153
+ {%- endif %}
154
+ {%- endif %}
checkpoint-2750/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9b823bb8d08dc7f3e67974d7373180327c0c3b3f484279111011eeccb193952e
3
+ size 174750283
checkpoint-2750/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:35b19cca0c77fd3faf9cb577574ebff9d16240a5010f338c8fd848717050f145
3
+ size 14645
checkpoint-2750/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e4246573c193d4338561dd7c638ea83197ef8cbd56a1d02b874104194a5175da
3
+ size 1465
checkpoint-2750/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:87a7830d63fcf43bf241c3c5242e96e62dd3fdc29224ca26fed8ea333db72de4
3
+ size 19989343
checkpoint-2750/tokenizer_config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "audio_bos_token": "<|audio_start|>",
4
+ "audio_eos_token": "<|audio_end|>",
5
+ "audio_token": "<|audio_pad|>",
6
+ "backend": "tokenizers",
7
+ "bos_token": null,
8
+ "clean_up_tokenization_spaces": false,
9
+ "eos_token": "<|im_end|>",
10
+ "errors": "replace",
11
+ "image_token": "<|image_pad|>",
12
+ "is_local": true,
13
+ "model_max_length": 262144,
14
+ "model_specific_special_tokens": {
15
+ "audio_bos_token": "<|audio_start|>",
16
+ "audio_eos_token": "<|audio_end|>",
17
+ "audio_token": "<|audio_pad|>",
18
+ "image_token": "<|image_pad|>",
19
+ "video_token": "<|video_pad|>",
20
+ "vision_bos_token": "<|vision_start|>",
21
+ "vision_eos_token": "<|vision_end|>"
22
+ },
23
+ "pad_token": "<|endoftext|>",
24
+ "pretokenize_regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?[\\p{L}\\p{M}]+|\\p{N}| ?[^\\s\\p{L}\\p{M}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
25
+ "split_special_tokens": false,
26
+ "tokenizer_class": "TokenizersBackend",
27
+ "unk_token": null,
28
+ "video_token": "<|video_pad|>",
29
+ "vision_bos_token": "<|vision_start|>",
30
+ "vision_eos_token": "<|vision_end|>"
31
+ }
checkpoint-2750/trainer_state.json ADDED
@@ -0,0 +1,2047 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 2.8802724652868745,
6
+ "eval_steps": 250,
7
+ "global_step": 2750,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.010479434110558029,
14
+ "grad_norm": 0.19915591180324554,
15
+ "learning_rate": 1.0465116279069768e-05,
16
+ "loss": 1.1350045204162598,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 0.020958868221116058,
21
+ "grad_norm": 0.18158815801143646,
22
+ "learning_rate": 2.2093023255813955e-05,
23
+ "loss": 1.0580164909362793,
24
+ "step": 20
25
+ },
26
+ {
27
+ "epoch": 0.03143830233167409,
28
+ "grad_norm": 0.16481591761112213,
29
+ "learning_rate": 3.372093023255814e-05,
30
+ "loss": 0.9252842903137207,
31
+ "step": 30
32
+ },
33
+ {
34
+ "epoch": 0.041917736442232116,
35
+ "grad_norm": 0.15599584579467773,
36
+ "learning_rate": 4.5348837209302326e-05,
37
+ "loss": 0.8342072486877441,
38
+ "step": 40
39
+ },
40
+ {
41
+ "epoch": 0.05239717055279015,
42
+ "grad_norm": 0.1804327368736267,
43
+ "learning_rate": 5.697674418604652e-05,
44
+ "loss": 0.7955524921417236,
45
+ "step": 50
46
+ },
47
+ {
48
+ "epoch": 0.06287660466334818,
49
+ "grad_norm": 0.16934047639369965,
50
+ "learning_rate": 6.86046511627907e-05,
51
+ "loss": 0.7358035087585449,
52
+ "step": 60
53
+ },
54
+ {
55
+ "epoch": 0.07335603877390622,
56
+ "grad_norm": 0.2234930843114853,
57
+ "learning_rate": 8.023255813953489e-05,
58
+ "loss": 0.6985861301422119,
59
+ "step": 70
60
+ },
61
+ {
62
+ "epoch": 0.08383547288446423,
63
+ "grad_norm": 0.16290400922298431,
64
+ "learning_rate": 9.186046511627907e-05,
65
+ "loss": 0.599607515335083,
66
+ "step": 80
67
+ },
68
+ {
69
+ "epoch": 0.09431490699502226,
70
+ "grad_norm": 0.1660464107990265,
71
+ "learning_rate": 9.999971245570617e-05,
72
+ "loss": 0.5886398315429687,
73
+ "step": 90
74
+ },
75
+ {
76
+ "epoch": 0.1047943411055803,
77
+ "grad_norm": 0.16978025436401367,
78
+ "learning_rate": 9.999460064915317e-05,
79
+ "loss": 0.5450529098510742,
80
+ "step": 100
81
+ },
82
+ {
83
+ "epoch": 0.11527377521613832,
84
+ "grad_norm": 0.21447990834712982,
85
+ "learning_rate": 9.998309972134645e-05,
86
+ "loss": 0.5072262287139893,
87
+ "step": 110
88
+ },
89
+ {
90
+ "epoch": 0.12575320932669637,
91
+ "grad_norm": 0.17418669164180756,
92
+ "learning_rate": 9.996521114206116e-05,
93
+ "loss": 0.49445347785949706,
94
+ "step": 120
95
+ },
96
+ {
97
+ "epoch": 0.13623264343725439,
98
+ "grad_norm": 0.22226351499557495,
99
+ "learning_rate": 9.994093719739023e-05,
100
+ "loss": 0.47142682075500486,
101
+ "step": 130
102
+ },
103
+ {
104
+ "epoch": 0.14671207754781243,
105
+ "grad_norm": 0.1745530068874359,
106
+ "learning_rate": 9.991028098945215e-05,
107
+ "loss": 0.46663532257080076,
108
+ "step": 140
109
+ },
110
+ {
111
+ "epoch": 0.15719151165837045,
112
+ "grad_norm": 0.17074695229530334,
113
+ "learning_rate": 9.987324643599459e-05,
114
+ "loss": 0.4508847236633301,
115
+ "step": 150
116
+ },
117
+ {
118
+ "epoch": 0.16767094576892846,
119
+ "grad_norm": 0.13428406417369843,
120
+ "learning_rate": 9.982983826989367e-05,
121
+ "loss": 0.40740265846252444,
122
+ "step": 160
123
+ },
124
+ {
125
+ "epoch": 0.1781503798794865,
126
+ "grad_norm": 0.17766578495502472,
127
+ "learning_rate": 9.978006203854918e-05,
128
+ "loss": 0.3998516321182251,
129
+ "step": 170
130
+ },
131
+ {
132
+ "epoch": 0.18862981399004453,
133
+ "grad_norm": 0.1672629565000534,
134
+ "learning_rate": 9.972392410317562e-05,
135
+ "loss": 0.41658673286437986,
136
+ "step": 180
137
+ },
138
+ {
139
+ "epoch": 0.19910924810060257,
140
+ "grad_norm": 0.1333673745393753,
141
+ "learning_rate": 9.96614316379892e-05,
142
+ "loss": 0.37024455070495604,
143
+ "step": 190
144
+ },
145
+ {
146
+ "epoch": 0.2095886822111606,
147
+ "grad_norm": 0.18037110567092896,
148
+ "learning_rate": 9.959259262929113e-05,
149
+ "loss": 0.35086841583251954,
150
+ "step": 200
151
+ },
152
+ {
153
+ "epoch": 0.22006811632171863,
154
+ "grad_norm": 0.14616410434246063,
155
+ "learning_rate": 9.951741587444683e-05,
156
+ "loss": 0.37918968200683595,
157
+ "step": 210
158
+ },
159
+ {
160
+ "epoch": 0.23054755043227665,
161
+ "grad_norm": 0.14523574709892273,
162
+ "learning_rate": 9.943591098076184e-05,
163
+ "loss": 0.32804527282714846,
164
+ "step": 220
165
+ },
166
+ {
167
+ "epoch": 0.2410269845428347,
168
+ "grad_norm": 0.14667049050331116,
169
+ "learning_rate": 9.934808836425393e-05,
170
+ "loss": 0.3480507850646973,
171
+ "step": 230
172
+ },
173
+ {
174
+ "epoch": 0.25150641865339274,
175
+ "grad_norm": 0.18156558275222778,
176
+ "learning_rate": 9.925395924832198e-05,
177
+ "loss": 0.3300448179244995,
178
+ "step": 240
179
+ },
180
+ {
181
+ "epoch": 0.26198585276395076,
182
+ "grad_norm": 0.13806430995464325,
183
+ "learning_rate": 9.91535356623117e-05,
184
+ "loss": 0.3127591609954834,
185
+ "step": 250
186
+ },
187
+ {
188
+ "epoch": 0.26198585276395076,
189
+ "eval_loss": 0.3132782578468323,
190
+ "eval_runtime": 94.8848,
191
+ "eval_samples_per_second": 3.278,
192
+ "eval_steps_per_second": 3.278,
193
+ "step": 250
194
+ },
195
+ {
196
+ "epoch": 0.27246528687450877,
197
+ "grad_norm": 0.17205959558486938,
198
+ "learning_rate": 9.904683043997835e-05,
199
+ "loss": 0.3306673288345337,
200
+ "step": 260
201
+ },
202
+ {
203
+ "epoch": 0.2829447209850668,
204
+ "grad_norm": 0.12620031833648682,
205
+ "learning_rate": 9.893385721784656e-05,
206
+ "loss": 0.3011106729507446,
207
+ "step": 270
208
+ },
209
+ {
210
+ "epoch": 0.29342415509562486,
211
+ "grad_norm": 0.11466006934642792,
212
+ "learning_rate": 9.881463043346768e-05,
213
+ "loss": 0.2951968669891357,
214
+ "step": 280
215
+ },
216
+ {
217
+ "epoch": 0.3039035892061829,
218
+ "grad_norm": 0.1671207845211029,
219
+ "learning_rate": 9.868916532357475e-05,
220
+ "loss": 0.2910990953445435,
221
+ "step": 290
222
+ },
223
+ {
224
+ "epoch": 0.3143830233167409,
225
+ "grad_norm": 0.1683349907398224,
226
+ "learning_rate": 9.855747792213521e-05,
227
+ "loss": 0.31409192085266113,
228
+ "step": 300
229
+ },
230
+ {
231
+ "epoch": 0.3248624574272989,
232
+ "grad_norm": 0.12934699654579163,
233
+ "learning_rate": 9.84195850583019e-05,
234
+ "loss": 0.27755858898162844,
235
+ "step": 310
236
+ },
237
+ {
238
+ "epoch": 0.33534189153785693,
239
+ "grad_norm": 0.13784605264663696,
240
+ "learning_rate": 9.827550435426234e-05,
241
+ "loss": 0.2809821605682373,
242
+ "step": 320
243
+ },
244
+ {
245
+ "epoch": 0.345821325648415,
246
+ "grad_norm": 0.18590271472930908,
247
+ "learning_rate": 9.812525422298664e-05,
248
+ "loss": 0.28698866367340087,
249
+ "step": 330
250
+ },
251
+ {
252
+ "epoch": 0.356300759758973,
253
+ "grad_norm": 0.1704522967338562,
254
+ "learning_rate": 9.796885386587447e-05,
255
+ "loss": 0.250814414024353,
256
+ "step": 340
257
+ },
258
+ {
259
+ "epoch": 0.36678019386953103,
260
+ "grad_norm": 0.1316167265176773,
261
+ "learning_rate": 9.780632327030112e-05,
262
+ "loss": 0.25458922386169436,
263
+ "step": 350
264
+ },
265
+ {
266
+ "epoch": 0.37725962798008905,
267
+ "grad_norm": 0.16226200759410858,
268
+ "learning_rate": 9.763768320706319e-05,
269
+ "loss": 0.26563262939453125,
270
+ "step": 360
271
+ },
272
+ {
273
+ "epoch": 0.3877390620906471,
274
+ "grad_norm": 0.1297195851802826,
275
+ "learning_rate": 9.746295522772424e-05,
276
+ "loss": 0.2632328748703003,
277
+ "step": 370
278
+ },
279
+ {
280
+ "epoch": 0.39821849620120514,
281
+ "grad_norm": 0.1286139190196991,
282
+ "learning_rate": 9.728216166186049e-05,
283
+ "loss": 0.2624588251113892,
284
+ "step": 380
285
+ },
286
+ {
287
+ "epoch": 0.40869793031176316,
288
+ "grad_norm": 0.1587965339422226,
289
+ "learning_rate": 9.709532561420725e-05,
290
+ "loss": 0.24741590023040771,
291
+ "step": 390
292
+ },
293
+ {
294
+ "epoch": 0.4191773644223212,
295
+ "grad_norm": 0.11963177472352982,
296
+ "learning_rate": 9.690247096170615e-05,
297
+ "loss": 0.22777397632598878,
298
+ "step": 400
299
+ },
300
+ {
301
+ "epoch": 0.42965679853287925,
302
+ "grad_norm": 0.13638927042484283,
303
+ "learning_rate": 9.670362235045387e-05,
304
+ "loss": 0.23324952125549317,
305
+ "step": 410
306
+ },
307
+ {
308
+ "epoch": 0.44013623264343726,
309
+ "grad_norm": 0.1514088362455368,
310
+ "learning_rate": 9.649880519255232e-05,
311
+ "loss": 0.2505915880203247,
312
+ "step": 420
313
+ },
314
+ {
315
+ "epoch": 0.4506156667539953,
316
+ "grad_norm": 0.10994207113981247,
317
+ "learning_rate": 9.62880456628612e-05,
318
+ "loss": 0.2078850269317627,
319
+ "step": 430
320
+ },
321
+ {
322
+ "epoch": 0.4610951008645533,
323
+ "grad_norm": 0.11983369290828705,
324
+ "learning_rate": 9.607137069565288e-05,
325
+ "loss": 0.21452484130859376,
326
+ "step": 440
327
+ },
328
+ {
329
+ "epoch": 0.47157453497511137,
330
+ "grad_norm": 0.12684305012226105,
331
+ "learning_rate": 9.58488079811703e-05,
332
+ "loss": 0.22002685070037842,
333
+ "step": 450
334
+ },
335
+ {
336
+ "epoch": 0.4820539690856694,
337
+ "grad_norm": 0.16841623187065125,
338
+ "learning_rate": 9.562038596208828e-05,
339
+ "loss": 0.21405396461486817,
340
+ "step": 460
341
+ },
342
+ {
343
+ "epoch": 0.4925334031962274,
344
+ "grad_norm": 0.1498555839061737,
345
+ "learning_rate": 9.538613382987865e-05,
346
+ "loss": 0.20534911155700683,
347
+ "step": 470
348
+ },
349
+ {
350
+ "epoch": 0.5030128373067855,
351
+ "grad_norm": 0.13913628458976746,
352
+ "learning_rate": 9.514608152107974e-05,
353
+ "loss": 0.22248730659484864,
354
+ "step": 480
355
+ },
356
+ {
357
+ "epoch": 0.5134922714173434,
358
+ "grad_norm": 0.14408951997756958,
359
+ "learning_rate": 9.490025971347047e-05,
360
+ "loss": 0.214866042137146,
361
+ "step": 490
362
+ },
363
+ {
364
+ "epoch": 0.5239717055279015,
365
+ "grad_norm": 0.1649770438671112,
366
+ "learning_rate": 9.464869982215001e-05,
367
+ "loss": 0.19965900182724,
368
+ "step": 500
369
+ },
370
+ {
371
+ "epoch": 0.5239717055279015,
372
+ "eval_loss": 0.19267401099205017,
373
+ "eval_runtime": 95.3374,
374
+ "eval_samples_per_second": 3.262,
375
+ "eval_steps_per_second": 3.262,
376
+ "step": 500
377
+ },
378
+ {
379
+ "epoch": 0.5344511396384595,
380
+ "grad_norm": 0.1305568665266037,
381
+ "learning_rate": 9.439143399552291e-05,
382
+ "loss": 0.21112546920776368,
383
+ "step": 510
384
+ },
385
+ {
386
+ "epoch": 0.5449305737490175,
387
+ "grad_norm": 0.11998175084590912,
388
+ "learning_rate": 9.412849511119074e-05,
389
+ "loss": 0.21422922611236572,
390
+ "step": 520
391
+ },
392
+ {
393
+ "epoch": 0.5554100078595756,
394
+ "grad_norm": 0.15220341086387634,
395
+ "learning_rate": 9.385991677175046e-05,
396
+ "loss": 0.20999882221221924,
397
+ "step": 530
398
+ },
399
+ {
400
+ "epoch": 0.5658894419701336,
401
+ "grad_norm": 0.13170023262500763,
402
+ "learning_rate": 9.358573330050004e-05,
403
+ "loss": 0.20208392143249512,
404
+ "step": 540
405
+ },
406
+ {
407
+ "epoch": 0.5763688760806917,
408
+ "grad_norm": 0.10457764565944672,
409
+ "learning_rate": 9.330597973705219e-05,
410
+ "loss": 0.1908803701400757,
411
+ "step": 550
412
+ },
413
+ {
414
+ "epoch": 0.5868483101912497,
415
+ "grad_norm": 0.12568537890911102,
416
+ "learning_rate": 9.302069183285637e-05,
417
+ "loss": 0.19316340684890748,
418
+ "step": 560
419
+ },
420
+ {
421
+ "epoch": 0.5973277443018077,
422
+ "grad_norm": 0.14824528992176056,
423
+ "learning_rate": 9.272990604662988e-05,
424
+ "loss": 0.18987581729888917,
425
+ "step": 570
426
+ },
427
+ {
428
+ "epoch": 0.6078071784123658,
429
+ "grad_norm": 0.14521734416484833,
430
+ "learning_rate": 9.243365953969861e-05,
431
+ "loss": 0.19232832193374633,
432
+ "step": 580
433
+ },
434
+ {
435
+ "epoch": 0.6182866125229237,
436
+ "grad_norm": 0.1335408091545105,
437
+ "learning_rate": 9.213199017124793e-05,
438
+ "loss": 0.1758212924003601,
439
+ "step": 590
440
+ },
441
+ {
442
+ "epoch": 0.6287660466334818,
443
+ "grad_norm": 0.11143071949481964,
444
+ "learning_rate": 9.182493649348447e-05,
445
+ "loss": 0.19117680788040162,
446
+ "step": 600
447
+ },
448
+ {
449
+ "epoch": 0.6392454807440399,
450
+ "grad_norm": 0.14789296686649323,
451
+ "learning_rate": 9.151253774670921e-05,
452
+ "loss": 0.184559965133667,
453
+ "step": 610
454
+ },
455
+ {
456
+ "epoch": 0.6497249148545978,
457
+ "grad_norm": 0.10541336238384247,
458
+ "learning_rate": 9.119483385430283e-05,
459
+ "loss": 0.1720304846763611,
460
+ "step": 620
461
+ },
462
+ {
463
+ "epoch": 0.6602043489651559,
464
+ "grad_norm": 0.12105975300073624,
465
+ "learning_rate": 9.087186541762358e-05,
466
+ "loss": 0.17654836177825928,
467
+ "step": 630
468
+ },
469
+ {
470
+ "epoch": 0.6706837830757139,
471
+ "grad_norm": 0.13114669919013977,
472
+ "learning_rate": 9.054367371081858e-05,
473
+ "loss": 0.1696592688560486,
474
+ "step": 640
475
+ },
476
+ {
477
+ "epoch": 0.6811632171862719,
478
+ "grad_norm": 0.13745592534542084,
479
+ "learning_rate": 9.021030067554919e-05,
480
+ "loss": 0.15404462814331055,
481
+ "step": 650
482
+ },
483
+ {
484
+ "epoch": 0.69164265129683,
485
+ "grad_norm": 0.15927442908287048,
486
+ "learning_rate": 8.987178891563094e-05,
487
+ "loss": 0.17024366855621337,
488
+ "step": 660
489
+ },
490
+ {
491
+ "epoch": 0.702122085407388,
492
+ "grad_norm": 0.13737429678440094,
493
+ "learning_rate": 8.952818169158903e-05,
494
+ "loss": 0.1602048397064209,
495
+ "step": 670
496
+ },
497
+ {
498
+ "epoch": 0.712601519517946,
499
+ "grad_norm": 0.13941751420497894,
500
+ "learning_rate": 8.91795229151297e-05,
501
+ "loss": 0.18057082891464232,
502
+ "step": 680
503
+ },
504
+ {
505
+ "epoch": 0.7230809536285041,
506
+ "grad_norm": 0.14242954552173615,
507
+ "learning_rate": 8.882585714352856e-05,
508
+ "loss": 0.14863334894180297,
509
+ "step": 690
510
+ },
511
+ {
512
+ "epoch": 0.7335603877390621,
513
+ "grad_norm": 0.15553542971611023,
514
+ "learning_rate": 8.846722957393626e-05,
515
+ "loss": 0.15701137781143187,
516
+ "step": 700
517
+ },
518
+ {
519
+ "epoch": 0.7440398218496201,
520
+ "grad_norm": 0.12901411950588226,
521
+ "learning_rate": 8.810368603760249e-05,
522
+ "loss": 0.15571318864822387,
523
+ "step": 710
524
+ },
525
+ {
526
+ "epoch": 0.7545192559601781,
527
+ "grad_norm": 0.13449430465698242,
528
+ "learning_rate": 8.773527299401902e-05,
529
+ "loss": 0.16418551206588744,
530
+ "step": 720
531
+ },
532
+ {
533
+ "epoch": 0.7649986900707362,
534
+ "grad_norm": 0.10630270838737488,
535
+ "learning_rate": 8.736203752498218e-05,
536
+ "loss": 0.16800801753997802,
537
+ "step": 730
538
+ },
539
+ {
540
+ "epoch": 0.7754781241812942,
541
+ "grad_norm": 0.11299935728311539,
542
+ "learning_rate": 8.698402732857611e-05,
543
+ "loss": 0.15700833797454833,
544
+ "step": 740
545
+ },
546
+ {
547
+ "epoch": 0.7859575582918522,
548
+ "grad_norm": 0.11920930445194244,
549
+ "learning_rate": 8.660129071307707e-05,
550
+ "loss": 0.15091001987457275,
551
+ "step": 750
552
+ },
553
+ {
554
+ "epoch": 0.7859575582918522,
555
+ "eval_loss": 0.1356429010629654,
556
+ "eval_runtime": 94.0557,
557
+ "eval_samples_per_second": 3.307,
558
+ "eval_steps_per_second": 3.307,
559
+ "step": 750
560
+ },
561
+ {
562
+ "epoch": 0.7964369924024103,
563
+ "grad_norm": 0.13870343565940857,
564
+ "learning_rate": 8.621387659077986e-05,
565
+ "loss": 0.1422027826309204,
566
+ "step": 760
567
+ },
568
+ {
569
+ "epoch": 0.8069164265129684,
570
+ "grad_norm": 0.12753477692604065,
571
+ "learning_rate": 8.582183447174697e-05,
572
+ "loss": 0.142450213432312,
573
+ "step": 770
574
+ },
575
+ {
576
+ "epoch": 0.8173958606235263,
577
+ "grad_norm": 0.11877496540546417,
578
+ "learning_rate": 8.542521445748141e-05,
579
+ "loss": 0.15361062288284302,
580
+ "step": 780
581
+ },
582
+ {
583
+ "epoch": 0.8278752947340844,
584
+ "grad_norm": 0.1200249195098877,
585
+ "learning_rate": 8.502406723452392e-05,
586
+ "loss": 0.14647477865219116,
587
+ "step": 790
588
+ },
589
+ {
590
+ "epoch": 0.8383547288446423,
591
+ "grad_norm": 0.12913794815540314,
592
+ "learning_rate": 8.461844406797543e-05,
593
+ "loss": 0.1591552734375,
594
+ "step": 800
595
+ },
596
+ {
597
+ "epoch": 0.8488341629552004,
598
+ "grad_norm": 0.17270176112651825,
599
+ "learning_rate": 8.420839679494558e-05,
600
+ "loss": 0.1495436668395996,
601
+ "step": 810
602
+ },
603
+ {
604
+ "epoch": 0.8593135970657585,
605
+ "grad_norm": 0.15545596182346344,
606
+ "learning_rate": 8.379397781792808e-05,
607
+ "loss": 0.15377395153045653,
608
+ "step": 820
609
+ },
610
+ {
611
+ "epoch": 0.8697930311763165,
612
+ "grad_norm": 0.12941111624240875,
613
+ "learning_rate": 8.337524009810395e-05,
614
+ "loss": 0.14733861684799193,
615
+ "step": 830
616
+ },
617
+ {
618
+ "epoch": 0.8802724652868745,
619
+ "grad_norm": 0.13152749836444855,
620
+ "learning_rate": 8.295223714857319e-05,
621
+ "loss": 0.13980752229690552,
622
+ "step": 840
623
+ },
624
+ {
625
+ "epoch": 0.8907518993974325,
626
+ "grad_norm": 0.11208872497081757,
627
+ "learning_rate": 8.252502302751612e-05,
628
+ "loss": 0.12019969224929809,
629
+ "step": 850
630
+ },
631
+ {
632
+ "epoch": 0.9012313335079906,
633
+ "grad_norm": 0.11118603497743607,
634
+ "learning_rate": 8.209365233128482e-05,
635
+ "loss": 0.13822466135025024,
636
+ "step": 860
637
+ },
638
+ {
639
+ "epoch": 0.9117107676185486,
640
+ "grad_norm": 0.11705653369426727,
641
+ "learning_rate": 8.165818018742605e-05,
642
+ "loss": 0.1439664840698242,
643
+ "step": 870
644
+ },
645
+ {
646
+ "epoch": 0.9221902017291066,
647
+ "grad_norm": 0.08817730098962784,
648
+ "learning_rate": 8.121866224763606e-05,
649
+ "loss": 0.13380355834960939,
650
+ "step": 880
651
+ },
652
+ {
653
+ "epoch": 0.9326696358396647,
654
+ "grad_norm": 0.1092257872223854,
655
+ "learning_rate": 8.077515468064851e-05,
656
+ "loss": 0.12982802391052245,
657
+ "step": 890
658
+ },
659
+ {
660
+ "epoch": 0.9431490699502227,
661
+ "grad_norm": 0.12680962681770325,
662
+ "learning_rate": 8.032771416505647e-05,
663
+ "loss": 0.1489071011543274,
664
+ "step": 900
665
+ },
666
+ {
667
+ "epoch": 0.9536285040607807,
668
+ "grad_norm": 0.11953219771385193,
669
+ "learning_rate": 7.987639788206888e-05,
670
+ "loss": 0.14020267724990845,
671
+ "step": 910
672
+ },
673
+ {
674
+ "epoch": 0.9641079381713388,
675
+ "grad_norm": 0.1041467934846878,
676
+ "learning_rate": 7.942126350820318e-05,
677
+ "loss": 0.1439213275909424,
678
+ "step": 920
679
+ },
680
+ {
681
+ "epoch": 0.9745873722818967,
682
+ "grad_norm": 0.1277916431427002,
683
+ "learning_rate": 7.896236920791442e-05,
684
+ "loss": 0.1468779683113098,
685
+ "step": 930
686
+ },
687
+ {
688
+ "epoch": 0.9850668063924548,
689
+ "grad_norm": 0.11245205253362656,
690
+ "learning_rate": 7.849977362616201e-05,
691
+ "loss": 0.12012372016906739,
692
+ "step": 940
693
+ },
694
+ {
695
+ "epoch": 0.9955462405030129,
696
+ "grad_norm": 0.12230483442544937,
697
+ "learning_rate": 7.803353588091522e-05,
698
+ "loss": 0.1488939881324768,
699
+ "step": 950
700
+ },
701
+ {
702
+ "epoch": 1.005239717055279,
703
+ "grad_norm": 0.14185865223407745,
704
+ "learning_rate": 7.7563715555598e-05,
705
+ "loss": 0.11488113403320313,
706
+ "step": 960
707
+ },
708
+ {
709
+ "epoch": 1.015719151165837,
710
+ "grad_norm": 0.10545773804187775,
711
+ "learning_rate": 7.709037269147459e-05,
712
+ "loss": 0.10712549686431885,
713
+ "step": 970
714
+ },
715
+ {
716
+ "epoch": 1.026198585276395,
717
+ "grad_norm": 0.10376274585723877,
718
+ "learning_rate": 7.661356777997631e-05,
719
+ "loss": 0.11428828239440918,
720
+ "step": 980
721
+ },
722
+ {
723
+ "epoch": 1.0366780193869531,
724
+ "grad_norm": 0.09950564056634903,
725
+ "learning_rate": 7.613336175497111e-05,
726
+ "loss": 0.09823058247566223,
727
+ "step": 990
728
+ },
729
+ {
730
+ "epoch": 1.0471574534975112,
731
+ "grad_norm": 0.10412753373384476,
732
+ "learning_rate": 7.564981598497643e-05,
733
+ "loss": 0.1106558084487915,
734
+ "step": 1000
735
+ },
736
+ {
737
+ "epoch": 1.0471574534975112,
738
+ "eval_loss": 0.11185819655656815,
739
+ "eval_runtime": 93.808,
740
+ "eval_samples_per_second": 3.315,
741
+ "eval_steps_per_second": 3.315,
742
+ "step": 1000
743
+ },
744
+ {
745
+ "epoch": 1.057636887608069,
746
+ "grad_norm": 0.10430868715047836,
747
+ "learning_rate": 7.516299226531645e-05,
748
+ "loss": 0.11168640851974487,
749
+ "step": 1010
750
+ },
751
+ {
752
+ "epoch": 1.0681163217186271,
753
+ "grad_norm": 0.09646806865930557,
754
+ "learning_rate": 7.467295281022501e-05,
755
+ "loss": 0.10711305141448975,
756
+ "step": 1020
757
+ },
758
+ {
759
+ "epoch": 1.0785957558291852,
760
+ "grad_norm": 0.13060614466667175,
761
+ "learning_rate": 7.417976024489474e-05,
762
+ "loss": 0.10001810789108276,
763
+ "step": 1030
764
+ },
765
+ {
766
+ "epoch": 1.0890751899397433,
767
+ "grad_norm": 0.10389085114002228,
768
+ "learning_rate": 7.368347759747393e-05,
769
+ "loss": 0.11893858909606933,
770
+ "step": 1040
771
+ },
772
+ {
773
+ "epoch": 1.0995546240503014,
774
+ "grad_norm": 0.11291550099849701,
775
+ "learning_rate": 7.318416829101164e-05,
776
+ "loss": 0.1079628586769104,
777
+ "step": 1050
778
+ },
779
+ {
780
+ "epoch": 1.1100340581608594,
781
+ "grad_norm": 0.10372598469257355,
782
+ "learning_rate": 7.268189613535255e-05,
783
+ "loss": 0.10332397222518921,
784
+ "step": 1060
785
+ },
786
+ {
787
+ "epoch": 1.1205134922714173,
788
+ "grad_norm": 0.12971536815166473,
789
+ "learning_rate": 7.217672531898225e-05,
790
+ "loss": 0.10804877281188965,
791
+ "step": 1070
792
+ },
793
+ {
794
+ "epoch": 1.1309929263819753,
795
+ "grad_norm": 0.10902425646781921,
796
+ "learning_rate": 7.166872040082431e-05,
797
+ "loss": 0.09947454929351807,
798
+ "step": 1080
799
+ },
800
+ {
801
+ "epoch": 1.1414723604925334,
802
+ "grad_norm": 0.09305932372808456,
803
+ "learning_rate": 7.11579463019897e-05,
804
+ "loss": 0.09406971335411071,
805
+ "step": 1090
806
+ },
807
+ {
808
+ "epoch": 1.1519517946030915,
809
+ "grad_norm": 0.11485275626182556,
810
+ "learning_rate": 7.064446829748034e-05,
811
+ "loss": 0.09943979978561401,
812
+ "step": 1100
813
+ },
814
+ {
815
+ "epoch": 1.1624312287136496,
816
+ "grad_norm": 0.09556467831134796,
817
+ "learning_rate": 7.0128352007847e-05,
818
+ "loss": 0.10862170457839966,
819
+ "step": 1110
820
+ },
821
+ {
822
+ "epoch": 1.1729106628242074,
823
+ "grad_norm": 0.11937833577394485,
824
+ "learning_rate": 6.96096633908034e-05,
825
+ "loss": 0.10385221242904663,
826
+ "step": 1120
827
+ },
828
+ {
829
+ "epoch": 1.1833900969347655,
830
+ "grad_norm": 0.11560507863759995,
831
+ "learning_rate": 6.908846873279691e-05,
832
+ "loss": 0.09252402186393738,
833
+ "step": 1130
834
+ },
835
+ {
836
+ "epoch": 1.1938695310453236,
837
+ "grad_norm": 0.11119654029607773,
838
+ "learning_rate": 6.856483464053758e-05,
839
+ "loss": 0.09637172818183899,
840
+ "step": 1140
841
+ },
842
+ {
843
+ "epoch": 1.2043489651558816,
844
+ "grad_norm": 0.11722644418478012,
845
+ "learning_rate": 6.803882803248585e-05,
846
+ "loss": 0.09078751802444458,
847
+ "step": 1150
848
+ },
849
+ {
850
+ "epoch": 1.2148283992664397,
851
+ "grad_norm": 0.10487739741802216,
852
+ "learning_rate": 6.751051613030082e-05,
853
+ "loss": 0.10334972143173218,
854
+ "step": 1160
855
+ },
856
+ {
857
+ "epoch": 1.2253078333769976,
858
+ "grad_norm": 0.10202383995056152,
859
+ "learning_rate": 6.697996645024937e-05,
860
+ "loss": 0.08661433458328247,
861
+ "step": 1170
862
+ },
863
+ {
864
+ "epoch": 1.2357872674875556,
865
+ "grad_norm": 0.11801143735647202,
866
+ "learning_rate": 6.644724679457804e-05,
867
+ "loss": 0.0997927188873291,
868
+ "step": 1180
869
+ },
870
+ {
871
+ "epoch": 1.2462667015981137,
872
+ "grad_norm": 0.10949107259511948,
873
+ "learning_rate": 6.591242524284802e-05,
874
+ "loss": 0.0977592945098877,
875
+ "step": 1190
876
+ },
877
+ {
878
+ "epoch": 1.2567461357086718,
879
+ "grad_norm": 0.10221222043037415,
880
+ "learning_rate": 6.537557014323487e-05,
881
+ "loss": 0.0970361053943634,
882
+ "step": 1200
883
+ },
884
+ {
885
+ "epoch": 1.2672255698192298,
886
+ "grad_norm": 0.10554748773574829,
887
+ "learning_rate": 6.483675010379393e-05,
888
+ "loss": 0.09007551074028015,
889
+ "step": 1210
890
+ },
891
+ {
892
+ "epoch": 1.2777050039297877,
893
+ "grad_norm": 0.11625627428293228,
894
+ "learning_rate": 6.429603398369242e-05,
895
+ "loss": 0.08734490275382996,
896
+ "step": 1220
897
+ },
898
+ {
899
+ "epoch": 1.2881844380403458,
900
+ "grad_norm": 0.10624277591705322,
901
+ "learning_rate": 6.37534908844095e-05,
902
+ "loss": 0.09858485460281372,
903
+ "step": 1230
904
+ },
905
+ {
906
+ "epoch": 1.2986638721509038,
907
+ "grad_norm": 0.10184557735919952,
908
+ "learning_rate": 6.320919014090534e-05,
909
+ "loss": 0.09335023164749146,
910
+ "step": 1240
911
+ },
912
+ {
913
+ "epoch": 1.309143306261462,
914
+ "grad_norm": 0.10787283629179001,
915
+ "learning_rate": 6.266320131276051e-05,
916
+ "loss": 0.08665563464164734,
917
+ "step": 1250
918
+ },
919
+ {
920
+ "epoch": 1.309143306261462,
921
+ "eval_loss": 0.08951585739850998,
922
+ "eval_runtime": 94.0567,
923
+ "eval_samples_per_second": 3.307,
924
+ "eval_steps_per_second": 3.307,
925
+ "step": 1250
926
+ },
927
+ {
928
+ "epoch": 1.31962274037202,
929
+ "grad_norm": 0.10836981981992722,
930
+ "learning_rate": 6.211559417528631e-05,
931
+ "loss": 0.0933380126953125,
932
+ "step": 1260
933
+ },
934
+ {
935
+ "epoch": 1.3301021744825778,
936
+ "grad_norm": 0.1397171914577484,
937
+ "learning_rate": 6.156643871060795e-05,
938
+ "loss": 0.09835371971130372,
939
+ "step": 1270
940
+ },
941
+ {
942
+ "epoch": 1.340581608593136,
943
+ "grad_norm": 0.11242218315601349,
944
+ "learning_rate": 6.101580509872097e-05,
945
+ "loss": 0.09398673176765442,
946
+ "step": 1280
947
+ },
948
+ {
949
+ "epoch": 1.351061042703694,
950
+ "grad_norm": 0.10235017538070679,
951
+ "learning_rate": 6.0463763708522536e-05,
952
+ "loss": 0.10350929498672486,
953
+ "step": 1290
954
+ },
955
+ {
956
+ "epoch": 1.361540476814252,
957
+ "grad_norm": 0.09327106177806854,
958
+ "learning_rate": 5.99103850888186e-05,
959
+ "loss": 0.09580238461494446,
960
+ "step": 1300
961
+ },
962
+ {
963
+ "epoch": 1.3720199109248101,
964
+ "grad_norm": 0.12995658814907074,
965
+ "learning_rate": 5.9355739959307976e-05,
966
+ "loss": 0.08437412977218628,
967
+ "step": 1310
968
+ },
969
+ {
970
+ "epoch": 1.382499345035368,
971
+ "grad_norm": 0.11962983757257462,
972
+ "learning_rate": 5.879989920154466e-05,
973
+ "loss": 0.08409937620162963,
974
+ "step": 1320
975
+ },
976
+ {
977
+ "epoch": 1.392978779145926,
978
+ "grad_norm": 0.09431737661361694,
979
+ "learning_rate": 5.824293384987941e-05,
980
+ "loss": 0.09504773020744324,
981
+ "step": 1330
982
+ },
983
+ {
984
+ "epoch": 1.4034582132564841,
985
+ "grad_norm": 0.13824374973773956,
986
+ "learning_rate": 5.768491508238188e-05,
987
+ "loss": 0.09193333983421326,
988
+ "step": 1340
989
+ },
990
+ {
991
+ "epoch": 1.4139376473670422,
992
+ "grad_norm": 0.10595858097076416,
993
+ "learning_rate": 5.712591421174422e-05,
994
+ "loss": 0.08976472616195678,
995
+ "step": 1350
996
+ },
997
+ {
998
+ "epoch": 1.4244170814776003,
999
+ "grad_norm": 0.09911809861660004,
1000
+ "learning_rate": 5.6566002676167725e-05,
1001
+ "loss": 0.07597061395645141,
1002
+ "step": 1360
1003
+ },
1004
+ {
1005
+ "epoch": 1.4348965155881581,
1006
+ "grad_norm": 0.09723466634750366,
1007
+ "learning_rate": 5.60052520302332e-05,
1008
+ "loss": 0.10513757467269898,
1009
+ "step": 1370
1010
+ },
1011
+ {
1012
+ "epoch": 1.4453759496987162,
1013
+ "grad_norm": 0.11331687867641449,
1014
+ "learning_rate": 5.5443733935756615e-05,
1015
+ "loss": 0.09019948840141297,
1016
+ "step": 1380
1017
+ },
1018
+ {
1019
+ "epoch": 1.4558553838092743,
1020
+ "grad_norm": 0.13363589346408844,
1021
+ "learning_rate": 5.4881520152630886e-05,
1022
+ "loss": 0.08314153552055359,
1023
+ "step": 1390
1024
+ },
1025
+ {
1026
+ "epoch": 1.4663348179198323,
1027
+ "grad_norm": 0.14111892879009247,
1028
+ "learning_rate": 5.4318682529655404e-05,
1029
+ "loss": 0.07892010807991028,
1030
+ "step": 1400
1031
+ },
1032
+ {
1033
+ "epoch": 1.4768142520303904,
1034
+ "grad_norm": 0.13948485255241394,
1035
+ "learning_rate": 5.3755292995353913e-05,
1036
+ "loss": 0.0840128481388092,
1037
+ "step": 1410
1038
+ },
1039
+ {
1040
+ "epoch": 1.4872936861409483,
1041
+ "grad_norm": 0.12535949051380157,
1042
+ "learning_rate": 5.31914235487823e-05,
1043
+ "loss": 0.07869629859924317,
1044
+ "step": 1420
1045
+ },
1046
+ {
1047
+ "epoch": 1.4977731202515066,
1048
+ "grad_norm": 0.10041694343090057,
1049
+ "learning_rate": 5.2627146250327484e-05,
1050
+ "loss": 0.08074848055839538,
1051
+ "step": 1430
1052
+ },
1053
+ {
1054
+ "epoch": 1.5082525543620644,
1055
+ "grad_norm": 0.10112891346216202,
1056
+ "learning_rate": 5.2062533212498275e-05,
1057
+ "loss": 0.0860810935497284,
1058
+ "step": 1440
1059
+ },
1060
+ {
1061
+ "epoch": 1.5187319884726225,
1062
+ "grad_norm": 0.11297477036714554,
1063
+ "learning_rate": 5.149765659070973e-05,
1064
+ "loss": 0.08794642686843872,
1065
+ "step": 1450
1066
+ },
1067
+ {
1068
+ "epoch": 1.5292114225831805,
1069
+ "grad_norm": 0.10511091351509094,
1070
+ "learning_rate": 5.0932588574061945e-05,
1071
+ "loss": 0.07854819297790527,
1072
+ "step": 1460
1073
+ },
1074
+ {
1075
+ "epoch": 1.5396908566937384,
1076
+ "grad_norm": 0.09333530068397522,
1077
+ "learning_rate": 5.036740137611453e-05,
1078
+ "loss": 0.08821435570716858,
1079
+ "step": 1470
1080
+ },
1081
+ {
1082
+ "epoch": 1.5501702908042967,
1083
+ "grad_norm": 0.11480343341827393,
1084
+ "learning_rate": 4.980216722565804e-05,
1085
+ "loss": 0.08062278628349304,
1086
+ "step": 1480
1087
+ },
1088
+ {
1089
+ "epoch": 1.5606497249148545,
1090
+ "grad_norm": 0.08406255394220352,
1091
+ "learning_rate": 4.923695835748338e-05,
1092
+ "loss": 0.0940588355064392,
1093
+ "step": 1490
1094
+ },
1095
+ {
1096
+ "epoch": 1.5711291590254126,
1097
+ "grad_norm": 0.12927693128585815,
1098
+ "learning_rate": 4.8671847003150447e-05,
1099
+ "loss": 0.0775177538394928,
1100
+ "step": 1500
1101
+ },
1102
+ {
1103
+ "epoch": 1.5711291590254126,
1104
+ "eval_loss": 0.07877222448587418,
1105
+ "eval_runtime": 34.4389,
1106
+ "eval_samples_per_second": 9.03,
1107
+ "eval_steps_per_second": 9.03,
1108
+ "step": 1500
1109
+ },
1110
+ {
1111
+ "epoch": 1.5816085931359707,
1112
+ "grad_norm": 0.1255076378583908,
1113
+ "learning_rate": 4.810690538175728e-05,
1114
+ "loss": 0.09362970590591431,
1115
+ "step": 1510
1116
+ },
1117
+ {
1118
+ "epoch": 1.5920880272465285,
1119
+ "grad_norm": 0.1326853185892105,
1120
+ "learning_rate": 4.754220569071068e-05,
1121
+ "loss": 0.08364834189414978,
1122
+ "step": 1520
1123
+ },
1124
+ {
1125
+ "epoch": 1.6025674613570868,
1126
+ "grad_norm": 0.10229979455471039,
1127
+ "learning_rate": 4.697782009649962e-05,
1128
+ "loss": 0.0725843846797943,
1129
+ "step": 1530
1130
+ },
1131
+ {
1132
+ "epoch": 1.6130468954676447,
1133
+ "grad_norm": 0.11407258361577988,
1134
+ "learning_rate": 4.641382072547272e-05,
1135
+ "loss": 0.07566151022911072,
1136
+ "step": 1540
1137
+ },
1138
+ {
1139
+ "epoch": 1.6235263295782028,
1140
+ "grad_norm": 0.09398165345191956,
1141
+ "learning_rate": 4.585027965462075e-05,
1142
+ "loss": 0.087736576795578,
1143
+ "step": 1550
1144
+ },
1145
+ {
1146
+ "epoch": 1.6340057636887608,
1147
+ "grad_norm": 0.11289424449205399,
1148
+ "learning_rate": 4.528726890236544e-05,
1149
+ "loss": 0.08366051316261292,
1150
+ "step": 1560
1151
+ },
1152
+ {
1153
+ "epoch": 1.6444851977993187,
1154
+ "grad_norm": 0.09478718787431717,
1155
+ "learning_rate": 4.4724860419355746e-05,
1156
+ "loss": 0.0885531723499298,
1157
+ "step": 1570
1158
+ },
1159
+ {
1160
+ "epoch": 1.654964631909877,
1161
+ "grad_norm": 0.09163404256105423,
1162
+ "learning_rate": 4.416312607927295e-05,
1163
+ "loss": 0.08392030596733094,
1164
+ "step": 1580
1165
+ },
1166
+ {
1167
+ "epoch": 1.6654440660204348,
1168
+ "grad_norm": 0.11422222852706909,
1169
+ "learning_rate": 4.360213766964542e-05,
1170
+ "loss": 0.08059985041618348,
1171
+ "step": 1590
1172
+ },
1173
+ {
1174
+ "epoch": 1.675923500130993,
1175
+ "grad_norm": 0.08131479471921921,
1176
+ "learning_rate": 4.304196688267438e-05,
1177
+ "loss": 0.07613803148269653,
1178
+ "step": 1600
1179
+ },
1180
+ {
1181
+ "epoch": 1.686402934241551,
1182
+ "grad_norm": 0.09615079313516617,
1183
+ "learning_rate": 4.248268530607199e-05,
1184
+ "loss": 0.07764078378677368,
1185
+ "step": 1610
1186
+ },
1187
+ {
1188
+ "epoch": 1.696882368352109,
1189
+ "grad_norm": 0.09730526059865952,
1190
+ "learning_rate": 4.192436441391271e-05,
1191
+ "loss": 0.07644452452659607,
1192
+ "step": 1620
1193
+ },
1194
+ {
1195
+ "epoch": 1.707361802462667,
1196
+ "grad_norm": 0.09649327397346497,
1197
+ "learning_rate": 4.136707555749907e-05,
1198
+ "loss": 0.07866159081459045,
1199
+ "step": 1630
1200
+ },
1201
+ {
1202
+ "epoch": 1.717841236573225,
1203
+ "grad_norm": 0.11804413050413132,
1204
+ "learning_rate": 4.0810889956243415e-05,
1205
+ "loss": 0.06996130347251892,
1206
+ "step": 1640
1207
+ },
1208
+ {
1209
+ "epoch": 1.728320670683783,
1210
+ "grad_norm": 0.09874672442674637,
1211
+ "learning_rate": 4.025587868856622e-05,
1212
+ "loss": 0.07877404093742371,
1213
+ "step": 1650
1214
+ },
1215
+ {
1216
+ "epoch": 1.738800104794341,
1217
+ "grad_norm": 0.11149467527866364,
1218
+ "learning_rate": 3.9702112682812544e-05,
1219
+ "loss": 0.07241421341896057,
1220
+ "step": 1660
1221
+ },
1222
+ {
1223
+ "epoch": 1.7492795389048992,
1224
+ "grad_norm": 0.08748896420001984,
1225
+ "learning_rate": 3.914966270818766e-05,
1226
+ "loss": 0.07336459755897522,
1227
+ "step": 1670
1228
+ },
1229
+ {
1230
+ "epoch": 1.7597589730154573,
1231
+ "grad_norm": 0.1172696202993393,
1232
+ "learning_rate": 3.859859936571307e-05,
1233
+ "loss": 0.07742337584495544,
1234
+ "step": 1680
1235
+ },
1236
+ {
1237
+ "epoch": 1.770238407126015,
1238
+ "grad_norm": 0.0719197615981102,
1239
+ "learning_rate": 3.8048993079203925e-05,
1240
+ "loss": 0.06242966651916504,
1241
+ "step": 1690
1242
+ },
1243
+ {
1244
+ "epoch": 1.7807178412365732,
1245
+ "grad_norm": 0.12380168586969376,
1246
+ "learning_rate": 3.750091408626907e-05,
1247
+ "loss": 0.07270430326461792,
1248
+ "step": 1700
1249
+ },
1250
+ {
1251
+ "epoch": 1.7911972753471312,
1252
+ "grad_norm": 0.1587221622467041,
1253
+ "learning_rate": 3.6954432429335015e-05,
1254
+ "loss": 0.06409866213798524,
1255
+ "step": 1710
1256
+ },
1257
+ {
1258
+ "epoch": 1.8016767094576893,
1259
+ "grad_norm": 0.10983912646770477,
1260
+ "learning_rate": 3.640961794669482e-05,
1261
+ "loss": 0.06610031127929687,
1262
+ "step": 1720
1263
+ },
1264
+ {
1265
+ "epoch": 1.8121561435682474,
1266
+ "grad_norm": 0.11023026704788208,
1267
+ "learning_rate": 3.586654026358287e-05,
1268
+ "loss": 0.06866579055786133,
1269
+ "step": 1730
1270
+ },
1271
+ {
1272
+ "epoch": 1.8226355776788052,
1273
+ "grad_norm": 0.11857719719409943,
1274
+ "learning_rate": 3.532526878327719e-05,
1275
+ "loss": 0.06734356880187989,
1276
+ "step": 1740
1277
+ },
1278
+ {
1279
+ "epoch": 1.8331150117893635,
1280
+ "grad_norm": 0.09280339628458023,
1281
+ "learning_rate": 3.478587267822987e-05,
1282
+ "loss": 0.06897796392440796,
1283
+ "step": 1750
1284
+ },
1285
+ {
1286
+ "epoch": 1.8331150117893635,
1287
+ "eval_loss": 0.06596127897500992,
1288
+ "eval_runtime": 35.5001,
1289
+ "eval_samples_per_second": 8.761,
1290
+ "eval_steps_per_second": 8.761,
1291
+ "step": 1750
1292
+ },
1293
+ {
1294
+ "epoch": 1.8435944458999214,
1295
+ "grad_norm": 0.1175367683172226,
1296
+ "learning_rate": 3.424842088122716e-05,
1297
+ "loss": 0.08288194537162781,
1298
+ "step": 1760
1299
+ },
1300
+ {
1301
+ "epoch": 1.8540738800104795,
1302
+ "grad_norm": 0.10271462798118591,
1303
+ "learning_rate": 3.371298207658003e-05,
1304
+ "loss": 0.05643013119697571,
1305
+ "step": 1770
1306
+ },
1307
+ {
1308
+ "epoch": 1.8645533141210375,
1309
+ "grad_norm": 0.11965195834636688,
1310
+ "learning_rate": 3.3179624691346654e-05,
1311
+ "loss": 0.07403092980384826,
1312
+ "step": 1780
1313
+ },
1314
+ {
1315
+ "epoch": 1.8750327482315954,
1316
+ "grad_norm": 0.09981680661439896,
1317
+ "learning_rate": 3.2648416886587686e-05,
1318
+ "loss": 0.07118859887123108,
1319
+ "step": 1790
1320
+ },
1321
+ {
1322
+ "epoch": 1.8855121823421537,
1323
+ "grad_norm": 0.07787375897169113,
1324
+ "learning_rate": 3.2119426548655435e-05,
1325
+ "loss": 0.07219682335853576,
1326
+ "step": 1800
1327
+ },
1328
+ {
1329
+ "epoch": 1.8959916164527115,
1330
+ "grad_norm": 0.1303507387638092,
1331
+ "learning_rate": 3.1592721280518404e-05,
1332
+ "loss": 0.07636030912399291,
1333
+ "step": 1810
1334
+ },
1335
+ {
1336
+ "epoch": 1.9064710505632696,
1337
+ "grad_norm": 0.09162267297506332,
1338
+ "learning_rate": 3.106836839312175e-05,
1339
+ "loss": 0.06230143308639526,
1340
+ "step": 1820
1341
+ },
1342
+ {
1343
+ "epoch": 1.9169504846738277,
1344
+ "grad_norm": 0.11375878751277924,
1345
+ "learning_rate": 3.054643489678526e-05,
1346
+ "loss": 0.060506826639175414,
1347
+ "step": 1830
1348
+ },
1349
+ {
1350
+ "epoch": 1.9274299187843855,
1351
+ "grad_norm": 0.1377716213464737,
1352
+ "learning_rate": 3.0026987492639668e-05,
1353
+ "loss": 0.08148540854454041,
1354
+ "step": 1840
1355
+ },
1356
+ {
1357
+ "epoch": 1.9379093528949438,
1358
+ "grad_norm": 0.10483554750680923,
1359
+ "learning_rate": 2.951009256410255e-05,
1360
+ "loss": 0.07040726542472839,
1361
+ "step": 1850
1362
+ },
1363
+ {
1364
+ "epoch": 1.9483887870055017,
1365
+ "grad_norm": 0.08736151456832886,
1366
+ "learning_rate": 2.8995816168394702e-05,
1367
+ "loss": 0.04931557774543762,
1368
+ "step": 1860
1369
+ },
1370
+ {
1371
+ "epoch": 1.9588682211160597,
1372
+ "grad_norm": 0.11461569368839264,
1373
+ "learning_rate": 2.848422402809828e-05,
1374
+ "loss": 0.057559752464294435,
1375
+ "step": 1870
1376
+ },
1377
+ {
1378
+ "epoch": 1.9693476552266178,
1379
+ "grad_norm": 0.09060918539762497,
1380
+ "learning_rate": 2.7975381522757803e-05,
1381
+ "loss": 0.06379705667495728,
1382
+ "step": 1880
1383
+ },
1384
+ {
1385
+ "epoch": 1.9798270893371757,
1386
+ "grad_norm": 0.07104971259832382,
1387
+ "learning_rate": 2.746935368052477e-05,
1388
+ "loss": 0.05813115239143372,
1389
+ "step": 1890
1390
+ },
1391
+ {
1392
+ "epoch": 1.990306523447734,
1393
+ "grad_norm": 0.10802938044071198,
1394
+ "learning_rate": 2.696620516984733e-05,
1395
+ "loss": 0.07732833027839661,
1396
+ "step": 1900
1397
+ },
1398
+ {
1399
+ "epoch": 2.0,
1400
+ "grad_norm": 0.16884952783584595,
1401
+ "learning_rate": 2.6466000291206004e-05,
1402
+ "loss": 0.06166202425956726,
1403
+ "step": 1910
1404
+ },
1405
+ {
1406
+ "epoch": 2.010479434110558,
1407
+ "grad_norm": 0.08582179993391037,
1408
+ "learning_rate": 2.5968802968896228e-05,
1409
+ "loss": 0.04766199886798859,
1410
+ "step": 1920
1411
+ },
1412
+ {
1413
+ "epoch": 2.020958868221116,
1414
+ "grad_norm": 0.1457364708185196,
1415
+ "learning_rate": 2.5474676742859048e-05,
1416
+ "loss": 0.03826354146003723,
1417
+ "step": 1930
1418
+ },
1419
+ {
1420
+ "epoch": 2.031438302331674,
1421
+ "grad_norm": 0.09275342524051666,
1422
+ "learning_rate": 2.4983684760561023e-05,
1423
+ "loss": 0.045059433579444884,
1424
+ "step": 1940
1425
+ },
1426
+ {
1427
+ "epoch": 2.0419177364422323,
1428
+ "grad_norm": 0.09085927903652191,
1429
+ "learning_rate": 2.44958897689242e-05,
1430
+ "loss": 0.04904903173446655,
1431
+ "step": 1950
1432
+ },
1433
+ {
1434
+ "epoch": 2.05239717055279,
1435
+ "grad_norm": 0.11733179539442062,
1436
+ "learning_rate": 2.401135410630731e-05,
1437
+ "loss": 0.05008396506309509,
1438
+ "step": 1960
1439
+ },
1440
+ {
1441
+ "epoch": 2.062876604663348,
1442
+ "grad_norm": 0.0894237607717514,
1443
+ "learning_rate": 2.3530139694539095e-05,
1444
+ "loss": 0.04057626128196716,
1445
+ "step": 1970
1446
+ },
1447
+ {
1448
+ "epoch": 2.0733560387739063,
1449
+ "grad_norm": 0.08560927212238312,
1450
+ "learning_rate": 2.305230803100496e-05,
1451
+ "loss": 0.04843136668205261,
1452
+ "step": 1980
1453
+ },
1454
+ {
1455
+ "epoch": 2.083835472884464,
1456
+ "grad_norm": 0.07991836220026016,
1457
+ "learning_rate": 2.257792018078793e-05,
1458
+ "loss": 0.0544127106666565,
1459
+ "step": 1990
1460
+ },
1461
+ {
1462
+ "epoch": 2.0943149069950224,
1463
+ "grad_norm": 0.08846250921487808,
1464
+ "learning_rate": 2.210703676886461e-05,
1465
+ "loss": 0.0459000825881958,
1466
+ "step": 2000
1467
+ },
1468
+ {
1469
+ "epoch": 2.0943149069950224,
1470
+ "eval_loss": 0.060011014342308044,
1471
+ "eval_runtime": 36.3755,
1472
+ "eval_samples_per_second": 8.55,
1473
+ "eval_steps_per_second": 8.55,
1474
+ "step": 2000
1475
+ },
1476
+ {
1477
+ "epoch": 2.1047943411055803,
1478
+ "grad_norm": 0.10082945972681046,
1479
+ "learning_rate": 2.1639717972357678e-05,
1480
+ "loss": 0.038090622425079344,
1481
+ "step": 2010
1482
+ },
1483
+ {
1484
+ "epoch": 2.115273775216138,
1485
+ "grad_norm": 0.05712248757481575,
1486
+ "learning_rate": 2.1176023512845376e-05,
1487
+ "loss": 0.04598597884178161,
1488
+ "step": 2020
1489
+ },
1490
+ {
1491
+ "epoch": 2.1257532093266964,
1492
+ "grad_norm": 0.11628362536430359,
1493
+ "learning_rate": 2.0716012648729353e-05,
1494
+ "loss": 0.04984880685806274,
1495
+ "step": 2030
1496
+ },
1497
+ {
1498
+ "epoch": 2.1362326434372543,
1499
+ "grad_norm": 0.10635484755039215,
1500
+ "learning_rate": 2.025974416766171e-05,
1501
+ "loss": 0.04293925166130066,
1502
+ "step": 2040
1503
+ },
1504
+ {
1505
+ "epoch": 2.1467120775478126,
1506
+ "grad_norm": 0.1017381027340889,
1507
+ "learning_rate": 1.9807276379032113e-05,
1508
+ "loss": 0.04305694401264191,
1509
+ "step": 2050
1510
+ },
1511
+ {
1512
+ "epoch": 2.1571915116583704,
1513
+ "grad_norm": 0.13550882041454315,
1514
+ "learning_rate": 1.9358667106516055e-05,
1515
+ "loss": 0.04478869140148163,
1516
+ "step": 2060
1517
+ },
1518
+ {
1519
+ "epoch": 2.1676709457689283,
1520
+ "grad_norm": 0.08526366949081421,
1521
+ "learning_rate": 1.8913973680685226e-05,
1522
+ "loss": 0.036646312475204466,
1523
+ "step": 2070
1524
+ },
1525
+ {
1526
+ "epoch": 2.1781503798794866,
1527
+ "grad_norm": 0.10932011157274246,
1528
+ "learning_rate": 1.8473252931680928e-05,
1529
+ "loss": 0.042200219631195066,
1530
+ "step": 2080
1531
+ },
1532
+ {
1533
+ "epoch": 2.1886298139900444,
1534
+ "grad_norm": 0.08768360316753387,
1535
+ "learning_rate": 1.803656118195136e-05,
1536
+ "loss": 0.0437488317489624,
1537
+ "step": 2090
1538
+ },
1539
+ {
1540
+ "epoch": 2.1991092481006027,
1541
+ "grad_norm": 0.08362651616334915,
1542
+ "learning_rate": 1.760395423905379e-05,
1543
+ "loss": 0.04669668078422547,
1544
+ "step": 2100
1545
+ },
1546
+ {
1547
+ "epoch": 2.2095886822111606,
1548
+ "grad_norm": 0.08554034680128098,
1549
+ "learning_rate": 1.7175487388522588e-05,
1550
+ "loss": 0.034989356994628906,
1551
+ "step": 2110
1552
+ },
1553
+ {
1554
+ "epoch": 2.220068116321719,
1555
+ "grad_norm": 0.08215561509132385,
1556
+ "learning_rate": 1.6751215386803986e-05,
1557
+ "loss": 0.040298929810523985,
1558
+ "step": 2120
1559
+ },
1560
+ {
1561
+ "epoch": 2.2305475504322767,
1562
+ "grad_norm": 0.0840689167380333,
1563
+ "learning_rate": 1.6331192454258337e-05,
1564
+ "loss": 0.041704925894737246,
1565
+ "step": 2130
1566
+ },
1567
+ {
1568
+ "epoch": 2.2410269845428346,
1569
+ "grad_norm": 0.06530614197254181,
1570
+ "learning_rate": 1.5915472268231018e-05,
1571
+ "loss": 0.03651900887489319,
1572
+ "step": 2140
1573
+ },
1574
+ {
1575
+ "epoch": 2.251506418653393,
1576
+ "grad_norm": 0.12431822717189789,
1577
+ "learning_rate": 1.550410795619261e-05,
1578
+ "loss": 0.04806804955005646,
1579
+ "step": 2150
1580
+ },
1581
+ {
1582
+ "epoch": 2.2619858527639507,
1583
+ "grad_norm": 0.09592410176992416,
1584
+ "learning_rate": 1.509715208894949e-05,
1585
+ "loss": 0.0454313725233078,
1586
+ "step": 2160
1587
+ },
1588
+ {
1589
+ "epoch": 2.2724652868745085,
1590
+ "grad_norm": 0.07589780539274216,
1591
+ "learning_rate": 1.469465667392536e-05,
1592
+ "loss": 0.03574602603912354,
1593
+ "step": 2170
1594
+ },
1595
+ {
1596
+ "epoch": 2.282944720985067,
1597
+ "grad_norm": 0.09734483063220978,
1598
+ "learning_rate": 1.4296673148515038e-05,
1599
+ "loss": 0.04358702301979065,
1600
+ "step": 2180
1601
+ },
1602
+ {
1603
+ "epoch": 2.2934241550956247,
1604
+ "grad_norm": 0.0974339172244072,
1605
+ "learning_rate": 1.3903252373510838e-05,
1606
+ "loss": 0.04603351950645447,
1607
+ "step": 2190
1608
+ },
1609
+ {
1610
+ "epoch": 2.303903589206183,
1611
+ "grad_norm": 0.09025271981954575,
1612
+ "learning_rate": 1.3514444626602773e-05,
1613
+ "loss": 0.040065237879753114,
1614
+ "step": 2200
1615
+ },
1616
+ {
1617
+ "epoch": 2.314383023316741,
1618
+ "grad_norm": 0.07625086605548859,
1619
+ "learning_rate": 1.3130299595953338e-05,
1620
+ "loss": 0.044061675667762756,
1621
+ "step": 2210
1622
+ },
1623
+ {
1624
+ "epoch": 2.324862457427299,
1625
+ "grad_norm": 0.07306221127510071,
1626
+ "learning_rate": 1.2750866373847465e-05,
1627
+ "loss": 0.03366467654705048,
1628
+ "step": 2220
1629
+ },
1630
+ {
1631
+ "epoch": 2.335341891537857,
1632
+ "grad_norm": 0.08357638120651245,
1633
+ "learning_rate": 1.2376193450418715e-05,
1634
+ "loss": 0.041424044966697694,
1635
+ "step": 2230
1636
+ },
1637
+ {
1638
+ "epoch": 2.345821325648415,
1639
+ "grad_norm": 0.09153921157121658,
1640
+ "learning_rate": 1.2006328707452459e-05,
1641
+ "loss": 0.03938372135162353,
1642
+ "step": 2240
1643
+ },
1644
+ {
1645
+ "epoch": 2.356300759758973,
1646
+ "grad_norm": 0.09109660983085632,
1647
+ "learning_rate": 1.1641319412266765e-05,
1648
+ "loss": 0.04015985131263733,
1649
+ "step": 2250
1650
+ },
1651
+ {
1652
+ "epoch": 2.356300759758973,
1653
+ "eval_loss": 0.05486458167433739,
1654
+ "eval_runtime": 36.8119,
1655
+ "eval_samples_per_second": 8.448,
1656
+ "eval_steps_per_second": 8.448,
1657
+ "step": 2250
1658
+ },
1659
+ {
1660
+ "epoch": 2.366780193869531,
1661
+ "grad_norm": 0.052502721548080444,
1662
+ "learning_rate": 1.1281212211671822e-05,
1663
+ "loss": 0.0270554780960083,
1664
+ "step": 2260
1665
+ },
1666
+ {
1667
+ "epoch": 2.377259627980089,
1668
+ "grad_norm": 0.07931812107563019,
1669
+ "learning_rate": 1.0926053126008584e-05,
1670
+ "loss": 0.0417300134897232,
1671
+ "step": 2270
1672
+ },
1673
+ {
1674
+ "epoch": 2.387739062090647,
1675
+ "grad_norm": 0.08996254205703735,
1676
+ "learning_rate": 1.0575887543267609e-05,
1677
+ "loss": 0.037659955024719236,
1678
+ "step": 2280
1679
+ },
1680
+ {
1681
+ "epoch": 2.398218496201205,
1682
+ "grad_norm": 0.08800788223743439,
1683
+ "learning_rate": 1.023076021328867e-05,
1684
+ "loss": 0.048437944054603575,
1685
+ "step": 2290
1686
+ },
1687
+ {
1688
+ "epoch": 2.4086979303117633,
1689
+ "grad_norm": 0.10572271049022675,
1690
+ "learning_rate": 9.890715242041787e-06,
1691
+ "loss": 0.04166909456253052,
1692
+ "step": 2300
1693
+ },
1694
+ {
1695
+ "epoch": 2.419177364422321,
1696
+ "grad_norm": 0.10573071986436844,
1697
+ "learning_rate": 9.555796085990781e-06,
1698
+ "loss": 0.03919607996940613,
1699
+ "step": 2310
1700
+ },
1701
+ {
1702
+ "epoch": 2.4296567985328794,
1703
+ "grad_norm": 0.09714583307504654,
1704
+ "learning_rate": 9.226045546539608e-06,
1705
+ "loss": 0.03530588150024414,
1706
+ "step": 2320
1707
+ },
1708
+ {
1709
+ "epoch": 2.4401362326434373,
1710
+ "grad_norm": 0.09436199069023132,
1711
+ "learning_rate": 8.901505764562518e-06,
1712
+ "loss": 0.05111382007598877,
1713
+ "step": 2330
1714
+ },
1715
+ {
1716
+ "epoch": 2.450615666753995,
1717
+ "grad_norm": 0.06353961676359177,
1718
+ "learning_rate": 8.582218215018656e-06,
1719
+ "loss": 0.03805697858333588,
1720
+ "step": 2340
1721
+ },
1722
+ {
1723
+ "epoch": 2.4610951008645534,
1724
+ "grad_norm": 0.08853815495967865,
1725
+ "learning_rate": 8.268223701651684e-06,
1726
+ "loss": 0.04815975427627563,
1727
+ "step": 2350
1728
+ },
1729
+ {
1730
+ "epoch": 2.4715745349751113,
1731
+ "grad_norm": 0.07472016662359238,
1732
+ "learning_rate": 7.959562351775196e-06,
1733
+ "loss": 0.042247459292411804,
1734
+ "step": 2360
1735
+ },
1736
+ {
1737
+ "epoch": 2.4820539690856696,
1738
+ "grad_norm": 0.12121549248695374,
1739
+ "learning_rate": 7.656273611144632e-06,
1740
+ "loss": 0.040102115273475646,
1741
+ "step": 2370
1742
+ },
1743
+ {
1744
+ "epoch": 2.4925334031962274,
1745
+ "grad_norm": 0.08667747676372528,
1746
+ "learning_rate": 7.358396238916254e-06,
1747
+ "loss": 0.03656341433525086,
1748
+ "step": 2380
1749
+ },
1750
+ {
1751
+ "epoch": 2.5030128373067857,
1752
+ "grad_norm": 0.1162872165441513,
1753
+ "learning_rate": 7.065968302693882e-06,
1754
+ "loss": 0.04052766263484955,
1755
+ "step": 2390
1756
+ },
1757
+ {
1758
+ "epoch": 2.5134922714173435,
1759
+ "grad_norm": 0.07924140989780426,
1760
+ "learning_rate": 6.7790271736639595e-06,
1761
+ "loss": 0.03394221067428589,
1762
+ "step": 2400
1763
+ },
1764
+ {
1765
+ "epoch": 2.5239717055279014,
1766
+ "grad_norm": 0.09523408859968185,
1767
+ "learning_rate": 6.497609521819681e-06,
1768
+ "loss": 0.04119439423084259,
1769
+ "step": 2410
1770
+ },
1771
+ {
1772
+ "epoch": 2.5344511396384597,
1773
+ "grad_norm": 0.12182598561048508,
1774
+ "learning_rate": 6.221751311274731e-06,
1775
+ "loss": 0.05154783725738525,
1776
+ "step": 2420
1777
+ },
1778
+ {
1779
+ "epoch": 2.5449305737490175,
1780
+ "grad_norm": 0.09359873831272125,
1781
+ "learning_rate": 5.951487795667149e-06,
1782
+ "loss": 0.035483264923095705,
1783
+ "step": 2430
1784
+ },
1785
+ {
1786
+ "epoch": 2.5554100078595754,
1787
+ "grad_norm": 0.08514095097780228,
1788
+ "learning_rate": 5.686853513654117e-06,
1789
+ "loss": 0.03830339312553406,
1790
+ "step": 2440
1791
+ },
1792
+ {
1793
+ "epoch": 2.5658894419701337,
1794
+ "grad_norm": 0.10625084489583969,
1795
+ "learning_rate": 5.4278822844979705e-06,
1796
+ "loss": 0.034111028909683226,
1797
+ "step": 2450
1798
+ },
1799
+ {
1800
+ "epoch": 2.5763688760806915,
1801
+ "grad_norm": 0.1004003956913948,
1802
+ "learning_rate": 5.174607203744286e-06,
1803
+ "loss": 0.04465605318546295,
1804
+ "step": 2460
1805
+ },
1806
+ {
1807
+ "epoch": 2.58684831019125,
1808
+ "grad_norm": 0.0962519720196724,
1809
+ "learning_rate": 4.927060638992382e-06,
1810
+ "loss": 0.041056016087532045,
1811
+ "step": 2470
1812
+ },
1813
+ {
1814
+ "epoch": 2.5973277443018077,
1815
+ "grad_norm": 0.06380607187747955,
1816
+ "learning_rate": 4.685274225758846e-06,
1817
+ "loss": 0.03880062401294708,
1818
+ "step": 2480
1819
+ },
1820
+ {
1821
+ "epoch": 2.607807178412366,
1822
+ "grad_norm": 0.07326535880565643,
1823
+ "learning_rate": 4.449278863434647e-06,
1824
+ "loss": 0.03194461762905121,
1825
+ "step": 2490
1826
+ },
1827
+ {
1828
+ "epoch": 2.618286612522924,
1829
+ "grad_norm": 0.12218596786260605,
1830
+ "learning_rate": 4.2191047113362854e-06,
1831
+ "loss": 0.04258840978145599,
1832
+ "step": 2500
1833
+ },
1834
+ {
1835
+ "epoch": 2.618286612522924,
1836
+ "eval_loss": 0.05223666876554489,
1837
+ "eval_runtime": 37.7234,
1838
+ "eval_samples_per_second": 8.244,
1839
+ "eval_steps_per_second": 8.244,
1840
+ "step": 2500
1841
+ },
1842
+ {
1843
+ "epoch": 2.6287660466334817,
1844
+ "grad_norm": 0.08594664931297302,
1845
+ "learning_rate": 3.994781184851598e-06,
1846
+ "loss": 0.04302787780761719,
1847
+ "step": 2510
1848
+ },
1849
+ {
1850
+ "epoch": 2.63924548074404,
1851
+ "grad_norm": 0.08187596499919891,
1852
+ "learning_rate": 3.776336951680548e-06,
1853
+ "loss": 0.0341387003660202,
1854
+ "step": 2520
1855
+ },
1856
+ {
1857
+ "epoch": 2.649724914854598,
1858
+ "grad_norm": 0.10216796398162842,
1859
+ "learning_rate": 3.563799928171596e-06,
1860
+ "loss": 0.04289879500865936,
1861
+ "step": 2530
1862
+ },
1863
+ {
1864
+ "epoch": 2.6602043489651557,
1865
+ "grad_norm": 0.11215174198150635,
1866
+ "learning_rate": 3.3571972757540814e-06,
1867
+ "loss": 0.04055049121379852,
1868
+ "step": 2540
1869
+ },
1870
+ {
1871
+ "epoch": 2.670683783075714,
1872
+ "grad_norm": 0.07941269129514694,
1873
+ "learning_rate": 3.156555397467176e-06,
1874
+ "loss": 0.04118689000606537,
1875
+ "step": 2550
1876
+ },
1877
+ {
1878
+ "epoch": 2.681163217186272,
1879
+ "grad_norm": 0.09404437988996506,
1880
+ "learning_rate": 2.9618999345855547e-06,
1881
+ "loss": 0.03079705536365509,
1882
+ "step": 2560
1883
+ },
1884
+ {
1885
+ "epoch": 2.69164265129683,
1886
+ "grad_norm": 0.1109817698597908,
1887
+ "learning_rate": 2.773255763342647e-06,
1888
+ "loss": 0.038885954022407535,
1889
+ "step": 2570
1890
+ },
1891
+ {
1892
+ "epoch": 2.702122085407388,
1893
+ "grad_norm": 0.09431962668895721,
1894
+ "learning_rate": 2.590646991751472e-06,
1895
+ "loss": 0.043543145060539246,
1896
+ "step": 2580
1897
+ },
1898
+ {
1899
+ "epoch": 2.7126015195179463,
1900
+ "grad_norm": 0.08184763044118881,
1901
+ "learning_rate": 2.414096956523776e-06,
1902
+ "loss": 0.03256987631320953,
1903
+ "step": 2590
1904
+ },
1905
+ {
1906
+ "epoch": 2.723080953628504,
1907
+ "grad_norm": 0.08390141278505325,
1908
+ "learning_rate": 2.2436282200876458e-06,
1909
+ "loss": 0.03908055424690247,
1910
+ "step": 2600
1911
+ },
1912
+ {
1913
+ "epoch": 2.733560387739062,
1914
+ "grad_norm": 0.0762532502412796,
1915
+ "learning_rate": 2.07926256770416e-06,
1916
+ "loss": 0.04899201393127441,
1917
+ "step": 2610
1918
+ },
1919
+ {
1920
+ "epoch": 2.7440398218496203,
1921
+ "grad_norm": 0.08239631354808807,
1922
+ "learning_rate": 1.9210210046832768e-06,
1923
+ "loss": 0.048707082867622375,
1924
+ "step": 2620
1925
+ },
1926
+ {
1927
+ "epoch": 2.754519255960178,
1928
+ "grad_norm": 0.09619107842445374,
1929
+ "learning_rate": 1.7689237536994364e-06,
1930
+ "loss": 0.0372231125831604,
1931
+ "step": 2630
1932
+ },
1933
+ {
1934
+ "epoch": 2.764998690070736,
1935
+ "grad_norm": 0.07099667191505432,
1936
+ "learning_rate": 1.6229902522072293e-06,
1937
+ "loss": 0.03421170711517334,
1938
+ "step": 2640
1939
+ },
1940
+ {
1941
+ "epoch": 2.7754781241812942,
1942
+ "grad_norm": 0.10154753923416138,
1943
+ "learning_rate": 1.4832391499572996e-06,
1944
+ "loss": 0.03656705319881439,
1945
+ "step": 2650
1946
+ },
1947
+ {
1948
+ "epoch": 2.785957558291852,
1949
+ "grad_norm": 0.09349387139081955,
1950
+ "learning_rate": 1.3496883066130173e-06,
1951
+ "loss": 0.03710306882858276,
1952
+ "step": 2660
1953
+ },
1954
+ {
1955
+ "epoch": 2.7964369924024104,
1956
+ "grad_norm": 0.061091430485248566,
1957
+ "learning_rate": 1.2223547894680443e-06,
1958
+ "loss": 0.0308389812707901,
1959
+ "step": 2670
1960
+ },
1961
+ {
1962
+ "epoch": 2.8069164265129682,
1963
+ "grad_norm": 0.09838075935840607,
1964
+ "learning_rate": 1.101254871265256e-06,
1965
+ "loss": 0.03703555166721344,
1966
+ "step": 2680
1967
+ },
1968
+ {
1969
+ "epoch": 2.8173958606235265,
1970
+ "grad_norm": 0.10046928375959396,
1971
+ "learning_rate": 9.864040281170938e-07,
1972
+ "loss": 0.04500553905963898,
1973
+ "step": 2690
1974
+ },
1975
+ {
1976
+ "epoch": 2.8278752947340844,
1977
+ "grad_norm": 0.06770773977041245,
1978
+ "learning_rate": 8.778169375277978e-07,
1979
+ "loss": 0.03823737502098083,
1980
+ "step": 2700
1981
+ },
1982
+ {
1983
+ "epoch": 2.8383547288446422,
1984
+ "grad_norm": 0.08373535424470901,
1985
+ "learning_rate": 7.755074765176618e-07,
1986
+ "loss": 0.03961678743362427,
1987
+ "step": 2710
1988
+ },
1989
+ {
1990
+ "epoch": 2.8488341629552005,
1991
+ "grad_norm": 0.07590050995349884,
1992
+ "learning_rate": 6.794887198496413e-07,
1993
+ "loss": 0.03221273124217987,
1994
+ "step": 2720
1995
+ },
1996
+ {
1997
+ "epoch": 2.8593135970657584,
1998
+ "grad_norm": 0.08507678657770157,
1999
+ "learning_rate": 5.897729383583906e-07,
2000
+ "loss": 0.04571912884712219,
2001
+ "step": 2730
2002
+ },
2003
+ {
2004
+ "epoch": 2.8697930311763162,
2005
+ "grad_norm": 0.06584763526916504,
2006
+ "learning_rate": 5.063715973821659e-07,
2007
+ "loss": 0.03794914484024048,
2008
+ "step": 2740
2009
+ },
2010
+ {
2011
+ "epoch": 2.8802724652868745,
2012
+ "grad_norm": 0.07312892377376556,
2013
+ "learning_rate": 4.292953552975154e-07,
2014
+ "loss": 0.036365586519241336,
2015
+ "step": 2750
2016
+ },
2017
+ {
2018
+ "epoch": 2.8802724652868745,
2019
+ "eval_loss": 0.05090421438217163,
2020
+ "eval_runtime": 85.293,
2021
+ "eval_samples_per_second": 3.646,
2022
+ "eval_steps_per_second": 3.646,
2023
+ "step": 2750
2024
+ }
2025
+ ],
2026
+ "logging_steps": 10,
2027
+ "max_steps": 2865,
2028
+ "num_input_tokens_seen": 0,
2029
+ "num_train_epochs": 3,
2030
+ "save_steps": 250,
2031
+ "stateful_callbacks": {
2032
+ "TrainerControl": {
2033
+ "args": {
2034
+ "should_epoch_stop": false,
2035
+ "should_evaluate": false,
2036
+ "should_log": false,
2037
+ "should_save": true,
2038
+ "should_training_stop": false
2039
+ },
2040
+ "attributes": {}
2041
+ }
2042
+ },
2043
+ "total_flos": 8.668152022199163e+17,
2044
+ "train_batch_size": 2,
2045
+ "trial_name": null,
2046
+ "trial_params": null
2047
+ }
checkpoint-2750/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e8fc737554ff6f82c4ea137b5313611e3b2b3b63fd69b3926d6b1fe9da14c0a6
3
+ size 5201
checkpoint-2865/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen3.5-2B
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:Qwen/Qwen3.5-2B
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.0
checkpoint-2865/adapter_config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "Qwen/Qwen3.5-2B",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 64,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.05,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.0",
27
+ "qalora_group_size": 16,
28
+ "r": 32,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "v_proj",
33
+ "k_proj",
34
+ "gate_proj",
35
+ "o_proj",
36
+ "down_proj",
37
+ "up_proj",
38
+ "q_proj"
39
+ ],
40
+ "target_parameters": null,
41
+ "task_type": "CAUSAL_LM",
42
+ "trainable_token_indices": null,
43
+ "use_dora": false,
44
+ "use_qalora": false,
45
+ "use_rslora": false
46
+ }
checkpoint-2865/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e54460326b97c66aced3c8ec3a50427b59111b42282d8638b4bbbe132d510518
3
+ size 87319256
checkpoint-2865/chat_template.jinja ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- set image_count = namespace(value=0) %}
2
+ {%- set video_count = namespace(value=0) %}
3
+ {%- macro render_content(content, do_vision_count, is_system_content=false) %}
4
+ {%- if content is string %}
5
+ {{- content }}
6
+ {%- elif content is iterable and content is not mapping %}
7
+ {%- for item in content %}
8
+ {%- if 'image' in item or 'image_url' in item or item.type == 'image' %}
9
+ {%- if is_system_content %}
10
+ {{- raise_exception('System message cannot contain images.') }}
11
+ {%- endif %}
12
+ {%- if do_vision_count %}
13
+ {%- set image_count.value = image_count.value + 1 %}
14
+ {%- endif %}
15
+ {%- if add_vision_id %}
16
+ {{- 'Picture ' ~ image_count.value ~ ': ' }}
17
+ {%- endif %}
18
+ {{- '<|vision_start|><|image_pad|><|vision_end|>' }}
19
+ {%- elif 'video' in item or item.type == 'video' %}
20
+ {%- if is_system_content %}
21
+ {{- raise_exception('System message cannot contain videos.') }}
22
+ {%- endif %}
23
+ {%- if do_vision_count %}
24
+ {%- set video_count.value = video_count.value + 1 %}
25
+ {%- endif %}
26
+ {%- if add_vision_id %}
27
+ {{- 'Video ' ~ video_count.value ~ ': ' }}
28
+ {%- endif %}
29
+ {{- '<|vision_start|><|video_pad|><|vision_end|>' }}
30
+ {%- elif 'text' in item %}
31
+ {{- item.text }}
32
+ {%- else %}
33
+ {{- raise_exception('Unexpected item type in content.') }}
34
+ {%- endif %}
35
+ {%- endfor %}
36
+ {%- elif content is none or content is undefined %}
37
+ {{- '' }}
38
+ {%- else %}
39
+ {{- raise_exception('Unexpected content type.') }}
40
+ {%- endif %}
41
+ {%- endmacro %}
42
+ {%- if not messages %}
43
+ {{- raise_exception('No messages provided.') }}
44
+ {%- endif %}
45
+ {%- if tools and tools is iterable and tools is not mapping %}
46
+ {{- '<|im_start|>system\n' }}
47
+ {{- "# Tools\n\nYou have access to the following functions:\n\n<tools>" }}
48
+ {%- for tool in tools %}
49
+ {{- "\n" }}
50
+ {{- tool | tojson }}
51
+ {%- endfor %}
52
+ {{- "\n</tools>" }}
53
+ {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
54
+ {%- if messages[0].role == 'system' %}
55
+ {%- set content = render_content(messages[0].content, false, true)|trim %}
56
+ {%- if content %}
57
+ {{- '\n\n' + content }}
58
+ {%- endif %}
59
+ {%- endif %}
60
+ {{- '<|im_end|>\n' }}
61
+ {%- else %}
62
+ {%- if messages[0].role == 'system' %}
63
+ {%- set content = render_content(messages[0].content, false, true)|trim %}
64
+ {{- '<|im_start|>system\n' + content + '<|im_end|>\n' }}
65
+ {%- endif %}
66
+ {%- endif %}
67
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
68
+ {%- for message in messages[::-1] %}
69
+ {%- set index = (messages|length - 1) - loop.index0 %}
70
+ {%- if ns.multi_step_tool and message.role == "user" %}
71
+ {%- set content = render_content(message.content, false)|trim %}
72
+ {%- if not(content.startswith('<tool_response>') and content.endswith('</tool_response>')) %}
73
+ {%- set ns.multi_step_tool = false %}
74
+ {%- set ns.last_query_index = index %}
75
+ {%- endif %}
76
+ {%- endif %}
77
+ {%- endfor %}
78
+ {%- if ns.multi_step_tool %}
79
+ {{- raise_exception('No user query found in messages.') }}
80
+ {%- endif %}
81
+ {%- for message in messages %}
82
+ {%- set content = render_content(message.content, true)|trim %}
83
+ {%- if message.role == "system" %}
84
+ {%- if not loop.first %}
85
+ {{- raise_exception('System message must be at the beginning.') }}
86
+ {%- endif %}
87
+ {%- elif message.role == "user" %}
88
+ {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
89
+ {%- elif message.role == "assistant" %}
90
+ {%- set reasoning_content = '' %}
91
+ {%- if message.reasoning_content is string %}
92
+ {%- set reasoning_content = message.reasoning_content %}
93
+ {%- else %}
94
+ {%- if '</think>' in content %}
95
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
96
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
97
+ {%- endif %}
98
+ {%- endif %}
99
+ {%- set reasoning_content = reasoning_content|trim %}
100
+ {%- if loop.index0 > ns.last_query_index %}
101
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content + '\n</think>\n\n' + content }}
102
+ {%- else %}
103
+ {{- '<|im_start|>' + message.role + '\n' + content }}
104
+ {%- endif %}
105
+ {%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}
106
+ {%- for tool_call in message.tool_calls %}
107
+ {%- if tool_call.function is defined %}
108
+ {%- set tool_call = tool_call.function %}
109
+ {%- endif %}
110
+ {%- if loop.first %}
111
+ {%- if content|trim %}
112
+ {{- '\n\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
113
+ {%- else %}
114
+ {{- '<tool_call>\n<function=' + tool_call.name + '>\n' }}
115
+ {%- endif %}
116
+ {%- else %}
117
+ {{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
118
+ {%- endif %}
119
+ {%- if tool_call.arguments is defined %}
120
+ {%- for args_name, args_value in tool_call.arguments|items %}
121
+ {{- '<parameter=' + args_name + '>\n' }}
122
+ {%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}
123
+ {{- args_value }}
124
+ {{- '\n</parameter>\n' }}
125
+ {%- endfor %}
126
+ {%- endif %}
127
+ {{- '</function>\n</tool_call>' }}
128
+ {%- endfor %}
129
+ {%- endif %}
130
+ {{- '<|im_end|>\n' }}
131
+ {%- elif message.role == "tool" %}
132
+ {%- if loop.previtem and loop.previtem.role != "tool" %}
133
+ {{- '<|im_start|>user' }}
134
+ {%- endif %}
135
+ {{- '\n<tool_response>\n' }}
136
+ {{- content }}
137
+ {{- '\n</tool_response>' }}
138
+ {%- if not loop.last and loop.nextitem.role != "tool" %}
139
+ {{- '<|im_end|>\n' }}
140
+ {%- elif loop.last %}
141
+ {{- '<|im_end|>\n' }}
142
+ {%- endif %}
143
+ {%- else %}
144
+ {{- raise_exception('Unexpected message role.') }}
145
+ {%- endif %}
146
+ {%- endfor %}
147
+ {%- if add_generation_prompt %}
148
+ {{- '<|im_start|>assistant\n' }}
149
+ {%- if enable_thinking is defined and enable_thinking is true %}
150
+ {{- '<think>\n' }}
151
+ {%- else %}
152
+ {{- '<think>\n\n</think>\n\n' }}
153
+ {%- endif %}
154
+ {%- endif %}
checkpoint-2865/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a4e94fd092ed2523d6e2e9f17a72149ce8dc0997b192119b210e5713146e635f
3
+ size 174750283
checkpoint-2865/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cad28a71806e0eabf48ed08b2dec44bd87e88427c15cd75dc56fa5f7a84126dd
3
+ size 14645
checkpoint-2865/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8d5623d9d5ab3d6dfaf03bedc8ed63928ffd6ae34c2b75efbaf2c90b81268293
3
+ size 1465
checkpoint-2865/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:87a7830d63fcf43bf241c3c5242e96e62dd3fdc29224ca26fed8ea333db72de4
3
+ size 19989343
checkpoint-2865/tokenizer_config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "audio_bos_token": "<|audio_start|>",
4
+ "audio_eos_token": "<|audio_end|>",
5
+ "audio_token": "<|audio_pad|>",
6
+ "backend": "tokenizers",
7
+ "bos_token": null,
8
+ "clean_up_tokenization_spaces": false,
9
+ "eos_token": "<|im_end|>",
10
+ "errors": "replace",
11
+ "image_token": "<|image_pad|>",
12
+ "is_local": true,
13
+ "model_max_length": 262144,
14
+ "model_specific_special_tokens": {
15
+ "audio_bos_token": "<|audio_start|>",
16
+ "audio_eos_token": "<|audio_end|>",
17
+ "audio_token": "<|audio_pad|>",
18
+ "image_token": "<|image_pad|>",
19
+ "video_token": "<|video_pad|>",
20
+ "vision_bos_token": "<|vision_start|>",
21
+ "vision_eos_token": "<|vision_end|>"
22
+ },
23
+ "pad_token": "<|endoftext|>",
24
+ "pretokenize_regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?[\\p{L}\\p{M}]+|\\p{N}| ?[^\\s\\p{L}\\p{M}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
25
+ "split_special_tokens": false,
26
+ "tokenizer_class": "TokenizersBackend",
27
+ "unk_token": null,
28
+ "video_token": "<|video_pad|>",
29
+ "vision_bos_token": "<|vision_start|>",
30
+ "vision_eos_token": "<|vision_end|>"
31
+ }
checkpoint-2865/trainer_state.json ADDED
@@ -0,0 +1,2124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 3.0,
6
+ "eval_steps": 250,
7
+ "global_step": 2865,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.010479434110558029,
14
+ "grad_norm": 0.19915591180324554,
15
+ "learning_rate": 1.0465116279069768e-05,
16
+ "loss": 1.1350045204162598,
17
+ "step": 10
18
+ },
19
+ {
20
+ "epoch": 0.020958868221116058,
21
+ "grad_norm": 0.18158815801143646,
22
+ "learning_rate": 2.2093023255813955e-05,
23
+ "loss": 1.0580164909362793,
24
+ "step": 20
25
+ },
26
+ {
27
+ "epoch": 0.03143830233167409,
28
+ "grad_norm": 0.16481591761112213,
29
+ "learning_rate": 3.372093023255814e-05,
30
+ "loss": 0.9252842903137207,
31
+ "step": 30
32
+ },
33
+ {
34
+ "epoch": 0.041917736442232116,
35
+ "grad_norm": 0.15599584579467773,
36
+ "learning_rate": 4.5348837209302326e-05,
37
+ "loss": 0.8342072486877441,
38
+ "step": 40
39
+ },
40
+ {
41
+ "epoch": 0.05239717055279015,
42
+ "grad_norm": 0.1804327368736267,
43
+ "learning_rate": 5.697674418604652e-05,
44
+ "loss": 0.7955524921417236,
45
+ "step": 50
46
+ },
47
+ {
48
+ "epoch": 0.06287660466334818,
49
+ "grad_norm": 0.16934047639369965,
50
+ "learning_rate": 6.86046511627907e-05,
51
+ "loss": 0.7358035087585449,
52
+ "step": 60
53
+ },
54
+ {
55
+ "epoch": 0.07335603877390622,
56
+ "grad_norm": 0.2234930843114853,
57
+ "learning_rate": 8.023255813953489e-05,
58
+ "loss": 0.6985861301422119,
59
+ "step": 70
60
+ },
61
+ {
62
+ "epoch": 0.08383547288446423,
63
+ "grad_norm": 0.16290400922298431,
64
+ "learning_rate": 9.186046511627907e-05,
65
+ "loss": 0.599607515335083,
66
+ "step": 80
67
+ },
68
+ {
69
+ "epoch": 0.09431490699502226,
70
+ "grad_norm": 0.1660464107990265,
71
+ "learning_rate": 9.999971245570617e-05,
72
+ "loss": 0.5886398315429687,
73
+ "step": 90
74
+ },
75
+ {
76
+ "epoch": 0.1047943411055803,
77
+ "grad_norm": 0.16978025436401367,
78
+ "learning_rate": 9.999460064915317e-05,
79
+ "loss": 0.5450529098510742,
80
+ "step": 100
81
+ },
82
+ {
83
+ "epoch": 0.11527377521613832,
84
+ "grad_norm": 0.21447990834712982,
85
+ "learning_rate": 9.998309972134645e-05,
86
+ "loss": 0.5072262287139893,
87
+ "step": 110
88
+ },
89
+ {
90
+ "epoch": 0.12575320932669637,
91
+ "grad_norm": 0.17418669164180756,
92
+ "learning_rate": 9.996521114206116e-05,
93
+ "loss": 0.49445347785949706,
94
+ "step": 120
95
+ },
96
+ {
97
+ "epoch": 0.13623264343725439,
98
+ "grad_norm": 0.22226351499557495,
99
+ "learning_rate": 9.994093719739023e-05,
100
+ "loss": 0.47142682075500486,
101
+ "step": 130
102
+ },
103
+ {
104
+ "epoch": 0.14671207754781243,
105
+ "grad_norm": 0.1745530068874359,
106
+ "learning_rate": 9.991028098945215e-05,
107
+ "loss": 0.46663532257080076,
108
+ "step": 140
109
+ },
110
+ {
111
+ "epoch": 0.15719151165837045,
112
+ "grad_norm": 0.17074695229530334,
113
+ "learning_rate": 9.987324643599459e-05,
114
+ "loss": 0.4508847236633301,
115
+ "step": 150
116
+ },
117
+ {
118
+ "epoch": 0.16767094576892846,
119
+ "grad_norm": 0.13428406417369843,
120
+ "learning_rate": 9.982983826989367e-05,
121
+ "loss": 0.40740265846252444,
122
+ "step": 160
123
+ },
124
+ {
125
+ "epoch": 0.1781503798794865,
126
+ "grad_norm": 0.17766578495502472,
127
+ "learning_rate": 9.978006203854918e-05,
128
+ "loss": 0.3998516321182251,
129
+ "step": 170
130
+ },
131
+ {
132
+ "epoch": 0.18862981399004453,
133
+ "grad_norm": 0.1672629565000534,
134
+ "learning_rate": 9.972392410317562e-05,
135
+ "loss": 0.41658673286437986,
136
+ "step": 180
137
+ },
138
+ {
139
+ "epoch": 0.19910924810060257,
140
+ "grad_norm": 0.1333673745393753,
141
+ "learning_rate": 9.96614316379892e-05,
142
+ "loss": 0.37024455070495604,
143
+ "step": 190
144
+ },
145
+ {
146
+ "epoch": 0.2095886822111606,
147
+ "grad_norm": 0.18037110567092896,
148
+ "learning_rate": 9.959259262929113e-05,
149
+ "loss": 0.35086841583251954,
150
+ "step": 200
151
+ },
152
+ {
153
+ "epoch": 0.22006811632171863,
154
+ "grad_norm": 0.14616410434246063,
155
+ "learning_rate": 9.951741587444683e-05,
156
+ "loss": 0.37918968200683595,
157
+ "step": 210
158
+ },
159
+ {
160
+ "epoch": 0.23054755043227665,
161
+ "grad_norm": 0.14523574709892273,
162
+ "learning_rate": 9.943591098076184e-05,
163
+ "loss": 0.32804527282714846,
164
+ "step": 220
165
+ },
166
+ {
167
+ "epoch": 0.2410269845428347,
168
+ "grad_norm": 0.14667049050331116,
169
+ "learning_rate": 9.934808836425393e-05,
170
+ "loss": 0.3480507850646973,
171
+ "step": 230
172
+ },
173
+ {
174
+ "epoch": 0.25150641865339274,
175
+ "grad_norm": 0.18156558275222778,
176
+ "learning_rate": 9.925395924832198e-05,
177
+ "loss": 0.3300448179244995,
178
+ "step": 240
179
+ },
180
+ {
181
+ "epoch": 0.26198585276395076,
182
+ "grad_norm": 0.13806430995464325,
183
+ "learning_rate": 9.91535356623117e-05,
184
+ "loss": 0.3127591609954834,
185
+ "step": 250
186
+ },
187
+ {
188
+ "epoch": 0.26198585276395076,
189
+ "eval_loss": 0.3132782578468323,
190
+ "eval_runtime": 94.8848,
191
+ "eval_samples_per_second": 3.278,
192
+ "eval_steps_per_second": 3.278,
193
+ "step": 250
194
+ },
195
+ {
196
+ "epoch": 0.27246528687450877,
197
+ "grad_norm": 0.17205959558486938,
198
+ "learning_rate": 9.904683043997835e-05,
199
+ "loss": 0.3306673288345337,
200
+ "step": 260
201
+ },
202
+ {
203
+ "epoch": 0.2829447209850668,
204
+ "grad_norm": 0.12620031833648682,
205
+ "learning_rate": 9.893385721784656e-05,
206
+ "loss": 0.3011106729507446,
207
+ "step": 270
208
+ },
209
+ {
210
+ "epoch": 0.29342415509562486,
211
+ "grad_norm": 0.11466006934642792,
212
+ "learning_rate": 9.881463043346768e-05,
213
+ "loss": 0.2951968669891357,
214
+ "step": 280
215
+ },
216
+ {
217
+ "epoch": 0.3039035892061829,
218
+ "grad_norm": 0.1671207845211029,
219
+ "learning_rate": 9.868916532357475e-05,
220
+ "loss": 0.2910990953445435,
221
+ "step": 290
222
+ },
223
+ {
224
+ "epoch": 0.3143830233167409,
225
+ "grad_norm": 0.1683349907398224,
226
+ "learning_rate": 9.855747792213521e-05,
227
+ "loss": 0.31409192085266113,
228
+ "step": 300
229
+ },
230
+ {
231
+ "epoch": 0.3248624574272989,
232
+ "grad_norm": 0.12934699654579163,
233
+ "learning_rate": 9.84195850583019e-05,
234
+ "loss": 0.27755858898162844,
235
+ "step": 310
236
+ },
237
+ {
238
+ "epoch": 0.33534189153785693,
239
+ "grad_norm": 0.13784605264663696,
240
+ "learning_rate": 9.827550435426234e-05,
241
+ "loss": 0.2809821605682373,
242
+ "step": 320
243
+ },
244
+ {
245
+ "epoch": 0.345821325648415,
246
+ "grad_norm": 0.18590271472930908,
247
+ "learning_rate": 9.812525422298664e-05,
248
+ "loss": 0.28698866367340087,
249
+ "step": 330
250
+ },
251
+ {
252
+ "epoch": 0.356300759758973,
253
+ "grad_norm": 0.1704522967338562,
254
+ "learning_rate": 9.796885386587447e-05,
255
+ "loss": 0.250814414024353,
256
+ "step": 340
257
+ },
258
+ {
259
+ "epoch": 0.36678019386953103,
260
+ "grad_norm": 0.1316167265176773,
261
+ "learning_rate": 9.780632327030112e-05,
262
+ "loss": 0.25458922386169436,
263
+ "step": 350
264
+ },
265
+ {
266
+ "epoch": 0.37725962798008905,
267
+ "grad_norm": 0.16226200759410858,
268
+ "learning_rate": 9.763768320706319e-05,
269
+ "loss": 0.26563262939453125,
270
+ "step": 360
271
+ },
272
+ {
273
+ "epoch": 0.3877390620906471,
274
+ "grad_norm": 0.1297195851802826,
275
+ "learning_rate": 9.746295522772424e-05,
276
+ "loss": 0.2632328748703003,
277
+ "step": 370
278
+ },
279
+ {
280
+ "epoch": 0.39821849620120514,
281
+ "grad_norm": 0.1286139190196991,
282
+ "learning_rate": 9.728216166186049e-05,
283
+ "loss": 0.2624588251113892,
284
+ "step": 380
285
+ },
286
+ {
287
+ "epoch": 0.40869793031176316,
288
+ "grad_norm": 0.1587965339422226,
289
+ "learning_rate": 9.709532561420725e-05,
290
+ "loss": 0.24741590023040771,
291
+ "step": 390
292
+ },
293
+ {
294
+ "epoch": 0.4191773644223212,
295
+ "grad_norm": 0.11963177472352982,
296
+ "learning_rate": 9.690247096170615e-05,
297
+ "loss": 0.22777397632598878,
298
+ "step": 400
299
+ },
300
+ {
301
+ "epoch": 0.42965679853287925,
302
+ "grad_norm": 0.13638927042484283,
303
+ "learning_rate": 9.670362235045387e-05,
304
+ "loss": 0.23324952125549317,
305
+ "step": 410
306
+ },
307
+ {
308
+ "epoch": 0.44013623264343726,
309
+ "grad_norm": 0.1514088362455368,
310
+ "learning_rate": 9.649880519255232e-05,
311
+ "loss": 0.2505915880203247,
312
+ "step": 420
313
+ },
314
+ {
315
+ "epoch": 0.4506156667539953,
316
+ "grad_norm": 0.10994207113981247,
317
+ "learning_rate": 9.62880456628612e-05,
318
+ "loss": 0.2078850269317627,
319
+ "step": 430
320
+ },
321
+ {
322
+ "epoch": 0.4610951008645533,
323
+ "grad_norm": 0.11983369290828705,
324
+ "learning_rate": 9.607137069565288e-05,
325
+ "loss": 0.21452484130859376,
326
+ "step": 440
327
+ },
328
+ {
329
+ "epoch": 0.47157453497511137,
330
+ "grad_norm": 0.12684305012226105,
331
+ "learning_rate": 9.58488079811703e-05,
332
+ "loss": 0.22002685070037842,
333
+ "step": 450
334
+ },
335
+ {
336
+ "epoch": 0.4820539690856694,
337
+ "grad_norm": 0.16841623187065125,
338
+ "learning_rate": 9.562038596208828e-05,
339
+ "loss": 0.21405396461486817,
340
+ "step": 460
341
+ },
342
+ {
343
+ "epoch": 0.4925334031962274,
344
+ "grad_norm": 0.1498555839061737,
345
+ "learning_rate": 9.538613382987865e-05,
346
+ "loss": 0.20534911155700683,
347
+ "step": 470
348
+ },
349
+ {
350
+ "epoch": 0.5030128373067855,
351
+ "grad_norm": 0.13913628458976746,
352
+ "learning_rate": 9.514608152107974e-05,
353
+ "loss": 0.22248730659484864,
354
+ "step": 480
355
+ },
356
+ {
357
+ "epoch": 0.5134922714173434,
358
+ "grad_norm": 0.14408951997756958,
359
+ "learning_rate": 9.490025971347047e-05,
360
+ "loss": 0.214866042137146,
361
+ "step": 490
362
+ },
363
+ {
364
+ "epoch": 0.5239717055279015,
365
+ "grad_norm": 0.1649770438671112,
366
+ "learning_rate": 9.464869982215001e-05,
367
+ "loss": 0.19965900182724,
368
+ "step": 500
369
+ },
370
+ {
371
+ "epoch": 0.5239717055279015,
372
+ "eval_loss": 0.19267401099205017,
373
+ "eval_runtime": 95.3374,
374
+ "eval_samples_per_second": 3.262,
375
+ "eval_steps_per_second": 3.262,
376
+ "step": 500
377
+ },
378
+ {
379
+ "epoch": 0.5344511396384595,
380
+ "grad_norm": 0.1305568665266037,
381
+ "learning_rate": 9.439143399552291e-05,
382
+ "loss": 0.21112546920776368,
383
+ "step": 510
384
+ },
385
+ {
386
+ "epoch": 0.5449305737490175,
387
+ "grad_norm": 0.11998175084590912,
388
+ "learning_rate": 9.412849511119074e-05,
389
+ "loss": 0.21422922611236572,
390
+ "step": 520
391
+ },
392
+ {
393
+ "epoch": 0.5554100078595756,
394
+ "grad_norm": 0.15220341086387634,
395
+ "learning_rate": 9.385991677175046e-05,
396
+ "loss": 0.20999882221221924,
397
+ "step": 530
398
+ },
399
+ {
400
+ "epoch": 0.5658894419701336,
401
+ "grad_norm": 0.13170023262500763,
402
+ "learning_rate": 9.358573330050004e-05,
403
+ "loss": 0.20208392143249512,
404
+ "step": 540
405
+ },
406
+ {
407
+ "epoch": 0.5763688760806917,
408
+ "grad_norm": 0.10457764565944672,
409
+ "learning_rate": 9.330597973705219e-05,
410
+ "loss": 0.1908803701400757,
411
+ "step": 550
412
+ },
413
+ {
414
+ "epoch": 0.5868483101912497,
415
+ "grad_norm": 0.12568537890911102,
416
+ "learning_rate": 9.302069183285637e-05,
417
+ "loss": 0.19316340684890748,
418
+ "step": 560
419
+ },
420
+ {
421
+ "epoch": 0.5973277443018077,
422
+ "grad_norm": 0.14824528992176056,
423
+ "learning_rate": 9.272990604662988e-05,
424
+ "loss": 0.18987581729888917,
425
+ "step": 570
426
+ },
427
+ {
428
+ "epoch": 0.6078071784123658,
429
+ "grad_norm": 0.14521734416484833,
430
+ "learning_rate": 9.243365953969861e-05,
431
+ "loss": 0.19232832193374633,
432
+ "step": 580
433
+ },
434
+ {
435
+ "epoch": 0.6182866125229237,
436
+ "grad_norm": 0.1335408091545105,
437
+ "learning_rate": 9.213199017124793e-05,
438
+ "loss": 0.1758212924003601,
439
+ "step": 590
440
+ },
441
+ {
442
+ "epoch": 0.6287660466334818,
443
+ "grad_norm": 0.11143071949481964,
444
+ "learning_rate": 9.182493649348447e-05,
445
+ "loss": 0.19117680788040162,
446
+ "step": 600
447
+ },
448
+ {
449
+ "epoch": 0.6392454807440399,
450
+ "grad_norm": 0.14789296686649323,
451
+ "learning_rate": 9.151253774670921e-05,
452
+ "loss": 0.184559965133667,
453
+ "step": 610
454
+ },
455
+ {
456
+ "epoch": 0.6497249148545978,
457
+ "grad_norm": 0.10541336238384247,
458
+ "learning_rate": 9.119483385430283e-05,
459
+ "loss": 0.1720304846763611,
460
+ "step": 620
461
+ },
462
+ {
463
+ "epoch": 0.6602043489651559,
464
+ "grad_norm": 0.12105975300073624,
465
+ "learning_rate": 9.087186541762358e-05,
466
+ "loss": 0.17654836177825928,
467
+ "step": 630
468
+ },
469
+ {
470
+ "epoch": 0.6706837830757139,
471
+ "grad_norm": 0.13114669919013977,
472
+ "learning_rate": 9.054367371081858e-05,
473
+ "loss": 0.1696592688560486,
474
+ "step": 640
475
+ },
476
+ {
477
+ "epoch": 0.6811632171862719,
478
+ "grad_norm": 0.13745592534542084,
479
+ "learning_rate": 9.021030067554919e-05,
480
+ "loss": 0.15404462814331055,
481
+ "step": 650
482
+ },
483
+ {
484
+ "epoch": 0.69164265129683,
485
+ "grad_norm": 0.15927442908287048,
486
+ "learning_rate": 8.987178891563094e-05,
487
+ "loss": 0.17024366855621337,
488
+ "step": 660
489
+ },
490
+ {
491
+ "epoch": 0.702122085407388,
492
+ "grad_norm": 0.13737429678440094,
493
+ "learning_rate": 8.952818169158903e-05,
494
+ "loss": 0.1602048397064209,
495
+ "step": 670
496
+ },
497
+ {
498
+ "epoch": 0.712601519517946,
499
+ "grad_norm": 0.13941751420497894,
500
+ "learning_rate": 8.91795229151297e-05,
501
+ "loss": 0.18057082891464232,
502
+ "step": 680
503
+ },
504
+ {
505
+ "epoch": 0.7230809536285041,
506
+ "grad_norm": 0.14242954552173615,
507
+ "learning_rate": 8.882585714352856e-05,
508
+ "loss": 0.14863334894180297,
509
+ "step": 690
510
+ },
511
+ {
512
+ "epoch": 0.7335603877390621,
513
+ "grad_norm": 0.15553542971611023,
514
+ "learning_rate": 8.846722957393626e-05,
515
+ "loss": 0.15701137781143187,
516
+ "step": 700
517
+ },
518
+ {
519
+ "epoch": 0.7440398218496201,
520
+ "grad_norm": 0.12901411950588226,
521
+ "learning_rate": 8.810368603760249e-05,
522
+ "loss": 0.15571318864822387,
523
+ "step": 710
524
+ },
525
+ {
526
+ "epoch": 0.7545192559601781,
527
+ "grad_norm": 0.13449430465698242,
528
+ "learning_rate": 8.773527299401902e-05,
529
+ "loss": 0.16418551206588744,
530
+ "step": 720
531
+ },
532
+ {
533
+ "epoch": 0.7649986900707362,
534
+ "grad_norm": 0.10630270838737488,
535
+ "learning_rate": 8.736203752498218e-05,
536
+ "loss": 0.16800801753997802,
537
+ "step": 730
538
+ },
539
+ {
540
+ "epoch": 0.7754781241812942,
541
+ "grad_norm": 0.11299935728311539,
542
+ "learning_rate": 8.698402732857611e-05,
543
+ "loss": 0.15700833797454833,
544
+ "step": 740
545
+ },
546
+ {
547
+ "epoch": 0.7859575582918522,
548
+ "grad_norm": 0.11920930445194244,
549
+ "learning_rate": 8.660129071307707e-05,
550
+ "loss": 0.15091001987457275,
551
+ "step": 750
552
+ },
553
+ {
554
+ "epoch": 0.7859575582918522,
555
+ "eval_loss": 0.1356429010629654,
556
+ "eval_runtime": 94.0557,
557
+ "eval_samples_per_second": 3.307,
558
+ "eval_steps_per_second": 3.307,
559
+ "step": 750
560
+ },
561
+ {
562
+ "epoch": 0.7964369924024103,
563
+ "grad_norm": 0.13870343565940857,
564
+ "learning_rate": 8.621387659077986e-05,
565
+ "loss": 0.1422027826309204,
566
+ "step": 760
567
+ },
568
+ {
569
+ "epoch": 0.8069164265129684,
570
+ "grad_norm": 0.12753477692604065,
571
+ "learning_rate": 8.582183447174697e-05,
572
+ "loss": 0.142450213432312,
573
+ "step": 770
574
+ },
575
+ {
576
+ "epoch": 0.8173958606235263,
577
+ "grad_norm": 0.11877496540546417,
578
+ "learning_rate": 8.542521445748141e-05,
579
+ "loss": 0.15361062288284302,
580
+ "step": 780
581
+ },
582
+ {
583
+ "epoch": 0.8278752947340844,
584
+ "grad_norm": 0.1200249195098877,
585
+ "learning_rate": 8.502406723452392e-05,
586
+ "loss": 0.14647477865219116,
587
+ "step": 790
588
+ },
589
+ {
590
+ "epoch": 0.8383547288446423,
591
+ "grad_norm": 0.12913794815540314,
592
+ "learning_rate": 8.461844406797543e-05,
593
+ "loss": 0.1591552734375,
594
+ "step": 800
595
+ },
596
+ {
597
+ "epoch": 0.8488341629552004,
598
+ "grad_norm": 0.17270176112651825,
599
+ "learning_rate": 8.420839679494558e-05,
600
+ "loss": 0.1495436668395996,
601
+ "step": 810
602
+ },
603
+ {
604
+ "epoch": 0.8593135970657585,
605
+ "grad_norm": 0.15545596182346344,
606
+ "learning_rate": 8.379397781792808e-05,
607
+ "loss": 0.15377395153045653,
608
+ "step": 820
609
+ },
610
+ {
611
+ "epoch": 0.8697930311763165,
612
+ "grad_norm": 0.12941111624240875,
613
+ "learning_rate": 8.337524009810395e-05,
614
+ "loss": 0.14733861684799193,
615
+ "step": 830
616
+ },
617
+ {
618
+ "epoch": 0.8802724652868745,
619
+ "grad_norm": 0.13152749836444855,
620
+ "learning_rate": 8.295223714857319e-05,
621
+ "loss": 0.13980752229690552,
622
+ "step": 840
623
+ },
624
+ {
625
+ "epoch": 0.8907518993974325,
626
+ "grad_norm": 0.11208872497081757,
627
+ "learning_rate": 8.252502302751612e-05,
628
+ "loss": 0.12019969224929809,
629
+ "step": 850
630
+ },
631
+ {
632
+ "epoch": 0.9012313335079906,
633
+ "grad_norm": 0.11118603497743607,
634
+ "learning_rate": 8.209365233128482e-05,
635
+ "loss": 0.13822466135025024,
636
+ "step": 860
637
+ },
638
+ {
639
+ "epoch": 0.9117107676185486,
640
+ "grad_norm": 0.11705653369426727,
641
+ "learning_rate": 8.165818018742605e-05,
642
+ "loss": 0.1439664840698242,
643
+ "step": 870
644
+ },
645
+ {
646
+ "epoch": 0.9221902017291066,
647
+ "grad_norm": 0.08817730098962784,
648
+ "learning_rate": 8.121866224763606e-05,
649
+ "loss": 0.13380355834960939,
650
+ "step": 880
651
+ },
652
+ {
653
+ "epoch": 0.9326696358396647,
654
+ "grad_norm": 0.1092257872223854,
655
+ "learning_rate": 8.077515468064851e-05,
656
+ "loss": 0.12982802391052245,
657
+ "step": 890
658
+ },
659
+ {
660
+ "epoch": 0.9431490699502227,
661
+ "grad_norm": 0.12680962681770325,
662
+ "learning_rate": 8.032771416505647e-05,
663
+ "loss": 0.1489071011543274,
664
+ "step": 900
665
+ },
666
+ {
667
+ "epoch": 0.9536285040607807,
668
+ "grad_norm": 0.11953219771385193,
669
+ "learning_rate": 7.987639788206888e-05,
670
+ "loss": 0.14020267724990845,
671
+ "step": 910
672
+ },
673
+ {
674
+ "epoch": 0.9641079381713388,
675
+ "grad_norm": 0.1041467934846878,
676
+ "learning_rate": 7.942126350820318e-05,
677
+ "loss": 0.1439213275909424,
678
+ "step": 920
679
+ },
680
+ {
681
+ "epoch": 0.9745873722818967,
682
+ "grad_norm": 0.1277916431427002,
683
+ "learning_rate": 7.896236920791442e-05,
684
+ "loss": 0.1468779683113098,
685
+ "step": 930
686
+ },
687
+ {
688
+ "epoch": 0.9850668063924548,
689
+ "grad_norm": 0.11245205253362656,
690
+ "learning_rate": 7.849977362616201e-05,
691
+ "loss": 0.12012372016906739,
692
+ "step": 940
693
+ },
694
+ {
695
+ "epoch": 0.9955462405030129,
696
+ "grad_norm": 0.12230483442544937,
697
+ "learning_rate": 7.803353588091522e-05,
698
+ "loss": 0.1488939881324768,
699
+ "step": 950
700
+ },
701
+ {
702
+ "epoch": 1.005239717055279,
703
+ "grad_norm": 0.14185865223407745,
704
+ "learning_rate": 7.7563715555598e-05,
705
+ "loss": 0.11488113403320313,
706
+ "step": 960
707
+ },
708
+ {
709
+ "epoch": 1.015719151165837,
710
+ "grad_norm": 0.10545773804187775,
711
+ "learning_rate": 7.709037269147459e-05,
712
+ "loss": 0.10712549686431885,
713
+ "step": 970
714
+ },
715
+ {
716
+ "epoch": 1.026198585276395,
717
+ "grad_norm": 0.10376274585723877,
718
+ "learning_rate": 7.661356777997631e-05,
719
+ "loss": 0.11428828239440918,
720
+ "step": 980
721
+ },
722
+ {
723
+ "epoch": 1.0366780193869531,
724
+ "grad_norm": 0.09950564056634903,
725
+ "learning_rate": 7.613336175497111e-05,
726
+ "loss": 0.09823058247566223,
727
+ "step": 990
728
+ },
729
+ {
730
+ "epoch": 1.0471574534975112,
731
+ "grad_norm": 0.10412753373384476,
732
+ "learning_rate": 7.564981598497643e-05,
733
+ "loss": 0.1106558084487915,
734
+ "step": 1000
735
+ },
736
+ {
737
+ "epoch": 1.0471574534975112,
738
+ "eval_loss": 0.11185819655656815,
739
+ "eval_runtime": 93.808,
740
+ "eval_samples_per_second": 3.315,
741
+ "eval_steps_per_second": 3.315,
742
+ "step": 1000
743
+ },
744
+ {
745
+ "epoch": 1.057636887608069,
746
+ "grad_norm": 0.10430868715047836,
747
+ "learning_rate": 7.516299226531645e-05,
748
+ "loss": 0.11168640851974487,
749
+ "step": 1010
750
+ },
751
+ {
752
+ "epoch": 1.0681163217186271,
753
+ "grad_norm": 0.09646806865930557,
754
+ "learning_rate": 7.467295281022501e-05,
755
+ "loss": 0.10711305141448975,
756
+ "step": 1020
757
+ },
758
+ {
759
+ "epoch": 1.0785957558291852,
760
+ "grad_norm": 0.13060614466667175,
761
+ "learning_rate": 7.417976024489474e-05,
762
+ "loss": 0.10001810789108276,
763
+ "step": 1030
764
+ },
765
+ {
766
+ "epoch": 1.0890751899397433,
767
+ "grad_norm": 0.10389085114002228,
768
+ "learning_rate": 7.368347759747393e-05,
769
+ "loss": 0.11893858909606933,
770
+ "step": 1040
771
+ },
772
+ {
773
+ "epoch": 1.0995546240503014,
774
+ "grad_norm": 0.11291550099849701,
775
+ "learning_rate": 7.318416829101164e-05,
776
+ "loss": 0.1079628586769104,
777
+ "step": 1050
778
+ },
779
+ {
780
+ "epoch": 1.1100340581608594,
781
+ "grad_norm": 0.10372598469257355,
782
+ "learning_rate": 7.268189613535255e-05,
783
+ "loss": 0.10332397222518921,
784
+ "step": 1060
785
+ },
786
+ {
787
+ "epoch": 1.1205134922714173,
788
+ "grad_norm": 0.12971536815166473,
789
+ "learning_rate": 7.217672531898225e-05,
790
+ "loss": 0.10804877281188965,
791
+ "step": 1070
792
+ },
793
+ {
794
+ "epoch": 1.1309929263819753,
795
+ "grad_norm": 0.10902425646781921,
796
+ "learning_rate": 7.166872040082431e-05,
797
+ "loss": 0.09947454929351807,
798
+ "step": 1080
799
+ },
800
+ {
801
+ "epoch": 1.1414723604925334,
802
+ "grad_norm": 0.09305932372808456,
803
+ "learning_rate": 7.11579463019897e-05,
804
+ "loss": 0.09406971335411071,
805
+ "step": 1090
806
+ },
807
+ {
808
+ "epoch": 1.1519517946030915,
809
+ "grad_norm": 0.11485275626182556,
810
+ "learning_rate": 7.064446829748034e-05,
811
+ "loss": 0.09943979978561401,
812
+ "step": 1100
813
+ },
814
+ {
815
+ "epoch": 1.1624312287136496,
816
+ "grad_norm": 0.09556467831134796,
817
+ "learning_rate": 7.0128352007847e-05,
818
+ "loss": 0.10862170457839966,
819
+ "step": 1110
820
+ },
821
+ {
822
+ "epoch": 1.1729106628242074,
823
+ "grad_norm": 0.11937833577394485,
824
+ "learning_rate": 6.96096633908034e-05,
825
+ "loss": 0.10385221242904663,
826
+ "step": 1120
827
+ },
828
+ {
829
+ "epoch": 1.1833900969347655,
830
+ "grad_norm": 0.11560507863759995,
831
+ "learning_rate": 6.908846873279691e-05,
832
+ "loss": 0.09252402186393738,
833
+ "step": 1130
834
+ },
835
+ {
836
+ "epoch": 1.1938695310453236,
837
+ "grad_norm": 0.11119654029607773,
838
+ "learning_rate": 6.856483464053758e-05,
839
+ "loss": 0.09637172818183899,
840
+ "step": 1140
841
+ },
842
+ {
843
+ "epoch": 1.2043489651558816,
844
+ "grad_norm": 0.11722644418478012,
845
+ "learning_rate": 6.803882803248585e-05,
846
+ "loss": 0.09078751802444458,
847
+ "step": 1150
848
+ },
849
+ {
850
+ "epoch": 1.2148283992664397,
851
+ "grad_norm": 0.10487739741802216,
852
+ "learning_rate": 6.751051613030082e-05,
853
+ "loss": 0.10334972143173218,
854
+ "step": 1160
855
+ },
856
+ {
857
+ "epoch": 1.2253078333769976,
858
+ "grad_norm": 0.10202383995056152,
859
+ "learning_rate": 6.697996645024937e-05,
860
+ "loss": 0.08661433458328247,
861
+ "step": 1170
862
+ },
863
+ {
864
+ "epoch": 1.2357872674875556,
865
+ "grad_norm": 0.11801143735647202,
866
+ "learning_rate": 6.644724679457804e-05,
867
+ "loss": 0.0997927188873291,
868
+ "step": 1180
869
+ },
870
+ {
871
+ "epoch": 1.2462667015981137,
872
+ "grad_norm": 0.10949107259511948,
873
+ "learning_rate": 6.591242524284802e-05,
874
+ "loss": 0.0977592945098877,
875
+ "step": 1190
876
+ },
877
+ {
878
+ "epoch": 1.2567461357086718,
879
+ "grad_norm": 0.10221222043037415,
880
+ "learning_rate": 6.537557014323487e-05,
881
+ "loss": 0.0970361053943634,
882
+ "step": 1200
883
+ },
884
+ {
885
+ "epoch": 1.2672255698192298,
886
+ "grad_norm": 0.10554748773574829,
887
+ "learning_rate": 6.483675010379393e-05,
888
+ "loss": 0.09007551074028015,
889
+ "step": 1210
890
+ },
891
+ {
892
+ "epoch": 1.2777050039297877,
893
+ "grad_norm": 0.11625627428293228,
894
+ "learning_rate": 6.429603398369242e-05,
895
+ "loss": 0.08734490275382996,
896
+ "step": 1220
897
+ },
898
+ {
899
+ "epoch": 1.2881844380403458,
900
+ "grad_norm": 0.10624277591705322,
901
+ "learning_rate": 6.37534908844095e-05,
902
+ "loss": 0.09858485460281372,
903
+ "step": 1230
904
+ },
905
+ {
906
+ "epoch": 1.2986638721509038,
907
+ "grad_norm": 0.10184557735919952,
908
+ "learning_rate": 6.320919014090534e-05,
909
+ "loss": 0.09335023164749146,
910
+ "step": 1240
911
+ },
912
+ {
913
+ "epoch": 1.309143306261462,
914
+ "grad_norm": 0.10787283629179001,
915
+ "learning_rate": 6.266320131276051e-05,
916
+ "loss": 0.08665563464164734,
917
+ "step": 1250
918
+ },
919
+ {
920
+ "epoch": 1.309143306261462,
921
+ "eval_loss": 0.08951585739850998,
922
+ "eval_runtime": 94.0567,
923
+ "eval_samples_per_second": 3.307,
924
+ "eval_steps_per_second": 3.307,
925
+ "step": 1250
926
+ },
927
+ {
928
+ "epoch": 1.31962274037202,
929
+ "grad_norm": 0.10836981981992722,
930
+ "learning_rate": 6.211559417528631e-05,
931
+ "loss": 0.0933380126953125,
932
+ "step": 1260
933
+ },
934
+ {
935
+ "epoch": 1.3301021744825778,
936
+ "grad_norm": 0.1397171914577484,
937
+ "learning_rate": 6.156643871060795e-05,
938
+ "loss": 0.09835371971130372,
939
+ "step": 1270
940
+ },
941
+ {
942
+ "epoch": 1.340581608593136,
943
+ "grad_norm": 0.11242218315601349,
944
+ "learning_rate": 6.101580509872097e-05,
945
+ "loss": 0.09398673176765442,
946
+ "step": 1280
947
+ },
948
+ {
949
+ "epoch": 1.351061042703694,
950
+ "grad_norm": 0.10235017538070679,
951
+ "learning_rate": 6.0463763708522536e-05,
952
+ "loss": 0.10350929498672486,
953
+ "step": 1290
954
+ },
955
+ {
956
+ "epoch": 1.361540476814252,
957
+ "grad_norm": 0.09327106177806854,
958
+ "learning_rate": 5.99103850888186e-05,
959
+ "loss": 0.09580238461494446,
960
+ "step": 1300
961
+ },
962
+ {
963
+ "epoch": 1.3720199109248101,
964
+ "grad_norm": 0.12995658814907074,
965
+ "learning_rate": 5.9355739959307976e-05,
966
+ "loss": 0.08437412977218628,
967
+ "step": 1310
968
+ },
969
+ {
970
+ "epoch": 1.382499345035368,
971
+ "grad_norm": 0.11962983757257462,
972
+ "learning_rate": 5.879989920154466e-05,
973
+ "loss": 0.08409937620162963,
974
+ "step": 1320
975
+ },
976
+ {
977
+ "epoch": 1.392978779145926,
978
+ "grad_norm": 0.09431737661361694,
979
+ "learning_rate": 5.824293384987941e-05,
980
+ "loss": 0.09504773020744324,
981
+ "step": 1330
982
+ },
983
+ {
984
+ "epoch": 1.4034582132564841,
985
+ "grad_norm": 0.13824374973773956,
986
+ "learning_rate": 5.768491508238188e-05,
987
+ "loss": 0.09193333983421326,
988
+ "step": 1340
989
+ },
990
+ {
991
+ "epoch": 1.4139376473670422,
992
+ "grad_norm": 0.10595858097076416,
993
+ "learning_rate": 5.712591421174422e-05,
994
+ "loss": 0.08976472616195678,
995
+ "step": 1350
996
+ },
997
+ {
998
+ "epoch": 1.4244170814776003,
999
+ "grad_norm": 0.09911809861660004,
1000
+ "learning_rate": 5.6566002676167725e-05,
1001
+ "loss": 0.07597061395645141,
1002
+ "step": 1360
1003
+ },
1004
+ {
1005
+ "epoch": 1.4348965155881581,
1006
+ "grad_norm": 0.09723466634750366,
1007
+ "learning_rate": 5.60052520302332e-05,
1008
+ "loss": 0.10513757467269898,
1009
+ "step": 1370
1010
+ },
1011
+ {
1012
+ "epoch": 1.4453759496987162,
1013
+ "grad_norm": 0.11331687867641449,
1014
+ "learning_rate": 5.5443733935756615e-05,
1015
+ "loss": 0.09019948840141297,
1016
+ "step": 1380
1017
+ },
1018
+ {
1019
+ "epoch": 1.4558553838092743,
1020
+ "grad_norm": 0.13363589346408844,
1021
+ "learning_rate": 5.4881520152630886e-05,
1022
+ "loss": 0.08314153552055359,
1023
+ "step": 1390
1024
+ },
1025
+ {
1026
+ "epoch": 1.4663348179198323,
1027
+ "grad_norm": 0.14111892879009247,
1028
+ "learning_rate": 5.4318682529655404e-05,
1029
+ "loss": 0.07892010807991028,
1030
+ "step": 1400
1031
+ },
1032
+ {
1033
+ "epoch": 1.4768142520303904,
1034
+ "grad_norm": 0.13948485255241394,
1035
+ "learning_rate": 5.3755292995353913e-05,
1036
+ "loss": 0.0840128481388092,
1037
+ "step": 1410
1038
+ },
1039
+ {
1040
+ "epoch": 1.4872936861409483,
1041
+ "grad_norm": 0.12535949051380157,
1042
+ "learning_rate": 5.31914235487823e-05,
1043
+ "loss": 0.07869629859924317,
1044
+ "step": 1420
1045
+ },
1046
+ {
1047
+ "epoch": 1.4977731202515066,
1048
+ "grad_norm": 0.10041694343090057,
1049
+ "learning_rate": 5.2627146250327484e-05,
1050
+ "loss": 0.08074848055839538,
1051
+ "step": 1430
1052
+ },
1053
+ {
1054
+ "epoch": 1.5082525543620644,
1055
+ "grad_norm": 0.10112891346216202,
1056
+ "learning_rate": 5.2062533212498275e-05,
1057
+ "loss": 0.0860810935497284,
1058
+ "step": 1440
1059
+ },
1060
+ {
1061
+ "epoch": 1.5187319884726225,
1062
+ "grad_norm": 0.11297477036714554,
1063
+ "learning_rate": 5.149765659070973e-05,
1064
+ "loss": 0.08794642686843872,
1065
+ "step": 1450
1066
+ },
1067
+ {
1068
+ "epoch": 1.5292114225831805,
1069
+ "grad_norm": 0.10511091351509094,
1070
+ "learning_rate": 5.0932588574061945e-05,
1071
+ "loss": 0.07854819297790527,
1072
+ "step": 1460
1073
+ },
1074
+ {
1075
+ "epoch": 1.5396908566937384,
1076
+ "grad_norm": 0.09333530068397522,
1077
+ "learning_rate": 5.036740137611453e-05,
1078
+ "loss": 0.08821435570716858,
1079
+ "step": 1470
1080
+ },
1081
+ {
1082
+ "epoch": 1.5501702908042967,
1083
+ "grad_norm": 0.11480343341827393,
1084
+ "learning_rate": 4.980216722565804e-05,
1085
+ "loss": 0.08062278628349304,
1086
+ "step": 1480
1087
+ },
1088
+ {
1089
+ "epoch": 1.5606497249148545,
1090
+ "grad_norm": 0.08406255394220352,
1091
+ "learning_rate": 4.923695835748338e-05,
1092
+ "loss": 0.0940588355064392,
1093
+ "step": 1490
1094
+ },
1095
+ {
1096
+ "epoch": 1.5711291590254126,
1097
+ "grad_norm": 0.12927693128585815,
1098
+ "learning_rate": 4.8671847003150447e-05,
1099
+ "loss": 0.0775177538394928,
1100
+ "step": 1500
1101
+ },
1102
+ {
1103
+ "epoch": 1.5711291590254126,
1104
+ "eval_loss": 0.07877222448587418,
1105
+ "eval_runtime": 34.4389,
1106
+ "eval_samples_per_second": 9.03,
1107
+ "eval_steps_per_second": 9.03,
1108
+ "step": 1500
1109
+ },
1110
+ {
1111
+ "epoch": 1.5816085931359707,
1112
+ "grad_norm": 0.1255076378583908,
1113
+ "learning_rate": 4.810690538175728e-05,
1114
+ "loss": 0.09362970590591431,
1115
+ "step": 1510
1116
+ },
1117
+ {
1118
+ "epoch": 1.5920880272465285,
1119
+ "grad_norm": 0.1326853185892105,
1120
+ "learning_rate": 4.754220569071068e-05,
1121
+ "loss": 0.08364834189414978,
1122
+ "step": 1520
1123
+ },
1124
+ {
1125
+ "epoch": 1.6025674613570868,
1126
+ "grad_norm": 0.10229979455471039,
1127
+ "learning_rate": 4.697782009649962e-05,
1128
+ "loss": 0.0725843846797943,
1129
+ "step": 1530
1130
+ },
1131
+ {
1132
+ "epoch": 1.6130468954676447,
1133
+ "grad_norm": 0.11407258361577988,
1134
+ "learning_rate": 4.641382072547272e-05,
1135
+ "loss": 0.07566151022911072,
1136
+ "step": 1540
1137
+ },
1138
+ {
1139
+ "epoch": 1.6235263295782028,
1140
+ "grad_norm": 0.09398165345191956,
1141
+ "learning_rate": 4.585027965462075e-05,
1142
+ "loss": 0.087736576795578,
1143
+ "step": 1550
1144
+ },
1145
+ {
1146
+ "epoch": 1.6340057636887608,
1147
+ "grad_norm": 0.11289424449205399,
1148
+ "learning_rate": 4.528726890236544e-05,
1149
+ "loss": 0.08366051316261292,
1150
+ "step": 1560
1151
+ },
1152
+ {
1153
+ "epoch": 1.6444851977993187,
1154
+ "grad_norm": 0.09478718787431717,
1155
+ "learning_rate": 4.4724860419355746e-05,
1156
+ "loss": 0.0885531723499298,
1157
+ "step": 1570
1158
+ },
1159
+ {
1160
+ "epoch": 1.654964631909877,
1161
+ "grad_norm": 0.09163404256105423,
1162
+ "learning_rate": 4.416312607927295e-05,
1163
+ "loss": 0.08392030596733094,
1164
+ "step": 1580
1165
+ },
1166
+ {
1167
+ "epoch": 1.6654440660204348,
1168
+ "grad_norm": 0.11422222852706909,
1169
+ "learning_rate": 4.360213766964542e-05,
1170
+ "loss": 0.08059985041618348,
1171
+ "step": 1590
1172
+ },
1173
+ {
1174
+ "epoch": 1.675923500130993,
1175
+ "grad_norm": 0.08131479471921921,
1176
+ "learning_rate": 4.304196688267438e-05,
1177
+ "loss": 0.07613803148269653,
1178
+ "step": 1600
1179
+ },
1180
+ {
1181
+ "epoch": 1.686402934241551,
1182
+ "grad_norm": 0.09615079313516617,
1183
+ "learning_rate": 4.248268530607199e-05,
1184
+ "loss": 0.07764078378677368,
1185
+ "step": 1610
1186
+ },
1187
+ {
1188
+ "epoch": 1.696882368352109,
1189
+ "grad_norm": 0.09730526059865952,
1190
+ "learning_rate": 4.192436441391271e-05,
1191
+ "loss": 0.07644452452659607,
1192
+ "step": 1620
1193
+ },
1194
+ {
1195
+ "epoch": 1.707361802462667,
1196
+ "grad_norm": 0.09649327397346497,
1197
+ "learning_rate": 4.136707555749907e-05,
1198
+ "loss": 0.07866159081459045,
1199
+ "step": 1630
1200
+ },
1201
+ {
1202
+ "epoch": 1.717841236573225,
1203
+ "grad_norm": 0.11804413050413132,
1204
+ "learning_rate": 4.0810889956243415e-05,
1205
+ "loss": 0.06996130347251892,
1206
+ "step": 1640
1207
+ },
1208
+ {
1209
+ "epoch": 1.728320670683783,
1210
+ "grad_norm": 0.09874672442674637,
1211
+ "learning_rate": 4.025587868856622e-05,
1212
+ "loss": 0.07877404093742371,
1213
+ "step": 1650
1214
+ },
1215
+ {
1216
+ "epoch": 1.738800104794341,
1217
+ "grad_norm": 0.11149467527866364,
1218
+ "learning_rate": 3.9702112682812544e-05,
1219
+ "loss": 0.07241421341896057,
1220
+ "step": 1660
1221
+ },
1222
+ {
1223
+ "epoch": 1.7492795389048992,
1224
+ "grad_norm": 0.08748896420001984,
1225
+ "learning_rate": 3.914966270818766e-05,
1226
+ "loss": 0.07336459755897522,
1227
+ "step": 1670
1228
+ },
1229
+ {
1230
+ "epoch": 1.7597589730154573,
1231
+ "grad_norm": 0.1172696202993393,
1232
+ "learning_rate": 3.859859936571307e-05,
1233
+ "loss": 0.07742337584495544,
1234
+ "step": 1680
1235
+ },
1236
+ {
1237
+ "epoch": 1.770238407126015,
1238
+ "grad_norm": 0.0719197615981102,
1239
+ "learning_rate": 3.8048993079203925e-05,
1240
+ "loss": 0.06242966651916504,
1241
+ "step": 1690
1242
+ },
1243
+ {
1244
+ "epoch": 1.7807178412365732,
1245
+ "grad_norm": 0.12380168586969376,
1246
+ "learning_rate": 3.750091408626907e-05,
1247
+ "loss": 0.07270430326461792,
1248
+ "step": 1700
1249
+ },
1250
+ {
1251
+ "epoch": 1.7911972753471312,
1252
+ "grad_norm": 0.1587221622467041,
1253
+ "learning_rate": 3.6954432429335015e-05,
1254
+ "loss": 0.06409866213798524,
1255
+ "step": 1710
1256
+ },
1257
+ {
1258
+ "epoch": 1.8016767094576893,
1259
+ "grad_norm": 0.10983912646770477,
1260
+ "learning_rate": 3.640961794669482e-05,
1261
+ "loss": 0.06610031127929687,
1262
+ "step": 1720
1263
+ },
1264
+ {
1265
+ "epoch": 1.8121561435682474,
1266
+ "grad_norm": 0.11023026704788208,
1267
+ "learning_rate": 3.586654026358287e-05,
1268
+ "loss": 0.06866579055786133,
1269
+ "step": 1730
1270
+ },
1271
+ {
1272
+ "epoch": 1.8226355776788052,
1273
+ "grad_norm": 0.11857719719409943,
1274
+ "learning_rate": 3.532526878327719e-05,
1275
+ "loss": 0.06734356880187989,
1276
+ "step": 1740
1277
+ },
1278
+ {
1279
+ "epoch": 1.8331150117893635,
1280
+ "grad_norm": 0.09280339628458023,
1281
+ "learning_rate": 3.478587267822987e-05,
1282
+ "loss": 0.06897796392440796,
1283
+ "step": 1750
1284
+ },
1285
+ {
1286
+ "epoch": 1.8331150117893635,
1287
+ "eval_loss": 0.06596127897500992,
1288
+ "eval_runtime": 35.5001,
1289
+ "eval_samples_per_second": 8.761,
1290
+ "eval_steps_per_second": 8.761,
1291
+ "step": 1750
1292
+ },
1293
+ {
1294
+ "epoch": 1.8435944458999214,
1295
+ "grad_norm": 0.1175367683172226,
1296
+ "learning_rate": 3.424842088122716e-05,
1297
+ "loss": 0.08288194537162781,
1298
+ "step": 1760
1299
+ },
1300
+ {
1301
+ "epoch": 1.8540738800104795,
1302
+ "grad_norm": 0.10271462798118591,
1303
+ "learning_rate": 3.371298207658003e-05,
1304
+ "loss": 0.05643013119697571,
1305
+ "step": 1770
1306
+ },
1307
+ {
1308
+ "epoch": 1.8645533141210375,
1309
+ "grad_norm": 0.11965195834636688,
1310
+ "learning_rate": 3.3179624691346654e-05,
1311
+ "loss": 0.07403092980384826,
1312
+ "step": 1780
1313
+ },
1314
+ {
1315
+ "epoch": 1.8750327482315954,
1316
+ "grad_norm": 0.09981680661439896,
1317
+ "learning_rate": 3.2648416886587686e-05,
1318
+ "loss": 0.07118859887123108,
1319
+ "step": 1790
1320
+ },
1321
+ {
1322
+ "epoch": 1.8855121823421537,
1323
+ "grad_norm": 0.07787375897169113,
1324
+ "learning_rate": 3.2119426548655435e-05,
1325
+ "loss": 0.07219682335853576,
1326
+ "step": 1800
1327
+ },
1328
+ {
1329
+ "epoch": 1.8959916164527115,
1330
+ "grad_norm": 0.1303507387638092,
1331
+ "learning_rate": 3.1592721280518404e-05,
1332
+ "loss": 0.07636030912399291,
1333
+ "step": 1810
1334
+ },
1335
+ {
1336
+ "epoch": 1.9064710505632696,
1337
+ "grad_norm": 0.09162267297506332,
1338
+ "learning_rate": 3.106836839312175e-05,
1339
+ "loss": 0.06230143308639526,
1340
+ "step": 1820
1341
+ },
1342
+ {
1343
+ "epoch": 1.9169504846738277,
1344
+ "grad_norm": 0.11375878751277924,
1345
+ "learning_rate": 3.054643489678526e-05,
1346
+ "loss": 0.060506826639175414,
1347
+ "step": 1830
1348
+ },
1349
+ {
1350
+ "epoch": 1.9274299187843855,
1351
+ "grad_norm": 0.1377716213464737,
1352
+ "learning_rate": 3.0026987492639668e-05,
1353
+ "loss": 0.08148540854454041,
1354
+ "step": 1840
1355
+ },
1356
+ {
1357
+ "epoch": 1.9379093528949438,
1358
+ "grad_norm": 0.10483554750680923,
1359
+ "learning_rate": 2.951009256410255e-05,
1360
+ "loss": 0.07040726542472839,
1361
+ "step": 1850
1362
+ },
1363
+ {
1364
+ "epoch": 1.9483887870055017,
1365
+ "grad_norm": 0.08736151456832886,
1366
+ "learning_rate": 2.8995816168394702e-05,
1367
+ "loss": 0.04931557774543762,
1368
+ "step": 1860
1369
+ },
1370
+ {
1371
+ "epoch": 1.9588682211160597,
1372
+ "grad_norm": 0.11461569368839264,
1373
+ "learning_rate": 2.848422402809828e-05,
1374
+ "loss": 0.057559752464294435,
1375
+ "step": 1870
1376
+ },
1377
+ {
1378
+ "epoch": 1.9693476552266178,
1379
+ "grad_norm": 0.09060918539762497,
1380
+ "learning_rate": 2.7975381522757803e-05,
1381
+ "loss": 0.06379705667495728,
1382
+ "step": 1880
1383
+ },
1384
+ {
1385
+ "epoch": 1.9798270893371757,
1386
+ "grad_norm": 0.07104971259832382,
1387
+ "learning_rate": 2.746935368052477e-05,
1388
+ "loss": 0.05813115239143372,
1389
+ "step": 1890
1390
+ },
1391
+ {
1392
+ "epoch": 1.990306523447734,
1393
+ "grad_norm": 0.10802938044071198,
1394
+ "learning_rate": 2.696620516984733e-05,
1395
+ "loss": 0.07732833027839661,
1396
+ "step": 1900
1397
+ },
1398
+ {
1399
+ "epoch": 2.0,
1400
+ "grad_norm": 0.16884952783584595,
1401
+ "learning_rate": 2.6466000291206004e-05,
1402
+ "loss": 0.06166202425956726,
1403
+ "step": 1910
1404
+ },
1405
+ {
1406
+ "epoch": 2.010479434110558,
1407
+ "grad_norm": 0.08582179993391037,
1408
+ "learning_rate": 2.5968802968896228e-05,
1409
+ "loss": 0.04766199886798859,
1410
+ "step": 1920
1411
+ },
1412
+ {
1413
+ "epoch": 2.020958868221116,
1414
+ "grad_norm": 0.1457364708185196,
1415
+ "learning_rate": 2.5474676742859048e-05,
1416
+ "loss": 0.03826354146003723,
1417
+ "step": 1930
1418
+ },
1419
+ {
1420
+ "epoch": 2.031438302331674,
1421
+ "grad_norm": 0.09275342524051666,
1422
+ "learning_rate": 2.4983684760561023e-05,
1423
+ "loss": 0.045059433579444884,
1424
+ "step": 1940
1425
+ },
1426
+ {
1427
+ "epoch": 2.0419177364422323,
1428
+ "grad_norm": 0.09085927903652191,
1429
+ "learning_rate": 2.44958897689242e-05,
1430
+ "loss": 0.04904903173446655,
1431
+ "step": 1950
1432
+ },
1433
+ {
1434
+ "epoch": 2.05239717055279,
1435
+ "grad_norm": 0.11733179539442062,
1436
+ "learning_rate": 2.401135410630731e-05,
1437
+ "loss": 0.05008396506309509,
1438
+ "step": 1960
1439
+ },
1440
+ {
1441
+ "epoch": 2.062876604663348,
1442
+ "grad_norm": 0.0894237607717514,
1443
+ "learning_rate": 2.3530139694539095e-05,
1444
+ "loss": 0.04057626128196716,
1445
+ "step": 1970
1446
+ },
1447
+ {
1448
+ "epoch": 2.0733560387739063,
1449
+ "grad_norm": 0.08560927212238312,
1450
+ "learning_rate": 2.305230803100496e-05,
1451
+ "loss": 0.04843136668205261,
1452
+ "step": 1980
1453
+ },
1454
+ {
1455
+ "epoch": 2.083835472884464,
1456
+ "grad_norm": 0.07991836220026016,
1457
+ "learning_rate": 2.257792018078793e-05,
1458
+ "loss": 0.0544127106666565,
1459
+ "step": 1990
1460
+ },
1461
+ {
1462
+ "epoch": 2.0943149069950224,
1463
+ "grad_norm": 0.08846250921487808,
1464
+ "learning_rate": 2.210703676886461e-05,
1465
+ "loss": 0.0459000825881958,
1466
+ "step": 2000
1467
+ },
1468
+ {
1469
+ "epoch": 2.0943149069950224,
1470
+ "eval_loss": 0.060011014342308044,
1471
+ "eval_runtime": 36.3755,
1472
+ "eval_samples_per_second": 8.55,
1473
+ "eval_steps_per_second": 8.55,
1474
+ "step": 2000
1475
+ },
1476
+ {
1477
+ "epoch": 2.1047943411055803,
1478
+ "grad_norm": 0.10082945972681046,
1479
+ "learning_rate": 2.1639717972357678e-05,
1480
+ "loss": 0.038090622425079344,
1481
+ "step": 2010
1482
+ },
1483
+ {
1484
+ "epoch": 2.115273775216138,
1485
+ "grad_norm": 0.05712248757481575,
1486
+ "learning_rate": 2.1176023512845376e-05,
1487
+ "loss": 0.04598597884178161,
1488
+ "step": 2020
1489
+ },
1490
+ {
1491
+ "epoch": 2.1257532093266964,
1492
+ "grad_norm": 0.11628362536430359,
1493
+ "learning_rate": 2.0716012648729353e-05,
1494
+ "loss": 0.04984880685806274,
1495
+ "step": 2030
1496
+ },
1497
+ {
1498
+ "epoch": 2.1362326434372543,
1499
+ "grad_norm": 0.10635484755039215,
1500
+ "learning_rate": 2.025974416766171e-05,
1501
+ "loss": 0.04293925166130066,
1502
+ "step": 2040
1503
+ },
1504
+ {
1505
+ "epoch": 2.1467120775478126,
1506
+ "grad_norm": 0.1017381027340889,
1507
+ "learning_rate": 1.9807276379032113e-05,
1508
+ "loss": 0.04305694401264191,
1509
+ "step": 2050
1510
+ },
1511
+ {
1512
+ "epoch": 2.1571915116583704,
1513
+ "grad_norm": 0.13550882041454315,
1514
+ "learning_rate": 1.9358667106516055e-05,
1515
+ "loss": 0.04478869140148163,
1516
+ "step": 2060
1517
+ },
1518
+ {
1519
+ "epoch": 2.1676709457689283,
1520
+ "grad_norm": 0.08526366949081421,
1521
+ "learning_rate": 1.8913973680685226e-05,
1522
+ "loss": 0.036646312475204466,
1523
+ "step": 2070
1524
+ },
1525
+ {
1526
+ "epoch": 2.1781503798794866,
1527
+ "grad_norm": 0.10932011157274246,
1528
+ "learning_rate": 1.8473252931680928e-05,
1529
+ "loss": 0.042200219631195066,
1530
+ "step": 2080
1531
+ },
1532
+ {
1533
+ "epoch": 2.1886298139900444,
1534
+ "grad_norm": 0.08768360316753387,
1535
+ "learning_rate": 1.803656118195136e-05,
1536
+ "loss": 0.0437488317489624,
1537
+ "step": 2090
1538
+ },
1539
+ {
1540
+ "epoch": 2.1991092481006027,
1541
+ "grad_norm": 0.08362651616334915,
1542
+ "learning_rate": 1.760395423905379e-05,
1543
+ "loss": 0.04669668078422547,
1544
+ "step": 2100
1545
+ },
1546
+ {
1547
+ "epoch": 2.2095886822111606,
1548
+ "grad_norm": 0.08554034680128098,
1549
+ "learning_rate": 1.7175487388522588e-05,
1550
+ "loss": 0.034989356994628906,
1551
+ "step": 2110
1552
+ },
1553
+ {
1554
+ "epoch": 2.220068116321719,
1555
+ "grad_norm": 0.08215561509132385,
1556
+ "learning_rate": 1.6751215386803986e-05,
1557
+ "loss": 0.040298929810523985,
1558
+ "step": 2120
1559
+ },
1560
+ {
1561
+ "epoch": 2.2305475504322767,
1562
+ "grad_norm": 0.0840689167380333,
1563
+ "learning_rate": 1.6331192454258337e-05,
1564
+ "loss": 0.041704925894737246,
1565
+ "step": 2130
1566
+ },
1567
+ {
1568
+ "epoch": 2.2410269845428346,
1569
+ "grad_norm": 0.06530614197254181,
1570
+ "learning_rate": 1.5915472268231018e-05,
1571
+ "loss": 0.03651900887489319,
1572
+ "step": 2140
1573
+ },
1574
+ {
1575
+ "epoch": 2.251506418653393,
1576
+ "grad_norm": 0.12431822717189789,
1577
+ "learning_rate": 1.550410795619261e-05,
1578
+ "loss": 0.04806804955005646,
1579
+ "step": 2150
1580
+ },
1581
+ {
1582
+ "epoch": 2.2619858527639507,
1583
+ "grad_norm": 0.09592410176992416,
1584
+ "learning_rate": 1.509715208894949e-05,
1585
+ "loss": 0.0454313725233078,
1586
+ "step": 2160
1587
+ },
1588
+ {
1589
+ "epoch": 2.2724652868745085,
1590
+ "grad_norm": 0.07589780539274216,
1591
+ "learning_rate": 1.469465667392536e-05,
1592
+ "loss": 0.03574602603912354,
1593
+ "step": 2170
1594
+ },
1595
+ {
1596
+ "epoch": 2.282944720985067,
1597
+ "grad_norm": 0.09734483063220978,
1598
+ "learning_rate": 1.4296673148515038e-05,
1599
+ "loss": 0.04358702301979065,
1600
+ "step": 2180
1601
+ },
1602
+ {
1603
+ "epoch": 2.2934241550956247,
1604
+ "grad_norm": 0.0974339172244072,
1605
+ "learning_rate": 1.3903252373510838e-05,
1606
+ "loss": 0.04603351950645447,
1607
+ "step": 2190
1608
+ },
1609
+ {
1610
+ "epoch": 2.303903589206183,
1611
+ "grad_norm": 0.09025271981954575,
1612
+ "learning_rate": 1.3514444626602773e-05,
1613
+ "loss": 0.040065237879753114,
1614
+ "step": 2200
1615
+ },
1616
+ {
1617
+ "epoch": 2.314383023316741,
1618
+ "grad_norm": 0.07625086605548859,
1619
+ "learning_rate": 1.3130299595953338e-05,
1620
+ "loss": 0.044061675667762756,
1621
+ "step": 2210
1622
+ },
1623
+ {
1624
+ "epoch": 2.324862457427299,
1625
+ "grad_norm": 0.07306221127510071,
1626
+ "learning_rate": 1.2750866373847465e-05,
1627
+ "loss": 0.03366467654705048,
1628
+ "step": 2220
1629
+ },
1630
+ {
1631
+ "epoch": 2.335341891537857,
1632
+ "grad_norm": 0.08357638120651245,
1633
+ "learning_rate": 1.2376193450418715e-05,
1634
+ "loss": 0.041424044966697694,
1635
+ "step": 2230
1636
+ },
1637
+ {
1638
+ "epoch": 2.345821325648415,
1639
+ "grad_norm": 0.09153921157121658,
1640
+ "learning_rate": 1.2006328707452459e-05,
1641
+ "loss": 0.03938372135162353,
1642
+ "step": 2240
1643
+ },
1644
+ {
1645
+ "epoch": 2.356300759758973,
1646
+ "grad_norm": 0.09109660983085632,
1647
+ "learning_rate": 1.1641319412266765e-05,
1648
+ "loss": 0.04015985131263733,
1649
+ "step": 2250
1650
+ },
1651
+ {
1652
+ "epoch": 2.356300759758973,
1653
+ "eval_loss": 0.05486458167433739,
1654
+ "eval_runtime": 36.8119,
1655
+ "eval_samples_per_second": 8.448,
1656
+ "eval_steps_per_second": 8.448,
1657
+ "step": 2250
1658
+ },
1659
+ {
1660
+ "epoch": 2.366780193869531,
1661
+ "grad_norm": 0.052502721548080444,
1662
+ "learning_rate": 1.1281212211671822e-05,
1663
+ "loss": 0.0270554780960083,
1664
+ "step": 2260
1665
+ },
1666
+ {
1667
+ "epoch": 2.377259627980089,
1668
+ "grad_norm": 0.07931812107563019,
1669
+ "learning_rate": 1.0926053126008584e-05,
1670
+ "loss": 0.0417300134897232,
1671
+ "step": 2270
1672
+ },
1673
+ {
1674
+ "epoch": 2.387739062090647,
1675
+ "grad_norm": 0.08996254205703735,
1676
+ "learning_rate": 1.0575887543267609e-05,
1677
+ "loss": 0.037659955024719236,
1678
+ "step": 2280
1679
+ },
1680
+ {
1681
+ "epoch": 2.398218496201205,
1682
+ "grad_norm": 0.08800788223743439,
1683
+ "learning_rate": 1.023076021328867e-05,
1684
+ "loss": 0.048437944054603575,
1685
+ "step": 2290
1686
+ },
1687
+ {
1688
+ "epoch": 2.4086979303117633,
1689
+ "grad_norm": 0.10572271049022675,
1690
+ "learning_rate": 9.890715242041787e-06,
1691
+ "loss": 0.04166909456253052,
1692
+ "step": 2300
1693
+ },
1694
+ {
1695
+ "epoch": 2.419177364422321,
1696
+ "grad_norm": 0.10573071986436844,
1697
+ "learning_rate": 9.555796085990781e-06,
1698
+ "loss": 0.03919607996940613,
1699
+ "step": 2310
1700
+ },
1701
+ {
1702
+ "epoch": 2.4296567985328794,
1703
+ "grad_norm": 0.09714583307504654,
1704
+ "learning_rate": 9.226045546539608e-06,
1705
+ "loss": 0.03530588150024414,
1706
+ "step": 2320
1707
+ },
1708
+ {
1709
+ "epoch": 2.4401362326434373,
1710
+ "grad_norm": 0.09436199069023132,
1711
+ "learning_rate": 8.901505764562518e-06,
1712
+ "loss": 0.05111382007598877,
1713
+ "step": 2330
1714
+ },
1715
+ {
1716
+ "epoch": 2.450615666753995,
1717
+ "grad_norm": 0.06353961676359177,
1718
+ "learning_rate": 8.582218215018656e-06,
1719
+ "loss": 0.03805697858333588,
1720
+ "step": 2340
1721
+ },
1722
+ {
1723
+ "epoch": 2.4610951008645534,
1724
+ "grad_norm": 0.08853815495967865,
1725
+ "learning_rate": 8.268223701651684e-06,
1726
+ "loss": 0.04815975427627563,
1727
+ "step": 2350
1728
+ },
1729
+ {
1730
+ "epoch": 2.4715745349751113,
1731
+ "grad_norm": 0.07472016662359238,
1732
+ "learning_rate": 7.959562351775196e-06,
1733
+ "loss": 0.042247459292411804,
1734
+ "step": 2360
1735
+ },
1736
+ {
1737
+ "epoch": 2.4820539690856696,
1738
+ "grad_norm": 0.12121549248695374,
1739
+ "learning_rate": 7.656273611144632e-06,
1740
+ "loss": 0.040102115273475646,
1741
+ "step": 2370
1742
+ },
1743
+ {
1744
+ "epoch": 2.4925334031962274,
1745
+ "grad_norm": 0.08667747676372528,
1746
+ "learning_rate": 7.358396238916254e-06,
1747
+ "loss": 0.03656341433525086,
1748
+ "step": 2380
1749
+ },
1750
+ {
1751
+ "epoch": 2.5030128373067857,
1752
+ "grad_norm": 0.1162872165441513,
1753
+ "learning_rate": 7.065968302693882e-06,
1754
+ "loss": 0.04052766263484955,
1755
+ "step": 2390
1756
+ },
1757
+ {
1758
+ "epoch": 2.5134922714173435,
1759
+ "grad_norm": 0.07924140989780426,
1760
+ "learning_rate": 6.7790271736639595e-06,
1761
+ "loss": 0.03394221067428589,
1762
+ "step": 2400
1763
+ },
1764
+ {
1765
+ "epoch": 2.5239717055279014,
1766
+ "grad_norm": 0.09523408859968185,
1767
+ "learning_rate": 6.497609521819681e-06,
1768
+ "loss": 0.04119439423084259,
1769
+ "step": 2410
1770
+ },
1771
+ {
1772
+ "epoch": 2.5344511396384597,
1773
+ "grad_norm": 0.12182598561048508,
1774
+ "learning_rate": 6.221751311274731e-06,
1775
+ "loss": 0.05154783725738525,
1776
+ "step": 2420
1777
+ },
1778
+ {
1779
+ "epoch": 2.5449305737490175,
1780
+ "grad_norm": 0.09359873831272125,
1781
+ "learning_rate": 5.951487795667149e-06,
1782
+ "loss": 0.035483264923095705,
1783
+ "step": 2430
1784
+ },
1785
+ {
1786
+ "epoch": 2.5554100078595754,
1787
+ "grad_norm": 0.08514095097780228,
1788
+ "learning_rate": 5.686853513654117e-06,
1789
+ "loss": 0.03830339312553406,
1790
+ "step": 2440
1791
+ },
1792
+ {
1793
+ "epoch": 2.5658894419701337,
1794
+ "grad_norm": 0.10625084489583969,
1795
+ "learning_rate": 5.4278822844979705e-06,
1796
+ "loss": 0.034111028909683226,
1797
+ "step": 2450
1798
+ },
1799
+ {
1800
+ "epoch": 2.5763688760806915,
1801
+ "grad_norm": 0.1004003956913948,
1802
+ "learning_rate": 5.174607203744286e-06,
1803
+ "loss": 0.04465605318546295,
1804
+ "step": 2460
1805
+ },
1806
+ {
1807
+ "epoch": 2.58684831019125,
1808
+ "grad_norm": 0.0962519720196724,
1809
+ "learning_rate": 4.927060638992382e-06,
1810
+ "loss": 0.041056016087532045,
1811
+ "step": 2470
1812
+ },
1813
+ {
1814
+ "epoch": 2.5973277443018077,
1815
+ "grad_norm": 0.06380607187747955,
1816
+ "learning_rate": 4.685274225758846e-06,
1817
+ "loss": 0.03880062401294708,
1818
+ "step": 2480
1819
+ },
1820
+ {
1821
+ "epoch": 2.607807178412366,
1822
+ "grad_norm": 0.07326535880565643,
1823
+ "learning_rate": 4.449278863434647e-06,
1824
+ "loss": 0.03194461762905121,
1825
+ "step": 2490
1826
+ },
1827
+ {
1828
+ "epoch": 2.618286612522924,
1829
+ "grad_norm": 0.12218596786260605,
1830
+ "learning_rate": 4.2191047113362854e-06,
1831
+ "loss": 0.04258840978145599,
1832
+ "step": 2500
1833
+ },
1834
+ {
1835
+ "epoch": 2.618286612522924,
1836
+ "eval_loss": 0.05223666876554489,
1837
+ "eval_runtime": 37.7234,
1838
+ "eval_samples_per_second": 8.244,
1839
+ "eval_steps_per_second": 8.244,
1840
+ "step": 2500
1841
+ },
1842
+ {
1843
+ "epoch": 2.6287660466334817,
1844
+ "grad_norm": 0.08594664931297302,
1845
+ "learning_rate": 3.994781184851598e-06,
1846
+ "loss": 0.04302787780761719,
1847
+ "step": 2510
1848
+ },
1849
+ {
1850
+ "epoch": 2.63924548074404,
1851
+ "grad_norm": 0.08187596499919891,
1852
+ "learning_rate": 3.776336951680548e-06,
1853
+ "loss": 0.0341387003660202,
1854
+ "step": 2520
1855
+ },
1856
+ {
1857
+ "epoch": 2.649724914854598,
1858
+ "grad_norm": 0.10216796398162842,
1859
+ "learning_rate": 3.563799928171596e-06,
1860
+ "loss": 0.04289879500865936,
1861
+ "step": 2530
1862
+ },
1863
+ {
1864
+ "epoch": 2.6602043489651557,
1865
+ "grad_norm": 0.11215174198150635,
1866
+ "learning_rate": 3.3571972757540814e-06,
1867
+ "loss": 0.04055049121379852,
1868
+ "step": 2540
1869
+ },
1870
+ {
1871
+ "epoch": 2.670683783075714,
1872
+ "grad_norm": 0.07941269129514694,
1873
+ "learning_rate": 3.156555397467176e-06,
1874
+ "loss": 0.04118689000606537,
1875
+ "step": 2550
1876
+ },
1877
+ {
1878
+ "epoch": 2.681163217186272,
1879
+ "grad_norm": 0.09404437988996506,
1880
+ "learning_rate": 2.9618999345855547e-06,
1881
+ "loss": 0.03079705536365509,
1882
+ "step": 2560
1883
+ },
1884
+ {
1885
+ "epoch": 2.69164265129683,
1886
+ "grad_norm": 0.1109817698597908,
1887
+ "learning_rate": 2.773255763342647e-06,
1888
+ "loss": 0.038885954022407535,
1889
+ "step": 2570
1890
+ },
1891
+ {
1892
+ "epoch": 2.702122085407388,
1893
+ "grad_norm": 0.09431962668895721,
1894
+ "learning_rate": 2.590646991751472e-06,
1895
+ "loss": 0.043543145060539246,
1896
+ "step": 2580
1897
+ },
1898
+ {
1899
+ "epoch": 2.7126015195179463,
1900
+ "grad_norm": 0.08184763044118881,
1901
+ "learning_rate": 2.414096956523776e-06,
1902
+ "loss": 0.03256987631320953,
1903
+ "step": 2590
1904
+ },
1905
+ {
1906
+ "epoch": 2.723080953628504,
1907
+ "grad_norm": 0.08390141278505325,
1908
+ "learning_rate": 2.2436282200876458e-06,
1909
+ "loss": 0.03908055424690247,
1910
+ "step": 2600
1911
+ },
1912
+ {
1913
+ "epoch": 2.733560387739062,
1914
+ "grad_norm": 0.0762532502412796,
1915
+ "learning_rate": 2.07926256770416e-06,
1916
+ "loss": 0.04899201393127441,
1917
+ "step": 2610
1918
+ },
1919
+ {
1920
+ "epoch": 2.7440398218496203,
1921
+ "grad_norm": 0.08239631354808807,
1922
+ "learning_rate": 1.9210210046832768e-06,
1923
+ "loss": 0.048707082867622375,
1924
+ "step": 2620
1925
+ },
1926
+ {
1927
+ "epoch": 2.754519255960178,
1928
+ "grad_norm": 0.09619107842445374,
1929
+ "learning_rate": 1.7689237536994364e-06,
1930
+ "loss": 0.0372231125831604,
1931
+ "step": 2630
1932
+ },
1933
+ {
1934
+ "epoch": 2.764998690070736,
1935
+ "grad_norm": 0.07099667191505432,
1936
+ "learning_rate": 1.6229902522072293e-06,
1937
+ "loss": 0.03421170711517334,
1938
+ "step": 2640
1939
+ },
1940
+ {
1941
+ "epoch": 2.7754781241812942,
1942
+ "grad_norm": 0.10154753923416138,
1943
+ "learning_rate": 1.4832391499572996e-06,
1944
+ "loss": 0.03656705319881439,
1945
+ "step": 2650
1946
+ },
1947
+ {
1948
+ "epoch": 2.785957558291852,
1949
+ "grad_norm": 0.09349387139081955,
1950
+ "learning_rate": 1.3496883066130173e-06,
1951
+ "loss": 0.03710306882858276,
1952
+ "step": 2660
1953
+ },
1954
+ {
1955
+ "epoch": 2.7964369924024104,
1956
+ "grad_norm": 0.061091430485248566,
1957
+ "learning_rate": 1.2223547894680443e-06,
1958
+ "loss": 0.0308389812707901,
1959
+ "step": 2670
1960
+ },
1961
+ {
1962
+ "epoch": 2.8069164265129682,
1963
+ "grad_norm": 0.09838075935840607,
1964
+ "learning_rate": 1.101254871265256e-06,
1965
+ "loss": 0.03703555166721344,
1966
+ "step": 2680
1967
+ },
1968
+ {
1969
+ "epoch": 2.8173958606235265,
1970
+ "grad_norm": 0.10046928375959396,
1971
+ "learning_rate": 9.864040281170938e-07,
1972
+ "loss": 0.04500553905963898,
1973
+ "step": 2690
1974
+ },
1975
+ {
1976
+ "epoch": 2.8278752947340844,
1977
+ "grad_norm": 0.06770773977041245,
1978
+ "learning_rate": 8.778169375277978e-07,
1979
+ "loss": 0.03823737502098083,
1980
+ "step": 2700
1981
+ },
1982
+ {
1983
+ "epoch": 2.8383547288446422,
1984
+ "grad_norm": 0.08373535424470901,
1985
+ "learning_rate": 7.755074765176618e-07,
1986
+ "loss": 0.03961678743362427,
1987
+ "step": 2710
1988
+ },
1989
+ {
1990
+ "epoch": 2.8488341629552005,
1991
+ "grad_norm": 0.07590050995349884,
1992
+ "learning_rate": 6.794887198496413e-07,
1993
+ "loss": 0.03221273124217987,
1994
+ "step": 2720
1995
+ },
1996
+ {
1997
+ "epoch": 2.8593135970657584,
1998
+ "grad_norm": 0.08507678657770157,
1999
+ "learning_rate": 5.897729383583906e-07,
2000
+ "loss": 0.04571912884712219,
2001
+ "step": 2730
2002
+ },
2003
+ {
2004
+ "epoch": 2.8697930311763162,
2005
+ "grad_norm": 0.06584763526916504,
2006
+ "learning_rate": 5.063715973821659e-07,
2007
+ "loss": 0.03794914484024048,
2008
+ "step": 2740
2009
+ },
2010
+ {
2011
+ "epoch": 2.8802724652868745,
2012
+ "grad_norm": 0.07312892377376556,
2013
+ "learning_rate": 4.292953552975154e-07,
2014
+ "loss": 0.036365586519241336,
2015
+ "step": 2750
2016
+ },
2017
+ {
2018
+ "epoch": 2.8802724652868745,
2019
+ "eval_loss": 0.05090421438217163,
2020
+ "eval_runtime": 85.293,
2021
+ "eval_samples_per_second": 3.646,
2022
+ "eval_steps_per_second": 3.646,
2023
+ "step": 2750
2024
+ },
2025
+ {
2026
+ "epoch": 2.8907518993974324,
2027
+ "grad_norm": 0.08459606021642685,
2028
+ "learning_rate": 3.5855406215725697e-07,
2029
+ "loss": 0.03068857192993164,
2030
+ "step": 2760
2031
+ },
2032
+ {
2033
+ "epoch": 2.9012313335079907,
2034
+ "grad_norm": 0.06866376101970673,
2035
+ "learning_rate": 2.9415675843163515e-07,
2036
+ "loss": 0.03265829384326935,
2037
+ "step": 2770
2038
+ },
2039
+ {
2040
+ "epoch": 2.9117107676185485,
2041
+ "grad_norm": 0.09082643687725067,
2042
+ "learning_rate": 2.361116738529956e-07,
2043
+ "loss": 0.03418546915054321,
2044
+ "step": 2780
2045
+ },
2046
+ {
2047
+ "epoch": 2.922190201729107,
2048
+ "grad_norm": 0.10772739350795746,
2049
+ "learning_rate": 1.8442622636404284e-07,
2050
+ "loss": 0.03810786008834839,
2051
+ "step": 2790
2052
+ },
2053
+ {
2054
+ "epoch": 2.9326696358396647,
2055
+ "grad_norm": 0.08321297913789749,
2056
+ "learning_rate": 1.391070211698764e-07,
2057
+ "loss": 0.04068491756916046,
2058
+ "step": 2800
2059
+ },
2060
+ {
2061
+ "epoch": 2.9431490699502225,
2062
+ "grad_norm": 0.11239277571439743,
2063
+ "learning_rate": 1.0015984989385496e-07,
2064
+ "loss": 0.041029155254364014,
2065
+ "step": 2810
2066
+ },
2067
+ {
2068
+ "epoch": 2.953628504060781,
2069
+ "grad_norm": 0.07199843227863312,
2070
+ "learning_rate": 6.758968983747171e-08,
2071
+ "loss": 0.037902483344078065,
2072
+ "step": 2820
2073
+ },
2074
+ {
2075
+ "epoch": 2.9641079381713387,
2076
+ "grad_norm": 0.08249279856681824,
2077
+ "learning_rate": 4.140070334422985e-08,
2078
+ "loss": 0.03996126651763916,
2079
+ "step": 2830
2080
+ },
2081
+ {
2082
+ "epoch": 2.9745873722818965,
2083
+ "grad_norm": 0.0852220207452774,
2084
+ "learning_rate": 2.1596237267751396e-08,
2085
+ "loss": 0.04228667616844177,
2086
+ "step": 2840
2087
+ },
2088
+ {
2089
+ "epoch": 2.985066806392455,
2090
+ "grad_norm": 0.0858582928776741,
2091
+ "learning_rate": 8.178822544052666e-09,
2092
+ "loss": 0.03813594281673431,
2093
+ "step": 2850
2094
+ },
2095
+ {
2096
+ "epoch": 2.995546240503013,
2097
+ "grad_norm": 0.06642451137304306,
2098
+ "learning_rate": 1.1501738680919084e-09,
2099
+ "loss": 0.033472076058387756,
2100
+ "step": 2860
2101
+ }
2102
+ ],
2103
+ "logging_steps": 10,
2104
+ "max_steps": 2865,
2105
+ "num_input_tokens_seen": 0,
2106
+ "num_train_epochs": 3,
2107
+ "save_steps": 250,
2108
+ "stateful_callbacks": {
2109
+ "TrainerControl": {
2110
+ "args": {
2111
+ "should_epoch_stop": false,
2112
+ "should_evaluate": false,
2113
+ "should_log": false,
2114
+ "should_save": true,
2115
+ "should_training_stop": true
2116
+ },
2117
+ "attributes": {}
2118
+ }
2119
+ },
2120
+ "total_flos": 9.031737271887514e+17,
2121
+ "train_batch_size": 2,
2122
+ "trial_name": null,
2123
+ "trial_params": null
2124
+ }
checkpoint-2865/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e8fc737554ff6f82c4ea137b5313611e3b2b3b63fd69b3926d6b1fe9da14c0a6
3
+ size 5201
merged/chat_template.jinja ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- set image_count = namespace(value=0) %}
2
+ {%- set video_count = namespace(value=0) %}
3
+ {%- macro render_content(content, do_vision_count, is_system_content=false) %}
4
+ {%- if content is string %}
5
+ {{- content }}
6
+ {%- elif content is iterable and content is not mapping %}
7
+ {%- for item in content %}
8
+ {%- if 'image' in item or 'image_url' in item or item.type == 'image' %}
9
+ {%- if is_system_content %}
10
+ {{- raise_exception('System message cannot contain images.') }}
11
+ {%- endif %}
12
+ {%- if do_vision_count %}
13
+ {%- set image_count.value = image_count.value + 1 %}
14
+ {%- endif %}
15
+ {%- if add_vision_id %}
16
+ {{- 'Picture ' ~ image_count.value ~ ': ' }}
17
+ {%- endif %}
18
+ {{- '<|vision_start|><|image_pad|><|vision_end|>' }}
19
+ {%- elif 'video' in item or item.type == 'video' %}
20
+ {%- if is_system_content %}
21
+ {{- raise_exception('System message cannot contain videos.') }}
22
+ {%- endif %}
23
+ {%- if do_vision_count %}
24
+ {%- set video_count.value = video_count.value + 1 %}
25
+ {%- endif %}
26
+ {%- if add_vision_id %}
27
+ {{- 'Video ' ~ video_count.value ~ ': ' }}
28
+ {%- endif %}
29
+ {{- '<|vision_start|><|video_pad|><|vision_end|>' }}
30
+ {%- elif 'text' in item %}
31
+ {{- item.text }}
32
+ {%- else %}
33
+ {{- raise_exception('Unexpected item type in content.') }}
34
+ {%- endif %}
35
+ {%- endfor %}
36
+ {%- elif content is none or content is undefined %}
37
+ {{- '' }}
38
+ {%- else %}
39
+ {{- raise_exception('Unexpected content type.') }}
40
+ {%- endif %}
41
+ {%- endmacro %}
42
+ {%- if not messages %}
43
+ {{- raise_exception('No messages provided.') }}
44
+ {%- endif %}
45
+ {%- if tools and tools is iterable and tools is not mapping %}
46
+ {{- '<|im_start|>system\n' }}
47
+ {{- "# Tools\n\nYou have access to the following functions:\n\n<tools>" }}
48
+ {%- for tool in tools %}
49
+ {{- "\n" }}
50
+ {{- tool | tojson }}
51
+ {%- endfor %}
52
+ {{- "\n</tools>" }}
53
+ {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
54
+ {%- if messages[0].role == 'system' %}
55
+ {%- set content = render_content(messages[0].content, false, true)|trim %}
56
+ {%- if content %}
57
+ {{- '\n\n' + content }}
58
+ {%- endif %}
59
+ {%- endif %}
60
+ {{- '<|im_end|>\n' }}
61
+ {%- else %}
62
+ {%- if messages[0].role == 'system' %}
63
+ {%- set content = render_content(messages[0].content, false, true)|trim %}
64
+ {{- '<|im_start|>system\n' + content + '<|im_end|>\n' }}
65
+ {%- endif %}
66
+ {%- endif %}
67
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
68
+ {%- for message in messages[::-1] %}
69
+ {%- set index = (messages|length - 1) - loop.index0 %}
70
+ {%- if ns.multi_step_tool and message.role == "user" %}
71
+ {%- set content = render_content(message.content, false)|trim %}
72
+ {%- if not(content.startswith('<tool_response>') and content.endswith('</tool_response>')) %}
73
+ {%- set ns.multi_step_tool = false %}
74
+ {%- set ns.last_query_index = index %}
75
+ {%- endif %}
76
+ {%- endif %}
77
+ {%- endfor %}
78
+ {%- if ns.multi_step_tool %}
79
+ {{- raise_exception('No user query found in messages.') }}
80
+ {%- endif %}
81
+ {%- for message in messages %}
82
+ {%- set content = render_content(message.content, true)|trim %}
83
+ {%- if message.role == "system" %}
84
+ {%- if not loop.first %}
85
+ {{- raise_exception('System message must be at the beginning.') }}
86
+ {%- endif %}
87
+ {%- elif message.role == "user" %}
88
+ {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
89
+ {%- elif message.role == "assistant" %}
90
+ {%- set reasoning_content = '' %}
91
+ {%- if message.reasoning_content is string %}
92
+ {%- set reasoning_content = message.reasoning_content %}
93
+ {%- else %}
94
+ {%- if '</think>' in content %}
95
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
96
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
97
+ {%- endif %}
98
+ {%- endif %}
99
+ {%- set reasoning_content = reasoning_content|trim %}
100
+ {%- if loop.index0 > ns.last_query_index %}
101
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content + '\n</think>\n\n' + content }}
102
+ {%- else %}
103
+ {{- '<|im_start|>' + message.role + '\n' + content }}
104
+ {%- endif %}
105
+ {%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}
106
+ {%- for tool_call in message.tool_calls %}
107
+ {%- if tool_call.function is defined %}
108
+ {%- set tool_call = tool_call.function %}
109
+ {%- endif %}
110
+ {%- if loop.first %}
111
+ {%- if content|trim %}
112
+ {{- '\n\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
113
+ {%- else %}
114
+ {{- '<tool_call>\n<function=' + tool_call.name + '>\n' }}
115
+ {%- endif %}
116
+ {%- else %}
117
+ {{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
118
+ {%- endif %}
119
+ {%- if tool_call.arguments is defined %}
120
+ {%- for args_name, args_value in tool_call.arguments|items %}
121
+ {{- '<parameter=' + args_name + '>\n' }}
122
+ {%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}
123
+ {{- args_value }}
124
+ {{- '\n</parameter>\n' }}
125
+ {%- endfor %}
126
+ {%- endif %}
127
+ {{- '</function>\n</tool_call>' }}
128
+ {%- endfor %}
129
+ {%- endif %}
130
+ {{- '<|im_end|>\n' }}
131
+ {%- elif message.role == "tool" %}
132
+ {%- if loop.previtem and loop.previtem.role != "tool" %}
133
+ {{- '<|im_start|>user' }}
134
+ {%- endif %}
135
+ {{- '\n<tool_response>\n' }}
136
+ {{- content }}
137
+ {{- '\n</tool_response>' }}
138
+ {%- if not loop.last and loop.nextitem.role != "tool" %}
139
+ {{- '<|im_end|>\n' }}
140
+ {%- elif loop.last %}
141
+ {{- '<|im_end|>\n' }}
142
+ {%- endif %}
143
+ {%- else %}
144
+ {{- raise_exception('Unexpected message role.') }}
145
+ {%- endif %}
146
+ {%- endfor %}
147
+ {%- if add_generation_prompt %}
148
+ {{- '<|im_start|>assistant\n' }}
149
+ {%- if enable_thinking is defined and enable_thinking is true %}
150
+ {{- '<think>\n' }}
151
+ {%- else %}
152
+ {{- '<think>\n\n</think>\n\n' }}
153
+ {%- endif %}
154
+ {%- endif %}
merged/config.json ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen3_5ForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "attn_output_gate": true,
8
+ "bos_token_id": null,
9
+ "dtype": "bfloat16",
10
+ "eos_token_id": 248046,
11
+ "full_attention_interval": 4,
12
+ "head_dim": 256,
13
+ "hidden_act": "silu",
14
+ "hidden_size": 2048,
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 6144,
17
+ "layer_types": [
18
+ "linear_attention",
19
+ "linear_attention",
20
+ "linear_attention",
21
+ "full_attention",
22
+ "linear_attention",
23
+ "linear_attention",
24
+ "linear_attention",
25
+ "full_attention",
26
+ "linear_attention",
27
+ "linear_attention",
28
+ "linear_attention",
29
+ "full_attention",
30
+ "linear_attention",
31
+ "linear_attention",
32
+ "linear_attention",
33
+ "full_attention",
34
+ "linear_attention",
35
+ "linear_attention",
36
+ "linear_attention",
37
+ "full_attention",
38
+ "linear_attention",
39
+ "linear_attention",
40
+ "linear_attention",
41
+ "full_attention"
42
+ ],
43
+ "linear_conv_kernel_dim": 4,
44
+ "linear_key_head_dim": 128,
45
+ "linear_num_key_heads": 16,
46
+ "linear_num_value_heads": 16,
47
+ "linear_value_head_dim": 128,
48
+ "mamba_ssm_dtype": "float32",
49
+ "max_position_embeddings": 262144,
50
+ "mlp_only_layers": [],
51
+ "model_type": "qwen3_5_text",
52
+ "mtp_num_hidden_layers": 1,
53
+ "mtp_use_dedicated_embeddings": false,
54
+ "num_attention_heads": 8,
55
+ "num_hidden_layers": 24,
56
+ "num_key_value_heads": 2,
57
+ "pad_token_id": 248044,
58
+ "partial_rotary_factor": 0.25,
59
+ "rms_norm_eps": 1e-06,
60
+ "rope_parameters": {
61
+ "mrope_interleaved": true,
62
+ "mrope_section": [
63
+ 11,
64
+ 11,
65
+ 10
66
+ ],
67
+ "partial_rotary_factor": 0.25,
68
+ "rope_theta": 10000000,
69
+ "rope_type": "default"
70
+ },
71
+ "tie_word_embeddings": true,
72
+ "transformers_version": "5.3.0",
73
+ "use_cache": false,
74
+ "vocab_size": 248320
75
+ }
merged/generation_config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "eos_token_id": [
4
+ 248046,
5
+ 248044
6
+ ],
7
+ "pad_token_id": 248044,
8
+ "transformers_version": "5.3.0",
9
+ "use_cache": true
10
+ }
merged/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6cd7701ff97b7561e0dc158635941426e2a6b9fce824390e6995c9485605b2b7
3
+ size 3763692048
merged/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:87a7830d63fcf43bf241c3c5242e96e62dd3fdc29224ca26fed8ea333db72de4
3
+ size 19989343
merged/tokenizer_config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "audio_bos_token": "<|audio_start|>",
4
+ "audio_eos_token": "<|audio_end|>",
5
+ "audio_token": "<|audio_pad|>",
6
+ "backend": "tokenizers",
7
+ "bos_token": null,
8
+ "clean_up_tokenization_spaces": false,
9
+ "eos_token": "<|im_end|>",
10
+ "errors": "replace",
11
+ "image_token": "<|image_pad|>",
12
+ "is_local": true,
13
+ "model_max_length": 262144,
14
+ "model_specific_special_tokens": {
15
+ "audio_bos_token": "<|audio_start|>",
16
+ "audio_eos_token": "<|audio_end|>",
17
+ "audio_token": "<|audio_pad|>",
18
+ "image_token": "<|image_pad|>",
19
+ "video_token": "<|video_pad|>",
20
+ "vision_bos_token": "<|vision_start|>",
21
+ "vision_eos_token": "<|vision_end|>"
22
+ },
23
+ "pad_token": "<|endoftext|>",
24
+ "pretokenize_regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?[\\p{L}\\p{M}]+|\\p{N}| ?[^\\s\\p{L}\\p{M}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
25
+ "split_special_tokens": false,
26
+ "tokenizer_class": "TokenizersBackend",
27
+ "unk_token": null,
28
+ "video_token": "<|video_pad|>",
29
+ "vision_bos_token": "<|vision_start|>",
30
+ "vision_eos_token": "<|vision_end|>"
31
+ }
run-config.txt ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model=/data/pretrained_models/Qwen3.5-2B
2
+ data=/home/lg/workflow_tooluse/Flow_RL_luogan/temp/metamath/metamath-output/setmm-train-qwen35-4b-mixed-12000
3
+ max_length=6144
4
+ micro_batch_size=2
5
+ gradient_accumulation_steps=8
6
+ effective_batch_size=16
7
+ learning_rate=1e-4
8
+ num_train_epochs=3
9
+ direct_ref_mode=same-file-distractors
10
+ same_file_distractor_direct_refs=4
11
+ distractor_seed=0
12
+ gpu_ids=1
skipped-tokenization.jsonl ADDED
@@ -0,0 +1,422 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"label": "upgrwlkedg", "expanded_label": "upgriswlk", "source": "expanded", "tokens": 9508, "reason": "max_length"}
2
+ {"label": "iiconn", "expanded_label": "1re", "source": "expanded", "tokens": 7147, "reason": "max_length"}
3
+ {"label": "dprd2db", "expanded_label": "dprdspan", "source": "expanded", "tokens": 7426, "reason": "max_length"}
4
+ {"label": "pm2mpf", "expanded_label": "pm2mpval", "source": "expanded", "tokens": 8963, "reason": "max_length"}
5
+ {"label": "phllvec", "expanded_label": "isphl", "source": "expanded", "tokens": 10796, "reason": "max_length"}
6
+ {"label": "xrlelttr", "expanded_label": "xrlttr", "source": "expanded", "tokens": 6891, "reason": "max_length"}
7
+ {"label": "trlres", "expanded_label": "wlkres", "source": "expanded", "tokens": 17184, "reason": "max_length"}
8
+ {"label": "pcorev", "expanded_label": "pcorevlem", "source": "expanded", "tokens": 53452, "reason": "max_length"}
9
+ {"label": "clwlkclwwlkf1o", "expanded_label": "clwlkclwwlkfo", "source": "expanded", "tokens": 11298, "reason": "max_length"}
10
+ {"label": "divcan7", "expanded_label": "divdivdiv", "source": "expanded", "tokens": 9114, "reason": "max_length"}
11
+ {"label": "ghmfghm", "expanded_label": "ghmgrp", "source": "expanded", "tokens": 6287, "reason": "max_length"}
12
+ {"label": "zrzeroorngc", "expanded_label": "zrtermorngc", "source": "expanded", "tokens": 9359, "reason": "max_length"}
13
+ {"label": "srgmgp", "expanded_label": "issrg", "source": "expanded", "tokens": 12562, "reason": "max_length"}
14
+ {"label": "m1bits", "expanded_label": "bitscmp", "source": "expanded", "tokens": 15084, "reason": "max_length"}
15
+ {"label": "resinhcl", "expanded_label": "sinhval", "source": "expanded", "tokens": 6694, "reason": "max_length"}
16
+ {"label": "orngogrp", "expanded_label": "isorng", "source": "expanded", "tokens": 9577, "reason": "max_length"}
17
+ {"label": "psrasclcl", "expanded_label": "psrring", "source": "expanded", "tokens": 8539, "reason": "max_length"}
18
+ {"label": "usgrexmpl1", "expanded_label": "usgrexmpl1lem", "source": "expanded", "tokens": 23243, "reason": "max_length"}
19
+ {"label": "lssnvc", "expanded_label": "lssnlm", "source": "expanded", "tokens": 9394, "reason": "max_length"}
20
+ {"label": "xlemul2a", "expanded_label": "xlemul1a", "source": "expanded", "tokens": 17516, "reason": "max_length"}
21
+ {"label": "cphphl", "expanded_label": "iscph", "source": "expanded", "tokens": 8762, "reason": "max_length"}
22
+ {"label": "fclstop", "expanded_label": "isfcls", "source": "expanded", "tokens": 7205, "reason": "max_length"}
23
+ {"label": "lvecprop2d", "expanded_label": "lmodprop2d", "source": "expanded", "tokens": 33457, "reason": "max_length"}
24
+ {"label": "drhmsubcALTV", "expanded_label": "srhmsubcALTV", "source": "expanded", "tokens": 13003, "reason": "max_length"}
25
+ {"label": "nsmndex1", "expanded_label": "smndex1n0mnd", "source": "expanded", "tokens": 8470, "reason": "max_length"}
26
+ {"label": "ringcidALTV", "expanded_label": "ringccatidALTV", "source": "expanded", "tokens": 31357, "reason": "max_length"}
27
+ {"label": "ghmima", "expanded_label": "ghmrn", "source": "expanded", "tokens": 8915, "reason": "max_length"}
28
+ {"label": "pmatassa", "expanded_label": "matassa", "source": "expanded", "tokens": 9716, "reason": "max_length"}
29
+ {"label": "addgt0", "expanded_label": "00id", "source": "expanded", "tokens": 8215, "reason": "max_length"}
30
+ {"label": "ser0", "expanded_label": "00id", "source": "expanded", "tokens": 8607, "reason": "max_length"}
31
+ {"label": "mulgfn", "expanded_label": "mulgfval", "source": "expanded", "tokens": 15315, "reason": "max_length"}
32
+ {"label": "usgrexmpl2", "expanded_label": "usgrexmpl2lem", "source": "expanded", "tokens": 22678, "reason": "max_length"}
33
+ {"label": "indistps2ALT", "expanded_label": "indistopon", "source": "expanded", "tokens": 6910, "reason": "max_length"}
34
+ {"label": "pfx0", "expanded_label": "swrd0", "source": "expanded", "tokens": 7535, "reason": "max_length"}
35
+ {"label": "dprd2db", "expanded_label": "dprd2da", "source": "expanded", "tokens": 71940, "reason": "max_length"}
36
+ {"label": "fmfg", "expanded_label": "fgcl", "source": "expanded", "tokens": 8599, "reason": "max_length"}
37
+ {"label": "numufl", "expanded_label": "filssufilg", "source": "expanded", "tokens": 9093, "reason": "max_length"}
38
+ {"label": "wlkiswwlkupgr", "expanded_label": "wlkiswwlksupgr2", "source": "expanded", "tokens": 12895, "reason": "max_length"}
39
+ {"label": "addcomsr", "expanded_label": "addsrpr", "source": "expanded", "tokens": 14104, "reason": "max_length"}
40
+ {"label": "mat1rhm", "expanded_label": "matring", "source": "expanded", "tokens": 18352, "reason": "max_length"}
41
+ {"label": "dpjid", "expanded_label": "dpjidcl", "source": "expanded", "tokens": 38662, "reason": "max_length"}
42
+ {"label": "metcn4", "expanded_label": "met1stc", "source": "expanded", "tokens": 13166, "reason": "max_length"}
43
+ {"label": "mhmf", "expanded_label": "ismhm", "source": "expanded", "tokens": 7423, "reason": "max_length"}
44
+ {"label": "pr01ssre", "expanded_label": "1re", "source": "expanded", "tokens": 6787, "reason": "max_length"}
45
+ {"label": "zcld2", "expanded_label": "recld2", "source": "expanded", "tokens": 7912, "reason": "max_length"}
46
+ {"label": "submgmmgm", "expanded_label": "issubmgm2", "source": "expanded", "tokens": 6349, "reason": "max_length"}
47
+ {"label": "rehaus", "expanded_label": "tgioo", "source": "expanded", "tokens": 22231, "reason": "max_length"}
48
+ {"label": "pmat1op", "expanded_label": "mat1", "source": "expanded", "tokens": 6209, "reason": "max_length"}
49
+ {"label": "sincos1sgn", "expanded_label": "1re", "source": "expanded", "tokens": 9636, "reason": "max_length"}
50
+ {"label": "wwlksnonfi", "expanded_label": "wwlksnfi", "source": "expanded", "tokens": 6363, "reason": "max_length"}
51
+ {"label": "1rp", "expanded_label": "1re", "source": "expanded", "tokens": 6599, "reason": "max_length"}
52
+ {"label": "qtopcmp", "expanded_label": "cncmp", "source": "expanded", "tokens": 16895, "reason": "max_length"}
53
+ {"label": "sqrtle", "expanded_label": "resqrtcl", "source": "expanded", "tokens": 7775, "reason": "max_length"}
54
+ {"label": "opsrsca", "expanded_label": "psrsca", "source": "expanded", "tokens": 6160, "reason": "max_length"}
55
+ {"label": "pgpfi2", "expanded_label": "pgpfi", "source": "expanded", "tokens": 18037, "reason": "max_length"}
56
+ {"label": "ere", "expanded_label": "1re", "source": "expanded", "tokens": 6736, "reason": "max_length"}
57
+ {"label": "sgrp2nmnd", "expanded_label": "sgrp2nmndlem5", "source": "expanded", "tokens": 6277, "reason": "max_length"}
58
+ {"label": "tgtopon", "expanded_label": "tgcl", "source": "expanded", "tokens": 9236, "reason": "max_length"}
59
+ {"label": "cnptop1", "expanded_label": "iscnp2", "source": "expanded", "tokens": 9488, "reason": "max_length"}
60
+ {"label": "fprod0diag", "expanded_label": "fsum0diaglem", "source": "expanded", "tokens": 6497, "reason": "max_length"}
61
+ {"label": "zprodn0", "expanded_label": "zprod", "source": "expanded", "tokens": 18101, "reason": "max_length"}
62
+ {"label": "ishtpyd", "expanded_label": "ishtpy", "source": "expanded", "tokens": 8698, "reason": "max_length"}
63
+ {"label": "axlttrn", "expanded_label": "ltxrlt", "source": "expanded", "tokens": 8594, "reason": "max_length"}
64
+ {"label": "xrltletr", "expanded_label": "xrlttr", "source": "expanded", "tokens": 6927, "reason": "max_length"}
65
+ {"label": "m2cpmf1o", "expanded_label": "m2cpmfo", "source": "expanded", "tokens": 6756, "reason": "max_length"}
66
+ {"label": "lgricngricex", "expanded_label": "gpg5grlic", "source": "expanded", "tokens": 6365, "reason": "max_length"}
67
+ {"label": "nnrecre", "expanded_label": "1re", "source": "expanded", "tokens": 6790, "reason": "max_length"}
68
+ {"label": "grimedgi", "expanded_label": "grimedg", "source": "expanded", "tokens": 21386, "reason": "max_length"}
69
+ {"label": "neg1lt0", "expanded_label": "1re", "source": "expanded", "tokens": 6830, "reason": "max_length"}
70
+ {"label": "pi1xfrgim", "expanded_label": "pi1xfrcnv", "source": "expanded", "tokens": 19046, "reason": "max_length"}
71
+ {"label": "metreg", "expanded_label": "methaus", "source": "expanded", "tokens": 11319, "reason": "max_length"}
72
+ {"label": "fldiv2", "expanded_label": "fldiv", "source": "expanded", "tokens": 15130, "reason": "max_length"}
73
+ {"label": "erclwwlkn", "expanded_label": "erclwwlknsym", "source": "expanded", "tokens": 7474, "reason": "max_length"}
74
+ {"label": "uvtx2vtx1edg", "expanded_label": "nbgr2vtx1edg", "source": "expanded", "tokens": 8190, "reason": "max_length"}
75
+ {"label": "recms", "expanded_label": "recld2", "source": "expanded", "tokens": 7878, "reason": "max_length"}
76
+ {"label": "iccordt", "expanded_label": "letsr", "source": "expanded", "tokens": 6694, "reason": "max_length"}
77
+ {"label": "crhmsubc", "expanded_label": "srhmsubc", "source": "expanded", "tokens": 12634, "reason": "max_length"}
78
+ {"label": "xmul02", "expanded_label": "xmulcom", "source": "expanded", "tokens": 8030, "reason": "max_length"}
79
+ {"label": "pcabs", "expanded_label": "pcneg", "source": "expanded", "tokens": 8094, "reason": "max_length"}
80
+ {"label": "indisuni", "expanded_label": "indistopon", "source": "expanded", "tokens": 7576, "reason": "max_length"}
81
+ {"label": "rngqiprngho", "expanded_label": "rngqiprngghm", "source": "expanded", "tokens": 10189, "reason": "max_length"}
82
+ {"label": "peano2re", "expanded_label": "1re", "source": "expanded", "tokens": 6658, "reason": "max_length"}
83
+ {"label": "seqabs", "expanded_label": "fsumabs", "source": "expanded", "tokens": 19216, "reason": "max_length"}
84
+ {"label": "prmrp", "expanded_label": "coprm", "source": "expanded", "tokens": 7707, "reason": "max_length"}
85
+ {"label": "addsub", "expanded_label": "addcom", "source": "expanded", "tokens": 6963, "reason": "max_length"}
86
+ {"label": "xmullid", "expanded_label": "xmulcom", "source": "expanded", "tokens": 8165, "reason": "max_length"}
87
+ {"label": "lsslmod", "expanded_label": "islss3", "source": "expanded", "tokens": 11153, "reason": "max_length"}
88
+ {"label": "mat0dimid", "expanded_label": "matring", "source": "expanded", "tokens": 16819, "reason": "max_length"}
89
+ {"label": "1lt4", "expanded_label": "1re", "source": "expanded", "tokens": 6930, "reason": "max_length"}
90
+ {"label": "2ndctop", "expanded_label": "tgcl", "source": "expanded", "tokens": 9706, "reason": "max_length"}
91
+ {"label": "ordthmeo", "expanded_label": "isocnv", "source": "expanded", "tokens": 6246, "reason": "max_length"}
92
+ {"label": "9re", "expanded_label": "1re", "source": "expanded", "tokens": 6826, "reason": "max_length"}
93
+ {"label": "addgegt0", "expanded_label": "00id", "source": "expanded", "tokens": 8276, "reason": "max_length"}
94
+ {"label": "mdetuni", "expanded_label": "mdetuni0", "source": "expanded", "tokens": 36640, "reason": "max_length"}
95
+ {"label": "wlkswwlksen", "expanded_label": "wlkswwlksf1o", "source": "expanded", "tokens": 8077, "reason": "max_length"}
96
+ {"label": "cphnlm", "expanded_label": "iscph", "source": "expanded", "tokens": 8530, "reason": "max_length"}
97
+ {"label": "algcvgb", "expanded_label": "algcvgblem", "source": "expanded", "tokens": 7042, "reason": "max_length"}
98
+ {"label": "neglcm", "expanded_label": "lcmneg", "source": "expanded", "tokens": 8580, "reason": "max_length"}
99
+ {"label": "metflem", "expanded_label": "ismet", "source": "expanded", "tokens": 6299, "reason": "max_length"}
100
+ {"label": "nmoleub2b", "expanded_label": "nmoleub2lem2", "source": "expanded", "tokens": 13552, "reason": "max_length"}
101
+ {"label": "nbedgusgr", "expanded_label": "hasheqf1oi", "source": "expanded", "tokens": 7832, "reason": "max_length"}
102
+ {"label": "drhmsubc", "expanded_label": "srhmsubc", "source": "expanded", "tokens": 12663, "reason": "max_length"}
103
+ {"label": "nnpw2blenfzo2", "expanded_label": "elfzolborelfzop1", "source": "expanded", "tokens": 7424, "reason": "max_length"}
104
+ {"label": "lssvancl2", "expanded_label": "lmodcom", "source": "expanded", "tokens": 11067, "reason": "max_length"}
105
+ {"label": "ordthmeo", "expanded_label": "ordthmeolem", "source": "expanded", "tokens": 26266, "reason": "max_length"}
106
+ {"label": "ismhmd", "expanded_label": "ismhm", "source": "expanded", "tokens": 7438, "reason": "max_length"}
107
+ {"label": "crctcshwlk", "expanded_label": "crctcshlem4", "source": "expanded", "tokens": 6817, "reason": "max_length"}
108
+ {"label": "0reALT", "expanded_label": "1re", "source": "expanded", "tokens": 8961, "reason": "max_length"}
109
+ {"label": "fsumshft", "expanded_label": "mptfzshft", "source": "expanded", "tokens": 7431, "reason": "max_length"}
110
+ {"label": "ltm1", "expanded_label": "1re", "source": "expanded", "tokens": 7209, "reason": "max_length"}
111
+ {"label": "subcn", "expanded_label": "addcnlem", "source": "expanded", "tokens": 10706, "reason": "max_length"}
112
+ {"label": "bastop", "expanded_label": "tgcl", "source": "expanded", "tokens": 9261, "reason": "max_length"}
113
+ {"label": "frgrncvvdeqlem10", "expanded_label": "frgrncvvdeqlem9", "source": "expanded", "tokens": 9599, "reason": "max_length"}
114
+ {"label": "upgrwlkupwlkb", "expanded_label": "upgrwlkupwlk", "source": "expanded", "tokens": 12711, "reason": "max_length"}
115
+ {"label": "gexdvds2", "expanded_label": "oddvds", "source": "expanded", "tokens": 8924, "reason": "max_length"}
116
+ {"label": "mat1ov", "expanded_label": "mat1", "source": "expanded", "tokens": 7234, "reason": "max_length"}
117
+ {"label": "pgjsgr", "expanded_label": "gpgusgra", "source": "expanded", "tokens": 8100, "reason": "max_length"}
118
+ {"label": "m2cpmghm", "expanded_label": "mat2pmatghm", "source": "expanded", "tokens": 14236, "reason": "max_length"}
119
+ {"label": "cnconst", "expanded_label": "cnconst2", "source": "expanded", "tokens": 7114, "reason": "max_length"}
120
+ {"label": "4re", "expanded_label": "1re", "source": "expanded", "tokens": 6779, "reason": "max_length"}
121
+ {"label": "hausflf", "expanded_label": "hausflimi", "source": "expanded", "tokens": 6580, "reason": "max_length"}
122
+ {"label": "gcdmodi", "expanded_label": "modgcd", "source": "expanded", "tokens": 7405, "reason": "max_length"}
123
+ {"label": "xmetutop", "expanded_label": "psmetutop", "source": "expanded", "tokens": 14660, "reason": "max_length"}
124
+ {"label": "issdrg2", "expanded_label": "issubdrg", "source": "expanded", "tokens": 12046, "reason": "max_length"}
125
+ {"label": "uspgrsprf1o", "expanded_label": "uspgrsprfo", "source": "expanded", "tokens": 9662, "reason": "max_length"}
126
+ {"label": "subrgsubrng", "expanded_label": "ringrng", "source": "expanded", "tokens": 6980, "reason": "max_length"}
127
+ {"label": "symgextf1o", "expanded_label": "symgextf1", "source": "expanded", "tokens": 8019, "reason": "max_length"}
128
+ {"label": "mulgp1", "expanded_label": "mulgdir", "source": "expanded", "tokens": 11090, "reason": "max_length"}
129
+ {"label": "nbusgrvtx", "expanded_label": "nbumgrvtx", "source": "expanded", "tokens": 6729, "reason": "max_length"}
130
+ {"label": "rlimcn1b", "expanded_label": "rlimcn1", "source": "expanded", "tokens": 7000, "reason": "max_length"}
131
+ {"label": "subid", "expanded_label": "addrid", "source": "expanded", "tokens": 10979, "reason": "max_length"}
132
+ {"label": "rngridlmcl", "expanded_label": "opprrng", "source": "expanded", "tokens": 7855, "reason": "max_length"}
133
+ {"label": "fusgr1th", "expanded_label": "finsumvtxdg2size", "source": "expanded", "tokens": 15434, "reason": "max_length"}
134
+ {"label": "cnmptc", "expanded_label": "cnconst2", "source": "expanded", "tokens": 6700, "reason": "max_length"}
135
+ {"label": "divalgmodcl", "expanded_label": "divalgmod", "source": "expanded", "tokens": 7672, "reason": "max_length"}
136
+ {"label": "cpmatsrgpmat", "expanded_label": "cpmatmcl", "source": "expanded", "tokens": 7671, "reason": "max_length"}
137
+ {"label": "6re", "expanded_label": "1re", "source": "expanded", "tokens": 6977, "reason": "max_length"}
138
+ {"label": "pn0sr", "expanded_label": "1idsr", "source": "expanded", "tokens": 6286, "reason": "max_length"}
139
+ {"label": "mat1bas", "expanded_label": "matring", "source": "expanded", "tokens": 15980, "reason": "max_length"}
140
+ {"label": "grlicer", "expanded_label": "grlictr", "source": "expanded", "tokens": 11391, "reason": "max_length"}
141
+ {"label": "wlk1ewlk", "expanded_label": "wlk1walk", "source": "expanded", "tokens": 20611, "reason": "max_length"}
142
+ {"label": "gexdvds2", "expanded_label": "gexdvds", "source": "expanded", "tokens": 17221, "reason": "max_length"}
143
+ {"label": "expcncf", "expanded_label": "expcn", "source": "expanded", "tokens": 6499, "reason": "max_length"}
144
+ {"label": "reexpcl", "expanded_label": "1re", "source": "expanded", "tokens": 6871, "reason": "max_length"}
145
+ {"label": "erclwwlk", "expanded_label": "erclwwlktr", "source": "expanded", "tokens": 10470, "reason": "max_length"}
146
+ {"label": "expp1z", "expanded_label": "expaddz", "source": "expanded", "tokens": 11654, "reason": "max_length"}
147
+ {"label": "1wlkd", "expanded_label": "1wlkdlem4", "source": "expanded", "tokens": 7723, "reason": "max_length"}
148
+ {"label": "symgfixf1o", "expanded_label": "symgfixf1", "source": "expanded", "tokens": 8441, "reason": "max_length"}
149
+ {"label": "htpycn", "expanded_label": "ishtpy", "source": "expanded", "tokens": 8322, "reason": "max_length"}
150
+ {"label": "ringccatALTV", "expanded_label": "ringccatidALTV", "source": "expanded", "tokens": 36133, "reason": "max_length"}
151
+ {"label": "dfnbgrss2", "expanded_label": "dfnbgr6", "source": "expanded", "tokens": 7749, "reason": "max_length"}
152
+ {"label": "locfintop", "expanded_label": "islocfin", "source": "expanded", "tokens": 7464, "reason": "max_length"}
153
+ {"label": "rngcidALTV", "expanded_label": "rngccatidALTV", "source": "expanded", "tokens": 31264, "reason": "max_length"}
154
+ {"label": "unitabl", "expanded_label": "unitgrp", "source": "expanded", "tokens": 19083, "reason": "max_length"}
155
+ {"label": "kqt0", "expanded_label": "kqt0lem", "source": "expanded", "tokens": 9261, "reason": "max_length"}
156
+ {"label": "recmet", "expanded_label": "recld2", "source": "expanded", "tokens": 8047, "reason": "max_length"}
157
+ {"label": "gagrp", "expanded_label": "isga", "source": "expanded", "tokens": 9295, "reason": "max_length"}
158
+ {"label": "rngccatALTV", "expanded_label": "rngccatidALTV", "source": "expanded", "tokens": 35837, "reason": "max_length"}
159
+ {"label": "mpl1", "expanded_label": "mplsubrg", "source": "expanded", "tokens": 7919, "reason": "max_length"}
160
+ {"label": "vtxduhgr0edgnel", "expanded_label": "vtxd0nedgb", "source": "expanded", "tokens": 7459, "reason": "max_length"}
161
+ {"label": "pmtrfmvdn0", "expanded_label": "pmtrfrn", "source": "expanded", "tokens": 8866, "reason": "max_length"}
162
+ {"label": "psrbagev2", "expanded_label": "psrbagev1", "source": "expanded", "tokens": 7226, "reason": "max_length"}
163
+ {"label": "rnghmghm", "expanded_label": "isrnghm", "source": "expanded", "tokens": 8318, "reason": "max_length"}
164
+ {"label": "erclwwlk", "expanded_label": "erclwwlksym", "source": "expanded", "tokens": 6259, "reason": "max_length"}
165
+ {"label": "numclwwlk2lem3", "expanded_label": "numclwlk2lem2f1o", "source": "expanded", "tokens": 16930, "reason": "max_length"}
166
+ {"label": "wwlksnonfi", "expanded_label": "iswwlksnon", "source": "expanded", "tokens": 7618, "reason": "max_length"}
167
+ {"label": "erclwwlkn", "expanded_label": "erclwwlkntr", "source": "expanded", "tokens": 13187, "reason": "max_length"}
168
+ {"label": "gcdn0cl", "expanded_label": "gcdcllem3", "source": "expanded", "tokens": 15171, "reason": "max_length"}
169
+ {"label": "symggen2", "expanded_label": "symggen", "source": "expanded", "tokens": 40700, "reason": "max_length"}
170
+ {"label": "mdet0f1o", "expanded_label": "mdet0pr", "source": "expanded", "tokens": 9891, "reason": "max_length"}
171
+ {"label": "ringgrp", "expanded_label": "isring", "source": "expanded", "tokens": 8062, "reason": "max_length"}
172
+ {"label": "mulginvinv", "expanded_label": "mulginvcom", "source": "expanded", "tokens": 10250, "reason": "max_length"}
173
+ {"label": "2zrng0", "expanded_label": "cncrng", "source": "expanded", "tokens": 8315, "reason": "max_length"}
174
+ {"label": "wlkv", "expanded_label": "wksfval", "source": "expanded", "tokens": 7814, "reason": "max_length"}
175
+ {"label": "isncvsngpd", "expanded_label": "isncvsngp", "source": "expanded", "tokens": 7861, "reason": "max_length"}
176
+ {"label": "rlimcn2", "expanded_label": "rlimcn3", "source": "expanded", "tokens": 11998, "reason": "max_length"}
177
+ {"label": "ghmco", "expanded_label": "mhmco", "source": "expanded", "tokens": 8559, "reason": "max_length"}
178
+ {"label": "qdensere2", "expanded_label": "tgioo", "source": "expanded", "tokens": 20132, "reason": "max_length"}
179
+ {"label": "vdgn1frgrv3", "expanded_label": "vdgn1frgrv2", "source": "expanded", "tokens": 8714, "reason": "max_length"}
180
+ {"label": "0le1", "expanded_label": "1re", "source": "expanded", "tokens": 6633, "reason": "max_length"}
181
+ {"label": "wrdlen2", "expanded_label": "wrdlen2i", "source": "expanded", "tokens": 7494, "reason": "max_length"}
182
+ {"label": "wlkiswwlkupgr", "expanded_label": "wlkiswwlks1", "source": "expanded", "tokens": 9327, "reason": "max_length"}
183
+ {"label": "nzrpropd", "expanded_label": "ringpropd", "source": "expanded", "tokens": 19645, "reason": "max_length"}
184
+ {"label": "unitlinv", "expanded_label": "unitgrp", "source": "expanded", "tokens": 13837, "reason": "max_length"}
185
+ {"label": "fmfil", "expanded_label": "fbasrn", "source": "expanded", "tokens": 14221, "reason": "max_length"}
186
+ {"label": "odhash2", "expanded_label": "odf1o2", "source": "expanded", "tokens": 12218, "reason": "max_length"}
187
+ {"label": "xmulmnf2", "expanded_label": "xmulcom", "source": "expanded", "tokens": 8195, "reason": "max_length"}
188
+ {"label": "2arymaptf1o", "expanded_label": "2arymaptfo", "source": "expanded", "tokens": 8887, "reason": "max_length"}
189
+ {"label": "ply1plusgpropd", "expanded_label": "psrplusgpropd", "source": "expanded", "tokens": 9265, "reason": "max_length"}
190
+ {"label": "1le2", "expanded_label": "1re", "source": "expanded", "tokens": 6748, "reason": "max_length"}
191
+ {"label": "uspgrsprf1o", "expanded_label": "uspgrsprf1", "source": "expanded", "tokens": 8507, "reason": "max_length"}
192
+ {"label": "numclwwlk1lem2f1o", "expanded_label": "numclwwlk1lem2fo", "source": "expanded", "tokens": 15086, "reason": "max_length"}
193
+ {"label": "unben", "expanded_label": "unbenlem", "source": "expanded", "tokens": 13691, "reason": "max_length"}
194
+ {"label": "frlmelbas", "expanded_label": "frlmbas", "source": "expanded", "tokens": 9104, "reason": "max_length"}
195
+ {"label": "1arith2", "expanded_label": "1arith", "source": "expanded", "tokens": 27416, "reason": "max_length"}
196
+ {"label": "1arymaptf1o", "expanded_label": "1arymaptf1", "source": "expanded", "tokens": 7210, "reason": "max_length"}
197
+ {"label": "sgrpssmgm", "expanded_label": "mgmnsgrpex", "source": "expanded", "tokens": 7533, "reason": "max_length"}
198
+ {"label": "pj1eq", "expanded_label": "pj1id", "source": "expanded", "tokens": 10282, "reason": "max_length"}
199
+ {"label": "isghmd", "expanded_label": "isghm", "source": "expanded", "tokens": 7692, "reason": "max_length"}
200
+ {"label": "ghmima", "expanded_label": "resghm", "source": "expanded", "tokens": 7045, "reason": "max_length"}
201
+ {"label": "znle", "expanded_label": "znval", "source": "expanded", "tokens": 8606, "reason": "max_length"}
202
+ {"label": "clnbfiusgrfi", "expanded_label": "fusgrfis", "source": "expanded", "tokens": 7030, "reason": "max_length"}
203
+ {"label": "assalmod", "expanded_label": "isassa", "source": "expanded", "tokens": 6614, "reason": "max_length"}
204
+ {"label": "xrs1cmn", "expanded_label": "xaddcom", "source": "expanded", "tokens": 6572, "reason": "max_length"}
205
+ {"label": "pcelnn", "expanded_label": "pcdvdsb", "source": "expanded", "tokens": 10977, "reason": "max_length"}
206
+ {"label": "indistpsALT", "expanded_label": "indistopon", "source": "expanded", "tokens": 7112, "reason": "max_length"}
207
+ {"label": "infssuzle", "expanded_label": "uzwo", "source": "expanded", "tokens": 9520, "reason": "max_length"}
208
+ {"label": "cpmatsrgpmat", "expanded_label": "1elcpmat", "source": "expanded", "tokens": 8164, "reason": "max_length"}
209
+ {"label": "scmatsrng", "expanded_label": "scmatmulcl", "source": "expanded", "tokens": 9402, "reason": "max_length"}
210
+ {"label": "oppgmndb", "expanded_label": "oppgmnd", "source": "expanded", "tokens": 7363, "reason": "max_length"}
211
+ {"label": "2zrngaabl", "expanded_label": "2zrngagrp", "source": "expanded", "tokens": 6148, "reason": "max_length"}
212
+ {"label": "lmodvaddsub4", "expanded_label": "abladdsub4", "source": "expanded", "tokens": 6739, "reason": "max_length"}
213
+ {"label": "m1modnnsub1", "expanded_label": "1re", "source": "expanded", "tokens": 7710, "reason": "max_length"}
214
+ {"label": "haushmph", "expanded_label": "cnhaus", "source": "expanded", "tokens": 13415, "reason": "max_length"}
215
+ {"label": "cncms", "expanded_label": "cncmet", "source": "expanded", "tokens": 9751, "reason": "max_length"}
216
+ {"label": "fprodxp", "expanded_label": "fprod2d", "source": "expanded", "tokens": 7010, "reason": "max_length"}
217
+ {"label": "fsummulc1", "expanded_label": "fsummulc2", "source": "expanded", "tokens": 10995, "reason": "max_length"}
218
+ {"label": "istop2g", "expanded_label": "fiint", "source": "expanded", "tokens": 17441, "reason": "max_length"}
219
+ {"label": "opsrlmod", "expanded_label": "psrlmod", "source": "expanded", "tokens": 21192, "reason": "max_length"}
220
+ {"label": "psradd", "expanded_label": "psrplusg", "source": "expanded", "tokens": 7929, "reason": "max_length"}
221
+ {"label": "phtpyhtpy", "expanded_label": "isphtpy", "source": "expanded", "tokens": 6188, "reason": "max_length"}
222
+ {"label": "refld", "expanded_label": "cncrng", "source": "expanded", "tokens": 8397, "reason": "max_length"}
223
+ {"label": "kgenuni", "expanded_label": "kgentopon", "source": "expanded", "tokens": 9648, "reason": "max_length"}
224
+ {"label": "clsdif", "expanded_label": "clsval2", "source": "expanded", "tokens": 9494, "reason": "max_length"}
225
+ {"label": "pm2mpgrpiso", "expanded_label": "pm2mpghm", "source": "expanded", "tokens": 23647, "reason": "max_length"}
226
+ {"label": "psrasclcl", "expanded_label": "psrlmod", "source": "expanded", "tokens": 18726, "reason": "max_length"}
227
+ {"label": "hashge2el2difb", "expanded_label": "hashge2el2dif", "source": "expanded", "tokens": 8135, "reason": "max_length"}
228
+ {"label": "relin01", "expanded_label": "1re", "source": "expanded", "tokens": 7726, "reason": "max_length"}
229
+ {"label": "nqerid", "expanded_label": "nqerf", "source": "expanded", "tokens": 7228, "reason": "max_length"}
230
+ {"label": "dsmmlmod", "expanded_label": "dsmmlss", "source": "expanded", "tokens": 11428, "reason": "max_length"}
231
+ {"label": "rhmisrnghm", "expanded_label": "ringrng", "source": "expanded", "tokens": 6625, "reason": "max_length"}
232
+ {"label": "orbstaval", "expanded_label": "gastacl", "source": "expanded", "tokens": 9593, "reason": "max_length"}
233
+ {"label": "mat0dim0", "expanded_label": "matring", "source": "expanded", "tokens": 16862, "reason": "max_length"}
234
+ {"label": "4pos", "expanded_label": "1re", "source": "expanded", "tokens": 6841, "reason": "max_length"}
235
+ {"label": "uspgredgleord", "expanded_label": "uspgredg2v", "source": "expanded", "tokens": 6377, "reason": "max_length"}
236
+ {"label": "restt1", "expanded_label": "cnt1", "source": "expanded", "tokens": 6228, "reason": "max_length"}
237
+ {"label": "metres2", "expanded_label": "xmetres2", "source": "expanded", "tokens": 6309, "reason": "max_length"}
238
+ {"label": "2pos", "expanded_label": "1re", "source": "expanded", "tokens": 8970, "reason": "max_length"}
239
+ {"label": "uzfbas", "expanded_label": "uzrest", "source": "expanded", "tokens": 8616, "reason": "max_length"}
240
+ {"label": "grpid", "expanded_label": "grprcan", "source": "expanded", "tokens": 10928, "reason": "max_length"}
241
+ {"label": "wwlksnextbij0", "expanded_label": "wwlksnextinj", "source": "expanded", "tokens": 17544, "reason": "max_length"}
242
+ {"label": "0le2", "expanded_label": "1re", "source": "expanded", "tokens": 9037, "reason": "max_length"}
243
+ {"label": "pmatring", "expanded_label": "matring", "source": "expanded", "tokens": 15581, "reason": "max_length"}
244
+ {"label": "elicc01", "expanded_label": "1re", "source": "expanded", "tokens": 6735, "reason": "max_length"}
245
+ {"label": "gpgprismgr4cycl0", "expanded_label": "gpgprismgr4cycllem11", "source": "expanded", "tokens": 6170, "reason": "max_length"}
246
+ {"label": "wlkiswwlks", "expanded_label": "wlkiswwlks1", "source": "expanded", "tokens": 9332, "reason": "max_length"}
247
+ {"label": "orbstaval", "expanded_label": "eqger", "source": "expanded", "tokens": 16573, "reason": "max_length"}
248
+ {"label": "nlmngp", "expanded_label": "isnlm", "source": "expanded", "tokens": 6526, "reason": "max_length"}
249
+ {"label": "cncdrg", "expanded_label": "cnsubrg", "source": "expanded", "tokens": 11658, "reason": "max_length"}
250
+ {"label": "txbasex", "expanded_label": "txuni2", "source": "expanded", "tokens": 6234, "reason": "max_length"}
251
+ {"label": "neggcd", "expanded_label": "gcdneg", "source": "expanded", "tokens": 6270, "reason": "max_length"}
252
+ {"label": "0cnALT2", "expanded_label": "cnegex", "source": "expanded", "tokens": 8302, "reason": "max_length"}
253
+ {"label": "xrrest", "expanded_label": "xrtgioo", "source": "expanded", "tokens": 12124, "reason": "max_length"}
254
+ {"label": "evls1scafv", "expanded_label": "evls1sca", "source": "expanded", "tokens": 9522, "reason": "max_length"}
255
+ {"label": "assaring", "expanded_label": "isassa", "source": "expanded", "tokens": 6337, "reason": "max_length"}
256
+ {"label": "ramtub", "expanded_label": "ramcl2lem", "source": "expanded", "tokens": 6234, "reason": "max_length"}
257
+ {"label": "gaset", "expanded_label": "isga", "source": "expanded", "tokens": 8960, "reason": "max_length"}
258
+ {"label": "ringmgp", "expanded_label": "isring", "source": "expanded", "tokens": 8047, "reason": "max_length"}
259
+ {"label": "fusgrvtxdgonume", "expanded_label": "vtxdgoddnumeven", "source": "expanded", "tokens": 8717, "reason": "max_length"}
260
+ {"label": "1lt6", "expanded_label": "1re", "source": "expanded", "tokens": 6929, "reason": "max_length"}
261
+ {"label": "rngqiprngho", "expanded_label": "rngqiprnglin", "source": "expanded", "tokens": 9300, "reason": "max_length"}
262
+ {"label": "addnqf", "expanded_label": "nqerf", "source": "expanded", "tokens": 7503, "reason": "max_length"}
263
+ {"label": "1lt10", "expanded_label": "1re", "source": "expanded", "tokens": 6987, "reason": "max_length"}
264
+ {"label": "qtopconn", "expanded_label": "cnconn", "source": "expanded", "tokens": 9265, "reason": "max_length"}
265
+ {"label": "uvtx2vtx1edgb", "expanded_label": "nbuhgr2vtx1edgb", "source": "expanded", "tokens": 9405, "reason": "max_length"}
266
+ {"label": "rngqiprng", "expanded_label": "ringrng", "source": "expanded", "tokens": 6666, "reason": "max_length"}
267
+ {"label": "psgnfitr", "expanded_label": "symggrp", "source": "expanded", "tokens": 6239, "reason": "max_length"}
268
+ {"label": "resttop", "expanded_label": "tgrest", "source": "expanded", "tokens": 9629, "reason": "max_length"}
269
+ {"label": "frgpgrp", "expanded_label": "frgp0", "source": "expanded", "tokens": 17274, "reason": "max_length"}
270
+ {"label": "sqrt2irr0", "expanded_label": "sqrt2irr", "source": "expanded", "tokens": 11416, "reason": "max_length"}
271
+ {"label": "frgr2wsp1", "expanded_label": "wpthswwlks2on", "source": "expanded", "tokens": 10945, "reason": "max_length"}
272
+ {"label": "metelcls", "expanded_label": "met1stc", "source": "expanded", "tokens": 13311, "reason": "max_length"}
273
+ {"label": "crhmsubcALTV", "expanded_label": "srhmsubcALTV", "source": "expanded", "tokens": 12910, "reason": "max_length"}
274
+ {"label": "sgrp2nmnd", "expanded_label": "sgrp2nmndlem4", "source": "expanded", "tokens": 17737, "reason": "max_length"}
275
+ {"label": "phlsrng", "expanded_label": "isphl", "source": "expanded", "tokens": 10453, "reason": "max_length"}
276
+ {"label": "subcn", "expanded_label": "subcn2", "source": "expanded", "tokens": 7032, "reason": "max_length"}
277
+ {"label": "utoptopon", "expanded_label": "utoptop", "source": "expanded", "tokens": 10837, "reason": "max_length"}
278
+ {"label": "metcnp4", "expanded_label": "met1stc", "source": "expanded", "tokens": 13317, "reason": "max_length"}
279
+ {"label": "rhmpsrlem1", "expanded_label": "psrbaglefi", "source": "expanded", "tokens": 8707, "reason": "max_length"}
280
+ {"label": "fcfelbas", "expanded_label": "fcfval", "source": "expanded", "tokens": 6815, "reason": "max_length"}
281
+ {"label": "opprneg", "expanded_label": "grpinvfval", "source": "expanded", "tokens": 7126, "reason": "max_length"}
282
+ {"label": "isgrpde", "expanded_label": "ismndd", "source": "expanded", "tokens": 6733, "reason": "max_length"}
283
+ {"label": "1lt2", "expanded_label": "1re", "source": "expanded", "tokens": 6790, "reason": "max_length"}
284
+ {"label": "hausnlly", "expanded_label": "restnlly", "source": "expanded", "tokens": 7658, "reason": "max_length"}
285
+ {"label": "dvdsunit", "expanded_label": "dvdsrtr", "source": "expanded", "tokens": 6678, "reason": "max_length"}
286
+ {"label": "nbfiusgrfi", "expanded_label": "fusgrfis", "source": "expanded", "tokens": 7133, "reason": "max_length"}
287
+ {"label": "opsrring", "expanded_label": "psrring", "source": "expanded", "tokens": 8414, "reason": "max_length"}
288
+ {"label": "dprd0", "expanded_label": "dprdz", "source": "expanded", "tokens": 14331, "reason": "max_length"}
289
+ {"label": "recmet", "expanded_label": "cncmet", "source": "expanded", "tokens": 9197, "reason": "max_length"}
290
+ {"label": "dvdseq", "expanded_label": "dvdsabseq", "source": "expanded", "tokens": 6996, "reason": "max_length"}
291
+ {"label": "telgsumfz0s", "expanded_label": "telgsumfzs", "source": "expanded", "tokens": 12061, "reason": "max_length"}
292
+ {"label": "evls1fvcl", "expanded_label": "ressply1evl", "source": "expanded", "tokens": 7459, "reason": "max_length"}
293
+ {"label": "t0hmph", "expanded_label": "cnt0", "source": "expanded", "tokens": 8061, "reason": "max_length"}
294
+ {"label": "1le3", "expanded_label": "1re", "source": "expanded", "tokens": 6650, "reason": "max_length"}
295
+ {"label": "zring0", "expanded_label": "cncrng", "source": "expanded", "tokens": 7883, "reason": "max_length"}
296
+ {"label": "ackval42a", "expanded_label": "ackval42", "source": "expanded", "tokens": 6435, "reason": "max_length"}
297
+ {"label": "mplsca", "expanded_label": "psrsca", "source": "expanded", "tokens": 6278, "reason": "max_length"}
298
+ {"label": "subid1", "expanded_label": "addrid", "source": "expanded", "tokens": 10958, "reason": "max_length"}
299
+ {"label": "idmatidpmat", "expanded_label": "mat2pmat1", "source": "expanded", "tokens": 7186, "reason": "max_length"}
300
+ {"label": "vtxduhgrun", "expanded_label": "vtxdun", "source": "expanded", "tokens": 13019, "reason": "max_length"}
301
+ {"label": "nnwo", "expanded_label": "uzwo", "source": "expanded", "tokens": 8873, "reason": "max_length"}
302
+ {"label": "peano2rem", "expanded_label": "1re", "source": "expanded", "tokens": 6713, "reason": "max_length"}
303
+ {"label": "usgrlimprop", "expanded_label": "uspgrlim", "source": "expanded", "tokens": 16778, "reason": "max_length"}
304
+ {"label": "vdgfrgrgt2", "expanded_label": "vdgn1frgrv2", "source": "expanded", "tokens": 9428, "reason": "max_length"}
305
+ {"label": "evls1varsrng", "expanded_label": "evls1var", "source": "expanded", "tokens": 8899, "reason": "max_length"}
306
+ {"label": "indf", "expanded_label": "1re", "source": "expanded", "tokens": 7405, "reason": "max_length"}
307
+ {"label": "usgrn2cycl", "expanded_label": "uspgrn2crct", "source": "expanded", "tokens": 12349, "reason": "max_length"}
308
+ {"label": "addgtge0", "expanded_label": "00id", "source": "expanded", "tokens": 8117, "reason": "max_length"}
309
+ {"label": "zringcrng", "expanded_label": "cncrng", "source": "expanded", "tokens": 7480, "reason": "max_length"}
310
+ {"label": "restt0", "expanded_label": "cnt0", "source": "expanded", "tokens": 10949, "reason": "max_length"}
311
+ {"label": "ring1ne0", "expanded_label": "hashgt12el", "source": "expanded", "tokens": 6637, "reason": "max_length"}
312
+ {"label": "cycsubggenodd", "expanded_label": "dfod2", "source": "expanded", "tokens": 22577, "reason": "max_length"}
313
+ {"label": "0mat2pmat", "expanded_label": "mat2pmatghm", "source": "expanded", "tokens": 15693, "reason": "max_length"}
314
+ {"label": "omndmnd", "expanded_label": "isomnd", "source": "expanded", "tokens": 8458, "reason": "max_length"}
315
+ {"label": "nneop", "expanded_label": "nneo", "source": "expanded", "tokens": 6454, "reason": "max_length"}
316
+ {"label": "eqg0subgecsn", "expanded_label": "eqg0subg", "source": "expanded", "tokens": 6571, "reason": "max_length"}
317
+ {"label": "xltmul2", "expanded_label": "xmulcom", "source": "expanded", "tokens": 14328, "reason": "max_length"}
318
+ {"label": "subrgnrg", "expanded_label": "subgngp", "source": "expanded", "tokens": 7882, "reason": "max_length"}
319
+ {"label": "pi1buni", "expanded_label": "pi1blem", "source": "expanded", "tokens": 7231, "reason": "max_length"}
320
+ {"label": "isoddgcd1", "expanded_label": "coprm", "source": "expanded", "tokens": 7107, "reason": "max_length"}
321
+ {"label": "1lt5", "expanded_label": "1re", "source": "expanded", "tokens": 6962, "reason": "max_length"}
322
+ {"label": "rellycmp", "expanded_label": "cnllycmp", "source": "expanded", "tokens": 22702, "reason": "max_length"}
323
+ {"label": "pm2mprhm", "expanded_label": "matring", "source": "expanded", "tokens": 16674, "reason": "max_length"}
324
+ {"label": "icccld", "expanded_label": "difreicc", "source": "expanded", "tokens": 10085, "reason": "max_length"}
325
+ {"label": "m2cpmrhm", "expanded_label": "matring", "source": "expanded", "tokens": 17131, "reason": "max_length"}
326
+ {"label": "coe1tmfv2", "expanded_label": "coe1tm", "source": "expanded", "tokens": 12134, "reason": "max_length"}
327
+ {"label": "tsmscl", "expanded_label": "eltsms", "source": "expanded", "tokens": 8460, "reason": "max_length"}
328
+ {"label": "eluzadd", "expanded_label": "zaddcl", "source": "expanded", "tokens": 6242, "reason": "max_length"}
329
+ {"label": "sst0", "expanded_label": "cnt0", "source": "expanded", "tokens": 8434, "reason": "max_length"}
330
+ {"label": "clwlkclwwlkf1o", "expanded_label": "clwlkclwwlkf1", "source": "expanded", "tokens": 11644, "reason": "max_length"}
331
+ {"label": "indistop", "expanded_label": "indistopon", "source": "expanded", "tokens": 7204, "reason": "max_length"}
332
+ {"label": "cphnmfval", "expanded_label": "iscph", "source": "expanded", "tokens": 8389, "reason": "max_length"}
333
+ {"label": "fclssscls", "expanded_label": "isfcls", "source": "expanded", "tokens": 7916, "reason": "max_length"}
334
+ {"label": "2zrng", "expanded_label": "2zlidl", "source": "expanded", "tokens": 9569, "reason": "max_length"}
335
+ {"label": "sringcat", "expanded_label": "srhmsubc", "source": "expanded", "tokens": 12711, "reason": "max_length"}
336
+ {"label": "xkotopon", "expanded_label": "xkouni", "source": "expanded", "tokens": 7374, "reason": "max_length"}
337
+ {"label": "abs2dif2", "expanded_label": "abstri", "source": "expanded", "tokens": 7522, "reason": "max_length"}
338
+ {"label": "nrmreg", "expanded_label": "nrmr0reg", "source": "expanded", "tokens": 6220, "reason": "max_length"}
339
+ {"label": "evlsrhm", "expanded_label": "evlsval2", "source": "expanded", "tokens": 8301, "reason": "max_length"}
340
+ {"label": "clwwlkf1o", "expanded_label": "clwwlkf1", "source": "expanded", "tokens": 16313, "reason": "max_length"}
341
+ {"label": "mat2pmatrhm", "expanded_label": "matring", "source": "expanded", "tokens": 26359, "reason": "max_length"}
342
+ {"label": "sqrt1", "expanded_label": "1re", "source": "expanded", "tokens": 7104, "reason": "max_length"}
343
+ {"label": "sumhash", "expanded_label": "ssfi", "source": "expanded", "tokens": 12417, "reason": "max_length"}
344
+ {"label": "pmatcollpw3", "expanded_label": "pmatcollpw", "source": "expanded", "tokens": 12192, "reason": "max_length"}
345
+ {"label": "pcprecl", "expanded_label": "pclem", "source": "expanded", "tokens": 9618, "reason": "max_length"}
346
+ {"label": "1ne2", "expanded_label": "1re", "source": "expanded", "tokens": 6611, "reason": "max_length"}
347
+ {"label": "wlknwwlksnen", "expanded_label": "wlknwwlksnbij", "source": "expanded", "tokens": 7322, "reason": "max_length"}
348
+ {"label": "pm2mpf", "expanded_label": "pm2mpcl", "source": "expanded", "tokens": 7149, "reason": "max_length"}
349
+ {"label": "m2cpmf1", "expanded_label": "mat2pmatf1", "source": "expanded", "tokens": 7912, "reason": "max_length"}
350
+ {"label": "lincsumscmcl", "expanded_label": "lincscmcl", "source": "expanded", "tokens": 12114, "reason": "max_length"}
351
+ {"label": "obsrcl", "expanded_label": "isobs", "source": "expanded", "tokens": 7261, "reason": "max_length"}
352
+ {"label": "evls1pw", "expanded_label": "evls1rhm", "source": "expanded", "tokens": 6659, "reason": "max_length"}
353
+ {"label": "lcmfunsn", "expanded_label": "lcmfunsnlem", "source": "expanded", "tokens": 10953, "reason": "max_length"}
354
+ {"label": "zrzeroorngc", "expanded_label": "zrinitorngc", "source": "expanded", "tokens": 9895, "reason": "max_length"}
355
+ {"label": "rrxmetfi", "expanded_label": "rrxmet", "source": "expanded", "tokens": 47140, "reason": "max_length"}
356
+ {"label": "iicmp", "expanded_label": "1re", "source": "expanded", "tokens": 7086, "reason": "max_length"}
357
+ {"label": "rngabl", "expanded_label": "isrng", "source": "expanded", "tokens": 7974, "reason": "max_length"}
358
+ {"label": "2re", "expanded_label": "1re", "source": "expanded", "tokens": 8834, "reason": "max_length"}
359
+ {"label": "fprodshft", "expanded_label": "mptfzshft", "source": "expanded", "tokens": 7323, "reason": "max_length"}
360
+ {"label": "lmhmlem", "expanded_label": "islmhm", "source": "expanded", "tokens": 7650, "reason": "max_length"}
361
+ {"label": "cncongr", "expanded_label": "cncongr2", "source": "expanded", "tokens": 14989, "reason": "max_length"}
362
+ {"label": "nnssre", "expanded_label": "1re", "source": "expanded", "tokens": 6912, "reason": "max_length"}
363
+ {"label": "1elunit", "expanded_label": "1re", "source": "expanded", "tokens": 6868, "reason": "max_length"}
364
+ {"label": "cnfldhaus", "expanded_label": "methaus", "source": "expanded", "tokens": 10873, "reason": "max_length"}
365
+ {"label": "cnpf", "expanded_label": "iscnp2", "source": "expanded", "tokens": 9405, "reason": "max_length"}
366
+ {"label": "sum2id", "expanded_label": "sumeq2ii", "source": "expanded", "tokens": 10306, "reason": "max_length"}
367
+ {"label": "dsmmlmod", "expanded_label": "prdslmodd", "source": "expanded", "tokens": 27808, "reason": "max_length"}
368
+ {"label": "unitinvcl", "expanded_label": "unitgrp", "source": "expanded", "tokens": 12866, "reason": "max_length"}
369
+ {"label": "bcpascm1", "expanded_label": "bcpasc", "source": "expanded", "tokens": 24545, "reason": "max_length"}
370
+ {"label": "2arymaptf1o", "expanded_label": "2arymaptf1", "source": "expanded", "tokens": 9368, "reason": "max_length"}
371
+ {"label": "xmetf", "expanded_label": "isxmet", "source": "expanded", "tokens": 6591, "reason": "max_length"}
372
+ {"label": "coe1tmfv1", "expanded_label": "coe1tm", "source": "expanded", "tokens": 11689, "reason": "max_length"}
373
+ {"label": "rlimdmo1", "expanded_label": "rlimo1", "source": "expanded", "tokens": 7587, "reason": "max_length"}
374
+ {"label": "nlmlmod", "expanded_label": "isnlm", "source": "expanded", "tokens": 6171, "reason": "max_length"}
375
+ {"label": "7re", "expanded_label": "1re", "source": "expanded", "tokens": 6750, "reason": "max_length"}
376
+ {"label": "omndtos", "expanded_label": "isomnd", "source": "expanded", "tokens": 8315, "reason": "max_length"}
377
+ {"label": "scmatf1o", "expanded_label": "scmatf1", "source": "expanded", "tokens": 10246, "reason": "max_length"}
378
+ {"label": "resthaus", "expanded_label": "cnhaus", "source": "expanded", "tokens": 17432, "reason": "max_length"}
379
+ {"label": "grplactf1o", "expanded_label": "grplactcnv", "source": "expanded", "tokens": 6922, "reason": "max_length"}
380
+ {"label": "bitsinv", "expanded_label": "bitsf1ocnv", "source": "expanded", "tokens": 6737, "reason": "max_length"}
381
+ {"label": "cnmptkc", "expanded_label": "xkoccn", "source": "expanded", "tokens": 14863, "reason": "max_length"}
382
+ {"label": "sringcatALTV", "expanded_label": "srhmsubcALTV", "source": "expanded", "tokens": 12861, "reason": "max_length"}
383
+ {"label": "lcmgcdnn", "expanded_label": "lcmgcd", "source": "expanded", "tokens": 7128, "reason": "max_length"}
384
+ {"label": "cncongr", "expanded_label": "cncongr1", "source": "expanded", "tokens": 25677, "reason": "max_length"}
385
+ {"label": "re0g", "expanded_label": "cncrng", "source": "expanded", "tokens": 7900, "reason": "max_length"}
386
+ {"label": "lmodring", "expanded_label": "islmod", "source": "expanded", "tokens": 22570, "reason": "max_length"}
387
+ {"label": "phimul", "expanded_label": "phimullem", "source": "expanded", "tokens": 59881, "reason": "max_length"}
388
+ {"label": "cycsubgcld", "expanded_label": "cycsubgcl", "source": "expanded", "tokens": 9718, "reason": "max_length"}
389
+ {"label": "zndvds0", "expanded_label": "zndvds", "source": "expanded", "tokens": 8442, "reason": "max_length"}
390
+ {"label": "ghmabl", "expanded_label": "ghmgrp", "source": "expanded", "tokens": 6213, "reason": "max_length"}
391
+ {"label": "pm2mpghmlem1", "expanded_label": "matring", "source": "expanded", "tokens": 16380, "reason": "max_length"}
392
+ {"label": "frgrncvvdeqlem10", "expanded_label": "frgrncvvdeqlem8", "source": "expanded", "tokens": 9143, "reason": "max_length"}
393
+ {"label": "pi1xfrgim", "expanded_label": "pi1xfr", "source": "expanded", "tokens": 56581, "reason": "max_length"}
394
+ {"label": "grlicer", "expanded_label": "grlicsym", "source": "expanded", "tokens": 9352, "reason": "max_length"}
395
+ {"label": "o1const", "expanded_label": "rlimo1", "source": "expanded", "tokens": 8521, "reason": "max_length"}
396
+ {"label": "cphssphl", "expanded_label": "cphsscph", "source": "expanded", "tokens": 10434, "reason": "max_length"}
397
+ {"label": "islmhmd", "expanded_label": "islmhm", "source": "expanded", "tokens": 7502, "reason": "max_length"}
398
+ {"label": "fsumsub", "expanded_label": "fsumadd", "source": "expanded", "tokens": 13138, "reason": "max_length"}
399
+ {"label": "kgenftop", "expanded_label": "kgentopon", "source": "expanded", "tokens": 10064, "reason": "max_length"}
400
+ {"label": "clwwlkf1o", "expanded_label": "clwwlkfo", "source": "expanded", "tokens": 7535, "reason": "max_length"}
401
+ {"label": "mdet0fv0", "expanded_label": "mdet0pr", "source": "expanded", "tokens": 9903, "reason": "max_length"}
402
+ {"label": "usgrexmpl", "expanded_label": "usgrexmplef", "source": "expanded", "tokens": 9478, "reason": "max_length"}
403
+ {"label": "numclwwlk1lem2f1o", "expanded_label": "numclwwlk1lem2f1", "source": "expanded", "tokens": 19478, "reason": "max_length"}
404
+ {"label": "odval2", "expanded_label": "odeq", "source": "expanded", "tokens": 6696, "reason": "max_length"}
405
+ {"label": "phiprm", "expanded_label": "phiprmpw", "source": "expanded", "tokens": 17595, "reason": "max_length"}
406
+ {"label": "3re", "expanded_label": "1re", "source": "expanded", "tokens": 6728, "reason": "max_length"}
407
+ {"label": "ghmf", "expanded_label": "isghm", "source": "expanded", "tokens": 7750, "reason": "max_length"}
408
+ {"label": "5re", "expanded_label": "1re", "source": "expanded", "tokens": 6884, "reason": "max_length"}
409
+ {"label": "cnmgpabl", "expanded_label": "cncrng", "source": "expanded", "tokens": 7869, "reason": "max_length"}
410
+ {"label": "ordtrestixx", "expanded_label": "letsr", "source": "expanded", "tokens": 6760, "reason": "max_length"}
411
+ {"label": "filfinnfr", "expanded_label": "fbfinnfr", "source": "expanded", "tokens": 6699, "reason": "max_length"}
412
+ {"label": "vtxdfiun", "expanded_label": "vtxdun", "source": "expanded", "tokens": 13496, "reason": "max_length"}
413
+ {"label": "cnprcl", "expanded_label": "iscnp2", "source": "expanded", "tokens": 9418, "reason": "max_length"}
414
+ {"label": "1arymaptf1o", "expanded_label": "1arymaptfo", "source": "expanded", "tokens": 6423, "reason": "max_length"}
415
+ {"label": "pc1", "expanded_label": "pczpre", "source": "expanded", "tokens": 10054, "reason": "max_length"}
416
+ {"label": "sshaus", "expanded_label": "cnhaus", "source": "expanded", "tokens": 13569, "reason": "max_length"}
417
+ {"label": "8re", "expanded_label": "1re", "source": "expanded", "tokens": 6721, "reason": "max_length"}
418
+ {"label": "nmoleub2a", "expanded_label": "nmoleub2lem2", "source": "expanded", "tokens": 13300, "reason": "max_length"}
419
+ {"label": "nnesq", "expanded_label": "zesq", "source": "expanded", "tokens": 7574, "reason": "max_length"}
420
+ {"label": "vtxdgfusgrf", "expanded_label": "fusgrfis", "source": "expanded", "tokens": 7818, "reason": "max_length"}
421
+ {"label": "cphsca", "expanded_label": "iscph", "source": "expanded", "tokens": 7872, "reason": "max_length"}
422
+ {"label": "pm2mpf1o", "expanded_label": "pm2mpf1", "source": "expanded", "tokens": 20808, "reason": "max_length"}
speed-estimate.md ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Qwen3.5-2B-metamath 8192 speed estimate
2
+
3
+ - Data: metamath-output/setmm-train-qwen35-4b-mixed-12000, 4000 original + 12000 expanded.
4
+ - max_length=8192 keeps about 14492/16000 examples from tokenizer length scan.
5
+ - Training config: Qwen3.5-2B base, FLA fast path available, LoRA rank 32/alpha 64/dropout 0.05, bf16, gradient checkpointing, lr=5e-4, 1 epoch.
6
+ - Batch: per-device train batch size 2, gradient accumulation 8 on one GPU, effective batch size about 16.
7
+ - Smoke run: 59 train examples, 4 optimizer steps, runtime 120.9s. First step was about 94s due to compile/init; later steps were about 8-11s/step on the short smoke sample.
8
+ - Full 1-epoch steps: about 888 optimizer steps after 2% eval split.
9
+ - Estimated full runtime: roughly 4-8 hours depending on length mix, checkpoint/eval cost, and whether the compiled kernels stay warm.
10
+ - Log: /data/pretrained_models/Qwen3.5-2B-metamath/train-8192.log
11
+ - PID file: /data/pretrained_models/Qwen3.5-2B-metamath/train-8192.pid
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:87a7830d63fcf43bf241c3c5242e96e62dd3fdc29224ca26fed8ea333db72de4
3
+ size 19989343
tokenizer_config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "audio_bos_token": "<|audio_start|>",
4
+ "audio_eos_token": "<|audio_end|>",
5
+ "audio_token": "<|audio_pad|>",
6
+ "backend": "tokenizers",
7
+ "bos_token": null,
8
+ "clean_up_tokenization_spaces": false,
9
+ "eos_token": "<|im_end|>",
10
+ "errors": "replace",
11
+ "image_token": "<|image_pad|>",
12
+ "is_local": true,
13
+ "model_max_length": 262144,
14
+ "model_specific_special_tokens": {
15
+ "audio_bos_token": "<|audio_start|>",
16
+ "audio_eos_token": "<|audio_end|>",
17
+ "audio_token": "<|audio_pad|>",
18
+ "image_token": "<|image_pad|>",
19
+ "video_token": "<|video_pad|>",
20
+ "vision_bos_token": "<|vision_start|>",
21
+ "vision_eos_token": "<|vision_end|>"
22
+ },
23
+ "pad_token": "<|endoftext|>",
24
+ "pretokenize_regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?[\\p{L}\\p{M}]+|\\p{N}| ?[^\\s\\p{L}\\p{M}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
25
+ "split_special_tokens": false,
26
+ "tokenizer_class": "TokenizersBackend",
27
+ "unk_token": null,
28
+ "video_token": "<|video_pad|>",
29
+ "vision_bos_token": "<|vision_start|>",
30
+ "vision_eos_token": "<|vision_end|>"
31
+ }
train-6144-mb2x8-3ep-gpu1.log ADDED
The diff for this file is too large to render. See raw diff
 
train-6144-mb2x8-gpu1.log ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
0
  0%| | 0/955 [00:00<?, ?it/s]
1
  0%| | 1/955 [00:08<2:08:48, 8.10s/it]
2
  0%| | 2/955 [00:13<1:42:03, 6.43s/it]
3
  0%| | 3/955 [00:18<1:33:39, 5.90s/it]
4
  0%| | 4/955 [00:24<1:33:10, 5.88s/it]
5
  1%| | 5/955 [00:29<1:26:47, 5.48s/it]
6
  1%| | 6/955 [00:33<1:22:03, 5.19s/it]
7
  1%| | 7/955 [00:39<1:22:53, 5.25s/it]
8
  1%| | 8/955 [00:44<1:22:33, 5.23s/it]
9
  1%| | 9/955 [00:49<1:22:53, 5.26s/it]
10
  1%| | 10/955 [00:55<1:23:03, 5.27s/it]
11
 
12
  1%| | 10/955 [00:55<1:23:03, 5.27s/it]
13
  1%| | 11/955 [01:00<1:21:48, 5.20s/it]
14
  1%|▏ | 12/955 [01:06<1:26:21, 5.49s/it]
15
  1%|▏ | 13/955 [01:11<1:26:44, 5.52s/it]
16
  1%|▏ | 14/955 [01:16<1:21:12, 5.18s/it]
17
  2%|▏ | 15/955 [01:23<1:32:44, 5.92s/it]
18
  2%|▏ | 16/955 [01:29<1:31:53, 5.87s/it]
19
  2%|▏ | 17/955 [01:34<1:27:09, 5.57s/it]
20
  2%|▏ | 18/955 [01:39<1:24:54, 5.44s/it]
21
  2%|▏ | 19/955 [01:47<1:35:33, 6.13s/it]
22
  2%|▏ | 20/955 [01:52<1:29:57, 5.77s/it]
23
 
24
  2%|▏ | 20/955 [01:52<1:29:57, 5.77s/it]
25
  2%|▏ | 21/955 [01:58<1:29:40, 5.76s/it]
26
  2%|▏ | 22/955 [02:02<1:25:43, 5.51s/it]
27
  2%|▏ | 23/955 [02:10<1:36:31, 6.21s/it]
28
  3%|β–Ž | 24/955 [02:16<1:31:56, 5.92s/it]
29
  3%|β–Ž | 25/955 [02:22<1:33:55, 6.06s/it]
30
  3%|β–Ž | 26/955 [02:27<1:27:44, 5.67s/it]
31
  3%|β–Ž | 27/955 [02:32<1:27:30, 5.66s/it]
32
  3%|β–Ž | 28/955 [02:40<1:34:33, 6.12s/it]
33
  3%|β–Ž | 29/955 [02:48<1:47:23, 6.96s/it]
34
  3%|β–Ž | 30/955 [02:57<1:54:11, 7.41s/it]
35
 
36
  3%|β–Ž | 30/955 [02:57<1:54:11, 7.41s/it]
37
  3%|β–Ž | 31/955 [03:04<1:51:03, 7.21s/it]
38
  3%|β–Ž | 32/955 [03:09<1:42:57, 6.69s/it]
39
  3%|β–Ž | 33/955 [03:19<1:56:30, 7.58s/it]
40
  4%|β–Ž | 34/955 [03:26<1:53:59, 7.43s/it]
41
  4%|β–Ž | 35/955 [03:31<1:44:08, 6.79s/it]
42
  4%|▍ | 36/955 [03:37<1:39:52, 6.52s/it]
43
  4%|▍ | 37/955 [03:43<1:36:02, 6.28s/it]
44
  4%|▍ | 38/955 [03:49<1:34:08, 6.16s/it]
45
  4%|▍ | 39/955 [03:58<1:47:32, 7.04s/it]
46
  4%|▍ | 40/955 [04:04<1:45:12, 6.90s/it]
47
 
48
  4%|▍ | 40/955 [04:04<1:45:12, 6.90s/it]
49
  4%|▍ | 41/955 [04:11<1:42:05, 6.70s/it]
50
  4%|▍ | 42/955 [04:18<1:44:20, 6.86s/it]
51
  5%|▍ | 43/955 [04:28<1:58:59, 7.83s/it]
52
  5%|▍ | 44/955 [04:35<1:56:05, 7.65s/it]
53
  5%|▍ | 45/955 [04:43<1:57:25, 7.74s/it]
54
  5%|▍ | 46/955 [04:49<1:51:07, 7.33s/it]
55
  5%|▍ | 47/955 [04:56<1:46:18, 7.02s/it]
56
  5%|β–Œ | 48/955 [05:01<1:37:38, 6.46s/it]
57
  5%|β–Œ | 49/955 [05:11<1:51:55, 7.41s/it]
58
  5%|β–Œ | 50/955 [05:17<1:47:04, 7.10s/it]
59
 
60
  5%|β–Œ | 50/955 [05:17<1:47:04, 7.10s/it]Terminated
 
1
+ Starting Qwen3.5-2B Metamath training
2
+ Output: /data/pretrained_models/Qwen3.5-2B-metamath
3
+ Effective batch size: 16
4
+ /data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
5
+ import pynvml # type: ignore[import]
6
+ `torch_dtype` is deprecated! Use `dtype` instead!
7
+
8
+ warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead.
9
+ The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 248046, 'pad_token_id': 248044}.
10
+ trainable params: 21,823,488 || all params: 1,903,648,576 || trainable%: 1.1464
11
+
12
  0%| | 0/955 [00:00<?, ?it/s]
13
  0%| | 1/955 [00:08<2:08:48, 8.10s/it]
14
  0%| | 2/955 [00:13<1:42:03, 6.43s/it]
15
  0%| | 3/955 [00:18<1:33:39, 5.90s/it]
16
  0%| | 4/955 [00:24<1:33:10, 5.88s/it]
17
  1%| | 5/955 [00:29<1:26:47, 5.48s/it]
18
  1%| | 6/955 [00:33<1:22:03, 5.19s/it]
19
  1%| | 7/955 [00:39<1:22:53, 5.25s/it]
20
  1%| | 8/955 [00:44<1:22:33, 5.23s/it]
21
  1%| | 9/955 [00:49<1:22:53, 5.26s/it]
22
  1%| | 10/955 [00:55<1:23:03, 5.27s/it]
23
 
24
  1%| | 10/955 [00:55<1:23:03, 5.27s/it]
25
  1%| | 11/955 [01:00<1:21:48, 5.20s/it]
26
  1%|▏ | 12/955 [01:06<1:26:21, 5.49s/it]
27
  1%|▏ | 13/955 [01:11<1:26:44, 5.52s/it]
28
  1%|▏ | 14/955 [01:16<1:21:12, 5.18s/it]
29
  2%|▏ | 15/955 [01:23<1:32:44, 5.92s/it]
30
  2%|▏ | 16/955 [01:29<1:31:53, 5.87s/it]
31
  2%|▏ | 17/955 [01:34<1:27:09, 5.57s/it]
32
  2%|▏ | 18/955 [01:39<1:24:54, 5.44s/it]
33
  2%|▏ | 19/955 [01:47<1:35:33, 6.13s/it]
34
  2%|▏ | 20/955 [01:52<1:29:57, 5.77s/it]
35
 
36
  2%|▏ | 20/955 [01:52<1:29:57, 5.77s/it]
37
  2%|▏ | 21/955 [01:58<1:29:40, 5.76s/it]
38
  2%|▏ | 22/955 [02:02<1:25:43, 5.51s/it]
39
  2%|▏ | 23/955 [02:10<1:36:31, 6.21s/it]
40
  3%|β–Ž | 24/955 [02:16<1:31:56, 5.92s/it]
41
  3%|β–Ž | 25/955 [02:22<1:33:55, 6.06s/it]
42
  3%|β–Ž | 26/955 [02:27<1:27:44, 5.67s/it]
43
  3%|β–Ž | 27/955 [02:32<1:27:30, 5.66s/it]
44
  3%|β–Ž | 28/955 [02:40<1:34:33, 6.12s/it]
45
  3%|β–Ž | 29/955 [02:48<1:47:23, 6.96s/it]
46
  3%|β–Ž | 30/955 [02:57<1:54:11, 7.41s/it]
47
 
48
  3%|β–Ž | 30/955 [02:57<1:54:11, 7.41s/it]
49
  3%|β–Ž | 31/955 [03:04<1:51:03, 7.21s/it]
50
  3%|β–Ž | 32/955 [03:09<1:42:57, 6.69s/it]
51
  3%|β–Ž | 33/955 [03:19<1:56:30, 7.58s/it]
52
  4%|β–Ž | 34/955 [03:26<1:53:59, 7.43s/it]
53
  4%|β–Ž | 35/955 [03:31<1:44:08, 6.79s/it]
54
  4%|▍ | 36/955 [03:37<1:39:52, 6.52s/it]
55
  4%|▍ | 37/955 [03:43<1:36:02, 6.28s/it]
56
  4%|▍ | 38/955 [03:49<1:34:08, 6.16s/it]
57
  4%|▍ | 39/955 [03:58<1:47:32, 7.04s/it]
58
  4%|▍ | 40/955 [04:04<1:45:12, 6.90s/it]
59
 
60
  4%|▍ | 40/955 [04:04<1:45:12, 6.90s/it]
61
  4%|▍ | 41/955 [04:11<1:42:05, 6.70s/it]
62
  4%|▍ | 42/955 [04:18<1:44:20, 6.86s/it]
63
  5%|▍ | 43/955 [04:28<1:58:59, 7.83s/it]
64
  5%|▍ | 44/955 [04:35<1:56:05, 7.65s/it]
65
  5%|▍ | 45/955 [04:43<1:57:25, 7.74s/it]
66
  5%|▍ | 46/955 [04:49<1:51:07, 7.33s/it]
67
  5%|▍ | 47/955 [04:56<1:46:18, 7.02s/it]
68
  5%|β–Œ | 48/955 [05:01<1:37:38, 6.46s/it]
69
  5%|β–Œ | 49/955 [05:11<1:51:55, 7.41s/it]
70
  5%|β–Œ | 50/955 [05:17<1:47:04, 7.10s/it]
71
 
72
  5%|β–Œ | 50/955 [05:17<1:47:04, 7.10s/it]Terminated
train-8192-mb4x4-gpu1.log ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
0
  0%| | 0/966 [00:00<?, ?it/s]
1
  0%| | 1/966 [00:25<6:46:25, 25.27s/it]
2
  0%| | 2/966 [00:30<3:33:54, 13.31s/it]
3
  0%| | 3/966 [00:34<2:30:05, 9.35s/it]Traceback (most recent call last):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  0%| | 3/966 [00:45<4:00:55, 15.01s/it]
 
 
 
 
 
 
 
1
+ Starting Qwen3.5-2B Metamath training
2
+ Output: /data/pretrained_models/Qwen3.5-2B-metamath
3
+ Effective batch size: 16
4
+ /data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
5
+ import pynvml # type: ignore[import]
6
+ `torch_dtype` is deprecated! Use `dtype` instead!
7
+
8
+ warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead.
9
+ The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 248046, 'pad_token_id': 248044}.
10
+ trainable params: 21,823,488 || all params: 1,903,648,576 || trainable%: 1.1464
11
+
12
  0%| | 0/966 [00:00<?, ?it/s]
13
  0%| | 1/966 [00:25<6:46:25, 25.27s/it]
14
  0%| | 2/966 [00:30<3:33:54, 13.31s/it]
15
  0%| | 3/966 [00:34<2:30:05, 9.35s/it]Traceback (most recent call last):
16
+ File "/home/lg/workflow_tooluse/Flow_RL_luogan/temp/metamath/tools/train_qwen35_metamath.py", line 381, in <module>
17
+ main()
18
+ File "/home/lg/workflow_tooluse/Flow_RL_luogan/temp/metamath/tools/train_qwen35_metamath.py", line 339, in main
19
+ trainer.train()
20
+ File "/data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/transformers/trainer.py", line 1424, in train
21
+ return inner_training_loop(
22
+ ^^^^^^^^^^^^^^^^^^^^
23
+ File "/data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/transformers/trainer.py", line 1506, in _inner_training_loop
24
+ self._run_epoch(
25
+ File "/data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/transformers/trainer.py", line 1734, in _run_epoch
26
+ tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
27
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
28
+ File "/data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/transformers/trainer.py", line 1934, in training_step
29
+ self.accelerator.backward(loss, **kwargs)
30
+ File "/home/lg/.local/lib/python3.12/site-packages/accelerate/accelerator.py", line 2329, in backward
31
+ loss.backward(**kwargs)
32
+ File "/data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/torch/_tensor.py", line 625, in backward
33
+ torch.autograd.backward(
34
+ File "/data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/torch/autograd/__init__.py", line 354, in backward
35
+ _engine_run_backward(
36
+ File "/data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward
37
+ return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
38
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
39
+ torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 25.02 GiB. GPU 0 has a total capacity of 79.14 GiB of which 18.98 GiB is free. Including non-PyTorch memory, this process has 60.14 GiB memory in use. Of the allocated memory 56.74 GiB is allocated by PyTorch, and 2.36 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
40
+
41
  0%| | 3/966 [00:45<4:00:55, 15.01s/it]
42
+ Exception ignored in: <function ResourceTracker.__del__ at 0x7d8e4c558c20>
43
+ Traceback (most recent call last):
44
+ File "/home/lg/.local/lib/python3.12/site-packages/multiprocess/resource_tracker.py", line 80, in __del__
45
+ File "/home/lg/.local/lib/python3.12/site-packages/multiprocess/resource_tracker.py", line 89, in _stop
46
+ File "/home/lg/.local/lib/python3.12/site-packages/multiprocess/resource_tracker.py", line 102, in _stop_locked
47
+ AttributeError: '_thread.RLock' object has no attribute '_recursion_count'
train-8192.log ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
0
  0%| | 0/888 [00:00<?, ?it/s]
1
  0%| | 1/888 [00:11<2:53:39, 11.75s/it]
2
  0%| | 2/888 [00:17<1:59:09, 8.07s/it]
3
  0%| | 3/888 [00:28<2:21:20, 9.58s/it]
4
  0%| | 4/888 [00:37<2:18:40, 9.41s/it]
5
  1%| | 5/888 [00:47<2:19:10, 9.46s/it]
6
  1%| | 6/888 [00:57<2:23:55, 9.79s/it]
7
  1%| | 7/888 [01:04<2:09:23, 8.81s/it]
8
  1%| | 8/888 [01:15<2:20:01, 9.55s/it]
9
  1%| | 9/888 [01:23<2:10:23, 8.90s/it]
10
  1%| | 10/888 [01:31<2:09:24, 8.84s/it]
11
 
12
  1%| | 10/888 [01:31<2:09:24, 8.84s/it]
13
  1%| | 11/888 [01:42<2:15:42, 9.28s/it]
14
  1%|▏ | 12/888 [01:53<2:24:50, 9.92s/it]
15
  1%|▏ | 13/888 [02:03<2:24:07, 9.88s/it]
16
  2%|▏ | 14/888 [02:19<2:52:47, 11.86s/it]
17
  2%|▏ | 15/888 [02:36<3:11:57, 13.19s/it]
18
  2%|▏ | 16/888 [02:47<3:03:38, 12.64s/it]
19
  2%|▏ | 17/888 [02:59<3:00:47, 12.45s/it]
20
  2%|▏ | 18/888 [03:18<3:27:52, 14.34s/it]
 
1
+ /data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
2
+ import pynvml # type: ignore[import]
3
+ `torch_dtype` is deprecated! Use `dtype` instead!
4
+
5
+ warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead.
6
+ The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 248046, 'pad_token_id': 248044}.
7
+ trainable params: 21,823,488 || all params: 1,903,648,576 || trainable%: 1.1464
8
+
9
  0%| | 0/888 [00:00<?, ?it/s]
10
  0%| | 1/888 [00:11<2:53:39, 11.75s/it]
11
  0%| | 2/888 [00:17<1:59:09, 8.07s/it]
12
  0%| | 3/888 [00:28<2:21:20, 9.58s/it]
13
  0%| | 4/888 [00:37<2:18:40, 9.41s/it]
14
  1%| | 5/888 [00:47<2:19:10, 9.46s/it]
15
  1%| | 6/888 [00:57<2:23:55, 9.79s/it]
16
  1%| | 7/888 [01:04<2:09:23, 8.81s/it]
17
  1%| | 8/888 [01:15<2:20:01, 9.55s/it]
18
  1%| | 9/888 [01:23<2:10:23, 8.90s/it]
19
  1%| | 10/888 [01:31<2:09:24, 8.84s/it]
20
 
21
  1%| | 10/888 [01:31<2:09:24, 8.84s/it]
22
  1%| | 11/888 [01:42<2:15:42, 9.28s/it]
23
  1%|▏ | 12/888 [01:53<2:24:50, 9.92s/it]
24
  1%|▏ | 13/888 [02:03<2:24:07, 9.88s/it]
25
  2%|▏ | 14/888 [02:19<2:52:47, 11.86s/it]
26
  2%|▏ | 15/888 [02:36<3:11:57, 13.19s/it]
27
  2%|▏ | 16/888 [02:47<3:03:38, 12.64s/it]
28
  2%|▏ | 17/888 [02:59<3:00:47, 12.45s/it]
29
  2%|▏ | 18/888 [03:18<3:27:52, 14.34s/it]
train-8192.pid ADDED
@@ -0,0 +1 @@
 
 
1
+ 2564523
train-manifest.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "base_model": "/data/pretrained_models/Qwen3.5-2B",
3
+ "original_units": "/home/lg/workflow_tooluse/Flow_RL_luogan/temp/metamath/metamath-output/setmm-train-qwen35-4b-mixed-12000/setmm-proof-units.jsonl",
4
+ "expanded_units": "/home/lg/workflow_tooluse/Flow_RL_luogan/temp/metamath/metamath-output/setmm-train-qwen35-4b-mixed-12000/setmm-expanded-units.jsonl",
5
+ "output_dir": "/data/pretrained_models/Qwen3.5-2B-metamath",
6
+ "merged_dir": "/data/pretrained_models/Qwen3.5-2B-metamath/merged",
7
+ "train_examples": 15267,
8
+ "eval_examples": 311,
9
+ "skipped_examples": 422,
10
+ "max_length": 6144,
11
+ "direct_ref_mode": "same-file-distractors",
12
+ "same_file_distractor_direct_refs": 4,
13
+ "shuffle_direct_refs": true,
14
+ "learning_rate": 0.0001,
15
+ "lora_rank": 32,
16
+ "lora_alpha": 64
17
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e8fc737554ff6f82c4ea137b5313611e3b2b3b63fd69b3926d6b1fe9da14c0a6
3
+ size 5201