Instructions to use WhiteGiverPlus/Qwen3.5-2B-metamath with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use WhiteGiverPlus/Qwen3.5-2B-metamath with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-2B") model = PeftModel.from_pretrained(base_model, "WhiteGiverPlus/Qwen3.5-2B-metamath") - Transformers
How to use WhiteGiverPlus/Qwen3.5-2B-metamath with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="WhiteGiverPlus/Qwen3.5-2B-metamath") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("WhiteGiverPlus/Qwen3.5-2B-metamath", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use WhiteGiverPlus/Qwen3.5-2B-metamath with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "WhiteGiverPlus/Qwen3.5-2B-metamath" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WhiteGiverPlus/Qwen3.5-2B-metamath", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/WhiteGiverPlus/Qwen3.5-2B-metamath
- SGLang
How to use WhiteGiverPlus/Qwen3.5-2B-metamath with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "WhiteGiverPlus/Qwen3.5-2B-metamath" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WhiteGiverPlus/Qwen3.5-2B-metamath", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "WhiteGiverPlus/Qwen3.5-2B-metamath" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WhiteGiverPlus/Qwen3.5-2B-metamath", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use WhiteGiverPlus/Qwen3.5-2B-metamath with Docker Model Runner:
docker model run hf.co/WhiteGiverPlus/Qwen3.5-2B-metamath
Add files using upload-large-folder tool
Browse files- .gitattributes +4 -0
- README.md +207 -0
- adapter_config.json +46 -0
- adapter_model.safetensors +3 -0
- chat_template.jinja +154 -0
- checkpoint-2750/README.md +207 -0
- checkpoint-2750/adapter_config.json +46 -0
- checkpoint-2750/adapter_model.safetensors +3 -0
- checkpoint-2750/chat_template.jinja +154 -0
- checkpoint-2750/optimizer.pt +3 -0
- checkpoint-2750/rng_state.pth +3 -0
- checkpoint-2750/scheduler.pt +3 -0
- checkpoint-2750/tokenizer.json +3 -0
- checkpoint-2750/tokenizer_config.json +31 -0
- checkpoint-2750/trainer_state.json +2047 -0
- checkpoint-2750/training_args.bin +3 -0
- checkpoint-2865/README.md +207 -0
- checkpoint-2865/adapter_config.json +46 -0
- checkpoint-2865/adapter_model.safetensors +3 -0
- checkpoint-2865/chat_template.jinja +154 -0
- checkpoint-2865/optimizer.pt +3 -0
- checkpoint-2865/rng_state.pth +3 -0
- checkpoint-2865/scheduler.pt +3 -0
- checkpoint-2865/tokenizer.json +3 -0
- checkpoint-2865/tokenizer_config.json +31 -0
- checkpoint-2865/trainer_state.json +2124 -0
- checkpoint-2865/training_args.bin +3 -0
- merged/chat_template.jinja +154 -0
- merged/config.json +75 -0
- merged/generation_config.json +10 -0
- merged/model.safetensors +3 -0
- merged/tokenizer.json +3 -0
- merged/tokenizer_config.json +31 -0
- run-config.txt +12 -0
- skipped-tokenization.jsonl +422 -0
- speed-estimate.md +11 -0
- tokenizer.json +3 -0
- tokenizer_config.json +31 -0
- train-6144-mb2x8-3ep-gpu1.log +0 -0
- train-6144-mb2x8-gpu1.log +11 -0
- train-8192-mb4x4-gpu1.log +42 -0
- train-8192.log +8 -0
- train-8192.pid +1 -0
- train-manifest.json +17 -0
- training_args.bin +3 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
checkpoint-2865/tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
merged/tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
| 39 |
+
checkpoint-2750/tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,207 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
base_model: Qwen/Qwen3.5-2B
|
| 3 |
+
library_name: peft
|
| 4 |
+
pipeline_tag: text-generation
|
| 5 |
+
tags:
|
| 6 |
+
- base_model:adapter:Qwen/Qwen3.5-2B
|
| 7 |
+
- lora
|
| 8 |
+
- transformers
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# Model Card for Model ID
|
| 12 |
+
|
| 13 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
## Model Details
|
| 18 |
+
|
| 19 |
+
### Model Description
|
| 20 |
+
|
| 21 |
+
<!-- Provide a longer summary of what this model is. -->
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
- **Developed by:** [More Information Needed]
|
| 26 |
+
- **Funded by [optional]:** [More Information Needed]
|
| 27 |
+
- **Shared by [optional]:** [More Information Needed]
|
| 28 |
+
- **Model type:** [More Information Needed]
|
| 29 |
+
- **Language(s) (NLP):** [More Information Needed]
|
| 30 |
+
- **License:** [More Information Needed]
|
| 31 |
+
- **Finetuned from model [optional]:** [More Information Needed]
|
| 32 |
+
|
| 33 |
+
### Model Sources [optional]
|
| 34 |
+
|
| 35 |
+
<!-- Provide the basic links for the model. -->
|
| 36 |
+
|
| 37 |
+
- **Repository:** [More Information Needed]
|
| 38 |
+
- **Paper [optional]:** [More Information Needed]
|
| 39 |
+
- **Demo [optional]:** [More Information Needed]
|
| 40 |
+
|
| 41 |
+
## Uses
|
| 42 |
+
|
| 43 |
+
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
| 44 |
+
|
| 45 |
+
### Direct Use
|
| 46 |
+
|
| 47 |
+
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
| 48 |
+
|
| 49 |
+
[More Information Needed]
|
| 50 |
+
|
| 51 |
+
### Downstream Use [optional]
|
| 52 |
+
|
| 53 |
+
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
| 54 |
+
|
| 55 |
+
[More Information Needed]
|
| 56 |
+
|
| 57 |
+
### Out-of-Scope Use
|
| 58 |
+
|
| 59 |
+
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
| 60 |
+
|
| 61 |
+
[More Information Needed]
|
| 62 |
+
|
| 63 |
+
## Bias, Risks, and Limitations
|
| 64 |
+
|
| 65 |
+
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
| 66 |
+
|
| 67 |
+
[More Information Needed]
|
| 68 |
+
|
| 69 |
+
### Recommendations
|
| 70 |
+
|
| 71 |
+
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
| 72 |
+
|
| 73 |
+
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
| 74 |
+
|
| 75 |
+
## How to Get Started with the Model
|
| 76 |
+
|
| 77 |
+
Use the code below to get started with the model.
|
| 78 |
+
|
| 79 |
+
[More Information Needed]
|
| 80 |
+
|
| 81 |
+
## Training Details
|
| 82 |
+
|
| 83 |
+
### Training Data
|
| 84 |
+
|
| 85 |
+
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
| 86 |
+
|
| 87 |
+
[More Information Needed]
|
| 88 |
+
|
| 89 |
+
### Training Procedure
|
| 90 |
+
|
| 91 |
+
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
| 92 |
+
|
| 93 |
+
#### Preprocessing [optional]
|
| 94 |
+
|
| 95 |
+
[More Information Needed]
|
| 96 |
+
|
| 97 |
+
|
| 98 |
+
#### Training Hyperparameters
|
| 99 |
+
|
| 100 |
+
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
| 101 |
+
|
| 102 |
+
#### Speeds, Sizes, Times [optional]
|
| 103 |
+
|
| 104 |
+
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
| 105 |
+
|
| 106 |
+
[More Information Needed]
|
| 107 |
+
|
| 108 |
+
## Evaluation
|
| 109 |
+
|
| 110 |
+
<!-- This section describes the evaluation protocols and provides the results. -->
|
| 111 |
+
|
| 112 |
+
### Testing Data, Factors & Metrics
|
| 113 |
+
|
| 114 |
+
#### Testing Data
|
| 115 |
+
|
| 116 |
+
<!-- This should link to a Dataset Card if possible. -->
|
| 117 |
+
|
| 118 |
+
[More Information Needed]
|
| 119 |
+
|
| 120 |
+
#### Factors
|
| 121 |
+
|
| 122 |
+
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
| 123 |
+
|
| 124 |
+
[More Information Needed]
|
| 125 |
+
|
| 126 |
+
#### Metrics
|
| 127 |
+
|
| 128 |
+
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
| 129 |
+
|
| 130 |
+
[More Information Needed]
|
| 131 |
+
|
| 132 |
+
### Results
|
| 133 |
+
|
| 134 |
+
[More Information Needed]
|
| 135 |
+
|
| 136 |
+
#### Summary
|
| 137 |
+
|
| 138 |
+
|
| 139 |
+
|
| 140 |
+
## Model Examination [optional]
|
| 141 |
+
|
| 142 |
+
<!-- Relevant interpretability work for the model goes here -->
|
| 143 |
+
|
| 144 |
+
[More Information Needed]
|
| 145 |
+
|
| 146 |
+
## Environmental Impact
|
| 147 |
+
|
| 148 |
+
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
| 149 |
+
|
| 150 |
+
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
| 151 |
+
|
| 152 |
+
- **Hardware Type:** [More Information Needed]
|
| 153 |
+
- **Hours used:** [More Information Needed]
|
| 154 |
+
- **Cloud Provider:** [More Information Needed]
|
| 155 |
+
- **Compute Region:** [More Information Needed]
|
| 156 |
+
- **Carbon Emitted:** [More Information Needed]
|
| 157 |
+
|
| 158 |
+
## Technical Specifications [optional]
|
| 159 |
+
|
| 160 |
+
### Model Architecture and Objective
|
| 161 |
+
|
| 162 |
+
[More Information Needed]
|
| 163 |
+
|
| 164 |
+
### Compute Infrastructure
|
| 165 |
+
|
| 166 |
+
[More Information Needed]
|
| 167 |
+
|
| 168 |
+
#### Hardware
|
| 169 |
+
|
| 170 |
+
[More Information Needed]
|
| 171 |
+
|
| 172 |
+
#### Software
|
| 173 |
+
|
| 174 |
+
[More Information Needed]
|
| 175 |
+
|
| 176 |
+
## Citation [optional]
|
| 177 |
+
|
| 178 |
+
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
| 179 |
+
|
| 180 |
+
**BibTeX:**
|
| 181 |
+
|
| 182 |
+
[More Information Needed]
|
| 183 |
+
|
| 184 |
+
**APA:**
|
| 185 |
+
|
| 186 |
+
[More Information Needed]
|
| 187 |
+
|
| 188 |
+
## Glossary [optional]
|
| 189 |
+
|
| 190 |
+
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
| 191 |
+
|
| 192 |
+
[More Information Needed]
|
| 193 |
+
|
| 194 |
+
## More Information [optional]
|
| 195 |
+
|
| 196 |
+
[More Information Needed]
|
| 197 |
+
|
| 198 |
+
## Model Card Authors [optional]
|
| 199 |
+
|
| 200 |
+
[More Information Needed]
|
| 201 |
+
|
| 202 |
+
## Model Card Contact
|
| 203 |
+
|
| 204 |
+
[More Information Needed]
|
| 205 |
+
### Framework versions
|
| 206 |
+
|
| 207 |
+
- PEFT 0.18.0
|
adapter_config.json
ADDED
|
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"alora_invocation_tokens": null,
|
| 3 |
+
"alpha_pattern": {},
|
| 4 |
+
"arrow_config": null,
|
| 5 |
+
"auto_mapping": null,
|
| 6 |
+
"base_model_name_or_path": "Qwen/Qwen3.5-2B",
|
| 7 |
+
"bias": "none",
|
| 8 |
+
"corda_config": null,
|
| 9 |
+
"ensure_weight_tying": false,
|
| 10 |
+
"eva_config": null,
|
| 11 |
+
"exclude_modules": null,
|
| 12 |
+
"fan_in_fan_out": false,
|
| 13 |
+
"inference_mode": true,
|
| 14 |
+
"init_lora_weights": true,
|
| 15 |
+
"layer_replication": null,
|
| 16 |
+
"layers_pattern": null,
|
| 17 |
+
"layers_to_transform": null,
|
| 18 |
+
"loftq_config": {},
|
| 19 |
+
"lora_alpha": 64,
|
| 20 |
+
"lora_bias": false,
|
| 21 |
+
"lora_dropout": 0.05,
|
| 22 |
+
"megatron_config": null,
|
| 23 |
+
"megatron_core": "megatron.core",
|
| 24 |
+
"modules_to_save": null,
|
| 25 |
+
"peft_type": "LORA",
|
| 26 |
+
"peft_version": "0.18.0",
|
| 27 |
+
"qalora_group_size": 16,
|
| 28 |
+
"r": 32,
|
| 29 |
+
"rank_pattern": {},
|
| 30 |
+
"revision": null,
|
| 31 |
+
"target_modules": [
|
| 32 |
+
"v_proj",
|
| 33 |
+
"k_proj",
|
| 34 |
+
"gate_proj",
|
| 35 |
+
"o_proj",
|
| 36 |
+
"down_proj",
|
| 37 |
+
"up_proj",
|
| 38 |
+
"q_proj"
|
| 39 |
+
],
|
| 40 |
+
"target_parameters": null,
|
| 41 |
+
"task_type": "CAUSAL_LM",
|
| 42 |
+
"trainable_token_indices": null,
|
| 43 |
+
"use_dora": false,
|
| 44 |
+
"use_qalora": false,
|
| 45 |
+
"use_rslora": false
|
| 46 |
+
}
|
adapter_model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e54460326b97c66aced3c8ec3a50427b59111b42282d8638b4bbbe132d510518
|
| 3 |
+
size 87319256
|
chat_template.jinja
ADDED
|
@@ -0,0 +1,154 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{%- set image_count = namespace(value=0) %}
|
| 2 |
+
{%- set video_count = namespace(value=0) %}
|
| 3 |
+
{%- macro render_content(content, do_vision_count, is_system_content=false) %}
|
| 4 |
+
{%- if content is string %}
|
| 5 |
+
{{- content }}
|
| 6 |
+
{%- elif content is iterable and content is not mapping %}
|
| 7 |
+
{%- for item in content %}
|
| 8 |
+
{%- if 'image' in item or 'image_url' in item or item.type == 'image' %}
|
| 9 |
+
{%- if is_system_content %}
|
| 10 |
+
{{- raise_exception('System message cannot contain images.') }}
|
| 11 |
+
{%- endif %}
|
| 12 |
+
{%- if do_vision_count %}
|
| 13 |
+
{%- set image_count.value = image_count.value + 1 %}
|
| 14 |
+
{%- endif %}
|
| 15 |
+
{%- if add_vision_id %}
|
| 16 |
+
{{- 'Picture ' ~ image_count.value ~ ': ' }}
|
| 17 |
+
{%- endif %}
|
| 18 |
+
{{- '<|vision_start|><|image_pad|><|vision_end|>' }}
|
| 19 |
+
{%- elif 'video' in item or item.type == 'video' %}
|
| 20 |
+
{%- if is_system_content %}
|
| 21 |
+
{{- raise_exception('System message cannot contain videos.') }}
|
| 22 |
+
{%- endif %}
|
| 23 |
+
{%- if do_vision_count %}
|
| 24 |
+
{%- set video_count.value = video_count.value + 1 %}
|
| 25 |
+
{%- endif %}
|
| 26 |
+
{%- if add_vision_id %}
|
| 27 |
+
{{- 'Video ' ~ video_count.value ~ ': ' }}
|
| 28 |
+
{%- endif %}
|
| 29 |
+
{{- '<|vision_start|><|video_pad|><|vision_end|>' }}
|
| 30 |
+
{%- elif 'text' in item %}
|
| 31 |
+
{{- item.text }}
|
| 32 |
+
{%- else %}
|
| 33 |
+
{{- raise_exception('Unexpected item type in content.') }}
|
| 34 |
+
{%- endif %}
|
| 35 |
+
{%- endfor %}
|
| 36 |
+
{%- elif content is none or content is undefined %}
|
| 37 |
+
{{- '' }}
|
| 38 |
+
{%- else %}
|
| 39 |
+
{{- raise_exception('Unexpected content type.') }}
|
| 40 |
+
{%- endif %}
|
| 41 |
+
{%- endmacro %}
|
| 42 |
+
{%- if not messages %}
|
| 43 |
+
{{- raise_exception('No messages provided.') }}
|
| 44 |
+
{%- endif %}
|
| 45 |
+
{%- if tools and tools is iterable and tools is not mapping %}
|
| 46 |
+
{{- '<|im_start|>system\n' }}
|
| 47 |
+
{{- "# Tools\n\nYou have access to the following functions:\n\n<tools>" }}
|
| 48 |
+
{%- for tool in tools %}
|
| 49 |
+
{{- "\n" }}
|
| 50 |
+
{{- tool | tojson }}
|
| 51 |
+
{%- endfor %}
|
| 52 |
+
{{- "\n</tools>" }}
|
| 53 |
+
{{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
|
| 54 |
+
{%- if messages[0].role == 'system' %}
|
| 55 |
+
{%- set content = render_content(messages[0].content, false, true)|trim %}
|
| 56 |
+
{%- if content %}
|
| 57 |
+
{{- '\n\n' + content }}
|
| 58 |
+
{%- endif %}
|
| 59 |
+
{%- endif %}
|
| 60 |
+
{{- '<|im_end|>\n' }}
|
| 61 |
+
{%- else %}
|
| 62 |
+
{%- if messages[0].role == 'system' %}
|
| 63 |
+
{%- set content = render_content(messages[0].content, false, true)|trim %}
|
| 64 |
+
{{- '<|im_start|>system\n' + content + '<|im_end|>\n' }}
|
| 65 |
+
{%- endif %}
|
| 66 |
+
{%- endif %}
|
| 67 |
+
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
|
| 68 |
+
{%- for message in messages[::-1] %}
|
| 69 |
+
{%- set index = (messages|length - 1) - loop.index0 %}
|
| 70 |
+
{%- if ns.multi_step_tool and message.role == "user" %}
|
| 71 |
+
{%- set content = render_content(message.content, false)|trim %}
|
| 72 |
+
{%- if not(content.startswith('<tool_response>') and content.endswith('</tool_response>')) %}
|
| 73 |
+
{%- set ns.multi_step_tool = false %}
|
| 74 |
+
{%- set ns.last_query_index = index %}
|
| 75 |
+
{%- endif %}
|
| 76 |
+
{%- endif %}
|
| 77 |
+
{%- endfor %}
|
| 78 |
+
{%- if ns.multi_step_tool %}
|
| 79 |
+
{{- raise_exception('No user query found in messages.') }}
|
| 80 |
+
{%- endif %}
|
| 81 |
+
{%- for message in messages %}
|
| 82 |
+
{%- set content = render_content(message.content, true)|trim %}
|
| 83 |
+
{%- if message.role == "system" %}
|
| 84 |
+
{%- if not loop.first %}
|
| 85 |
+
{{- raise_exception('System message must be at the beginning.') }}
|
| 86 |
+
{%- endif %}
|
| 87 |
+
{%- elif message.role == "user" %}
|
| 88 |
+
{{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
|
| 89 |
+
{%- elif message.role == "assistant" %}
|
| 90 |
+
{%- set reasoning_content = '' %}
|
| 91 |
+
{%- if message.reasoning_content is string %}
|
| 92 |
+
{%- set reasoning_content = message.reasoning_content %}
|
| 93 |
+
{%- else %}
|
| 94 |
+
{%- if '</think>' in content %}
|
| 95 |
+
{%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
|
| 96 |
+
{%- set content = content.split('</think>')[-1].lstrip('\n') %}
|
| 97 |
+
{%- endif %}
|
| 98 |
+
{%- endif %}
|
| 99 |
+
{%- set reasoning_content = reasoning_content|trim %}
|
| 100 |
+
{%- if loop.index0 > ns.last_query_index %}
|
| 101 |
+
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content + '\n</think>\n\n' + content }}
|
| 102 |
+
{%- else %}
|
| 103 |
+
{{- '<|im_start|>' + message.role + '\n' + content }}
|
| 104 |
+
{%- endif %}
|
| 105 |
+
{%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}
|
| 106 |
+
{%- for tool_call in message.tool_calls %}
|
| 107 |
+
{%- if tool_call.function is defined %}
|
| 108 |
+
{%- set tool_call = tool_call.function %}
|
| 109 |
+
{%- endif %}
|
| 110 |
+
{%- if loop.first %}
|
| 111 |
+
{%- if content|trim %}
|
| 112 |
+
{{- '\n\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
|
| 113 |
+
{%- else %}
|
| 114 |
+
{{- '<tool_call>\n<function=' + tool_call.name + '>\n' }}
|
| 115 |
+
{%- endif %}
|
| 116 |
+
{%- else %}
|
| 117 |
+
{{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
|
| 118 |
+
{%- endif %}
|
| 119 |
+
{%- if tool_call.arguments is defined %}
|
| 120 |
+
{%- for args_name, args_value in tool_call.arguments|items %}
|
| 121 |
+
{{- '<parameter=' + args_name + '>\n' }}
|
| 122 |
+
{%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}
|
| 123 |
+
{{- args_value }}
|
| 124 |
+
{{- '\n</parameter>\n' }}
|
| 125 |
+
{%- endfor %}
|
| 126 |
+
{%- endif %}
|
| 127 |
+
{{- '</function>\n</tool_call>' }}
|
| 128 |
+
{%- endfor %}
|
| 129 |
+
{%- endif %}
|
| 130 |
+
{{- '<|im_end|>\n' }}
|
| 131 |
+
{%- elif message.role == "tool" %}
|
| 132 |
+
{%- if loop.previtem and loop.previtem.role != "tool" %}
|
| 133 |
+
{{- '<|im_start|>user' }}
|
| 134 |
+
{%- endif %}
|
| 135 |
+
{{- '\n<tool_response>\n' }}
|
| 136 |
+
{{- content }}
|
| 137 |
+
{{- '\n</tool_response>' }}
|
| 138 |
+
{%- if not loop.last and loop.nextitem.role != "tool" %}
|
| 139 |
+
{{- '<|im_end|>\n' }}
|
| 140 |
+
{%- elif loop.last %}
|
| 141 |
+
{{- '<|im_end|>\n' }}
|
| 142 |
+
{%- endif %}
|
| 143 |
+
{%- else %}
|
| 144 |
+
{{- raise_exception('Unexpected message role.') }}
|
| 145 |
+
{%- endif %}
|
| 146 |
+
{%- endfor %}
|
| 147 |
+
{%- if add_generation_prompt %}
|
| 148 |
+
{{- '<|im_start|>assistant\n' }}
|
| 149 |
+
{%- if enable_thinking is defined and enable_thinking is true %}
|
| 150 |
+
{{- '<think>\n' }}
|
| 151 |
+
{%- else %}
|
| 152 |
+
{{- '<think>\n\n</think>\n\n' }}
|
| 153 |
+
{%- endif %}
|
| 154 |
+
{%- endif %}
|
checkpoint-2750/README.md
ADDED
|
@@ -0,0 +1,207 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
base_model: Qwen/Qwen3.5-2B
|
| 3 |
+
library_name: peft
|
| 4 |
+
pipeline_tag: text-generation
|
| 5 |
+
tags:
|
| 6 |
+
- base_model:adapter:Qwen/Qwen3.5-2B
|
| 7 |
+
- lora
|
| 8 |
+
- transformers
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# Model Card for Model ID
|
| 12 |
+
|
| 13 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
## Model Details
|
| 18 |
+
|
| 19 |
+
### Model Description
|
| 20 |
+
|
| 21 |
+
<!-- Provide a longer summary of what this model is. -->
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
- **Developed by:** [More Information Needed]
|
| 26 |
+
- **Funded by [optional]:** [More Information Needed]
|
| 27 |
+
- **Shared by [optional]:** [More Information Needed]
|
| 28 |
+
- **Model type:** [More Information Needed]
|
| 29 |
+
- **Language(s) (NLP):** [More Information Needed]
|
| 30 |
+
- **License:** [More Information Needed]
|
| 31 |
+
- **Finetuned from model [optional]:** [More Information Needed]
|
| 32 |
+
|
| 33 |
+
### Model Sources [optional]
|
| 34 |
+
|
| 35 |
+
<!-- Provide the basic links for the model. -->
|
| 36 |
+
|
| 37 |
+
- **Repository:** [More Information Needed]
|
| 38 |
+
- **Paper [optional]:** [More Information Needed]
|
| 39 |
+
- **Demo [optional]:** [More Information Needed]
|
| 40 |
+
|
| 41 |
+
## Uses
|
| 42 |
+
|
| 43 |
+
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
| 44 |
+
|
| 45 |
+
### Direct Use
|
| 46 |
+
|
| 47 |
+
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
| 48 |
+
|
| 49 |
+
[More Information Needed]
|
| 50 |
+
|
| 51 |
+
### Downstream Use [optional]
|
| 52 |
+
|
| 53 |
+
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
| 54 |
+
|
| 55 |
+
[More Information Needed]
|
| 56 |
+
|
| 57 |
+
### Out-of-Scope Use
|
| 58 |
+
|
| 59 |
+
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
| 60 |
+
|
| 61 |
+
[More Information Needed]
|
| 62 |
+
|
| 63 |
+
## Bias, Risks, and Limitations
|
| 64 |
+
|
| 65 |
+
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
| 66 |
+
|
| 67 |
+
[More Information Needed]
|
| 68 |
+
|
| 69 |
+
### Recommendations
|
| 70 |
+
|
| 71 |
+
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
| 72 |
+
|
| 73 |
+
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
| 74 |
+
|
| 75 |
+
## How to Get Started with the Model
|
| 76 |
+
|
| 77 |
+
Use the code below to get started with the model.
|
| 78 |
+
|
| 79 |
+
[More Information Needed]
|
| 80 |
+
|
| 81 |
+
## Training Details
|
| 82 |
+
|
| 83 |
+
### Training Data
|
| 84 |
+
|
| 85 |
+
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
| 86 |
+
|
| 87 |
+
[More Information Needed]
|
| 88 |
+
|
| 89 |
+
### Training Procedure
|
| 90 |
+
|
| 91 |
+
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
| 92 |
+
|
| 93 |
+
#### Preprocessing [optional]
|
| 94 |
+
|
| 95 |
+
[More Information Needed]
|
| 96 |
+
|
| 97 |
+
|
| 98 |
+
#### Training Hyperparameters
|
| 99 |
+
|
| 100 |
+
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
| 101 |
+
|
| 102 |
+
#### Speeds, Sizes, Times [optional]
|
| 103 |
+
|
| 104 |
+
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
| 105 |
+
|
| 106 |
+
[More Information Needed]
|
| 107 |
+
|
| 108 |
+
## Evaluation
|
| 109 |
+
|
| 110 |
+
<!-- This section describes the evaluation protocols and provides the results. -->
|
| 111 |
+
|
| 112 |
+
### Testing Data, Factors & Metrics
|
| 113 |
+
|
| 114 |
+
#### Testing Data
|
| 115 |
+
|
| 116 |
+
<!-- This should link to a Dataset Card if possible. -->
|
| 117 |
+
|
| 118 |
+
[More Information Needed]
|
| 119 |
+
|
| 120 |
+
#### Factors
|
| 121 |
+
|
| 122 |
+
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
| 123 |
+
|
| 124 |
+
[More Information Needed]
|
| 125 |
+
|
| 126 |
+
#### Metrics
|
| 127 |
+
|
| 128 |
+
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
| 129 |
+
|
| 130 |
+
[More Information Needed]
|
| 131 |
+
|
| 132 |
+
### Results
|
| 133 |
+
|
| 134 |
+
[More Information Needed]
|
| 135 |
+
|
| 136 |
+
#### Summary
|
| 137 |
+
|
| 138 |
+
|
| 139 |
+
|
| 140 |
+
## Model Examination [optional]
|
| 141 |
+
|
| 142 |
+
<!-- Relevant interpretability work for the model goes here -->
|
| 143 |
+
|
| 144 |
+
[More Information Needed]
|
| 145 |
+
|
| 146 |
+
## Environmental Impact
|
| 147 |
+
|
| 148 |
+
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
| 149 |
+
|
| 150 |
+
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
| 151 |
+
|
| 152 |
+
- **Hardware Type:** [More Information Needed]
|
| 153 |
+
- **Hours used:** [More Information Needed]
|
| 154 |
+
- **Cloud Provider:** [More Information Needed]
|
| 155 |
+
- **Compute Region:** [More Information Needed]
|
| 156 |
+
- **Carbon Emitted:** [More Information Needed]
|
| 157 |
+
|
| 158 |
+
## Technical Specifications [optional]
|
| 159 |
+
|
| 160 |
+
### Model Architecture and Objective
|
| 161 |
+
|
| 162 |
+
[More Information Needed]
|
| 163 |
+
|
| 164 |
+
### Compute Infrastructure
|
| 165 |
+
|
| 166 |
+
[More Information Needed]
|
| 167 |
+
|
| 168 |
+
#### Hardware
|
| 169 |
+
|
| 170 |
+
[More Information Needed]
|
| 171 |
+
|
| 172 |
+
#### Software
|
| 173 |
+
|
| 174 |
+
[More Information Needed]
|
| 175 |
+
|
| 176 |
+
## Citation [optional]
|
| 177 |
+
|
| 178 |
+
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
| 179 |
+
|
| 180 |
+
**BibTeX:**
|
| 181 |
+
|
| 182 |
+
[More Information Needed]
|
| 183 |
+
|
| 184 |
+
**APA:**
|
| 185 |
+
|
| 186 |
+
[More Information Needed]
|
| 187 |
+
|
| 188 |
+
## Glossary [optional]
|
| 189 |
+
|
| 190 |
+
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
| 191 |
+
|
| 192 |
+
[More Information Needed]
|
| 193 |
+
|
| 194 |
+
## More Information [optional]
|
| 195 |
+
|
| 196 |
+
[More Information Needed]
|
| 197 |
+
|
| 198 |
+
## Model Card Authors [optional]
|
| 199 |
+
|
| 200 |
+
[More Information Needed]
|
| 201 |
+
|
| 202 |
+
## Model Card Contact
|
| 203 |
+
|
| 204 |
+
[More Information Needed]
|
| 205 |
+
### Framework versions
|
| 206 |
+
|
| 207 |
+
- PEFT 0.18.0
|
checkpoint-2750/adapter_config.json
ADDED
|
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"alora_invocation_tokens": null,
|
| 3 |
+
"alpha_pattern": {},
|
| 4 |
+
"arrow_config": null,
|
| 5 |
+
"auto_mapping": null,
|
| 6 |
+
"base_model_name_or_path": "Qwen/Qwen3.5-2B",
|
| 7 |
+
"bias": "none",
|
| 8 |
+
"corda_config": null,
|
| 9 |
+
"ensure_weight_tying": false,
|
| 10 |
+
"eva_config": null,
|
| 11 |
+
"exclude_modules": null,
|
| 12 |
+
"fan_in_fan_out": false,
|
| 13 |
+
"inference_mode": true,
|
| 14 |
+
"init_lora_weights": true,
|
| 15 |
+
"layer_replication": null,
|
| 16 |
+
"layers_pattern": null,
|
| 17 |
+
"layers_to_transform": null,
|
| 18 |
+
"loftq_config": {},
|
| 19 |
+
"lora_alpha": 64,
|
| 20 |
+
"lora_bias": false,
|
| 21 |
+
"lora_dropout": 0.05,
|
| 22 |
+
"megatron_config": null,
|
| 23 |
+
"megatron_core": "megatron.core",
|
| 24 |
+
"modules_to_save": null,
|
| 25 |
+
"peft_type": "LORA",
|
| 26 |
+
"peft_version": "0.18.0",
|
| 27 |
+
"qalora_group_size": 16,
|
| 28 |
+
"r": 32,
|
| 29 |
+
"rank_pattern": {},
|
| 30 |
+
"revision": null,
|
| 31 |
+
"target_modules": [
|
| 32 |
+
"v_proj",
|
| 33 |
+
"k_proj",
|
| 34 |
+
"gate_proj",
|
| 35 |
+
"o_proj",
|
| 36 |
+
"down_proj",
|
| 37 |
+
"up_proj",
|
| 38 |
+
"q_proj"
|
| 39 |
+
],
|
| 40 |
+
"target_parameters": null,
|
| 41 |
+
"task_type": "CAUSAL_LM",
|
| 42 |
+
"trainable_token_indices": null,
|
| 43 |
+
"use_dora": false,
|
| 44 |
+
"use_qalora": false,
|
| 45 |
+
"use_rslora": false
|
| 46 |
+
}
|
checkpoint-2750/adapter_model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6b49ebcad405adb694c065950e282615812e3981bdca4202f2a79e151d3c1ec2
|
| 3 |
+
size 87319256
|
checkpoint-2750/chat_template.jinja
ADDED
|
@@ -0,0 +1,154 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{%- set image_count = namespace(value=0) %}
|
| 2 |
+
{%- set video_count = namespace(value=0) %}
|
| 3 |
+
{%- macro render_content(content, do_vision_count, is_system_content=false) %}
|
| 4 |
+
{%- if content is string %}
|
| 5 |
+
{{- content }}
|
| 6 |
+
{%- elif content is iterable and content is not mapping %}
|
| 7 |
+
{%- for item in content %}
|
| 8 |
+
{%- if 'image' in item or 'image_url' in item or item.type == 'image' %}
|
| 9 |
+
{%- if is_system_content %}
|
| 10 |
+
{{- raise_exception('System message cannot contain images.') }}
|
| 11 |
+
{%- endif %}
|
| 12 |
+
{%- if do_vision_count %}
|
| 13 |
+
{%- set image_count.value = image_count.value + 1 %}
|
| 14 |
+
{%- endif %}
|
| 15 |
+
{%- if add_vision_id %}
|
| 16 |
+
{{- 'Picture ' ~ image_count.value ~ ': ' }}
|
| 17 |
+
{%- endif %}
|
| 18 |
+
{{- '<|vision_start|><|image_pad|><|vision_end|>' }}
|
| 19 |
+
{%- elif 'video' in item or item.type == 'video' %}
|
| 20 |
+
{%- if is_system_content %}
|
| 21 |
+
{{- raise_exception('System message cannot contain videos.') }}
|
| 22 |
+
{%- endif %}
|
| 23 |
+
{%- if do_vision_count %}
|
| 24 |
+
{%- set video_count.value = video_count.value + 1 %}
|
| 25 |
+
{%- endif %}
|
| 26 |
+
{%- if add_vision_id %}
|
| 27 |
+
{{- 'Video ' ~ video_count.value ~ ': ' }}
|
| 28 |
+
{%- endif %}
|
| 29 |
+
{{- '<|vision_start|><|video_pad|><|vision_end|>' }}
|
| 30 |
+
{%- elif 'text' in item %}
|
| 31 |
+
{{- item.text }}
|
| 32 |
+
{%- else %}
|
| 33 |
+
{{- raise_exception('Unexpected item type in content.') }}
|
| 34 |
+
{%- endif %}
|
| 35 |
+
{%- endfor %}
|
| 36 |
+
{%- elif content is none or content is undefined %}
|
| 37 |
+
{{- '' }}
|
| 38 |
+
{%- else %}
|
| 39 |
+
{{- raise_exception('Unexpected content type.') }}
|
| 40 |
+
{%- endif %}
|
| 41 |
+
{%- endmacro %}
|
| 42 |
+
{%- if not messages %}
|
| 43 |
+
{{- raise_exception('No messages provided.') }}
|
| 44 |
+
{%- endif %}
|
| 45 |
+
{%- if tools and tools is iterable and tools is not mapping %}
|
| 46 |
+
{{- '<|im_start|>system\n' }}
|
| 47 |
+
{{- "# Tools\n\nYou have access to the following functions:\n\n<tools>" }}
|
| 48 |
+
{%- for tool in tools %}
|
| 49 |
+
{{- "\n" }}
|
| 50 |
+
{{- tool | tojson }}
|
| 51 |
+
{%- endfor %}
|
| 52 |
+
{{- "\n</tools>" }}
|
| 53 |
+
{{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
|
| 54 |
+
{%- if messages[0].role == 'system' %}
|
| 55 |
+
{%- set content = render_content(messages[0].content, false, true)|trim %}
|
| 56 |
+
{%- if content %}
|
| 57 |
+
{{- '\n\n' + content }}
|
| 58 |
+
{%- endif %}
|
| 59 |
+
{%- endif %}
|
| 60 |
+
{{- '<|im_end|>\n' }}
|
| 61 |
+
{%- else %}
|
| 62 |
+
{%- if messages[0].role == 'system' %}
|
| 63 |
+
{%- set content = render_content(messages[0].content, false, true)|trim %}
|
| 64 |
+
{{- '<|im_start|>system\n' + content + '<|im_end|>\n' }}
|
| 65 |
+
{%- endif %}
|
| 66 |
+
{%- endif %}
|
| 67 |
+
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
|
| 68 |
+
{%- for message in messages[::-1] %}
|
| 69 |
+
{%- set index = (messages|length - 1) - loop.index0 %}
|
| 70 |
+
{%- if ns.multi_step_tool and message.role == "user" %}
|
| 71 |
+
{%- set content = render_content(message.content, false)|trim %}
|
| 72 |
+
{%- if not(content.startswith('<tool_response>') and content.endswith('</tool_response>')) %}
|
| 73 |
+
{%- set ns.multi_step_tool = false %}
|
| 74 |
+
{%- set ns.last_query_index = index %}
|
| 75 |
+
{%- endif %}
|
| 76 |
+
{%- endif %}
|
| 77 |
+
{%- endfor %}
|
| 78 |
+
{%- if ns.multi_step_tool %}
|
| 79 |
+
{{- raise_exception('No user query found in messages.') }}
|
| 80 |
+
{%- endif %}
|
| 81 |
+
{%- for message in messages %}
|
| 82 |
+
{%- set content = render_content(message.content, true)|trim %}
|
| 83 |
+
{%- if message.role == "system" %}
|
| 84 |
+
{%- if not loop.first %}
|
| 85 |
+
{{- raise_exception('System message must be at the beginning.') }}
|
| 86 |
+
{%- endif %}
|
| 87 |
+
{%- elif message.role == "user" %}
|
| 88 |
+
{{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
|
| 89 |
+
{%- elif message.role == "assistant" %}
|
| 90 |
+
{%- set reasoning_content = '' %}
|
| 91 |
+
{%- if message.reasoning_content is string %}
|
| 92 |
+
{%- set reasoning_content = message.reasoning_content %}
|
| 93 |
+
{%- else %}
|
| 94 |
+
{%- if '</think>' in content %}
|
| 95 |
+
{%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
|
| 96 |
+
{%- set content = content.split('</think>')[-1].lstrip('\n') %}
|
| 97 |
+
{%- endif %}
|
| 98 |
+
{%- endif %}
|
| 99 |
+
{%- set reasoning_content = reasoning_content|trim %}
|
| 100 |
+
{%- if loop.index0 > ns.last_query_index %}
|
| 101 |
+
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content + '\n</think>\n\n' + content }}
|
| 102 |
+
{%- else %}
|
| 103 |
+
{{- '<|im_start|>' + message.role + '\n' + content }}
|
| 104 |
+
{%- endif %}
|
| 105 |
+
{%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}
|
| 106 |
+
{%- for tool_call in message.tool_calls %}
|
| 107 |
+
{%- if tool_call.function is defined %}
|
| 108 |
+
{%- set tool_call = tool_call.function %}
|
| 109 |
+
{%- endif %}
|
| 110 |
+
{%- if loop.first %}
|
| 111 |
+
{%- if content|trim %}
|
| 112 |
+
{{- '\n\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
|
| 113 |
+
{%- else %}
|
| 114 |
+
{{- '<tool_call>\n<function=' + tool_call.name + '>\n' }}
|
| 115 |
+
{%- endif %}
|
| 116 |
+
{%- else %}
|
| 117 |
+
{{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
|
| 118 |
+
{%- endif %}
|
| 119 |
+
{%- if tool_call.arguments is defined %}
|
| 120 |
+
{%- for args_name, args_value in tool_call.arguments|items %}
|
| 121 |
+
{{- '<parameter=' + args_name + '>\n' }}
|
| 122 |
+
{%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}
|
| 123 |
+
{{- args_value }}
|
| 124 |
+
{{- '\n</parameter>\n' }}
|
| 125 |
+
{%- endfor %}
|
| 126 |
+
{%- endif %}
|
| 127 |
+
{{- '</function>\n</tool_call>' }}
|
| 128 |
+
{%- endfor %}
|
| 129 |
+
{%- endif %}
|
| 130 |
+
{{- '<|im_end|>\n' }}
|
| 131 |
+
{%- elif message.role == "tool" %}
|
| 132 |
+
{%- if loop.previtem and loop.previtem.role != "tool" %}
|
| 133 |
+
{{- '<|im_start|>user' }}
|
| 134 |
+
{%- endif %}
|
| 135 |
+
{{- '\n<tool_response>\n' }}
|
| 136 |
+
{{- content }}
|
| 137 |
+
{{- '\n</tool_response>' }}
|
| 138 |
+
{%- if not loop.last and loop.nextitem.role != "tool" %}
|
| 139 |
+
{{- '<|im_end|>\n' }}
|
| 140 |
+
{%- elif loop.last %}
|
| 141 |
+
{{- '<|im_end|>\n' }}
|
| 142 |
+
{%- endif %}
|
| 143 |
+
{%- else %}
|
| 144 |
+
{{- raise_exception('Unexpected message role.') }}
|
| 145 |
+
{%- endif %}
|
| 146 |
+
{%- endfor %}
|
| 147 |
+
{%- if add_generation_prompt %}
|
| 148 |
+
{{- '<|im_start|>assistant\n' }}
|
| 149 |
+
{%- if enable_thinking is defined and enable_thinking is true %}
|
| 150 |
+
{{- '<think>\n' }}
|
| 151 |
+
{%- else %}
|
| 152 |
+
{{- '<think>\n\n</think>\n\n' }}
|
| 153 |
+
{%- endif %}
|
| 154 |
+
{%- endif %}
|
checkpoint-2750/optimizer.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9b823bb8d08dc7f3e67974d7373180327c0c3b3f484279111011eeccb193952e
|
| 3 |
+
size 174750283
|
checkpoint-2750/rng_state.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:35b19cca0c77fd3faf9cb577574ebff9d16240a5010f338c8fd848717050f145
|
| 3 |
+
size 14645
|
checkpoint-2750/scheduler.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e4246573c193d4338561dd7c638ea83197ef8cbd56a1d02b874104194a5175da
|
| 3 |
+
size 1465
|
checkpoint-2750/tokenizer.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:87a7830d63fcf43bf241c3c5242e96e62dd3fdc29224ca26fed8ea333db72de4
|
| 3 |
+
size 19989343
|
checkpoint-2750/tokenizer_config.json
ADDED
|
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"add_prefix_space": false,
|
| 3 |
+
"audio_bos_token": "<|audio_start|>",
|
| 4 |
+
"audio_eos_token": "<|audio_end|>",
|
| 5 |
+
"audio_token": "<|audio_pad|>",
|
| 6 |
+
"backend": "tokenizers",
|
| 7 |
+
"bos_token": null,
|
| 8 |
+
"clean_up_tokenization_spaces": false,
|
| 9 |
+
"eos_token": "<|im_end|>",
|
| 10 |
+
"errors": "replace",
|
| 11 |
+
"image_token": "<|image_pad|>",
|
| 12 |
+
"is_local": true,
|
| 13 |
+
"model_max_length": 262144,
|
| 14 |
+
"model_specific_special_tokens": {
|
| 15 |
+
"audio_bos_token": "<|audio_start|>",
|
| 16 |
+
"audio_eos_token": "<|audio_end|>",
|
| 17 |
+
"audio_token": "<|audio_pad|>",
|
| 18 |
+
"image_token": "<|image_pad|>",
|
| 19 |
+
"video_token": "<|video_pad|>",
|
| 20 |
+
"vision_bos_token": "<|vision_start|>",
|
| 21 |
+
"vision_eos_token": "<|vision_end|>"
|
| 22 |
+
},
|
| 23 |
+
"pad_token": "<|endoftext|>",
|
| 24 |
+
"pretokenize_regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?[\\p{L}\\p{M}]+|\\p{N}| ?[^\\s\\p{L}\\p{M}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
|
| 25 |
+
"split_special_tokens": false,
|
| 26 |
+
"tokenizer_class": "TokenizersBackend",
|
| 27 |
+
"unk_token": null,
|
| 28 |
+
"video_token": "<|video_pad|>",
|
| 29 |
+
"vision_bos_token": "<|vision_start|>",
|
| 30 |
+
"vision_eos_token": "<|vision_end|>"
|
| 31 |
+
}
|
checkpoint-2750/trainer_state.json
ADDED
|
@@ -0,0 +1,2047 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"best_global_step": null,
|
| 3 |
+
"best_metric": null,
|
| 4 |
+
"best_model_checkpoint": null,
|
| 5 |
+
"epoch": 2.8802724652868745,
|
| 6 |
+
"eval_steps": 250,
|
| 7 |
+
"global_step": 2750,
|
| 8 |
+
"is_hyper_param_search": false,
|
| 9 |
+
"is_local_process_zero": true,
|
| 10 |
+
"is_world_process_zero": true,
|
| 11 |
+
"log_history": [
|
| 12 |
+
{
|
| 13 |
+
"epoch": 0.010479434110558029,
|
| 14 |
+
"grad_norm": 0.19915591180324554,
|
| 15 |
+
"learning_rate": 1.0465116279069768e-05,
|
| 16 |
+
"loss": 1.1350045204162598,
|
| 17 |
+
"step": 10
|
| 18 |
+
},
|
| 19 |
+
{
|
| 20 |
+
"epoch": 0.020958868221116058,
|
| 21 |
+
"grad_norm": 0.18158815801143646,
|
| 22 |
+
"learning_rate": 2.2093023255813955e-05,
|
| 23 |
+
"loss": 1.0580164909362793,
|
| 24 |
+
"step": 20
|
| 25 |
+
},
|
| 26 |
+
{
|
| 27 |
+
"epoch": 0.03143830233167409,
|
| 28 |
+
"grad_norm": 0.16481591761112213,
|
| 29 |
+
"learning_rate": 3.372093023255814e-05,
|
| 30 |
+
"loss": 0.9252842903137207,
|
| 31 |
+
"step": 30
|
| 32 |
+
},
|
| 33 |
+
{
|
| 34 |
+
"epoch": 0.041917736442232116,
|
| 35 |
+
"grad_norm": 0.15599584579467773,
|
| 36 |
+
"learning_rate": 4.5348837209302326e-05,
|
| 37 |
+
"loss": 0.8342072486877441,
|
| 38 |
+
"step": 40
|
| 39 |
+
},
|
| 40 |
+
{
|
| 41 |
+
"epoch": 0.05239717055279015,
|
| 42 |
+
"grad_norm": 0.1804327368736267,
|
| 43 |
+
"learning_rate": 5.697674418604652e-05,
|
| 44 |
+
"loss": 0.7955524921417236,
|
| 45 |
+
"step": 50
|
| 46 |
+
},
|
| 47 |
+
{
|
| 48 |
+
"epoch": 0.06287660466334818,
|
| 49 |
+
"grad_norm": 0.16934047639369965,
|
| 50 |
+
"learning_rate": 6.86046511627907e-05,
|
| 51 |
+
"loss": 0.7358035087585449,
|
| 52 |
+
"step": 60
|
| 53 |
+
},
|
| 54 |
+
{
|
| 55 |
+
"epoch": 0.07335603877390622,
|
| 56 |
+
"grad_norm": 0.2234930843114853,
|
| 57 |
+
"learning_rate": 8.023255813953489e-05,
|
| 58 |
+
"loss": 0.6985861301422119,
|
| 59 |
+
"step": 70
|
| 60 |
+
},
|
| 61 |
+
{
|
| 62 |
+
"epoch": 0.08383547288446423,
|
| 63 |
+
"grad_norm": 0.16290400922298431,
|
| 64 |
+
"learning_rate": 9.186046511627907e-05,
|
| 65 |
+
"loss": 0.599607515335083,
|
| 66 |
+
"step": 80
|
| 67 |
+
},
|
| 68 |
+
{
|
| 69 |
+
"epoch": 0.09431490699502226,
|
| 70 |
+
"grad_norm": 0.1660464107990265,
|
| 71 |
+
"learning_rate": 9.999971245570617e-05,
|
| 72 |
+
"loss": 0.5886398315429687,
|
| 73 |
+
"step": 90
|
| 74 |
+
},
|
| 75 |
+
{
|
| 76 |
+
"epoch": 0.1047943411055803,
|
| 77 |
+
"grad_norm": 0.16978025436401367,
|
| 78 |
+
"learning_rate": 9.999460064915317e-05,
|
| 79 |
+
"loss": 0.5450529098510742,
|
| 80 |
+
"step": 100
|
| 81 |
+
},
|
| 82 |
+
{
|
| 83 |
+
"epoch": 0.11527377521613832,
|
| 84 |
+
"grad_norm": 0.21447990834712982,
|
| 85 |
+
"learning_rate": 9.998309972134645e-05,
|
| 86 |
+
"loss": 0.5072262287139893,
|
| 87 |
+
"step": 110
|
| 88 |
+
},
|
| 89 |
+
{
|
| 90 |
+
"epoch": 0.12575320932669637,
|
| 91 |
+
"grad_norm": 0.17418669164180756,
|
| 92 |
+
"learning_rate": 9.996521114206116e-05,
|
| 93 |
+
"loss": 0.49445347785949706,
|
| 94 |
+
"step": 120
|
| 95 |
+
},
|
| 96 |
+
{
|
| 97 |
+
"epoch": 0.13623264343725439,
|
| 98 |
+
"grad_norm": 0.22226351499557495,
|
| 99 |
+
"learning_rate": 9.994093719739023e-05,
|
| 100 |
+
"loss": 0.47142682075500486,
|
| 101 |
+
"step": 130
|
| 102 |
+
},
|
| 103 |
+
{
|
| 104 |
+
"epoch": 0.14671207754781243,
|
| 105 |
+
"grad_norm": 0.1745530068874359,
|
| 106 |
+
"learning_rate": 9.991028098945215e-05,
|
| 107 |
+
"loss": 0.46663532257080076,
|
| 108 |
+
"step": 140
|
| 109 |
+
},
|
| 110 |
+
{
|
| 111 |
+
"epoch": 0.15719151165837045,
|
| 112 |
+
"grad_norm": 0.17074695229530334,
|
| 113 |
+
"learning_rate": 9.987324643599459e-05,
|
| 114 |
+
"loss": 0.4508847236633301,
|
| 115 |
+
"step": 150
|
| 116 |
+
},
|
| 117 |
+
{
|
| 118 |
+
"epoch": 0.16767094576892846,
|
| 119 |
+
"grad_norm": 0.13428406417369843,
|
| 120 |
+
"learning_rate": 9.982983826989367e-05,
|
| 121 |
+
"loss": 0.40740265846252444,
|
| 122 |
+
"step": 160
|
| 123 |
+
},
|
| 124 |
+
{
|
| 125 |
+
"epoch": 0.1781503798794865,
|
| 126 |
+
"grad_norm": 0.17766578495502472,
|
| 127 |
+
"learning_rate": 9.978006203854918e-05,
|
| 128 |
+
"loss": 0.3998516321182251,
|
| 129 |
+
"step": 170
|
| 130 |
+
},
|
| 131 |
+
{
|
| 132 |
+
"epoch": 0.18862981399004453,
|
| 133 |
+
"grad_norm": 0.1672629565000534,
|
| 134 |
+
"learning_rate": 9.972392410317562e-05,
|
| 135 |
+
"loss": 0.41658673286437986,
|
| 136 |
+
"step": 180
|
| 137 |
+
},
|
| 138 |
+
{
|
| 139 |
+
"epoch": 0.19910924810060257,
|
| 140 |
+
"grad_norm": 0.1333673745393753,
|
| 141 |
+
"learning_rate": 9.96614316379892e-05,
|
| 142 |
+
"loss": 0.37024455070495604,
|
| 143 |
+
"step": 190
|
| 144 |
+
},
|
| 145 |
+
{
|
| 146 |
+
"epoch": 0.2095886822111606,
|
| 147 |
+
"grad_norm": 0.18037110567092896,
|
| 148 |
+
"learning_rate": 9.959259262929113e-05,
|
| 149 |
+
"loss": 0.35086841583251954,
|
| 150 |
+
"step": 200
|
| 151 |
+
},
|
| 152 |
+
{
|
| 153 |
+
"epoch": 0.22006811632171863,
|
| 154 |
+
"grad_norm": 0.14616410434246063,
|
| 155 |
+
"learning_rate": 9.951741587444683e-05,
|
| 156 |
+
"loss": 0.37918968200683595,
|
| 157 |
+
"step": 210
|
| 158 |
+
},
|
| 159 |
+
{
|
| 160 |
+
"epoch": 0.23054755043227665,
|
| 161 |
+
"grad_norm": 0.14523574709892273,
|
| 162 |
+
"learning_rate": 9.943591098076184e-05,
|
| 163 |
+
"loss": 0.32804527282714846,
|
| 164 |
+
"step": 220
|
| 165 |
+
},
|
| 166 |
+
{
|
| 167 |
+
"epoch": 0.2410269845428347,
|
| 168 |
+
"grad_norm": 0.14667049050331116,
|
| 169 |
+
"learning_rate": 9.934808836425393e-05,
|
| 170 |
+
"loss": 0.3480507850646973,
|
| 171 |
+
"step": 230
|
| 172 |
+
},
|
| 173 |
+
{
|
| 174 |
+
"epoch": 0.25150641865339274,
|
| 175 |
+
"grad_norm": 0.18156558275222778,
|
| 176 |
+
"learning_rate": 9.925395924832198e-05,
|
| 177 |
+
"loss": 0.3300448179244995,
|
| 178 |
+
"step": 240
|
| 179 |
+
},
|
| 180 |
+
{
|
| 181 |
+
"epoch": 0.26198585276395076,
|
| 182 |
+
"grad_norm": 0.13806430995464325,
|
| 183 |
+
"learning_rate": 9.91535356623117e-05,
|
| 184 |
+
"loss": 0.3127591609954834,
|
| 185 |
+
"step": 250
|
| 186 |
+
},
|
| 187 |
+
{
|
| 188 |
+
"epoch": 0.26198585276395076,
|
| 189 |
+
"eval_loss": 0.3132782578468323,
|
| 190 |
+
"eval_runtime": 94.8848,
|
| 191 |
+
"eval_samples_per_second": 3.278,
|
| 192 |
+
"eval_steps_per_second": 3.278,
|
| 193 |
+
"step": 250
|
| 194 |
+
},
|
| 195 |
+
{
|
| 196 |
+
"epoch": 0.27246528687450877,
|
| 197 |
+
"grad_norm": 0.17205959558486938,
|
| 198 |
+
"learning_rate": 9.904683043997835e-05,
|
| 199 |
+
"loss": 0.3306673288345337,
|
| 200 |
+
"step": 260
|
| 201 |
+
},
|
| 202 |
+
{
|
| 203 |
+
"epoch": 0.2829447209850668,
|
| 204 |
+
"grad_norm": 0.12620031833648682,
|
| 205 |
+
"learning_rate": 9.893385721784656e-05,
|
| 206 |
+
"loss": 0.3011106729507446,
|
| 207 |
+
"step": 270
|
| 208 |
+
},
|
| 209 |
+
{
|
| 210 |
+
"epoch": 0.29342415509562486,
|
| 211 |
+
"grad_norm": 0.11466006934642792,
|
| 212 |
+
"learning_rate": 9.881463043346768e-05,
|
| 213 |
+
"loss": 0.2951968669891357,
|
| 214 |
+
"step": 280
|
| 215 |
+
},
|
| 216 |
+
{
|
| 217 |
+
"epoch": 0.3039035892061829,
|
| 218 |
+
"grad_norm": 0.1671207845211029,
|
| 219 |
+
"learning_rate": 9.868916532357475e-05,
|
| 220 |
+
"loss": 0.2910990953445435,
|
| 221 |
+
"step": 290
|
| 222 |
+
},
|
| 223 |
+
{
|
| 224 |
+
"epoch": 0.3143830233167409,
|
| 225 |
+
"grad_norm": 0.1683349907398224,
|
| 226 |
+
"learning_rate": 9.855747792213521e-05,
|
| 227 |
+
"loss": 0.31409192085266113,
|
| 228 |
+
"step": 300
|
| 229 |
+
},
|
| 230 |
+
{
|
| 231 |
+
"epoch": 0.3248624574272989,
|
| 232 |
+
"grad_norm": 0.12934699654579163,
|
| 233 |
+
"learning_rate": 9.84195850583019e-05,
|
| 234 |
+
"loss": 0.27755858898162844,
|
| 235 |
+
"step": 310
|
| 236 |
+
},
|
| 237 |
+
{
|
| 238 |
+
"epoch": 0.33534189153785693,
|
| 239 |
+
"grad_norm": 0.13784605264663696,
|
| 240 |
+
"learning_rate": 9.827550435426234e-05,
|
| 241 |
+
"loss": 0.2809821605682373,
|
| 242 |
+
"step": 320
|
| 243 |
+
},
|
| 244 |
+
{
|
| 245 |
+
"epoch": 0.345821325648415,
|
| 246 |
+
"grad_norm": 0.18590271472930908,
|
| 247 |
+
"learning_rate": 9.812525422298664e-05,
|
| 248 |
+
"loss": 0.28698866367340087,
|
| 249 |
+
"step": 330
|
| 250 |
+
},
|
| 251 |
+
{
|
| 252 |
+
"epoch": 0.356300759758973,
|
| 253 |
+
"grad_norm": 0.1704522967338562,
|
| 254 |
+
"learning_rate": 9.796885386587447e-05,
|
| 255 |
+
"loss": 0.250814414024353,
|
| 256 |
+
"step": 340
|
| 257 |
+
},
|
| 258 |
+
{
|
| 259 |
+
"epoch": 0.36678019386953103,
|
| 260 |
+
"grad_norm": 0.1316167265176773,
|
| 261 |
+
"learning_rate": 9.780632327030112e-05,
|
| 262 |
+
"loss": 0.25458922386169436,
|
| 263 |
+
"step": 350
|
| 264 |
+
},
|
| 265 |
+
{
|
| 266 |
+
"epoch": 0.37725962798008905,
|
| 267 |
+
"grad_norm": 0.16226200759410858,
|
| 268 |
+
"learning_rate": 9.763768320706319e-05,
|
| 269 |
+
"loss": 0.26563262939453125,
|
| 270 |
+
"step": 360
|
| 271 |
+
},
|
| 272 |
+
{
|
| 273 |
+
"epoch": 0.3877390620906471,
|
| 274 |
+
"grad_norm": 0.1297195851802826,
|
| 275 |
+
"learning_rate": 9.746295522772424e-05,
|
| 276 |
+
"loss": 0.2632328748703003,
|
| 277 |
+
"step": 370
|
| 278 |
+
},
|
| 279 |
+
{
|
| 280 |
+
"epoch": 0.39821849620120514,
|
| 281 |
+
"grad_norm": 0.1286139190196991,
|
| 282 |
+
"learning_rate": 9.728216166186049e-05,
|
| 283 |
+
"loss": 0.2624588251113892,
|
| 284 |
+
"step": 380
|
| 285 |
+
},
|
| 286 |
+
{
|
| 287 |
+
"epoch": 0.40869793031176316,
|
| 288 |
+
"grad_norm": 0.1587965339422226,
|
| 289 |
+
"learning_rate": 9.709532561420725e-05,
|
| 290 |
+
"loss": 0.24741590023040771,
|
| 291 |
+
"step": 390
|
| 292 |
+
},
|
| 293 |
+
{
|
| 294 |
+
"epoch": 0.4191773644223212,
|
| 295 |
+
"grad_norm": 0.11963177472352982,
|
| 296 |
+
"learning_rate": 9.690247096170615e-05,
|
| 297 |
+
"loss": 0.22777397632598878,
|
| 298 |
+
"step": 400
|
| 299 |
+
},
|
| 300 |
+
{
|
| 301 |
+
"epoch": 0.42965679853287925,
|
| 302 |
+
"grad_norm": 0.13638927042484283,
|
| 303 |
+
"learning_rate": 9.670362235045387e-05,
|
| 304 |
+
"loss": 0.23324952125549317,
|
| 305 |
+
"step": 410
|
| 306 |
+
},
|
| 307 |
+
{
|
| 308 |
+
"epoch": 0.44013623264343726,
|
| 309 |
+
"grad_norm": 0.1514088362455368,
|
| 310 |
+
"learning_rate": 9.649880519255232e-05,
|
| 311 |
+
"loss": 0.2505915880203247,
|
| 312 |
+
"step": 420
|
| 313 |
+
},
|
| 314 |
+
{
|
| 315 |
+
"epoch": 0.4506156667539953,
|
| 316 |
+
"grad_norm": 0.10994207113981247,
|
| 317 |
+
"learning_rate": 9.62880456628612e-05,
|
| 318 |
+
"loss": 0.2078850269317627,
|
| 319 |
+
"step": 430
|
| 320 |
+
},
|
| 321 |
+
{
|
| 322 |
+
"epoch": 0.4610951008645533,
|
| 323 |
+
"grad_norm": 0.11983369290828705,
|
| 324 |
+
"learning_rate": 9.607137069565288e-05,
|
| 325 |
+
"loss": 0.21452484130859376,
|
| 326 |
+
"step": 440
|
| 327 |
+
},
|
| 328 |
+
{
|
| 329 |
+
"epoch": 0.47157453497511137,
|
| 330 |
+
"grad_norm": 0.12684305012226105,
|
| 331 |
+
"learning_rate": 9.58488079811703e-05,
|
| 332 |
+
"loss": 0.22002685070037842,
|
| 333 |
+
"step": 450
|
| 334 |
+
},
|
| 335 |
+
{
|
| 336 |
+
"epoch": 0.4820539690856694,
|
| 337 |
+
"grad_norm": 0.16841623187065125,
|
| 338 |
+
"learning_rate": 9.562038596208828e-05,
|
| 339 |
+
"loss": 0.21405396461486817,
|
| 340 |
+
"step": 460
|
| 341 |
+
},
|
| 342 |
+
{
|
| 343 |
+
"epoch": 0.4925334031962274,
|
| 344 |
+
"grad_norm": 0.1498555839061737,
|
| 345 |
+
"learning_rate": 9.538613382987865e-05,
|
| 346 |
+
"loss": 0.20534911155700683,
|
| 347 |
+
"step": 470
|
| 348 |
+
},
|
| 349 |
+
{
|
| 350 |
+
"epoch": 0.5030128373067855,
|
| 351 |
+
"grad_norm": 0.13913628458976746,
|
| 352 |
+
"learning_rate": 9.514608152107974e-05,
|
| 353 |
+
"loss": 0.22248730659484864,
|
| 354 |
+
"step": 480
|
| 355 |
+
},
|
| 356 |
+
{
|
| 357 |
+
"epoch": 0.5134922714173434,
|
| 358 |
+
"grad_norm": 0.14408951997756958,
|
| 359 |
+
"learning_rate": 9.490025971347047e-05,
|
| 360 |
+
"loss": 0.214866042137146,
|
| 361 |
+
"step": 490
|
| 362 |
+
},
|
| 363 |
+
{
|
| 364 |
+
"epoch": 0.5239717055279015,
|
| 365 |
+
"grad_norm": 0.1649770438671112,
|
| 366 |
+
"learning_rate": 9.464869982215001e-05,
|
| 367 |
+
"loss": 0.19965900182724,
|
| 368 |
+
"step": 500
|
| 369 |
+
},
|
| 370 |
+
{
|
| 371 |
+
"epoch": 0.5239717055279015,
|
| 372 |
+
"eval_loss": 0.19267401099205017,
|
| 373 |
+
"eval_runtime": 95.3374,
|
| 374 |
+
"eval_samples_per_second": 3.262,
|
| 375 |
+
"eval_steps_per_second": 3.262,
|
| 376 |
+
"step": 500
|
| 377 |
+
},
|
| 378 |
+
{
|
| 379 |
+
"epoch": 0.5344511396384595,
|
| 380 |
+
"grad_norm": 0.1305568665266037,
|
| 381 |
+
"learning_rate": 9.439143399552291e-05,
|
| 382 |
+
"loss": 0.21112546920776368,
|
| 383 |
+
"step": 510
|
| 384 |
+
},
|
| 385 |
+
{
|
| 386 |
+
"epoch": 0.5449305737490175,
|
| 387 |
+
"grad_norm": 0.11998175084590912,
|
| 388 |
+
"learning_rate": 9.412849511119074e-05,
|
| 389 |
+
"loss": 0.21422922611236572,
|
| 390 |
+
"step": 520
|
| 391 |
+
},
|
| 392 |
+
{
|
| 393 |
+
"epoch": 0.5554100078595756,
|
| 394 |
+
"grad_norm": 0.15220341086387634,
|
| 395 |
+
"learning_rate": 9.385991677175046e-05,
|
| 396 |
+
"loss": 0.20999882221221924,
|
| 397 |
+
"step": 530
|
| 398 |
+
},
|
| 399 |
+
{
|
| 400 |
+
"epoch": 0.5658894419701336,
|
| 401 |
+
"grad_norm": 0.13170023262500763,
|
| 402 |
+
"learning_rate": 9.358573330050004e-05,
|
| 403 |
+
"loss": 0.20208392143249512,
|
| 404 |
+
"step": 540
|
| 405 |
+
},
|
| 406 |
+
{
|
| 407 |
+
"epoch": 0.5763688760806917,
|
| 408 |
+
"grad_norm": 0.10457764565944672,
|
| 409 |
+
"learning_rate": 9.330597973705219e-05,
|
| 410 |
+
"loss": 0.1908803701400757,
|
| 411 |
+
"step": 550
|
| 412 |
+
},
|
| 413 |
+
{
|
| 414 |
+
"epoch": 0.5868483101912497,
|
| 415 |
+
"grad_norm": 0.12568537890911102,
|
| 416 |
+
"learning_rate": 9.302069183285637e-05,
|
| 417 |
+
"loss": 0.19316340684890748,
|
| 418 |
+
"step": 560
|
| 419 |
+
},
|
| 420 |
+
{
|
| 421 |
+
"epoch": 0.5973277443018077,
|
| 422 |
+
"grad_norm": 0.14824528992176056,
|
| 423 |
+
"learning_rate": 9.272990604662988e-05,
|
| 424 |
+
"loss": 0.18987581729888917,
|
| 425 |
+
"step": 570
|
| 426 |
+
},
|
| 427 |
+
{
|
| 428 |
+
"epoch": 0.6078071784123658,
|
| 429 |
+
"grad_norm": 0.14521734416484833,
|
| 430 |
+
"learning_rate": 9.243365953969861e-05,
|
| 431 |
+
"loss": 0.19232832193374633,
|
| 432 |
+
"step": 580
|
| 433 |
+
},
|
| 434 |
+
{
|
| 435 |
+
"epoch": 0.6182866125229237,
|
| 436 |
+
"grad_norm": 0.1335408091545105,
|
| 437 |
+
"learning_rate": 9.213199017124793e-05,
|
| 438 |
+
"loss": 0.1758212924003601,
|
| 439 |
+
"step": 590
|
| 440 |
+
},
|
| 441 |
+
{
|
| 442 |
+
"epoch": 0.6287660466334818,
|
| 443 |
+
"grad_norm": 0.11143071949481964,
|
| 444 |
+
"learning_rate": 9.182493649348447e-05,
|
| 445 |
+
"loss": 0.19117680788040162,
|
| 446 |
+
"step": 600
|
| 447 |
+
},
|
| 448 |
+
{
|
| 449 |
+
"epoch": 0.6392454807440399,
|
| 450 |
+
"grad_norm": 0.14789296686649323,
|
| 451 |
+
"learning_rate": 9.151253774670921e-05,
|
| 452 |
+
"loss": 0.184559965133667,
|
| 453 |
+
"step": 610
|
| 454 |
+
},
|
| 455 |
+
{
|
| 456 |
+
"epoch": 0.6497249148545978,
|
| 457 |
+
"grad_norm": 0.10541336238384247,
|
| 458 |
+
"learning_rate": 9.119483385430283e-05,
|
| 459 |
+
"loss": 0.1720304846763611,
|
| 460 |
+
"step": 620
|
| 461 |
+
},
|
| 462 |
+
{
|
| 463 |
+
"epoch": 0.6602043489651559,
|
| 464 |
+
"grad_norm": 0.12105975300073624,
|
| 465 |
+
"learning_rate": 9.087186541762358e-05,
|
| 466 |
+
"loss": 0.17654836177825928,
|
| 467 |
+
"step": 630
|
| 468 |
+
},
|
| 469 |
+
{
|
| 470 |
+
"epoch": 0.6706837830757139,
|
| 471 |
+
"grad_norm": 0.13114669919013977,
|
| 472 |
+
"learning_rate": 9.054367371081858e-05,
|
| 473 |
+
"loss": 0.1696592688560486,
|
| 474 |
+
"step": 640
|
| 475 |
+
},
|
| 476 |
+
{
|
| 477 |
+
"epoch": 0.6811632171862719,
|
| 478 |
+
"grad_norm": 0.13745592534542084,
|
| 479 |
+
"learning_rate": 9.021030067554919e-05,
|
| 480 |
+
"loss": 0.15404462814331055,
|
| 481 |
+
"step": 650
|
| 482 |
+
},
|
| 483 |
+
{
|
| 484 |
+
"epoch": 0.69164265129683,
|
| 485 |
+
"grad_norm": 0.15927442908287048,
|
| 486 |
+
"learning_rate": 8.987178891563094e-05,
|
| 487 |
+
"loss": 0.17024366855621337,
|
| 488 |
+
"step": 660
|
| 489 |
+
},
|
| 490 |
+
{
|
| 491 |
+
"epoch": 0.702122085407388,
|
| 492 |
+
"grad_norm": 0.13737429678440094,
|
| 493 |
+
"learning_rate": 8.952818169158903e-05,
|
| 494 |
+
"loss": 0.1602048397064209,
|
| 495 |
+
"step": 670
|
| 496 |
+
},
|
| 497 |
+
{
|
| 498 |
+
"epoch": 0.712601519517946,
|
| 499 |
+
"grad_norm": 0.13941751420497894,
|
| 500 |
+
"learning_rate": 8.91795229151297e-05,
|
| 501 |
+
"loss": 0.18057082891464232,
|
| 502 |
+
"step": 680
|
| 503 |
+
},
|
| 504 |
+
{
|
| 505 |
+
"epoch": 0.7230809536285041,
|
| 506 |
+
"grad_norm": 0.14242954552173615,
|
| 507 |
+
"learning_rate": 8.882585714352856e-05,
|
| 508 |
+
"loss": 0.14863334894180297,
|
| 509 |
+
"step": 690
|
| 510 |
+
},
|
| 511 |
+
{
|
| 512 |
+
"epoch": 0.7335603877390621,
|
| 513 |
+
"grad_norm": 0.15553542971611023,
|
| 514 |
+
"learning_rate": 8.846722957393626e-05,
|
| 515 |
+
"loss": 0.15701137781143187,
|
| 516 |
+
"step": 700
|
| 517 |
+
},
|
| 518 |
+
{
|
| 519 |
+
"epoch": 0.7440398218496201,
|
| 520 |
+
"grad_norm": 0.12901411950588226,
|
| 521 |
+
"learning_rate": 8.810368603760249e-05,
|
| 522 |
+
"loss": 0.15571318864822387,
|
| 523 |
+
"step": 710
|
| 524 |
+
},
|
| 525 |
+
{
|
| 526 |
+
"epoch": 0.7545192559601781,
|
| 527 |
+
"grad_norm": 0.13449430465698242,
|
| 528 |
+
"learning_rate": 8.773527299401902e-05,
|
| 529 |
+
"loss": 0.16418551206588744,
|
| 530 |
+
"step": 720
|
| 531 |
+
},
|
| 532 |
+
{
|
| 533 |
+
"epoch": 0.7649986900707362,
|
| 534 |
+
"grad_norm": 0.10630270838737488,
|
| 535 |
+
"learning_rate": 8.736203752498218e-05,
|
| 536 |
+
"loss": 0.16800801753997802,
|
| 537 |
+
"step": 730
|
| 538 |
+
},
|
| 539 |
+
{
|
| 540 |
+
"epoch": 0.7754781241812942,
|
| 541 |
+
"grad_norm": 0.11299935728311539,
|
| 542 |
+
"learning_rate": 8.698402732857611e-05,
|
| 543 |
+
"loss": 0.15700833797454833,
|
| 544 |
+
"step": 740
|
| 545 |
+
},
|
| 546 |
+
{
|
| 547 |
+
"epoch": 0.7859575582918522,
|
| 548 |
+
"grad_norm": 0.11920930445194244,
|
| 549 |
+
"learning_rate": 8.660129071307707e-05,
|
| 550 |
+
"loss": 0.15091001987457275,
|
| 551 |
+
"step": 750
|
| 552 |
+
},
|
| 553 |
+
{
|
| 554 |
+
"epoch": 0.7859575582918522,
|
| 555 |
+
"eval_loss": 0.1356429010629654,
|
| 556 |
+
"eval_runtime": 94.0557,
|
| 557 |
+
"eval_samples_per_second": 3.307,
|
| 558 |
+
"eval_steps_per_second": 3.307,
|
| 559 |
+
"step": 750
|
| 560 |
+
},
|
| 561 |
+
{
|
| 562 |
+
"epoch": 0.7964369924024103,
|
| 563 |
+
"grad_norm": 0.13870343565940857,
|
| 564 |
+
"learning_rate": 8.621387659077986e-05,
|
| 565 |
+
"loss": 0.1422027826309204,
|
| 566 |
+
"step": 760
|
| 567 |
+
},
|
| 568 |
+
{
|
| 569 |
+
"epoch": 0.8069164265129684,
|
| 570 |
+
"grad_norm": 0.12753477692604065,
|
| 571 |
+
"learning_rate": 8.582183447174697e-05,
|
| 572 |
+
"loss": 0.142450213432312,
|
| 573 |
+
"step": 770
|
| 574 |
+
},
|
| 575 |
+
{
|
| 576 |
+
"epoch": 0.8173958606235263,
|
| 577 |
+
"grad_norm": 0.11877496540546417,
|
| 578 |
+
"learning_rate": 8.542521445748141e-05,
|
| 579 |
+
"loss": 0.15361062288284302,
|
| 580 |
+
"step": 780
|
| 581 |
+
},
|
| 582 |
+
{
|
| 583 |
+
"epoch": 0.8278752947340844,
|
| 584 |
+
"grad_norm": 0.1200249195098877,
|
| 585 |
+
"learning_rate": 8.502406723452392e-05,
|
| 586 |
+
"loss": 0.14647477865219116,
|
| 587 |
+
"step": 790
|
| 588 |
+
},
|
| 589 |
+
{
|
| 590 |
+
"epoch": 0.8383547288446423,
|
| 591 |
+
"grad_norm": 0.12913794815540314,
|
| 592 |
+
"learning_rate": 8.461844406797543e-05,
|
| 593 |
+
"loss": 0.1591552734375,
|
| 594 |
+
"step": 800
|
| 595 |
+
},
|
| 596 |
+
{
|
| 597 |
+
"epoch": 0.8488341629552004,
|
| 598 |
+
"grad_norm": 0.17270176112651825,
|
| 599 |
+
"learning_rate": 8.420839679494558e-05,
|
| 600 |
+
"loss": 0.1495436668395996,
|
| 601 |
+
"step": 810
|
| 602 |
+
},
|
| 603 |
+
{
|
| 604 |
+
"epoch": 0.8593135970657585,
|
| 605 |
+
"grad_norm": 0.15545596182346344,
|
| 606 |
+
"learning_rate": 8.379397781792808e-05,
|
| 607 |
+
"loss": 0.15377395153045653,
|
| 608 |
+
"step": 820
|
| 609 |
+
},
|
| 610 |
+
{
|
| 611 |
+
"epoch": 0.8697930311763165,
|
| 612 |
+
"grad_norm": 0.12941111624240875,
|
| 613 |
+
"learning_rate": 8.337524009810395e-05,
|
| 614 |
+
"loss": 0.14733861684799193,
|
| 615 |
+
"step": 830
|
| 616 |
+
},
|
| 617 |
+
{
|
| 618 |
+
"epoch": 0.8802724652868745,
|
| 619 |
+
"grad_norm": 0.13152749836444855,
|
| 620 |
+
"learning_rate": 8.295223714857319e-05,
|
| 621 |
+
"loss": 0.13980752229690552,
|
| 622 |
+
"step": 840
|
| 623 |
+
},
|
| 624 |
+
{
|
| 625 |
+
"epoch": 0.8907518993974325,
|
| 626 |
+
"grad_norm": 0.11208872497081757,
|
| 627 |
+
"learning_rate": 8.252502302751612e-05,
|
| 628 |
+
"loss": 0.12019969224929809,
|
| 629 |
+
"step": 850
|
| 630 |
+
},
|
| 631 |
+
{
|
| 632 |
+
"epoch": 0.9012313335079906,
|
| 633 |
+
"grad_norm": 0.11118603497743607,
|
| 634 |
+
"learning_rate": 8.209365233128482e-05,
|
| 635 |
+
"loss": 0.13822466135025024,
|
| 636 |
+
"step": 860
|
| 637 |
+
},
|
| 638 |
+
{
|
| 639 |
+
"epoch": 0.9117107676185486,
|
| 640 |
+
"grad_norm": 0.11705653369426727,
|
| 641 |
+
"learning_rate": 8.165818018742605e-05,
|
| 642 |
+
"loss": 0.1439664840698242,
|
| 643 |
+
"step": 870
|
| 644 |
+
},
|
| 645 |
+
{
|
| 646 |
+
"epoch": 0.9221902017291066,
|
| 647 |
+
"grad_norm": 0.08817730098962784,
|
| 648 |
+
"learning_rate": 8.121866224763606e-05,
|
| 649 |
+
"loss": 0.13380355834960939,
|
| 650 |
+
"step": 880
|
| 651 |
+
},
|
| 652 |
+
{
|
| 653 |
+
"epoch": 0.9326696358396647,
|
| 654 |
+
"grad_norm": 0.1092257872223854,
|
| 655 |
+
"learning_rate": 8.077515468064851e-05,
|
| 656 |
+
"loss": 0.12982802391052245,
|
| 657 |
+
"step": 890
|
| 658 |
+
},
|
| 659 |
+
{
|
| 660 |
+
"epoch": 0.9431490699502227,
|
| 661 |
+
"grad_norm": 0.12680962681770325,
|
| 662 |
+
"learning_rate": 8.032771416505647e-05,
|
| 663 |
+
"loss": 0.1489071011543274,
|
| 664 |
+
"step": 900
|
| 665 |
+
},
|
| 666 |
+
{
|
| 667 |
+
"epoch": 0.9536285040607807,
|
| 668 |
+
"grad_norm": 0.11953219771385193,
|
| 669 |
+
"learning_rate": 7.987639788206888e-05,
|
| 670 |
+
"loss": 0.14020267724990845,
|
| 671 |
+
"step": 910
|
| 672 |
+
},
|
| 673 |
+
{
|
| 674 |
+
"epoch": 0.9641079381713388,
|
| 675 |
+
"grad_norm": 0.1041467934846878,
|
| 676 |
+
"learning_rate": 7.942126350820318e-05,
|
| 677 |
+
"loss": 0.1439213275909424,
|
| 678 |
+
"step": 920
|
| 679 |
+
},
|
| 680 |
+
{
|
| 681 |
+
"epoch": 0.9745873722818967,
|
| 682 |
+
"grad_norm": 0.1277916431427002,
|
| 683 |
+
"learning_rate": 7.896236920791442e-05,
|
| 684 |
+
"loss": 0.1468779683113098,
|
| 685 |
+
"step": 930
|
| 686 |
+
},
|
| 687 |
+
{
|
| 688 |
+
"epoch": 0.9850668063924548,
|
| 689 |
+
"grad_norm": 0.11245205253362656,
|
| 690 |
+
"learning_rate": 7.849977362616201e-05,
|
| 691 |
+
"loss": 0.12012372016906739,
|
| 692 |
+
"step": 940
|
| 693 |
+
},
|
| 694 |
+
{
|
| 695 |
+
"epoch": 0.9955462405030129,
|
| 696 |
+
"grad_norm": 0.12230483442544937,
|
| 697 |
+
"learning_rate": 7.803353588091522e-05,
|
| 698 |
+
"loss": 0.1488939881324768,
|
| 699 |
+
"step": 950
|
| 700 |
+
},
|
| 701 |
+
{
|
| 702 |
+
"epoch": 1.005239717055279,
|
| 703 |
+
"grad_norm": 0.14185865223407745,
|
| 704 |
+
"learning_rate": 7.7563715555598e-05,
|
| 705 |
+
"loss": 0.11488113403320313,
|
| 706 |
+
"step": 960
|
| 707 |
+
},
|
| 708 |
+
{
|
| 709 |
+
"epoch": 1.015719151165837,
|
| 710 |
+
"grad_norm": 0.10545773804187775,
|
| 711 |
+
"learning_rate": 7.709037269147459e-05,
|
| 712 |
+
"loss": 0.10712549686431885,
|
| 713 |
+
"step": 970
|
| 714 |
+
},
|
| 715 |
+
{
|
| 716 |
+
"epoch": 1.026198585276395,
|
| 717 |
+
"grad_norm": 0.10376274585723877,
|
| 718 |
+
"learning_rate": 7.661356777997631e-05,
|
| 719 |
+
"loss": 0.11428828239440918,
|
| 720 |
+
"step": 980
|
| 721 |
+
},
|
| 722 |
+
{
|
| 723 |
+
"epoch": 1.0366780193869531,
|
| 724 |
+
"grad_norm": 0.09950564056634903,
|
| 725 |
+
"learning_rate": 7.613336175497111e-05,
|
| 726 |
+
"loss": 0.09823058247566223,
|
| 727 |
+
"step": 990
|
| 728 |
+
},
|
| 729 |
+
{
|
| 730 |
+
"epoch": 1.0471574534975112,
|
| 731 |
+
"grad_norm": 0.10412753373384476,
|
| 732 |
+
"learning_rate": 7.564981598497643e-05,
|
| 733 |
+
"loss": 0.1106558084487915,
|
| 734 |
+
"step": 1000
|
| 735 |
+
},
|
| 736 |
+
{
|
| 737 |
+
"epoch": 1.0471574534975112,
|
| 738 |
+
"eval_loss": 0.11185819655656815,
|
| 739 |
+
"eval_runtime": 93.808,
|
| 740 |
+
"eval_samples_per_second": 3.315,
|
| 741 |
+
"eval_steps_per_second": 3.315,
|
| 742 |
+
"step": 1000
|
| 743 |
+
},
|
| 744 |
+
{
|
| 745 |
+
"epoch": 1.057636887608069,
|
| 746 |
+
"grad_norm": 0.10430868715047836,
|
| 747 |
+
"learning_rate": 7.516299226531645e-05,
|
| 748 |
+
"loss": 0.11168640851974487,
|
| 749 |
+
"step": 1010
|
| 750 |
+
},
|
| 751 |
+
{
|
| 752 |
+
"epoch": 1.0681163217186271,
|
| 753 |
+
"grad_norm": 0.09646806865930557,
|
| 754 |
+
"learning_rate": 7.467295281022501e-05,
|
| 755 |
+
"loss": 0.10711305141448975,
|
| 756 |
+
"step": 1020
|
| 757 |
+
},
|
| 758 |
+
{
|
| 759 |
+
"epoch": 1.0785957558291852,
|
| 760 |
+
"grad_norm": 0.13060614466667175,
|
| 761 |
+
"learning_rate": 7.417976024489474e-05,
|
| 762 |
+
"loss": 0.10001810789108276,
|
| 763 |
+
"step": 1030
|
| 764 |
+
},
|
| 765 |
+
{
|
| 766 |
+
"epoch": 1.0890751899397433,
|
| 767 |
+
"grad_norm": 0.10389085114002228,
|
| 768 |
+
"learning_rate": 7.368347759747393e-05,
|
| 769 |
+
"loss": 0.11893858909606933,
|
| 770 |
+
"step": 1040
|
| 771 |
+
},
|
| 772 |
+
{
|
| 773 |
+
"epoch": 1.0995546240503014,
|
| 774 |
+
"grad_norm": 0.11291550099849701,
|
| 775 |
+
"learning_rate": 7.318416829101164e-05,
|
| 776 |
+
"loss": 0.1079628586769104,
|
| 777 |
+
"step": 1050
|
| 778 |
+
},
|
| 779 |
+
{
|
| 780 |
+
"epoch": 1.1100340581608594,
|
| 781 |
+
"grad_norm": 0.10372598469257355,
|
| 782 |
+
"learning_rate": 7.268189613535255e-05,
|
| 783 |
+
"loss": 0.10332397222518921,
|
| 784 |
+
"step": 1060
|
| 785 |
+
},
|
| 786 |
+
{
|
| 787 |
+
"epoch": 1.1205134922714173,
|
| 788 |
+
"grad_norm": 0.12971536815166473,
|
| 789 |
+
"learning_rate": 7.217672531898225e-05,
|
| 790 |
+
"loss": 0.10804877281188965,
|
| 791 |
+
"step": 1070
|
| 792 |
+
},
|
| 793 |
+
{
|
| 794 |
+
"epoch": 1.1309929263819753,
|
| 795 |
+
"grad_norm": 0.10902425646781921,
|
| 796 |
+
"learning_rate": 7.166872040082431e-05,
|
| 797 |
+
"loss": 0.09947454929351807,
|
| 798 |
+
"step": 1080
|
| 799 |
+
},
|
| 800 |
+
{
|
| 801 |
+
"epoch": 1.1414723604925334,
|
| 802 |
+
"grad_norm": 0.09305932372808456,
|
| 803 |
+
"learning_rate": 7.11579463019897e-05,
|
| 804 |
+
"loss": 0.09406971335411071,
|
| 805 |
+
"step": 1090
|
| 806 |
+
},
|
| 807 |
+
{
|
| 808 |
+
"epoch": 1.1519517946030915,
|
| 809 |
+
"grad_norm": 0.11485275626182556,
|
| 810 |
+
"learning_rate": 7.064446829748034e-05,
|
| 811 |
+
"loss": 0.09943979978561401,
|
| 812 |
+
"step": 1100
|
| 813 |
+
},
|
| 814 |
+
{
|
| 815 |
+
"epoch": 1.1624312287136496,
|
| 816 |
+
"grad_norm": 0.09556467831134796,
|
| 817 |
+
"learning_rate": 7.0128352007847e-05,
|
| 818 |
+
"loss": 0.10862170457839966,
|
| 819 |
+
"step": 1110
|
| 820 |
+
},
|
| 821 |
+
{
|
| 822 |
+
"epoch": 1.1729106628242074,
|
| 823 |
+
"grad_norm": 0.11937833577394485,
|
| 824 |
+
"learning_rate": 6.96096633908034e-05,
|
| 825 |
+
"loss": 0.10385221242904663,
|
| 826 |
+
"step": 1120
|
| 827 |
+
},
|
| 828 |
+
{
|
| 829 |
+
"epoch": 1.1833900969347655,
|
| 830 |
+
"grad_norm": 0.11560507863759995,
|
| 831 |
+
"learning_rate": 6.908846873279691e-05,
|
| 832 |
+
"loss": 0.09252402186393738,
|
| 833 |
+
"step": 1130
|
| 834 |
+
},
|
| 835 |
+
{
|
| 836 |
+
"epoch": 1.1938695310453236,
|
| 837 |
+
"grad_norm": 0.11119654029607773,
|
| 838 |
+
"learning_rate": 6.856483464053758e-05,
|
| 839 |
+
"loss": 0.09637172818183899,
|
| 840 |
+
"step": 1140
|
| 841 |
+
},
|
| 842 |
+
{
|
| 843 |
+
"epoch": 1.2043489651558816,
|
| 844 |
+
"grad_norm": 0.11722644418478012,
|
| 845 |
+
"learning_rate": 6.803882803248585e-05,
|
| 846 |
+
"loss": 0.09078751802444458,
|
| 847 |
+
"step": 1150
|
| 848 |
+
},
|
| 849 |
+
{
|
| 850 |
+
"epoch": 1.2148283992664397,
|
| 851 |
+
"grad_norm": 0.10487739741802216,
|
| 852 |
+
"learning_rate": 6.751051613030082e-05,
|
| 853 |
+
"loss": 0.10334972143173218,
|
| 854 |
+
"step": 1160
|
| 855 |
+
},
|
| 856 |
+
{
|
| 857 |
+
"epoch": 1.2253078333769976,
|
| 858 |
+
"grad_norm": 0.10202383995056152,
|
| 859 |
+
"learning_rate": 6.697996645024937e-05,
|
| 860 |
+
"loss": 0.08661433458328247,
|
| 861 |
+
"step": 1170
|
| 862 |
+
},
|
| 863 |
+
{
|
| 864 |
+
"epoch": 1.2357872674875556,
|
| 865 |
+
"grad_norm": 0.11801143735647202,
|
| 866 |
+
"learning_rate": 6.644724679457804e-05,
|
| 867 |
+
"loss": 0.0997927188873291,
|
| 868 |
+
"step": 1180
|
| 869 |
+
},
|
| 870 |
+
{
|
| 871 |
+
"epoch": 1.2462667015981137,
|
| 872 |
+
"grad_norm": 0.10949107259511948,
|
| 873 |
+
"learning_rate": 6.591242524284802e-05,
|
| 874 |
+
"loss": 0.0977592945098877,
|
| 875 |
+
"step": 1190
|
| 876 |
+
},
|
| 877 |
+
{
|
| 878 |
+
"epoch": 1.2567461357086718,
|
| 879 |
+
"grad_norm": 0.10221222043037415,
|
| 880 |
+
"learning_rate": 6.537557014323487e-05,
|
| 881 |
+
"loss": 0.0970361053943634,
|
| 882 |
+
"step": 1200
|
| 883 |
+
},
|
| 884 |
+
{
|
| 885 |
+
"epoch": 1.2672255698192298,
|
| 886 |
+
"grad_norm": 0.10554748773574829,
|
| 887 |
+
"learning_rate": 6.483675010379393e-05,
|
| 888 |
+
"loss": 0.09007551074028015,
|
| 889 |
+
"step": 1210
|
| 890 |
+
},
|
| 891 |
+
{
|
| 892 |
+
"epoch": 1.2777050039297877,
|
| 893 |
+
"grad_norm": 0.11625627428293228,
|
| 894 |
+
"learning_rate": 6.429603398369242e-05,
|
| 895 |
+
"loss": 0.08734490275382996,
|
| 896 |
+
"step": 1220
|
| 897 |
+
},
|
| 898 |
+
{
|
| 899 |
+
"epoch": 1.2881844380403458,
|
| 900 |
+
"grad_norm": 0.10624277591705322,
|
| 901 |
+
"learning_rate": 6.37534908844095e-05,
|
| 902 |
+
"loss": 0.09858485460281372,
|
| 903 |
+
"step": 1230
|
| 904 |
+
},
|
| 905 |
+
{
|
| 906 |
+
"epoch": 1.2986638721509038,
|
| 907 |
+
"grad_norm": 0.10184557735919952,
|
| 908 |
+
"learning_rate": 6.320919014090534e-05,
|
| 909 |
+
"loss": 0.09335023164749146,
|
| 910 |
+
"step": 1240
|
| 911 |
+
},
|
| 912 |
+
{
|
| 913 |
+
"epoch": 1.309143306261462,
|
| 914 |
+
"grad_norm": 0.10787283629179001,
|
| 915 |
+
"learning_rate": 6.266320131276051e-05,
|
| 916 |
+
"loss": 0.08665563464164734,
|
| 917 |
+
"step": 1250
|
| 918 |
+
},
|
| 919 |
+
{
|
| 920 |
+
"epoch": 1.309143306261462,
|
| 921 |
+
"eval_loss": 0.08951585739850998,
|
| 922 |
+
"eval_runtime": 94.0567,
|
| 923 |
+
"eval_samples_per_second": 3.307,
|
| 924 |
+
"eval_steps_per_second": 3.307,
|
| 925 |
+
"step": 1250
|
| 926 |
+
},
|
| 927 |
+
{
|
| 928 |
+
"epoch": 1.31962274037202,
|
| 929 |
+
"grad_norm": 0.10836981981992722,
|
| 930 |
+
"learning_rate": 6.211559417528631e-05,
|
| 931 |
+
"loss": 0.0933380126953125,
|
| 932 |
+
"step": 1260
|
| 933 |
+
},
|
| 934 |
+
{
|
| 935 |
+
"epoch": 1.3301021744825778,
|
| 936 |
+
"grad_norm": 0.1397171914577484,
|
| 937 |
+
"learning_rate": 6.156643871060795e-05,
|
| 938 |
+
"loss": 0.09835371971130372,
|
| 939 |
+
"step": 1270
|
| 940 |
+
},
|
| 941 |
+
{
|
| 942 |
+
"epoch": 1.340581608593136,
|
| 943 |
+
"grad_norm": 0.11242218315601349,
|
| 944 |
+
"learning_rate": 6.101580509872097e-05,
|
| 945 |
+
"loss": 0.09398673176765442,
|
| 946 |
+
"step": 1280
|
| 947 |
+
},
|
| 948 |
+
{
|
| 949 |
+
"epoch": 1.351061042703694,
|
| 950 |
+
"grad_norm": 0.10235017538070679,
|
| 951 |
+
"learning_rate": 6.0463763708522536e-05,
|
| 952 |
+
"loss": 0.10350929498672486,
|
| 953 |
+
"step": 1290
|
| 954 |
+
},
|
| 955 |
+
{
|
| 956 |
+
"epoch": 1.361540476814252,
|
| 957 |
+
"grad_norm": 0.09327106177806854,
|
| 958 |
+
"learning_rate": 5.99103850888186e-05,
|
| 959 |
+
"loss": 0.09580238461494446,
|
| 960 |
+
"step": 1300
|
| 961 |
+
},
|
| 962 |
+
{
|
| 963 |
+
"epoch": 1.3720199109248101,
|
| 964 |
+
"grad_norm": 0.12995658814907074,
|
| 965 |
+
"learning_rate": 5.9355739959307976e-05,
|
| 966 |
+
"loss": 0.08437412977218628,
|
| 967 |
+
"step": 1310
|
| 968 |
+
},
|
| 969 |
+
{
|
| 970 |
+
"epoch": 1.382499345035368,
|
| 971 |
+
"grad_norm": 0.11962983757257462,
|
| 972 |
+
"learning_rate": 5.879989920154466e-05,
|
| 973 |
+
"loss": 0.08409937620162963,
|
| 974 |
+
"step": 1320
|
| 975 |
+
},
|
| 976 |
+
{
|
| 977 |
+
"epoch": 1.392978779145926,
|
| 978 |
+
"grad_norm": 0.09431737661361694,
|
| 979 |
+
"learning_rate": 5.824293384987941e-05,
|
| 980 |
+
"loss": 0.09504773020744324,
|
| 981 |
+
"step": 1330
|
| 982 |
+
},
|
| 983 |
+
{
|
| 984 |
+
"epoch": 1.4034582132564841,
|
| 985 |
+
"grad_norm": 0.13824374973773956,
|
| 986 |
+
"learning_rate": 5.768491508238188e-05,
|
| 987 |
+
"loss": 0.09193333983421326,
|
| 988 |
+
"step": 1340
|
| 989 |
+
},
|
| 990 |
+
{
|
| 991 |
+
"epoch": 1.4139376473670422,
|
| 992 |
+
"grad_norm": 0.10595858097076416,
|
| 993 |
+
"learning_rate": 5.712591421174422e-05,
|
| 994 |
+
"loss": 0.08976472616195678,
|
| 995 |
+
"step": 1350
|
| 996 |
+
},
|
| 997 |
+
{
|
| 998 |
+
"epoch": 1.4244170814776003,
|
| 999 |
+
"grad_norm": 0.09911809861660004,
|
| 1000 |
+
"learning_rate": 5.6566002676167725e-05,
|
| 1001 |
+
"loss": 0.07597061395645141,
|
| 1002 |
+
"step": 1360
|
| 1003 |
+
},
|
| 1004 |
+
{
|
| 1005 |
+
"epoch": 1.4348965155881581,
|
| 1006 |
+
"grad_norm": 0.09723466634750366,
|
| 1007 |
+
"learning_rate": 5.60052520302332e-05,
|
| 1008 |
+
"loss": 0.10513757467269898,
|
| 1009 |
+
"step": 1370
|
| 1010 |
+
},
|
| 1011 |
+
{
|
| 1012 |
+
"epoch": 1.4453759496987162,
|
| 1013 |
+
"grad_norm": 0.11331687867641449,
|
| 1014 |
+
"learning_rate": 5.5443733935756615e-05,
|
| 1015 |
+
"loss": 0.09019948840141297,
|
| 1016 |
+
"step": 1380
|
| 1017 |
+
},
|
| 1018 |
+
{
|
| 1019 |
+
"epoch": 1.4558553838092743,
|
| 1020 |
+
"grad_norm": 0.13363589346408844,
|
| 1021 |
+
"learning_rate": 5.4881520152630886e-05,
|
| 1022 |
+
"loss": 0.08314153552055359,
|
| 1023 |
+
"step": 1390
|
| 1024 |
+
},
|
| 1025 |
+
{
|
| 1026 |
+
"epoch": 1.4663348179198323,
|
| 1027 |
+
"grad_norm": 0.14111892879009247,
|
| 1028 |
+
"learning_rate": 5.4318682529655404e-05,
|
| 1029 |
+
"loss": 0.07892010807991028,
|
| 1030 |
+
"step": 1400
|
| 1031 |
+
},
|
| 1032 |
+
{
|
| 1033 |
+
"epoch": 1.4768142520303904,
|
| 1034 |
+
"grad_norm": 0.13948485255241394,
|
| 1035 |
+
"learning_rate": 5.3755292995353913e-05,
|
| 1036 |
+
"loss": 0.0840128481388092,
|
| 1037 |
+
"step": 1410
|
| 1038 |
+
},
|
| 1039 |
+
{
|
| 1040 |
+
"epoch": 1.4872936861409483,
|
| 1041 |
+
"grad_norm": 0.12535949051380157,
|
| 1042 |
+
"learning_rate": 5.31914235487823e-05,
|
| 1043 |
+
"loss": 0.07869629859924317,
|
| 1044 |
+
"step": 1420
|
| 1045 |
+
},
|
| 1046 |
+
{
|
| 1047 |
+
"epoch": 1.4977731202515066,
|
| 1048 |
+
"grad_norm": 0.10041694343090057,
|
| 1049 |
+
"learning_rate": 5.2627146250327484e-05,
|
| 1050 |
+
"loss": 0.08074848055839538,
|
| 1051 |
+
"step": 1430
|
| 1052 |
+
},
|
| 1053 |
+
{
|
| 1054 |
+
"epoch": 1.5082525543620644,
|
| 1055 |
+
"grad_norm": 0.10112891346216202,
|
| 1056 |
+
"learning_rate": 5.2062533212498275e-05,
|
| 1057 |
+
"loss": 0.0860810935497284,
|
| 1058 |
+
"step": 1440
|
| 1059 |
+
},
|
| 1060 |
+
{
|
| 1061 |
+
"epoch": 1.5187319884726225,
|
| 1062 |
+
"grad_norm": 0.11297477036714554,
|
| 1063 |
+
"learning_rate": 5.149765659070973e-05,
|
| 1064 |
+
"loss": 0.08794642686843872,
|
| 1065 |
+
"step": 1450
|
| 1066 |
+
},
|
| 1067 |
+
{
|
| 1068 |
+
"epoch": 1.5292114225831805,
|
| 1069 |
+
"grad_norm": 0.10511091351509094,
|
| 1070 |
+
"learning_rate": 5.0932588574061945e-05,
|
| 1071 |
+
"loss": 0.07854819297790527,
|
| 1072 |
+
"step": 1460
|
| 1073 |
+
},
|
| 1074 |
+
{
|
| 1075 |
+
"epoch": 1.5396908566937384,
|
| 1076 |
+
"grad_norm": 0.09333530068397522,
|
| 1077 |
+
"learning_rate": 5.036740137611453e-05,
|
| 1078 |
+
"loss": 0.08821435570716858,
|
| 1079 |
+
"step": 1470
|
| 1080 |
+
},
|
| 1081 |
+
{
|
| 1082 |
+
"epoch": 1.5501702908042967,
|
| 1083 |
+
"grad_norm": 0.11480343341827393,
|
| 1084 |
+
"learning_rate": 4.980216722565804e-05,
|
| 1085 |
+
"loss": 0.08062278628349304,
|
| 1086 |
+
"step": 1480
|
| 1087 |
+
},
|
| 1088 |
+
{
|
| 1089 |
+
"epoch": 1.5606497249148545,
|
| 1090 |
+
"grad_norm": 0.08406255394220352,
|
| 1091 |
+
"learning_rate": 4.923695835748338e-05,
|
| 1092 |
+
"loss": 0.0940588355064392,
|
| 1093 |
+
"step": 1490
|
| 1094 |
+
},
|
| 1095 |
+
{
|
| 1096 |
+
"epoch": 1.5711291590254126,
|
| 1097 |
+
"grad_norm": 0.12927693128585815,
|
| 1098 |
+
"learning_rate": 4.8671847003150447e-05,
|
| 1099 |
+
"loss": 0.0775177538394928,
|
| 1100 |
+
"step": 1500
|
| 1101 |
+
},
|
| 1102 |
+
{
|
| 1103 |
+
"epoch": 1.5711291590254126,
|
| 1104 |
+
"eval_loss": 0.07877222448587418,
|
| 1105 |
+
"eval_runtime": 34.4389,
|
| 1106 |
+
"eval_samples_per_second": 9.03,
|
| 1107 |
+
"eval_steps_per_second": 9.03,
|
| 1108 |
+
"step": 1500
|
| 1109 |
+
},
|
| 1110 |
+
{
|
| 1111 |
+
"epoch": 1.5816085931359707,
|
| 1112 |
+
"grad_norm": 0.1255076378583908,
|
| 1113 |
+
"learning_rate": 4.810690538175728e-05,
|
| 1114 |
+
"loss": 0.09362970590591431,
|
| 1115 |
+
"step": 1510
|
| 1116 |
+
},
|
| 1117 |
+
{
|
| 1118 |
+
"epoch": 1.5920880272465285,
|
| 1119 |
+
"grad_norm": 0.1326853185892105,
|
| 1120 |
+
"learning_rate": 4.754220569071068e-05,
|
| 1121 |
+
"loss": 0.08364834189414978,
|
| 1122 |
+
"step": 1520
|
| 1123 |
+
},
|
| 1124 |
+
{
|
| 1125 |
+
"epoch": 1.6025674613570868,
|
| 1126 |
+
"grad_norm": 0.10229979455471039,
|
| 1127 |
+
"learning_rate": 4.697782009649962e-05,
|
| 1128 |
+
"loss": 0.0725843846797943,
|
| 1129 |
+
"step": 1530
|
| 1130 |
+
},
|
| 1131 |
+
{
|
| 1132 |
+
"epoch": 1.6130468954676447,
|
| 1133 |
+
"grad_norm": 0.11407258361577988,
|
| 1134 |
+
"learning_rate": 4.641382072547272e-05,
|
| 1135 |
+
"loss": 0.07566151022911072,
|
| 1136 |
+
"step": 1540
|
| 1137 |
+
},
|
| 1138 |
+
{
|
| 1139 |
+
"epoch": 1.6235263295782028,
|
| 1140 |
+
"grad_norm": 0.09398165345191956,
|
| 1141 |
+
"learning_rate": 4.585027965462075e-05,
|
| 1142 |
+
"loss": 0.087736576795578,
|
| 1143 |
+
"step": 1550
|
| 1144 |
+
},
|
| 1145 |
+
{
|
| 1146 |
+
"epoch": 1.6340057636887608,
|
| 1147 |
+
"grad_norm": 0.11289424449205399,
|
| 1148 |
+
"learning_rate": 4.528726890236544e-05,
|
| 1149 |
+
"loss": 0.08366051316261292,
|
| 1150 |
+
"step": 1560
|
| 1151 |
+
},
|
| 1152 |
+
{
|
| 1153 |
+
"epoch": 1.6444851977993187,
|
| 1154 |
+
"grad_norm": 0.09478718787431717,
|
| 1155 |
+
"learning_rate": 4.4724860419355746e-05,
|
| 1156 |
+
"loss": 0.0885531723499298,
|
| 1157 |
+
"step": 1570
|
| 1158 |
+
},
|
| 1159 |
+
{
|
| 1160 |
+
"epoch": 1.654964631909877,
|
| 1161 |
+
"grad_norm": 0.09163404256105423,
|
| 1162 |
+
"learning_rate": 4.416312607927295e-05,
|
| 1163 |
+
"loss": 0.08392030596733094,
|
| 1164 |
+
"step": 1580
|
| 1165 |
+
},
|
| 1166 |
+
{
|
| 1167 |
+
"epoch": 1.6654440660204348,
|
| 1168 |
+
"grad_norm": 0.11422222852706909,
|
| 1169 |
+
"learning_rate": 4.360213766964542e-05,
|
| 1170 |
+
"loss": 0.08059985041618348,
|
| 1171 |
+
"step": 1590
|
| 1172 |
+
},
|
| 1173 |
+
{
|
| 1174 |
+
"epoch": 1.675923500130993,
|
| 1175 |
+
"grad_norm": 0.08131479471921921,
|
| 1176 |
+
"learning_rate": 4.304196688267438e-05,
|
| 1177 |
+
"loss": 0.07613803148269653,
|
| 1178 |
+
"step": 1600
|
| 1179 |
+
},
|
| 1180 |
+
{
|
| 1181 |
+
"epoch": 1.686402934241551,
|
| 1182 |
+
"grad_norm": 0.09615079313516617,
|
| 1183 |
+
"learning_rate": 4.248268530607199e-05,
|
| 1184 |
+
"loss": 0.07764078378677368,
|
| 1185 |
+
"step": 1610
|
| 1186 |
+
},
|
| 1187 |
+
{
|
| 1188 |
+
"epoch": 1.696882368352109,
|
| 1189 |
+
"grad_norm": 0.09730526059865952,
|
| 1190 |
+
"learning_rate": 4.192436441391271e-05,
|
| 1191 |
+
"loss": 0.07644452452659607,
|
| 1192 |
+
"step": 1620
|
| 1193 |
+
},
|
| 1194 |
+
{
|
| 1195 |
+
"epoch": 1.707361802462667,
|
| 1196 |
+
"grad_norm": 0.09649327397346497,
|
| 1197 |
+
"learning_rate": 4.136707555749907e-05,
|
| 1198 |
+
"loss": 0.07866159081459045,
|
| 1199 |
+
"step": 1630
|
| 1200 |
+
},
|
| 1201 |
+
{
|
| 1202 |
+
"epoch": 1.717841236573225,
|
| 1203 |
+
"grad_norm": 0.11804413050413132,
|
| 1204 |
+
"learning_rate": 4.0810889956243415e-05,
|
| 1205 |
+
"loss": 0.06996130347251892,
|
| 1206 |
+
"step": 1640
|
| 1207 |
+
},
|
| 1208 |
+
{
|
| 1209 |
+
"epoch": 1.728320670683783,
|
| 1210 |
+
"grad_norm": 0.09874672442674637,
|
| 1211 |
+
"learning_rate": 4.025587868856622e-05,
|
| 1212 |
+
"loss": 0.07877404093742371,
|
| 1213 |
+
"step": 1650
|
| 1214 |
+
},
|
| 1215 |
+
{
|
| 1216 |
+
"epoch": 1.738800104794341,
|
| 1217 |
+
"grad_norm": 0.11149467527866364,
|
| 1218 |
+
"learning_rate": 3.9702112682812544e-05,
|
| 1219 |
+
"loss": 0.07241421341896057,
|
| 1220 |
+
"step": 1660
|
| 1221 |
+
},
|
| 1222 |
+
{
|
| 1223 |
+
"epoch": 1.7492795389048992,
|
| 1224 |
+
"grad_norm": 0.08748896420001984,
|
| 1225 |
+
"learning_rate": 3.914966270818766e-05,
|
| 1226 |
+
"loss": 0.07336459755897522,
|
| 1227 |
+
"step": 1670
|
| 1228 |
+
},
|
| 1229 |
+
{
|
| 1230 |
+
"epoch": 1.7597589730154573,
|
| 1231 |
+
"grad_norm": 0.1172696202993393,
|
| 1232 |
+
"learning_rate": 3.859859936571307e-05,
|
| 1233 |
+
"loss": 0.07742337584495544,
|
| 1234 |
+
"step": 1680
|
| 1235 |
+
},
|
| 1236 |
+
{
|
| 1237 |
+
"epoch": 1.770238407126015,
|
| 1238 |
+
"grad_norm": 0.0719197615981102,
|
| 1239 |
+
"learning_rate": 3.8048993079203925e-05,
|
| 1240 |
+
"loss": 0.06242966651916504,
|
| 1241 |
+
"step": 1690
|
| 1242 |
+
},
|
| 1243 |
+
{
|
| 1244 |
+
"epoch": 1.7807178412365732,
|
| 1245 |
+
"grad_norm": 0.12380168586969376,
|
| 1246 |
+
"learning_rate": 3.750091408626907e-05,
|
| 1247 |
+
"loss": 0.07270430326461792,
|
| 1248 |
+
"step": 1700
|
| 1249 |
+
},
|
| 1250 |
+
{
|
| 1251 |
+
"epoch": 1.7911972753471312,
|
| 1252 |
+
"grad_norm": 0.1587221622467041,
|
| 1253 |
+
"learning_rate": 3.6954432429335015e-05,
|
| 1254 |
+
"loss": 0.06409866213798524,
|
| 1255 |
+
"step": 1710
|
| 1256 |
+
},
|
| 1257 |
+
{
|
| 1258 |
+
"epoch": 1.8016767094576893,
|
| 1259 |
+
"grad_norm": 0.10983912646770477,
|
| 1260 |
+
"learning_rate": 3.640961794669482e-05,
|
| 1261 |
+
"loss": 0.06610031127929687,
|
| 1262 |
+
"step": 1720
|
| 1263 |
+
},
|
| 1264 |
+
{
|
| 1265 |
+
"epoch": 1.8121561435682474,
|
| 1266 |
+
"grad_norm": 0.11023026704788208,
|
| 1267 |
+
"learning_rate": 3.586654026358287e-05,
|
| 1268 |
+
"loss": 0.06866579055786133,
|
| 1269 |
+
"step": 1730
|
| 1270 |
+
},
|
| 1271 |
+
{
|
| 1272 |
+
"epoch": 1.8226355776788052,
|
| 1273 |
+
"grad_norm": 0.11857719719409943,
|
| 1274 |
+
"learning_rate": 3.532526878327719e-05,
|
| 1275 |
+
"loss": 0.06734356880187989,
|
| 1276 |
+
"step": 1740
|
| 1277 |
+
},
|
| 1278 |
+
{
|
| 1279 |
+
"epoch": 1.8331150117893635,
|
| 1280 |
+
"grad_norm": 0.09280339628458023,
|
| 1281 |
+
"learning_rate": 3.478587267822987e-05,
|
| 1282 |
+
"loss": 0.06897796392440796,
|
| 1283 |
+
"step": 1750
|
| 1284 |
+
},
|
| 1285 |
+
{
|
| 1286 |
+
"epoch": 1.8331150117893635,
|
| 1287 |
+
"eval_loss": 0.06596127897500992,
|
| 1288 |
+
"eval_runtime": 35.5001,
|
| 1289 |
+
"eval_samples_per_second": 8.761,
|
| 1290 |
+
"eval_steps_per_second": 8.761,
|
| 1291 |
+
"step": 1750
|
| 1292 |
+
},
|
| 1293 |
+
{
|
| 1294 |
+
"epoch": 1.8435944458999214,
|
| 1295 |
+
"grad_norm": 0.1175367683172226,
|
| 1296 |
+
"learning_rate": 3.424842088122716e-05,
|
| 1297 |
+
"loss": 0.08288194537162781,
|
| 1298 |
+
"step": 1760
|
| 1299 |
+
},
|
| 1300 |
+
{
|
| 1301 |
+
"epoch": 1.8540738800104795,
|
| 1302 |
+
"grad_norm": 0.10271462798118591,
|
| 1303 |
+
"learning_rate": 3.371298207658003e-05,
|
| 1304 |
+
"loss": 0.05643013119697571,
|
| 1305 |
+
"step": 1770
|
| 1306 |
+
},
|
| 1307 |
+
{
|
| 1308 |
+
"epoch": 1.8645533141210375,
|
| 1309 |
+
"grad_norm": 0.11965195834636688,
|
| 1310 |
+
"learning_rate": 3.3179624691346654e-05,
|
| 1311 |
+
"loss": 0.07403092980384826,
|
| 1312 |
+
"step": 1780
|
| 1313 |
+
},
|
| 1314 |
+
{
|
| 1315 |
+
"epoch": 1.8750327482315954,
|
| 1316 |
+
"grad_norm": 0.09981680661439896,
|
| 1317 |
+
"learning_rate": 3.2648416886587686e-05,
|
| 1318 |
+
"loss": 0.07118859887123108,
|
| 1319 |
+
"step": 1790
|
| 1320 |
+
},
|
| 1321 |
+
{
|
| 1322 |
+
"epoch": 1.8855121823421537,
|
| 1323 |
+
"grad_norm": 0.07787375897169113,
|
| 1324 |
+
"learning_rate": 3.2119426548655435e-05,
|
| 1325 |
+
"loss": 0.07219682335853576,
|
| 1326 |
+
"step": 1800
|
| 1327 |
+
},
|
| 1328 |
+
{
|
| 1329 |
+
"epoch": 1.8959916164527115,
|
| 1330 |
+
"grad_norm": 0.1303507387638092,
|
| 1331 |
+
"learning_rate": 3.1592721280518404e-05,
|
| 1332 |
+
"loss": 0.07636030912399291,
|
| 1333 |
+
"step": 1810
|
| 1334 |
+
},
|
| 1335 |
+
{
|
| 1336 |
+
"epoch": 1.9064710505632696,
|
| 1337 |
+
"grad_norm": 0.09162267297506332,
|
| 1338 |
+
"learning_rate": 3.106836839312175e-05,
|
| 1339 |
+
"loss": 0.06230143308639526,
|
| 1340 |
+
"step": 1820
|
| 1341 |
+
},
|
| 1342 |
+
{
|
| 1343 |
+
"epoch": 1.9169504846738277,
|
| 1344 |
+
"grad_norm": 0.11375878751277924,
|
| 1345 |
+
"learning_rate": 3.054643489678526e-05,
|
| 1346 |
+
"loss": 0.060506826639175414,
|
| 1347 |
+
"step": 1830
|
| 1348 |
+
},
|
| 1349 |
+
{
|
| 1350 |
+
"epoch": 1.9274299187843855,
|
| 1351 |
+
"grad_norm": 0.1377716213464737,
|
| 1352 |
+
"learning_rate": 3.0026987492639668e-05,
|
| 1353 |
+
"loss": 0.08148540854454041,
|
| 1354 |
+
"step": 1840
|
| 1355 |
+
},
|
| 1356 |
+
{
|
| 1357 |
+
"epoch": 1.9379093528949438,
|
| 1358 |
+
"grad_norm": 0.10483554750680923,
|
| 1359 |
+
"learning_rate": 2.951009256410255e-05,
|
| 1360 |
+
"loss": 0.07040726542472839,
|
| 1361 |
+
"step": 1850
|
| 1362 |
+
},
|
| 1363 |
+
{
|
| 1364 |
+
"epoch": 1.9483887870055017,
|
| 1365 |
+
"grad_norm": 0.08736151456832886,
|
| 1366 |
+
"learning_rate": 2.8995816168394702e-05,
|
| 1367 |
+
"loss": 0.04931557774543762,
|
| 1368 |
+
"step": 1860
|
| 1369 |
+
},
|
| 1370 |
+
{
|
| 1371 |
+
"epoch": 1.9588682211160597,
|
| 1372 |
+
"grad_norm": 0.11461569368839264,
|
| 1373 |
+
"learning_rate": 2.848422402809828e-05,
|
| 1374 |
+
"loss": 0.057559752464294435,
|
| 1375 |
+
"step": 1870
|
| 1376 |
+
},
|
| 1377 |
+
{
|
| 1378 |
+
"epoch": 1.9693476552266178,
|
| 1379 |
+
"grad_norm": 0.09060918539762497,
|
| 1380 |
+
"learning_rate": 2.7975381522757803e-05,
|
| 1381 |
+
"loss": 0.06379705667495728,
|
| 1382 |
+
"step": 1880
|
| 1383 |
+
},
|
| 1384 |
+
{
|
| 1385 |
+
"epoch": 1.9798270893371757,
|
| 1386 |
+
"grad_norm": 0.07104971259832382,
|
| 1387 |
+
"learning_rate": 2.746935368052477e-05,
|
| 1388 |
+
"loss": 0.05813115239143372,
|
| 1389 |
+
"step": 1890
|
| 1390 |
+
},
|
| 1391 |
+
{
|
| 1392 |
+
"epoch": 1.990306523447734,
|
| 1393 |
+
"grad_norm": 0.10802938044071198,
|
| 1394 |
+
"learning_rate": 2.696620516984733e-05,
|
| 1395 |
+
"loss": 0.07732833027839661,
|
| 1396 |
+
"step": 1900
|
| 1397 |
+
},
|
| 1398 |
+
{
|
| 1399 |
+
"epoch": 2.0,
|
| 1400 |
+
"grad_norm": 0.16884952783584595,
|
| 1401 |
+
"learning_rate": 2.6466000291206004e-05,
|
| 1402 |
+
"loss": 0.06166202425956726,
|
| 1403 |
+
"step": 1910
|
| 1404 |
+
},
|
| 1405 |
+
{
|
| 1406 |
+
"epoch": 2.010479434110558,
|
| 1407 |
+
"grad_norm": 0.08582179993391037,
|
| 1408 |
+
"learning_rate": 2.5968802968896228e-05,
|
| 1409 |
+
"loss": 0.04766199886798859,
|
| 1410 |
+
"step": 1920
|
| 1411 |
+
},
|
| 1412 |
+
{
|
| 1413 |
+
"epoch": 2.020958868221116,
|
| 1414 |
+
"grad_norm": 0.1457364708185196,
|
| 1415 |
+
"learning_rate": 2.5474676742859048e-05,
|
| 1416 |
+
"loss": 0.03826354146003723,
|
| 1417 |
+
"step": 1930
|
| 1418 |
+
},
|
| 1419 |
+
{
|
| 1420 |
+
"epoch": 2.031438302331674,
|
| 1421 |
+
"grad_norm": 0.09275342524051666,
|
| 1422 |
+
"learning_rate": 2.4983684760561023e-05,
|
| 1423 |
+
"loss": 0.045059433579444884,
|
| 1424 |
+
"step": 1940
|
| 1425 |
+
},
|
| 1426 |
+
{
|
| 1427 |
+
"epoch": 2.0419177364422323,
|
| 1428 |
+
"grad_norm": 0.09085927903652191,
|
| 1429 |
+
"learning_rate": 2.44958897689242e-05,
|
| 1430 |
+
"loss": 0.04904903173446655,
|
| 1431 |
+
"step": 1950
|
| 1432 |
+
},
|
| 1433 |
+
{
|
| 1434 |
+
"epoch": 2.05239717055279,
|
| 1435 |
+
"grad_norm": 0.11733179539442062,
|
| 1436 |
+
"learning_rate": 2.401135410630731e-05,
|
| 1437 |
+
"loss": 0.05008396506309509,
|
| 1438 |
+
"step": 1960
|
| 1439 |
+
},
|
| 1440 |
+
{
|
| 1441 |
+
"epoch": 2.062876604663348,
|
| 1442 |
+
"grad_norm": 0.0894237607717514,
|
| 1443 |
+
"learning_rate": 2.3530139694539095e-05,
|
| 1444 |
+
"loss": 0.04057626128196716,
|
| 1445 |
+
"step": 1970
|
| 1446 |
+
},
|
| 1447 |
+
{
|
| 1448 |
+
"epoch": 2.0733560387739063,
|
| 1449 |
+
"grad_norm": 0.08560927212238312,
|
| 1450 |
+
"learning_rate": 2.305230803100496e-05,
|
| 1451 |
+
"loss": 0.04843136668205261,
|
| 1452 |
+
"step": 1980
|
| 1453 |
+
},
|
| 1454 |
+
{
|
| 1455 |
+
"epoch": 2.083835472884464,
|
| 1456 |
+
"grad_norm": 0.07991836220026016,
|
| 1457 |
+
"learning_rate": 2.257792018078793e-05,
|
| 1458 |
+
"loss": 0.0544127106666565,
|
| 1459 |
+
"step": 1990
|
| 1460 |
+
},
|
| 1461 |
+
{
|
| 1462 |
+
"epoch": 2.0943149069950224,
|
| 1463 |
+
"grad_norm": 0.08846250921487808,
|
| 1464 |
+
"learning_rate": 2.210703676886461e-05,
|
| 1465 |
+
"loss": 0.0459000825881958,
|
| 1466 |
+
"step": 2000
|
| 1467 |
+
},
|
| 1468 |
+
{
|
| 1469 |
+
"epoch": 2.0943149069950224,
|
| 1470 |
+
"eval_loss": 0.060011014342308044,
|
| 1471 |
+
"eval_runtime": 36.3755,
|
| 1472 |
+
"eval_samples_per_second": 8.55,
|
| 1473 |
+
"eval_steps_per_second": 8.55,
|
| 1474 |
+
"step": 2000
|
| 1475 |
+
},
|
| 1476 |
+
{
|
| 1477 |
+
"epoch": 2.1047943411055803,
|
| 1478 |
+
"grad_norm": 0.10082945972681046,
|
| 1479 |
+
"learning_rate": 2.1639717972357678e-05,
|
| 1480 |
+
"loss": 0.038090622425079344,
|
| 1481 |
+
"step": 2010
|
| 1482 |
+
},
|
| 1483 |
+
{
|
| 1484 |
+
"epoch": 2.115273775216138,
|
| 1485 |
+
"grad_norm": 0.05712248757481575,
|
| 1486 |
+
"learning_rate": 2.1176023512845376e-05,
|
| 1487 |
+
"loss": 0.04598597884178161,
|
| 1488 |
+
"step": 2020
|
| 1489 |
+
},
|
| 1490 |
+
{
|
| 1491 |
+
"epoch": 2.1257532093266964,
|
| 1492 |
+
"grad_norm": 0.11628362536430359,
|
| 1493 |
+
"learning_rate": 2.0716012648729353e-05,
|
| 1494 |
+
"loss": 0.04984880685806274,
|
| 1495 |
+
"step": 2030
|
| 1496 |
+
},
|
| 1497 |
+
{
|
| 1498 |
+
"epoch": 2.1362326434372543,
|
| 1499 |
+
"grad_norm": 0.10635484755039215,
|
| 1500 |
+
"learning_rate": 2.025974416766171e-05,
|
| 1501 |
+
"loss": 0.04293925166130066,
|
| 1502 |
+
"step": 2040
|
| 1503 |
+
},
|
| 1504 |
+
{
|
| 1505 |
+
"epoch": 2.1467120775478126,
|
| 1506 |
+
"grad_norm": 0.1017381027340889,
|
| 1507 |
+
"learning_rate": 1.9807276379032113e-05,
|
| 1508 |
+
"loss": 0.04305694401264191,
|
| 1509 |
+
"step": 2050
|
| 1510 |
+
},
|
| 1511 |
+
{
|
| 1512 |
+
"epoch": 2.1571915116583704,
|
| 1513 |
+
"grad_norm": 0.13550882041454315,
|
| 1514 |
+
"learning_rate": 1.9358667106516055e-05,
|
| 1515 |
+
"loss": 0.04478869140148163,
|
| 1516 |
+
"step": 2060
|
| 1517 |
+
},
|
| 1518 |
+
{
|
| 1519 |
+
"epoch": 2.1676709457689283,
|
| 1520 |
+
"grad_norm": 0.08526366949081421,
|
| 1521 |
+
"learning_rate": 1.8913973680685226e-05,
|
| 1522 |
+
"loss": 0.036646312475204466,
|
| 1523 |
+
"step": 2070
|
| 1524 |
+
},
|
| 1525 |
+
{
|
| 1526 |
+
"epoch": 2.1781503798794866,
|
| 1527 |
+
"grad_norm": 0.10932011157274246,
|
| 1528 |
+
"learning_rate": 1.8473252931680928e-05,
|
| 1529 |
+
"loss": 0.042200219631195066,
|
| 1530 |
+
"step": 2080
|
| 1531 |
+
},
|
| 1532 |
+
{
|
| 1533 |
+
"epoch": 2.1886298139900444,
|
| 1534 |
+
"grad_norm": 0.08768360316753387,
|
| 1535 |
+
"learning_rate": 1.803656118195136e-05,
|
| 1536 |
+
"loss": 0.0437488317489624,
|
| 1537 |
+
"step": 2090
|
| 1538 |
+
},
|
| 1539 |
+
{
|
| 1540 |
+
"epoch": 2.1991092481006027,
|
| 1541 |
+
"grad_norm": 0.08362651616334915,
|
| 1542 |
+
"learning_rate": 1.760395423905379e-05,
|
| 1543 |
+
"loss": 0.04669668078422547,
|
| 1544 |
+
"step": 2100
|
| 1545 |
+
},
|
| 1546 |
+
{
|
| 1547 |
+
"epoch": 2.2095886822111606,
|
| 1548 |
+
"grad_norm": 0.08554034680128098,
|
| 1549 |
+
"learning_rate": 1.7175487388522588e-05,
|
| 1550 |
+
"loss": 0.034989356994628906,
|
| 1551 |
+
"step": 2110
|
| 1552 |
+
},
|
| 1553 |
+
{
|
| 1554 |
+
"epoch": 2.220068116321719,
|
| 1555 |
+
"grad_norm": 0.08215561509132385,
|
| 1556 |
+
"learning_rate": 1.6751215386803986e-05,
|
| 1557 |
+
"loss": 0.040298929810523985,
|
| 1558 |
+
"step": 2120
|
| 1559 |
+
},
|
| 1560 |
+
{
|
| 1561 |
+
"epoch": 2.2305475504322767,
|
| 1562 |
+
"grad_norm": 0.0840689167380333,
|
| 1563 |
+
"learning_rate": 1.6331192454258337e-05,
|
| 1564 |
+
"loss": 0.041704925894737246,
|
| 1565 |
+
"step": 2130
|
| 1566 |
+
},
|
| 1567 |
+
{
|
| 1568 |
+
"epoch": 2.2410269845428346,
|
| 1569 |
+
"grad_norm": 0.06530614197254181,
|
| 1570 |
+
"learning_rate": 1.5915472268231018e-05,
|
| 1571 |
+
"loss": 0.03651900887489319,
|
| 1572 |
+
"step": 2140
|
| 1573 |
+
},
|
| 1574 |
+
{
|
| 1575 |
+
"epoch": 2.251506418653393,
|
| 1576 |
+
"grad_norm": 0.12431822717189789,
|
| 1577 |
+
"learning_rate": 1.550410795619261e-05,
|
| 1578 |
+
"loss": 0.04806804955005646,
|
| 1579 |
+
"step": 2150
|
| 1580 |
+
},
|
| 1581 |
+
{
|
| 1582 |
+
"epoch": 2.2619858527639507,
|
| 1583 |
+
"grad_norm": 0.09592410176992416,
|
| 1584 |
+
"learning_rate": 1.509715208894949e-05,
|
| 1585 |
+
"loss": 0.0454313725233078,
|
| 1586 |
+
"step": 2160
|
| 1587 |
+
},
|
| 1588 |
+
{
|
| 1589 |
+
"epoch": 2.2724652868745085,
|
| 1590 |
+
"grad_norm": 0.07589780539274216,
|
| 1591 |
+
"learning_rate": 1.469465667392536e-05,
|
| 1592 |
+
"loss": 0.03574602603912354,
|
| 1593 |
+
"step": 2170
|
| 1594 |
+
},
|
| 1595 |
+
{
|
| 1596 |
+
"epoch": 2.282944720985067,
|
| 1597 |
+
"grad_norm": 0.09734483063220978,
|
| 1598 |
+
"learning_rate": 1.4296673148515038e-05,
|
| 1599 |
+
"loss": 0.04358702301979065,
|
| 1600 |
+
"step": 2180
|
| 1601 |
+
},
|
| 1602 |
+
{
|
| 1603 |
+
"epoch": 2.2934241550956247,
|
| 1604 |
+
"grad_norm": 0.0974339172244072,
|
| 1605 |
+
"learning_rate": 1.3903252373510838e-05,
|
| 1606 |
+
"loss": 0.04603351950645447,
|
| 1607 |
+
"step": 2190
|
| 1608 |
+
},
|
| 1609 |
+
{
|
| 1610 |
+
"epoch": 2.303903589206183,
|
| 1611 |
+
"grad_norm": 0.09025271981954575,
|
| 1612 |
+
"learning_rate": 1.3514444626602773e-05,
|
| 1613 |
+
"loss": 0.040065237879753114,
|
| 1614 |
+
"step": 2200
|
| 1615 |
+
},
|
| 1616 |
+
{
|
| 1617 |
+
"epoch": 2.314383023316741,
|
| 1618 |
+
"grad_norm": 0.07625086605548859,
|
| 1619 |
+
"learning_rate": 1.3130299595953338e-05,
|
| 1620 |
+
"loss": 0.044061675667762756,
|
| 1621 |
+
"step": 2210
|
| 1622 |
+
},
|
| 1623 |
+
{
|
| 1624 |
+
"epoch": 2.324862457427299,
|
| 1625 |
+
"grad_norm": 0.07306221127510071,
|
| 1626 |
+
"learning_rate": 1.2750866373847465e-05,
|
| 1627 |
+
"loss": 0.03366467654705048,
|
| 1628 |
+
"step": 2220
|
| 1629 |
+
},
|
| 1630 |
+
{
|
| 1631 |
+
"epoch": 2.335341891537857,
|
| 1632 |
+
"grad_norm": 0.08357638120651245,
|
| 1633 |
+
"learning_rate": 1.2376193450418715e-05,
|
| 1634 |
+
"loss": 0.041424044966697694,
|
| 1635 |
+
"step": 2230
|
| 1636 |
+
},
|
| 1637 |
+
{
|
| 1638 |
+
"epoch": 2.345821325648415,
|
| 1639 |
+
"grad_norm": 0.09153921157121658,
|
| 1640 |
+
"learning_rate": 1.2006328707452459e-05,
|
| 1641 |
+
"loss": 0.03938372135162353,
|
| 1642 |
+
"step": 2240
|
| 1643 |
+
},
|
| 1644 |
+
{
|
| 1645 |
+
"epoch": 2.356300759758973,
|
| 1646 |
+
"grad_norm": 0.09109660983085632,
|
| 1647 |
+
"learning_rate": 1.1641319412266765e-05,
|
| 1648 |
+
"loss": 0.04015985131263733,
|
| 1649 |
+
"step": 2250
|
| 1650 |
+
},
|
| 1651 |
+
{
|
| 1652 |
+
"epoch": 2.356300759758973,
|
| 1653 |
+
"eval_loss": 0.05486458167433739,
|
| 1654 |
+
"eval_runtime": 36.8119,
|
| 1655 |
+
"eval_samples_per_second": 8.448,
|
| 1656 |
+
"eval_steps_per_second": 8.448,
|
| 1657 |
+
"step": 2250
|
| 1658 |
+
},
|
| 1659 |
+
{
|
| 1660 |
+
"epoch": 2.366780193869531,
|
| 1661 |
+
"grad_norm": 0.052502721548080444,
|
| 1662 |
+
"learning_rate": 1.1281212211671822e-05,
|
| 1663 |
+
"loss": 0.0270554780960083,
|
| 1664 |
+
"step": 2260
|
| 1665 |
+
},
|
| 1666 |
+
{
|
| 1667 |
+
"epoch": 2.377259627980089,
|
| 1668 |
+
"grad_norm": 0.07931812107563019,
|
| 1669 |
+
"learning_rate": 1.0926053126008584e-05,
|
| 1670 |
+
"loss": 0.0417300134897232,
|
| 1671 |
+
"step": 2270
|
| 1672 |
+
},
|
| 1673 |
+
{
|
| 1674 |
+
"epoch": 2.387739062090647,
|
| 1675 |
+
"grad_norm": 0.08996254205703735,
|
| 1676 |
+
"learning_rate": 1.0575887543267609e-05,
|
| 1677 |
+
"loss": 0.037659955024719236,
|
| 1678 |
+
"step": 2280
|
| 1679 |
+
},
|
| 1680 |
+
{
|
| 1681 |
+
"epoch": 2.398218496201205,
|
| 1682 |
+
"grad_norm": 0.08800788223743439,
|
| 1683 |
+
"learning_rate": 1.023076021328867e-05,
|
| 1684 |
+
"loss": 0.048437944054603575,
|
| 1685 |
+
"step": 2290
|
| 1686 |
+
},
|
| 1687 |
+
{
|
| 1688 |
+
"epoch": 2.4086979303117633,
|
| 1689 |
+
"grad_norm": 0.10572271049022675,
|
| 1690 |
+
"learning_rate": 9.890715242041787e-06,
|
| 1691 |
+
"loss": 0.04166909456253052,
|
| 1692 |
+
"step": 2300
|
| 1693 |
+
},
|
| 1694 |
+
{
|
| 1695 |
+
"epoch": 2.419177364422321,
|
| 1696 |
+
"grad_norm": 0.10573071986436844,
|
| 1697 |
+
"learning_rate": 9.555796085990781e-06,
|
| 1698 |
+
"loss": 0.03919607996940613,
|
| 1699 |
+
"step": 2310
|
| 1700 |
+
},
|
| 1701 |
+
{
|
| 1702 |
+
"epoch": 2.4296567985328794,
|
| 1703 |
+
"grad_norm": 0.09714583307504654,
|
| 1704 |
+
"learning_rate": 9.226045546539608e-06,
|
| 1705 |
+
"loss": 0.03530588150024414,
|
| 1706 |
+
"step": 2320
|
| 1707 |
+
},
|
| 1708 |
+
{
|
| 1709 |
+
"epoch": 2.4401362326434373,
|
| 1710 |
+
"grad_norm": 0.09436199069023132,
|
| 1711 |
+
"learning_rate": 8.901505764562518e-06,
|
| 1712 |
+
"loss": 0.05111382007598877,
|
| 1713 |
+
"step": 2330
|
| 1714 |
+
},
|
| 1715 |
+
{
|
| 1716 |
+
"epoch": 2.450615666753995,
|
| 1717 |
+
"grad_norm": 0.06353961676359177,
|
| 1718 |
+
"learning_rate": 8.582218215018656e-06,
|
| 1719 |
+
"loss": 0.03805697858333588,
|
| 1720 |
+
"step": 2340
|
| 1721 |
+
},
|
| 1722 |
+
{
|
| 1723 |
+
"epoch": 2.4610951008645534,
|
| 1724 |
+
"grad_norm": 0.08853815495967865,
|
| 1725 |
+
"learning_rate": 8.268223701651684e-06,
|
| 1726 |
+
"loss": 0.04815975427627563,
|
| 1727 |
+
"step": 2350
|
| 1728 |
+
},
|
| 1729 |
+
{
|
| 1730 |
+
"epoch": 2.4715745349751113,
|
| 1731 |
+
"grad_norm": 0.07472016662359238,
|
| 1732 |
+
"learning_rate": 7.959562351775196e-06,
|
| 1733 |
+
"loss": 0.042247459292411804,
|
| 1734 |
+
"step": 2360
|
| 1735 |
+
},
|
| 1736 |
+
{
|
| 1737 |
+
"epoch": 2.4820539690856696,
|
| 1738 |
+
"grad_norm": 0.12121549248695374,
|
| 1739 |
+
"learning_rate": 7.656273611144632e-06,
|
| 1740 |
+
"loss": 0.040102115273475646,
|
| 1741 |
+
"step": 2370
|
| 1742 |
+
},
|
| 1743 |
+
{
|
| 1744 |
+
"epoch": 2.4925334031962274,
|
| 1745 |
+
"grad_norm": 0.08667747676372528,
|
| 1746 |
+
"learning_rate": 7.358396238916254e-06,
|
| 1747 |
+
"loss": 0.03656341433525086,
|
| 1748 |
+
"step": 2380
|
| 1749 |
+
},
|
| 1750 |
+
{
|
| 1751 |
+
"epoch": 2.5030128373067857,
|
| 1752 |
+
"grad_norm": 0.1162872165441513,
|
| 1753 |
+
"learning_rate": 7.065968302693882e-06,
|
| 1754 |
+
"loss": 0.04052766263484955,
|
| 1755 |
+
"step": 2390
|
| 1756 |
+
},
|
| 1757 |
+
{
|
| 1758 |
+
"epoch": 2.5134922714173435,
|
| 1759 |
+
"grad_norm": 0.07924140989780426,
|
| 1760 |
+
"learning_rate": 6.7790271736639595e-06,
|
| 1761 |
+
"loss": 0.03394221067428589,
|
| 1762 |
+
"step": 2400
|
| 1763 |
+
},
|
| 1764 |
+
{
|
| 1765 |
+
"epoch": 2.5239717055279014,
|
| 1766 |
+
"grad_norm": 0.09523408859968185,
|
| 1767 |
+
"learning_rate": 6.497609521819681e-06,
|
| 1768 |
+
"loss": 0.04119439423084259,
|
| 1769 |
+
"step": 2410
|
| 1770 |
+
},
|
| 1771 |
+
{
|
| 1772 |
+
"epoch": 2.5344511396384597,
|
| 1773 |
+
"grad_norm": 0.12182598561048508,
|
| 1774 |
+
"learning_rate": 6.221751311274731e-06,
|
| 1775 |
+
"loss": 0.05154783725738525,
|
| 1776 |
+
"step": 2420
|
| 1777 |
+
},
|
| 1778 |
+
{
|
| 1779 |
+
"epoch": 2.5449305737490175,
|
| 1780 |
+
"grad_norm": 0.09359873831272125,
|
| 1781 |
+
"learning_rate": 5.951487795667149e-06,
|
| 1782 |
+
"loss": 0.035483264923095705,
|
| 1783 |
+
"step": 2430
|
| 1784 |
+
},
|
| 1785 |
+
{
|
| 1786 |
+
"epoch": 2.5554100078595754,
|
| 1787 |
+
"grad_norm": 0.08514095097780228,
|
| 1788 |
+
"learning_rate": 5.686853513654117e-06,
|
| 1789 |
+
"loss": 0.03830339312553406,
|
| 1790 |
+
"step": 2440
|
| 1791 |
+
},
|
| 1792 |
+
{
|
| 1793 |
+
"epoch": 2.5658894419701337,
|
| 1794 |
+
"grad_norm": 0.10625084489583969,
|
| 1795 |
+
"learning_rate": 5.4278822844979705e-06,
|
| 1796 |
+
"loss": 0.034111028909683226,
|
| 1797 |
+
"step": 2450
|
| 1798 |
+
},
|
| 1799 |
+
{
|
| 1800 |
+
"epoch": 2.5763688760806915,
|
| 1801 |
+
"grad_norm": 0.1004003956913948,
|
| 1802 |
+
"learning_rate": 5.174607203744286e-06,
|
| 1803 |
+
"loss": 0.04465605318546295,
|
| 1804 |
+
"step": 2460
|
| 1805 |
+
},
|
| 1806 |
+
{
|
| 1807 |
+
"epoch": 2.58684831019125,
|
| 1808 |
+
"grad_norm": 0.0962519720196724,
|
| 1809 |
+
"learning_rate": 4.927060638992382e-06,
|
| 1810 |
+
"loss": 0.041056016087532045,
|
| 1811 |
+
"step": 2470
|
| 1812 |
+
},
|
| 1813 |
+
{
|
| 1814 |
+
"epoch": 2.5973277443018077,
|
| 1815 |
+
"grad_norm": 0.06380607187747955,
|
| 1816 |
+
"learning_rate": 4.685274225758846e-06,
|
| 1817 |
+
"loss": 0.03880062401294708,
|
| 1818 |
+
"step": 2480
|
| 1819 |
+
},
|
| 1820 |
+
{
|
| 1821 |
+
"epoch": 2.607807178412366,
|
| 1822 |
+
"grad_norm": 0.07326535880565643,
|
| 1823 |
+
"learning_rate": 4.449278863434647e-06,
|
| 1824 |
+
"loss": 0.03194461762905121,
|
| 1825 |
+
"step": 2490
|
| 1826 |
+
},
|
| 1827 |
+
{
|
| 1828 |
+
"epoch": 2.618286612522924,
|
| 1829 |
+
"grad_norm": 0.12218596786260605,
|
| 1830 |
+
"learning_rate": 4.2191047113362854e-06,
|
| 1831 |
+
"loss": 0.04258840978145599,
|
| 1832 |
+
"step": 2500
|
| 1833 |
+
},
|
| 1834 |
+
{
|
| 1835 |
+
"epoch": 2.618286612522924,
|
| 1836 |
+
"eval_loss": 0.05223666876554489,
|
| 1837 |
+
"eval_runtime": 37.7234,
|
| 1838 |
+
"eval_samples_per_second": 8.244,
|
| 1839 |
+
"eval_steps_per_second": 8.244,
|
| 1840 |
+
"step": 2500
|
| 1841 |
+
},
|
| 1842 |
+
{
|
| 1843 |
+
"epoch": 2.6287660466334817,
|
| 1844 |
+
"grad_norm": 0.08594664931297302,
|
| 1845 |
+
"learning_rate": 3.994781184851598e-06,
|
| 1846 |
+
"loss": 0.04302787780761719,
|
| 1847 |
+
"step": 2510
|
| 1848 |
+
},
|
| 1849 |
+
{
|
| 1850 |
+
"epoch": 2.63924548074404,
|
| 1851 |
+
"grad_norm": 0.08187596499919891,
|
| 1852 |
+
"learning_rate": 3.776336951680548e-06,
|
| 1853 |
+
"loss": 0.0341387003660202,
|
| 1854 |
+
"step": 2520
|
| 1855 |
+
},
|
| 1856 |
+
{
|
| 1857 |
+
"epoch": 2.649724914854598,
|
| 1858 |
+
"grad_norm": 0.10216796398162842,
|
| 1859 |
+
"learning_rate": 3.563799928171596e-06,
|
| 1860 |
+
"loss": 0.04289879500865936,
|
| 1861 |
+
"step": 2530
|
| 1862 |
+
},
|
| 1863 |
+
{
|
| 1864 |
+
"epoch": 2.6602043489651557,
|
| 1865 |
+
"grad_norm": 0.11215174198150635,
|
| 1866 |
+
"learning_rate": 3.3571972757540814e-06,
|
| 1867 |
+
"loss": 0.04055049121379852,
|
| 1868 |
+
"step": 2540
|
| 1869 |
+
},
|
| 1870 |
+
{
|
| 1871 |
+
"epoch": 2.670683783075714,
|
| 1872 |
+
"grad_norm": 0.07941269129514694,
|
| 1873 |
+
"learning_rate": 3.156555397467176e-06,
|
| 1874 |
+
"loss": 0.04118689000606537,
|
| 1875 |
+
"step": 2550
|
| 1876 |
+
},
|
| 1877 |
+
{
|
| 1878 |
+
"epoch": 2.681163217186272,
|
| 1879 |
+
"grad_norm": 0.09404437988996506,
|
| 1880 |
+
"learning_rate": 2.9618999345855547e-06,
|
| 1881 |
+
"loss": 0.03079705536365509,
|
| 1882 |
+
"step": 2560
|
| 1883 |
+
},
|
| 1884 |
+
{
|
| 1885 |
+
"epoch": 2.69164265129683,
|
| 1886 |
+
"grad_norm": 0.1109817698597908,
|
| 1887 |
+
"learning_rate": 2.773255763342647e-06,
|
| 1888 |
+
"loss": 0.038885954022407535,
|
| 1889 |
+
"step": 2570
|
| 1890 |
+
},
|
| 1891 |
+
{
|
| 1892 |
+
"epoch": 2.702122085407388,
|
| 1893 |
+
"grad_norm": 0.09431962668895721,
|
| 1894 |
+
"learning_rate": 2.590646991751472e-06,
|
| 1895 |
+
"loss": 0.043543145060539246,
|
| 1896 |
+
"step": 2580
|
| 1897 |
+
},
|
| 1898 |
+
{
|
| 1899 |
+
"epoch": 2.7126015195179463,
|
| 1900 |
+
"grad_norm": 0.08184763044118881,
|
| 1901 |
+
"learning_rate": 2.414096956523776e-06,
|
| 1902 |
+
"loss": 0.03256987631320953,
|
| 1903 |
+
"step": 2590
|
| 1904 |
+
},
|
| 1905 |
+
{
|
| 1906 |
+
"epoch": 2.723080953628504,
|
| 1907 |
+
"grad_norm": 0.08390141278505325,
|
| 1908 |
+
"learning_rate": 2.2436282200876458e-06,
|
| 1909 |
+
"loss": 0.03908055424690247,
|
| 1910 |
+
"step": 2600
|
| 1911 |
+
},
|
| 1912 |
+
{
|
| 1913 |
+
"epoch": 2.733560387739062,
|
| 1914 |
+
"grad_norm": 0.0762532502412796,
|
| 1915 |
+
"learning_rate": 2.07926256770416e-06,
|
| 1916 |
+
"loss": 0.04899201393127441,
|
| 1917 |
+
"step": 2610
|
| 1918 |
+
},
|
| 1919 |
+
{
|
| 1920 |
+
"epoch": 2.7440398218496203,
|
| 1921 |
+
"grad_norm": 0.08239631354808807,
|
| 1922 |
+
"learning_rate": 1.9210210046832768e-06,
|
| 1923 |
+
"loss": 0.048707082867622375,
|
| 1924 |
+
"step": 2620
|
| 1925 |
+
},
|
| 1926 |
+
{
|
| 1927 |
+
"epoch": 2.754519255960178,
|
| 1928 |
+
"grad_norm": 0.09619107842445374,
|
| 1929 |
+
"learning_rate": 1.7689237536994364e-06,
|
| 1930 |
+
"loss": 0.0372231125831604,
|
| 1931 |
+
"step": 2630
|
| 1932 |
+
},
|
| 1933 |
+
{
|
| 1934 |
+
"epoch": 2.764998690070736,
|
| 1935 |
+
"grad_norm": 0.07099667191505432,
|
| 1936 |
+
"learning_rate": 1.6229902522072293e-06,
|
| 1937 |
+
"loss": 0.03421170711517334,
|
| 1938 |
+
"step": 2640
|
| 1939 |
+
},
|
| 1940 |
+
{
|
| 1941 |
+
"epoch": 2.7754781241812942,
|
| 1942 |
+
"grad_norm": 0.10154753923416138,
|
| 1943 |
+
"learning_rate": 1.4832391499572996e-06,
|
| 1944 |
+
"loss": 0.03656705319881439,
|
| 1945 |
+
"step": 2650
|
| 1946 |
+
},
|
| 1947 |
+
{
|
| 1948 |
+
"epoch": 2.785957558291852,
|
| 1949 |
+
"grad_norm": 0.09349387139081955,
|
| 1950 |
+
"learning_rate": 1.3496883066130173e-06,
|
| 1951 |
+
"loss": 0.03710306882858276,
|
| 1952 |
+
"step": 2660
|
| 1953 |
+
},
|
| 1954 |
+
{
|
| 1955 |
+
"epoch": 2.7964369924024104,
|
| 1956 |
+
"grad_norm": 0.061091430485248566,
|
| 1957 |
+
"learning_rate": 1.2223547894680443e-06,
|
| 1958 |
+
"loss": 0.0308389812707901,
|
| 1959 |
+
"step": 2670
|
| 1960 |
+
},
|
| 1961 |
+
{
|
| 1962 |
+
"epoch": 2.8069164265129682,
|
| 1963 |
+
"grad_norm": 0.09838075935840607,
|
| 1964 |
+
"learning_rate": 1.101254871265256e-06,
|
| 1965 |
+
"loss": 0.03703555166721344,
|
| 1966 |
+
"step": 2680
|
| 1967 |
+
},
|
| 1968 |
+
{
|
| 1969 |
+
"epoch": 2.8173958606235265,
|
| 1970 |
+
"grad_norm": 0.10046928375959396,
|
| 1971 |
+
"learning_rate": 9.864040281170938e-07,
|
| 1972 |
+
"loss": 0.04500553905963898,
|
| 1973 |
+
"step": 2690
|
| 1974 |
+
},
|
| 1975 |
+
{
|
| 1976 |
+
"epoch": 2.8278752947340844,
|
| 1977 |
+
"grad_norm": 0.06770773977041245,
|
| 1978 |
+
"learning_rate": 8.778169375277978e-07,
|
| 1979 |
+
"loss": 0.03823737502098083,
|
| 1980 |
+
"step": 2700
|
| 1981 |
+
},
|
| 1982 |
+
{
|
| 1983 |
+
"epoch": 2.8383547288446422,
|
| 1984 |
+
"grad_norm": 0.08373535424470901,
|
| 1985 |
+
"learning_rate": 7.755074765176618e-07,
|
| 1986 |
+
"loss": 0.03961678743362427,
|
| 1987 |
+
"step": 2710
|
| 1988 |
+
},
|
| 1989 |
+
{
|
| 1990 |
+
"epoch": 2.8488341629552005,
|
| 1991 |
+
"grad_norm": 0.07590050995349884,
|
| 1992 |
+
"learning_rate": 6.794887198496413e-07,
|
| 1993 |
+
"loss": 0.03221273124217987,
|
| 1994 |
+
"step": 2720
|
| 1995 |
+
},
|
| 1996 |
+
{
|
| 1997 |
+
"epoch": 2.8593135970657584,
|
| 1998 |
+
"grad_norm": 0.08507678657770157,
|
| 1999 |
+
"learning_rate": 5.897729383583906e-07,
|
| 2000 |
+
"loss": 0.04571912884712219,
|
| 2001 |
+
"step": 2730
|
| 2002 |
+
},
|
| 2003 |
+
{
|
| 2004 |
+
"epoch": 2.8697930311763162,
|
| 2005 |
+
"grad_norm": 0.06584763526916504,
|
| 2006 |
+
"learning_rate": 5.063715973821659e-07,
|
| 2007 |
+
"loss": 0.03794914484024048,
|
| 2008 |
+
"step": 2740
|
| 2009 |
+
},
|
| 2010 |
+
{
|
| 2011 |
+
"epoch": 2.8802724652868745,
|
| 2012 |
+
"grad_norm": 0.07312892377376556,
|
| 2013 |
+
"learning_rate": 4.292953552975154e-07,
|
| 2014 |
+
"loss": 0.036365586519241336,
|
| 2015 |
+
"step": 2750
|
| 2016 |
+
},
|
| 2017 |
+
{
|
| 2018 |
+
"epoch": 2.8802724652868745,
|
| 2019 |
+
"eval_loss": 0.05090421438217163,
|
| 2020 |
+
"eval_runtime": 85.293,
|
| 2021 |
+
"eval_samples_per_second": 3.646,
|
| 2022 |
+
"eval_steps_per_second": 3.646,
|
| 2023 |
+
"step": 2750
|
| 2024 |
+
}
|
| 2025 |
+
],
|
| 2026 |
+
"logging_steps": 10,
|
| 2027 |
+
"max_steps": 2865,
|
| 2028 |
+
"num_input_tokens_seen": 0,
|
| 2029 |
+
"num_train_epochs": 3,
|
| 2030 |
+
"save_steps": 250,
|
| 2031 |
+
"stateful_callbacks": {
|
| 2032 |
+
"TrainerControl": {
|
| 2033 |
+
"args": {
|
| 2034 |
+
"should_epoch_stop": false,
|
| 2035 |
+
"should_evaluate": false,
|
| 2036 |
+
"should_log": false,
|
| 2037 |
+
"should_save": true,
|
| 2038 |
+
"should_training_stop": false
|
| 2039 |
+
},
|
| 2040 |
+
"attributes": {}
|
| 2041 |
+
}
|
| 2042 |
+
},
|
| 2043 |
+
"total_flos": 8.668152022199163e+17,
|
| 2044 |
+
"train_batch_size": 2,
|
| 2045 |
+
"trial_name": null,
|
| 2046 |
+
"trial_params": null
|
| 2047 |
+
}
|
checkpoint-2750/training_args.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e8fc737554ff6f82c4ea137b5313611e3b2b3b63fd69b3926d6b1fe9da14c0a6
|
| 3 |
+
size 5201
|
checkpoint-2865/README.md
ADDED
|
@@ -0,0 +1,207 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
base_model: Qwen/Qwen3.5-2B
|
| 3 |
+
library_name: peft
|
| 4 |
+
pipeline_tag: text-generation
|
| 5 |
+
tags:
|
| 6 |
+
- base_model:adapter:Qwen/Qwen3.5-2B
|
| 7 |
+
- lora
|
| 8 |
+
- transformers
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# Model Card for Model ID
|
| 12 |
+
|
| 13 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
## Model Details
|
| 18 |
+
|
| 19 |
+
### Model Description
|
| 20 |
+
|
| 21 |
+
<!-- Provide a longer summary of what this model is. -->
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
- **Developed by:** [More Information Needed]
|
| 26 |
+
- **Funded by [optional]:** [More Information Needed]
|
| 27 |
+
- **Shared by [optional]:** [More Information Needed]
|
| 28 |
+
- **Model type:** [More Information Needed]
|
| 29 |
+
- **Language(s) (NLP):** [More Information Needed]
|
| 30 |
+
- **License:** [More Information Needed]
|
| 31 |
+
- **Finetuned from model [optional]:** [More Information Needed]
|
| 32 |
+
|
| 33 |
+
### Model Sources [optional]
|
| 34 |
+
|
| 35 |
+
<!-- Provide the basic links for the model. -->
|
| 36 |
+
|
| 37 |
+
- **Repository:** [More Information Needed]
|
| 38 |
+
- **Paper [optional]:** [More Information Needed]
|
| 39 |
+
- **Demo [optional]:** [More Information Needed]
|
| 40 |
+
|
| 41 |
+
## Uses
|
| 42 |
+
|
| 43 |
+
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
| 44 |
+
|
| 45 |
+
### Direct Use
|
| 46 |
+
|
| 47 |
+
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
| 48 |
+
|
| 49 |
+
[More Information Needed]
|
| 50 |
+
|
| 51 |
+
### Downstream Use [optional]
|
| 52 |
+
|
| 53 |
+
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
| 54 |
+
|
| 55 |
+
[More Information Needed]
|
| 56 |
+
|
| 57 |
+
### Out-of-Scope Use
|
| 58 |
+
|
| 59 |
+
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
| 60 |
+
|
| 61 |
+
[More Information Needed]
|
| 62 |
+
|
| 63 |
+
## Bias, Risks, and Limitations
|
| 64 |
+
|
| 65 |
+
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
| 66 |
+
|
| 67 |
+
[More Information Needed]
|
| 68 |
+
|
| 69 |
+
### Recommendations
|
| 70 |
+
|
| 71 |
+
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
| 72 |
+
|
| 73 |
+
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
| 74 |
+
|
| 75 |
+
## How to Get Started with the Model
|
| 76 |
+
|
| 77 |
+
Use the code below to get started with the model.
|
| 78 |
+
|
| 79 |
+
[More Information Needed]
|
| 80 |
+
|
| 81 |
+
## Training Details
|
| 82 |
+
|
| 83 |
+
### Training Data
|
| 84 |
+
|
| 85 |
+
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
| 86 |
+
|
| 87 |
+
[More Information Needed]
|
| 88 |
+
|
| 89 |
+
### Training Procedure
|
| 90 |
+
|
| 91 |
+
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
| 92 |
+
|
| 93 |
+
#### Preprocessing [optional]
|
| 94 |
+
|
| 95 |
+
[More Information Needed]
|
| 96 |
+
|
| 97 |
+
|
| 98 |
+
#### Training Hyperparameters
|
| 99 |
+
|
| 100 |
+
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
| 101 |
+
|
| 102 |
+
#### Speeds, Sizes, Times [optional]
|
| 103 |
+
|
| 104 |
+
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
| 105 |
+
|
| 106 |
+
[More Information Needed]
|
| 107 |
+
|
| 108 |
+
## Evaluation
|
| 109 |
+
|
| 110 |
+
<!-- This section describes the evaluation protocols and provides the results. -->
|
| 111 |
+
|
| 112 |
+
### Testing Data, Factors & Metrics
|
| 113 |
+
|
| 114 |
+
#### Testing Data
|
| 115 |
+
|
| 116 |
+
<!-- This should link to a Dataset Card if possible. -->
|
| 117 |
+
|
| 118 |
+
[More Information Needed]
|
| 119 |
+
|
| 120 |
+
#### Factors
|
| 121 |
+
|
| 122 |
+
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
| 123 |
+
|
| 124 |
+
[More Information Needed]
|
| 125 |
+
|
| 126 |
+
#### Metrics
|
| 127 |
+
|
| 128 |
+
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
| 129 |
+
|
| 130 |
+
[More Information Needed]
|
| 131 |
+
|
| 132 |
+
### Results
|
| 133 |
+
|
| 134 |
+
[More Information Needed]
|
| 135 |
+
|
| 136 |
+
#### Summary
|
| 137 |
+
|
| 138 |
+
|
| 139 |
+
|
| 140 |
+
## Model Examination [optional]
|
| 141 |
+
|
| 142 |
+
<!-- Relevant interpretability work for the model goes here -->
|
| 143 |
+
|
| 144 |
+
[More Information Needed]
|
| 145 |
+
|
| 146 |
+
## Environmental Impact
|
| 147 |
+
|
| 148 |
+
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
| 149 |
+
|
| 150 |
+
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
| 151 |
+
|
| 152 |
+
- **Hardware Type:** [More Information Needed]
|
| 153 |
+
- **Hours used:** [More Information Needed]
|
| 154 |
+
- **Cloud Provider:** [More Information Needed]
|
| 155 |
+
- **Compute Region:** [More Information Needed]
|
| 156 |
+
- **Carbon Emitted:** [More Information Needed]
|
| 157 |
+
|
| 158 |
+
## Technical Specifications [optional]
|
| 159 |
+
|
| 160 |
+
### Model Architecture and Objective
|
| 161 |
+
|
| 162 |
+
[More Information Needed]
|
| 163 |
+
|
| 164 |
+
### Compute Infrastructure
|
| 165 |
+
|
| 166 |
+
[More Information Needed]
|
| 167 |
+
|
| 168 |
+
#### Hardware
|
| 169 |
+
|
| 170 |
+
[More Information Needed]
|
| 171 |
+
|
| 172 |
+
#### Software
|
| 173 |
+
|
| 174 |
+
[More Information Needed]
|
| 175 |
+
|
| 176 |
+
## Citation [optional]
|
| 177 |
+
|
| 178 |
+
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
| 179 |
+
|
| 180 |
+
**BibTeX:**
|
| 181 |
+
|
| 182 |
+
[More Information Needed]
|
| 183 |
+
|
| 184 |
+
**APA:**
|
| 185 |
+
|
| 186 |
+
[More Information Needed]
|
| 187 |
+
|
| 188 |
+
## Glossary [optional]
|
| 189 |
+
|
| 190 |
+
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
| 191 |
+
|
| 192 |
+
[More Information Needed]
|
| 193 |
+
|
| 194 |
+
## More Information [optional]
|
| 195 |
+
|
| 196 |
+
[More Information Needed]
|
| 197 |
+
|
| 198 |
+
## Model Card Authors [optional]
|
| 199 |
+
|
| 200 |
+
[More Information Needed]
|
| 201 |
+
|
| 202 |
+
## Model Card Contact
|
| 203 |
+
|
| 204 |
+
[More Information Needed]
|
| 205 |
+
### Framework versions
|
| 206 |
+
|
| 207 |
+
- PEFT 0.18.0
|
checkpoint-2865/adapter_config.json
ADDED
|
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"alora_invocation_tokens": null,
|
| 3 |
+
"alpha_pattern": {},
|
| 4 |
+
"arrow_config": null,
|
| 5 |
+
"auto_mapping": null,
|
| 6 |
+
"base_model_name_or_path": "Qwen/Qwen3.5-2B",
|
| 7 |
+
"bias": "none",
|
| 8 |
+
"corda_config": null,
|
| 9 |
+
"ensure_weight_tying": false,
|
| 10 |
+
"eva_config": null,
|
| 11 |
+
"exclude_modules": null,
|
| 12 |
+
"fan_in_fan_out": false,
|
| 13 |
+
"inference_mode": true,
|
| 14 |
+
"init_lora_weights": true,
|
| 15 |
+
"layer_replication": null,
|
| 16 |
+
"layers_pattern": null,
|
| 17 |
+
"layers_to_transform": null,
|
| 18 |
+
"loftq_config": {},
|
| 19 |
+
"lora_alpha": 64,
|
| 20 |
+
"lora_bias": false,
|
| 21 |
+
"lora_dropout": 0.05,
|
| 22 |
+
"megatron_config": null,
|
| 23 |
+
"megatron_core": "megatron.core",
|
| 24 |
+
"modules_to_save": null,
|
| 25 |
+
"peft_type": "LORA",
|
| 26 |
+
"peft_version": "0.18.0",
|
| 27 |
+
"qalora_group_size": 16,
|
| 28 |
+
"r": 32,
|
| 29 |
+
"rank_pattern": {},
|
| 30 |
+
"revision": null,
|
| 31 |
+
"target_modules": [
|
| 32 |
+
"v_proj",
|
| 33 |
+
"k_proj",
|
| 34 |
+
"gate_proj",
|
| 35 |
+
"o_proj",
|
| 36 |
+
"down_proj",
|
| 37 |
+
"up_proj",
|
| 38 |
+
"q_proj"
|
| 39 |
+
],
|
| 40 |
+
"target_parameters": null,
|
| 41 |
+
"task_type": "CAUSAL_LM",
|
| 42 |
+
"trainable_token_indices": null,
|
| 43 |
+
"use_dora": false,
|
| 44 |
+
"use_qalora": false,
|
| 45 |
+
"use_rslora": false
|
| 46 |
+
}
|
checkpoint-2865/adapter_model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e54460326b97c66aced3c8ec3a50427b59111b42282d8638b4bbbe132d510518
|
| 3 |
+
size 87319256
|
checkpoint-2865/chat_template.jinja
ADDED
|
@@ -0,0 +1,154 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{%- set image_count = namespace(value=0) %}
|
| 2 |
+
{%- set video_count = namespace(value=0) %}
|
| 3 |
+
{%- macro render_content(content, do_vision_count, is_system_content=false) %}
|
| 4 |
+
{%- if content is string %}
|
| 5 |
+
{{- content }}
|
| 6 |
+
{%- elif content is iterable and content is not mapping %}
|
| 7 |
+
{%- for item in content %}
|
| 8 |
+
{%- if 'image' in item or 'image_url' in item or item.type == 'image' %}
|
| 9 |
+
{%- if is_system_content %}
|
| 10 |
+
{{- raise_exception('System message cannot contain images.') }}
|
| 11 |
+
{%- endif %}
|
| 12 |
+
{%- if do_vision_count %}
|
| 13 |
+
{%- set image_count.value = image_count.value + 1 %}
|
| 14 |
+
{%- endif %}
|
| 15 |
+
{%- if add_vision_id %}
|
| 16 |
+
{{- 'Picture ' ~ image_count.value ~ ': ' }}
|
| 17 |
+
{%- endif %}
|
| 18 |
+
{{- '<|vision_start|><|image_pad|><|vision_end|>' }}
|
| 19 |
+
{%- elif 'video' in item or item.type == 'video' %}
|
| 20 |
+
{%- if is_system_content %}
|
| 21 |
+
{{- raise_exception('System message cannot contain videos.') }}
|
| 22 |
+
{%- endif %}
|
| 23 |
+
{%- if do_vision_count %}
|
| 24 |
+
{%- set video_count.value = video_count.value + 1 %}
|
| 25 |
+
{%- endif %}
|
| 26 |
+
{%- if add_vision_id %}
|
| 27 |
+
{{- 'Video ' ~ video_count.value ~ ': ' }}
|
| 28 |
+
{%- endif %}
|
| 29 |
+
{{- '<|vision_start|><|video_pad|><|vision_end|>' }}
|
| 30 |
+
{%- elif 'text' in item %}
|
| 31 |
+
{{- item.text }}
|
| 32 |
+
{%- else %}
|
| 33 |
+
{{- raise_exception('Unexpected item type in content.') }}
|
| 34 |
+
{%- endif %}
|
| 35 |
+
{%- endfor %}
|
| 36 |
+
{%- elif content is none or content is undefined %}
|
| 37 |
+
{{- '' }}
|
| 38 |
+
{%- else %}
|
| 39 |
+
{{- raise_exception('Unexpected content type.') }}
|
| 40 |
+
{%- endif %}
|
| 41 |
+
{%- endmacro %}
|
| 42 |
+
{%- if not messages %}
|
| 43 |
+
{{- raise_exception('No messages provided.') }}
|
| 44 |
+
{%- endif %}
|
| 45 |
+
{%- if tools and tools is iterable and tools is not mapping %}
|
| 46 |
+
{{- '<|im_start|>system\n' }}
|
| 47 |
+
{{- "# Tools\n\nYou have access to the following functions:\n\n<tools>" }}
|
| 48 |
+
{%- for tool in tools %}
|
| 49 |
+
{{- "\n" }}
|
| 50 |
+
{{- tool | tojson }}
|
| 51 |
+
{%- endfor %}
|
| 52 |
+
{{- "\n</tools>" }}
|
| 53 |
+
{{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
|
| 54 |
+
{%- if messages[0].role == 'system' %}
|
| 55 |
+
{%- set content = render_content(messages[0].content, false, true)|trim %}
|
| 56 |
+
{%- if content %}
|
| 57 |
+
{{- '\n\n' + content }}
|
| 58 |
+
{%- endif %}
|
| 59 |
+
{%- endif %}
|
| 60 |
+
{{- '<|im_end|>\n' }}
|
| 61 |
+
{%- else %}
|
| 62 |
+
{%- if messages[0].role == 'system' %}
|
| 63 |
+
{%- set content = render_content(messages[0].content, false, true)|trim %}
|
| 64 |
+
{{- '<|im_start|>system\n' + content + '<|im_end|>\n' }}
|
| 65 |
+
{%- endif %}
|
| 66 |
+
{%- endif %}
|
| 67 |
+
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
|
| 68 |
+
{%- for message in messages[::-1] %}
|
| 69 |
+
{%- set index = (messages|length - 1) - loop.index0 %}
|
| 70 |
+
{%- if ns.multi_step_tool and message.role == "user" %}
|
| 71 |
+
{%- set content = render_content(message.content, false)|trim %}
|
| 72 |
+
{%- if not(content.startswith('<tool_response>') and content.endswith('</tool_response>')) %}
|
| 73 |
+
{%- set ns.multi_step_tool = false %}
|
| 74 |
+
{%- set ns.last_query_index = index %}
|
| 75 |
+
{%- endif %}
|
| 76 |
+
{%- endif %}
|
| 77 |
+
{%- endfor %}
|
| 78 |
+
{%- if ns.multi_step_tool %}
|
| 79 |
+
{{- raise_exception('No user query found in messages.') }}
|
| 80 |
+
{%- endif %}
|
| 81 |
+
{%- for message in messages %}
|
| 82 |
+
{%- set content = render_content(message.content, true)|trim %}
|
| 83 |
+
{%- if message.role == "system" %}
|
| 84 |
+
{%- if not loop.first %}
|
| 85 |
+
{{- raise_exception('System message must be at the beginning.') }}
|
| 86 |
+
{%- endif %}
|
| 87 |
+
{%- elif message.role == "user" %}
|
| 88 |
+
{{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
|
| 89 |
+
{%- elif message.role == "assistant" %}
|
| 90 |
+
{%- set reasoning_content = '' %}
|
| 91 |
+
{%- if message.reasoning_content is string %}
|
| 92 |
+
{%- set reasoning_content = message.reasoning_content %}
|
| 93 |
+
{%- else %}
|
| 94 |
+
{%- if '</think>' in content %}
|
| 95 |
+
{%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
|
| 96 |
+
{%- set content = content.split('</think>')[-1].lstrip('\n') %}
|
| 97 |
+
{%- endif %}
|
| 98 |
+
{%- endif %}
|
| 99 |
+
{%- set reasoning_content = reasoning_content|trim %}
|
| 100 |
+
{%- if loop.index0 > ns.last_query_index %}
|
| 101 |
+
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content + '\n</think>\n\n' + content }}
|
| 102 |
+
{%- else %}
|
| 103 |
+
{{- '<|im_start|>' + message.role + '\n' + content }}
|
| 104 |
+
{%- endif %}
|
| 105 |
+
{%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}
|
| 106 |
+
{%- for tool_call in message.tool_calls %}
|
| 107 |
+
{%- if tool_call.function is defined %}
|
| 108 |
+
{%- set tool_call = tool_call.function %}
|
| 109 |
+
{%- endif %}
|
| 110 |
+
{%- if loop.first %}
|
| 111 |
+
{%- if content|trim %}
|
| 112 |
+
{{- '\n\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
|
| 113 |
+
{%- else %}
|
| 114 |
+
{{- '<tool_call>\n<function=' + tool_call.name + '>\n' }}
|
| 115 |
+
{%- endif %}
|
| 116 |
+
{%- else %}
|
| 117 |
+
{{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
|
| 118 |
+
{%- endif %}
|
| 119 |
+
{%- if tool_call.arguments is defined %}
|
| 120 |
+
{%- for args_name, args_value in tool_call.arguments|items %}
|
| 121 |
+
{{- '<parameter=' + args_name + '>\n' }}
|
| 122 |
+
{%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}
|
| 123 |
+
{{- args_value }}
|
| 124 |
+
{{- '\n</parameter>\n' }}
|
| 125 |
+
{%- endfor %}
|
| 126 |
+
{%- endif %}
|
| 127 |
+
{{- '</function>\n</tool_call>' }}
|
| 128 |
+
{%- endfor %}
|
| 129 |
+
{%- endif %}
|
| 130 |
+
{{- '<|im_end|>\n' }}
|
| 131 |
+
{%- elif message.role == "tool" %}
|
| 132 |
+
{%- if loop.previtem and loop.previtem.role != "tool" %}
|
| 133 |
+
{{- '<|im_start|>user' }}
|
| 134 |
+
{%- endif %}
|
| 135 |
+
{{- '\n<tool_response>\n' }}
|
| 136 |
+
{{- content }}
|
| 137 |
+
{{- '\n</tool_response>' }}
|
| 138 |
+
{%- if not loop.last and loop.nextitem.role != "tool" %}
|
| 139 |
+
{{- '<|im_end|>\n' }}
|
| 140 |
+
{%- elif loop.last %}
|
| 141 |
+
{{- '<|im_end|>\n' }}
|
| 142 |
+
{%- endif %}
|
| 143 |
+
{%- else %}
|
| 144 |
+
{{- raise_exception('Unexpected message role.') }}
|
| 145 |
+
{%- endif %}
|
| 146 |
+
{%- endfor %}
|
| 147 |
+
{%- if add_generation_prompt %}
|
| 148 |
+
{{- '<|im_start|>assistant\n' }}
|
| 149 |
+
{%- if enable_thinking is defined and enable_thinking is true %}
|
| 150 |
+
{{- '<think>\n' }}
|
| 151 |
+
{%- else %}
|
| 152 |
+
{{- '<think>\n\n</think>\n\n' }}
|
| 153 |
+
{%- endif %}
|
| 154 |
+
{%- endif %}
|
checkpoint-2865/optimizer.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a4e94fd092ed2523d6e2e9f17a72149ce8dc0997b192119b210e5713146e635f
|
| 3 |
+
size 174750283
|
checkpoint-2865/rng_state.pth
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:cad28a71806e0eabf48ed08b2dec44bd87e88427c15cd75dc56fa5f7a84126dd
|
| 3 |
+
size 14645
|
checkpoint-2865/scheduler.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8d5623d9d5ab3d6dfaf03bedc8ed63928ffd6ae34c2b75efbaf2c90b81268293
|
| 3 |
+
size 1465
|
checkpoint-2865/tokenizer.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:87a7830d63fcf43bf241c3c5242e96e62dd3fdc29224ca26fed8ea333db72de4
|
| 3 |
+
size 19989343
|
checkpoint-2865/tokenizer_config.json
ADDED
|
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"add_prefix_space": false,
|
| 3 |
+
"audio_bos_token": "<|audio_start|>",
|
| 4 |
+
"audio_eos_token": "<|audio_end|>",
|
| 5 |
+
"audio_token": "<|audio_pad|>",
|
| 6 |
+
"backend": "tokenizers",
|
| 7 |
+
"bos_token": null,
|
| 8 |
+
"clean_up_tokenization_spaces": false,
|
| 9 |
+
"eos_token": "<|im_end|>",
|
| 10 |
+
"errors": "replace",
|
| 11 |
+
"image_token": "<|image_pad|>",
|
| 12 |
+
"is_local": true,
|
| 13 |
+
"model_max_length": 262144,
|
| 14 |
+
"model_specific_special_tokens": {
|
| 15 |
+
"audio_bos_token": "<|audio_start|>",
|
| 16 |
+
"audio_eos_token": "<|audio_end|>",
|
| 17 |
+
"audio_token": "<|audio_pad|>",
|
| 18 |
+
"image_token": "<|image_pad|>",
|
| 19 |
+
"video_token": "<|video_pad|>",
|
| 20 |
+
"vision_bos_token": "<|vision_start|>",
|
| 21 |
+
"vision_eos_token": "<|vision_end|>"
|
| 22 |
+
},
|
| 23 |
+
"pad_token": "<|endoftext|>",
|
| 24 |
+
"pretokenize_regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?[\\p{L}\\p{M}]+|\\p{N}| ?[^\\s\\p{L}\\p{M}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
|
| 25 |
+
"split_special_tokens": false,
|
| 26 |
+
"tokenizer_class": "TokenizersBackend",
|
| 27 |
+
"unk_token": null,
|
| 28 |
+
"video_token": "<|video_pad|>",
|
| 29 |
+
"vision_bos_token": "<|vision_start|>",
|
| 30 |
+
"vision_eos_token": "<|vision_end|>"
|
| 31 |
+
}
|
checkpoint-2865/trainer_state.json
ADDED
|
@@ -0,0 +1,2124 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"best_global_step": null,
|
| 3 |
+
"best_metric": null,
|
| 4 |
+
"best_model_checkpoint": null,
|
| 5 |
+
"epoch": 3.0,
|
| 6 |
+
"eval_steps": 250,
|
| 7 |
+
"global_step": 2865,
|
| 8 |
+
"is_hyper_param_search": false,
|
| 9 |
+
"is_local_process_zero": true,
|
| 10 |
+
"is_world_process_zero": true,
|
| 11 |
+
"log_history": [
|
| 12 |
+
{
|
| 13 |
+
"epoch": 0.010479434110558029,
|
| 14 |
+
"grad_norm": 0.19915591180324554,
|
| 15 |
+
"learning_rate": 1.0465116279069768e-05,
|
| 16 |
+
"loss": 1.1350045204162598,
|
| 17 |
+
"step": 10
|
| 18 |
+
},
|
| 19 |
+
{
|
| 20 |
+
"epoch": 0.020958868221116058,
|
| 21 |
+
"grad_norm": 0.18158815801143646,
|
| 22 |
+
"learning_rate": 2.2093023255813955e-05,
|
| 23 |
+
"loss": 1.0580164909362793,
|
| 24 |
+
"step": 20
|
| 25 |
+
},
|
| 26 |
+
{
|
| 27 |
+
"epoch": 0.03143830233167409,
|
| 28 |
+
"grad_norm": 0.16481591761112213,
|
| 29 |
+
"learning_rate": 3.372093023255814e-05,
|
| 30 |
+
"loss": 0.9252842903137207,
|
| 31 |
+
"step": 30
|
| 32 |
+
},
|
| 33 |
+
{
|
| 34 |
+
"epoch": 0.041917736442232116,
|
| 35 |
+
"grad_norm": 0.15599584579467773,
|
| 36 |
+
"learning_rate": 4.5348837209302326e-05,
|
| 37 |
+
"loss": 0.8342072486877441,
|
| 38 |
+
"step": 40
|
| 39 |
+
},
|
| 40 |
+
{
|
| 41 |
+
"epoch": 0.05239717055279015,
|
| 42 |
+
"grad_norm": 0.1804327368736267,
|
| 43 |
+
"learning_rate": 5.697674418604652e-05,
|
| 44 |
+
"loss": 0.7955524921417236,
|
| 45 |
+
"step": 50
|
| 46 |
+
},
|
| 47 |
+
{
|
| 48 |
+
"epoch": 0.06287660466334818,
|
| 49 |
+
"grad_norm": 0.16934047639369965,
|
| 50 |
+
"learning_rate": 6.86046511627907e-05,
|
| 51 |
+
"loss": 0.7358035087585449,
|
| 52 |
+
"step": 60
|
| 53 |
+
},
|
| 54 |
+
{
|
| 55 |
+
"epoch": 0.07335603877390622,
|
| 56 |
+
"grad_norm": 0.2234930843114853,
|
| 57 |
+
"learning_rate": 8.023255813953489e-05,
|
| 58 |
+
"loss": 0.6985861301422119,
|
| 59 |
+
"step": 70
|
| 60 |
+
},
|
| 61 |
+
{
|
| 62 |
+
"epoch": 0.08383547288446423,
|
| 63 |
+
"grad_norm": 0.16290400922298431,
|
| 64 |
+
"learning_rate": 9.186046511627907e-05,
|
| 65 |
+
"loss": 0.599607515335083,
|
| 66 |
+
"step": 80
|
| 67 |
+
},
|
| 68 |
+
{
|
| 69 |
+
"epoch": 0.09431490699502226,
|
| 70 |
+
"grad_norm": 0.1660464107990265,
|
| 71 |
+
"learning_rate": 9.999971245570617e-05,
|
| 72 |
+
"loss": 0.5886398315429687,
|
| 73 |
+
"step": 90
|
| 74 |
+
},
|
| 75 |
+
{
|
| 76 |
+
"epoch": 0.1047943411055803,
|
| 77 |
+
"grad_norm": 0.16978025436401367,
|
| 78 |
+
"learning_rate": 9.999460064915317e-05,
|
| 79 |
+
"loss": 0.5450529098510742,
|
| 80 |
+
"step": 100
|
| 81 |
+
},
|
| 82 |
+
{
|
| 83 |
+
"epoch": 0.11527377521613832,
|
| 84 |
+
"grad_norm": 0.21447990834712982,
|
| 85 |
+
"learning_rate": 9.998309972134645e-05,
|
| 86 |
+
"loss": 0.5072262287139893,
|
| 87 |
+
"step": 110
|
| 88 |
+
},
|
| 89 |
+
{
|
| 90 |
+
"epoch": 0.12575320932669637,
|
| 91 |
+
"grad_norm": 0.17418669164180756,
|
| 92 |
+
"learning_rate": 9.996521114206116e-05,
|
| 93 |
+
"loss": 0.49445347785949706,
|
| 94 |
+
"step": 120
|
| 95 |
+
},
|
| 96 |
+
{
|
| 97 |
+
"epoch": 0.13623264343725439,
|
| 98 |
+
"grad_norm": 0.22226351499557495,
|
| 99 |
+
"learning_rate": 9.994093719739023e-05,
|
| 100 |
+
"loss": 0.47142682075500486,
|
| 101 |
+
"step": 130
|
| 102 |
+
},
|
| 103 |
+
{
|
| 104 |
+
"epoch": 0.14671207754781243,
|
| 105 |
+
"grad_norm": 0.1745530068874359,
|
| 106 |
+
"learning_rate": 9.991028098945215e-05,
|
| 107 |
+
"loss": 0.46663532257080076,
|
| 108 |
+
"step": 140
|
| 109 |
+
},
|
| 110 |
+
{
|
| 111 |
+
"epoch": 0.15719151165837045,
|
| 112 |
+
"grad_norm": 0.17074695229530334,
|
| 113 |
+
"learning_rate": 9.987324643599459e-05,
|
| 114 |
+
"loss": 0.4508847236633301,
|
| 115 |
+
"step": 150
|
| 116 |
+
},
|
| 117 |
+
{
|
| 118 |
+
"epoch": 0.16767094576892846,
|
| 119 |
+
"grad_norm": 0.13428406417369843,
|
| 120 |
+
"learning_rate": 9.982983826989367e-05,
|
| 121 |
+
"loss": 0.40740265846252444,
|
| 122 |
+
"step": 160
|
| 123 |
+
},
|
| 124 |
+
{
|
| 125 |
+
"epoch": 0.1781503798794865,
|
| 126 |
+
"grad_norm": 0.17766578495502472,
|
| 127 |
+
"learning_rate": 9.978006203854918e-05,
|
| 128 |
+
"loss": 0.3998516321182251,
|
| 129 |
+
"step": 170
|
| 130 |
+
},
|
| 131 |
+
{
|
| 132 |
+
"epoch": 0.18862981399004453,
|
| 133 |
+
"grad_norm": 0.1672629565000534,
|
| 134 |
+
"learning_rate": 9.972392410317562e-05,
|
| 135 |
+
"loss": 0.41658673286437986,
|
| 136 |
+
"step": 180
|
| 137 |
+
},
|
| 138 |
+
{
|
| 139 |
+
"epoch": 0.19910924810060257,
|
| 140 |
+
"grad_norm": 0.1333673745393753,
|
| 141 |
+
"learning_rate": 9.96614316379892e-05,
|
| 142 |
+
"loss": 0.37024455070495604,
|
| 143 |
+
"step": 190
|
| 144 |
+
},
|
| 145 |
+
{
|
| 146 |
+
"epoch": 0.2095886822111606,
|
| 147 |
+
"grad_norm": 0.18037110567092896,
|
| 148 |
+
"learning_rate": 9.959259262929113e-05,
|
| 149 |
+
"loss": 0.35086841583251954,
|
| 150 |
+
"step": 200
|
| 151 |
+
},
|
| 152 |
+
{
|
| 153 |
+
"epoch": 0.22006811632171863,
|
| 154 |
+
"grad_norm": 0.14616410434246063,
|
| 155 |
+
"learning_rate": 9.951741587444683e-05,
|
| 156 |
+
"loss": 0.37918968200683595,
|
| 157 |
+
"step": 210
|
| 158 |
+
},
|
| 159 |
+
{
|
| 160 |
+
"epoch": 0.23054755043227665,
|
| 161 |
+
"grad_norm": 0.14523574709892273,
|
| 162 |
+
"learning_rate": 9.943591098076184e-05,
|
| 163 |
+
"loss": 0.32804527282714846,
|
| 164 |
+
"step": 220
|
| 165 |
+
},
|
| 166 |
+
{
|
| 167 |
+
"epoch": 0.2410269845428347,
|
| 168 |
+
"grad_norm": 0.14667049050331116,
|
| 169 |
+
"learning_rate": 9.934808836425393e-05,
|
| 170 |
+
"loss": 0.3480507850646973,
|
| 171 |
+
"step": 230
|
| 172 |
+
},
|
| 173 |
+
{
|
| 174 |
+
"epoch": 0.25150641865339274,
|
| 175 |
+
"grad_norm": 0.18156558275222778,
|
| 176 |
+
"learning_rate": 9.925395924832198e-05,
|
| 177 |
+
"loss": 0.3300448179244995,
|
| 178 |
+
"step": 240
|
| 179 |
+
},
|
| 180 |
+
{
|
| 181 |
+
"epoch": 0.26198585276395076,
|
| 182 |
+
"grad_norm": 0.13806430995464325,
|
| 183 |
+
"learning_rate": 9.91535356623117e-05,
|
| 184 |
+
"loss": 0.3127591609954834,
|
| 185 |
+
"step": 250
|
| 186 |
+
},
|
| 187 |
+
{
|
| 188 |
+
"epoch": 0.26198585276395076,
|
| 189 |
+
"eval_loss": 0.3132782578468323,
|
| 190 |
+
"eval_runtime": 94.8848,
|
| 191 |
+
"eval_samples_per_second": 3.278,
|
| 192 |
+
"eval_steps_per_second": 3.278,
|
| 193 |
+
"step": 250
|
| 194 |
+
},
|
| 195 |
+
{
|
| 196 |
+
"epoch": 0.27246528687450877,
|
| 197 |
+
"grad_norm": 0.17205959558486938,
|
| 198 |
+
"learning_rate": 9.904683043997835e-05,
|
| 199 |
+
"loss": 0.3306673288345337,
|
| 200 |
+
"step": 260
|
| 201 |
+
},
|
| 202 |
+
{
|
| 203 |
+
"epoch": 0.2829447209850668,
|
| 204 |
+
"grad_norm": 0.12620031833648682,
|
| 205 |
+
"learning_rate": 9.893385721784656e-05,
|
| 206 |
+
"loss": 0.3011106729507446,
|
| 207 |
+
"step": 270
|
| 208 |
+
},
|
| 209 |
+
{
|
| 210 |
+
"epoch": 0.29342415509562486,
|
| 211 |
+
"grad_norm": 0.11466006934642792,
|
| 212 |
+
"learning_rate": 9.881463043346768e-05,
|
| 213 |
+
"loss": 0.2951968669891357,
|
| 214 |
+
"step": 280
|
| 215 |
+
},
|
| 216 |
+
{
|
| 217 |
+
"epoch": 0.3039035892061829,
|
| 218 |
+
"grad_norm": 0.1671207845211029,
|
| 219 |
+
"learning_rate": 9.868916532357475e-05,
|
| 220 |
+
"loss": 0.2910990953445435,
|
| 221 |
+
"step": 290
|
| 222 |
+
},
|
| 223 |
+
{
|
| 224 |
+
"epoch": 0.3143830233167409,
|
| 225 |
+
"grad_norm": 0.1683349907398224,
|
| 226 |
+
"learning_rate": 9.855747792213521e-05,
|
| 227 |
+
"loss": 0.31409192085266113,
|
| 228 |
+
"step": 300
|
| 229 |
+
},
|
| 230 |
+
{
|
| 231 |
+
"epoch": 0.3248624574272989,
|
| 232 |
+
"grad_norm": 0.12934699654579163,
|
| 233 |
+
"learning_rate": 9.84195850583019e-05,
|
| 234 |
+
"loss": 0.27755858898162844,
|
| 235 |
+
"step": 310
|
| 236 |
+
},
|
| 237 |
+
{
|
| 238 |
+
"epoch": 0.33534189153785693,
|
| 239 |
+
"grad_norm": 0.13784605264663696,
|
| 240 |
+
"learning_rate": 9.827550435426234e-05,
|
| 241 |
+
"loss": 0.2809821605682373,
|
| 242 |
+
"step": 320
|
| 243 |
+
},
|
| 244 |
+
{
|
| 245 |
+
"epoch": 0.345821325648415,
|
| 246 |
+
"grad_norm": 0.18590271472930908,
|
| 247 |
+
"learning_rate": 9.812525422298664e-05,
|
| 248 |
+
"loss": 0.28698866367340087,
|
| 249 |
+
"step": 330
|
| 250 |
+
},
|
| 251 |
+
{
|
| 252 |
+
"epoch": 0.356300759758973,
|
| 253 |
+
"grad_norm": 0.1704522967338562,
|
| 254 |
+
"learning_rate": 9.796885386587447e-05,
|
| 255 |
+
"loss": 0.250814414024353,
|
| 256 |
+
"step": 340
|
| 257 |
+
},
|
| 258 |
+
{
|
| 259 |
+
"epoch": 0.36678019386953103,
|
| 260 |
+
"grad_norm": 0.1316167265176773,
|
| 261 |
+
"learning_rate": 9.780632327030112e-05,
|
| 262 |
+
"loss": 0.25458922386169436,
|
| 263 |
+
"step": 350
|
| 264 |
+
},
|
| 265 |
+
{
|
| 266 |
+
"epoch": 0.37725962798008905,
|
| 267 |
+
"grad_norm": 0.16226200759410858,
|
| 268 |
+
"learning_rate": 9.763768320706319e-05,
|
| 269 |
+
"loss": 0.26563262939453125,
|
| 270 |
+
"step": 360
|
| 271 |
+
},
|
| 272 |
+
{
|
| 273 |
+
"epoch": 0.3877390620906471,
|
| 274 |
+
"grad_norm": 0.1297195851802826,
|
| 275 |
+
"learning_rate": 9.746295522772424e-05,
|
| 276 |
+
"loss": 0.2632328748703003,
|
| 277 |
+
"step": 370
|
| 278 |
+
},
|
| 279 |
+
{
|
| 280 |
+
"epoch": 0.39821849620120514,
|
| 281 |
+
"grad_norm": 0.1286139190196991,
|
| 282 |
+
"learning_rate": 9.728216166186049e-05,
|
| 283 |
+
"loss": 0.2624588251113892,
|
| 284 |
+
"step": 380
|
| 285 |
+
},
|
| 286 |
+
{
|
| 287 |
+
"epoch": 0.40869793031176316,
|
| 288 |
+
"grad_norm": 0.1587965339422226,
|
| 289 |
+
"learning_rate": 9.709532561420725e-05,
|
| 290 |
+
"loss": 0.24741590023040771,
|
| 291 |
+
"step": 390
|
| 292 |
+
},
|
| 293 |
+
{
|
| 294 |
+
"epoch": 0.4191773644223212,
|
| 295 |
+
"grad_norm": 0.11963177472352982,
|
| 296 |
+
"learning_rate": 9.690247096170615e-05,
|
| 297 |
+
"loss": 0.22777397632598878,
|
| 298 |
+
"step": 400
|
| 299 |
+
},
|
| 300 |
+
{
|
| 301 |
+
"epoch": 0.42965679853287925,
|
| 302 |
+
"grad_norm": 0.13638927042484283,
|
| 303 |
+
"learning_rate": 9.670362235045387e-05,
|
| 304 |
+
"loss": 0.23324952125549317,
|
| 305 |
+
"step": 410
|
| 306 |
+
},
|
| 307 |
+
{
|
| 308 |
+
"epoch": 0.44013623264343726,
|
| 309 |
+
"grad_norm": 0.1514088362455368,
|
| 310 |
+
"learning_rate": 9.649880519255232e-05,
|
| 311 |
+
"loss": 0.2505915880203247,
|
| 312 |
+
"step": 420
|
| 313 |
+
},
|
| 314 |
+
{
|
| 315 |
+
"epoch": 0.4506156667539953,
|
| 316 |
+
"grad_norm": 0.10994207113981247,
|
| 317 |
+
"learning_rate": 9.62880456628612e-05,
|
| 318 |
+
"loss": 0.2078850269317627,
|
| 319 |
+
"step": 430
|
| 320 |
+
},
|
| 321 |
+
{
|
| 322 |
+
"epoch": 0.4610951008645533,
|
| 323 |
+
"grad_norm": 0.11983369290828705,
|
| 324 |
+
"learning_rate": 9.607137069565288e-05,
|
| 325 |
+
"loss": 0.21452484130859376,
|
| 326 |
+
"step": 440
|
| 327 |
+
},
|
| 328 |
+
{
|
| 329 |
+
"epoch": 0.47157453497511137,
|
| 330 |
+
"grad_norm": 0.12684305012226105,
|
| 331 |
+
"learning_rate": 9.58488079811703e-05,
|
| 332 |
+
"loss": 0.22002685070037842,
|
| 333 |
+
"step": 450
|
| 334 |
+
},
|
| 335 |
+
{
|
| 336 |
+
"epoch": 0.4820539690856694,
|
| 337 |
+
"grad_norm": 0.16841623187065125,
|
| 338 |
+
"learning_rate": 9.562038596208828e-05,
|
| 339 |
+
"loss": 0.21405396461486817,
|
| 340 |
+
"step": 460
|
| 341 |
+
},
|
| 342 |
+
{
|
| 343 |
+
"epoch": 0.4925334031962274,
|
| 344 |
+
"grad_norm": 0.1498555839061737,
|
| 345 |
+
"learning_rate": 9.538613382987865e-05,
|
| 346 |
+
"loss": 0.20534911155700683,
|
| 347 |
+
"step": 470
|
| 348 |
+
},
|
| 349 |
+
{
|
| 350 |
+
"epoch": 0.5030128373067855,
|
| 351 |
+
"grad_norm": 0.13913628458976746,
|
| 352 |
+
"learning_rate": 9.514608152107974e-05,
|
| 353 |
+
"loss": 0.22248730659484864,
|
| 354 |
+
"step": 480
|
| 355 |
+
},
|
| 356 |
+
{
|
| 357 |
+
"epoch": 0.5134922714173434,
|
| 358 |
+
"grad_norm": 0.14408951997756958,
|
| 359 |
+
"learning_rate": 9.490025971347047e-05,
|
| 360 |
+
"loss": 0.214866042137146,
|
| 361 |
+
"step": 490
|
| 362 |
+
},
|
| 363 |
+
{
|
| 364 |
+
"epoch": 0.5239717055279015,
|
| 365 |
+
"grad_norm": 0.1649770438671112,
|
| 366 |
+
"learning_rate": 9.464869982215001e-05,
|
| 367 |
+
"loss": 0.19965900182724,
|
| 368 |
+
"step": 500
|
| 369 |
+
},
|
| 370 |
+
{
|
| 371 |
+
"epoch": 0.5239717055279015,
|
| 372 |
+
"eval_loss": 0.19267401099205017,
|
| 373 |
+
"eval_runtime": 95.3374,
|
| 374 |
+
"eval_samples_per_second": 3.262,
|
| 375 |
+
"eval_steps_per_second": 3.262,
|
| 376 |
+
"step": 500
|
| 377 |
+
},
|
| 378 |
+
{
|
| 379 |
+
"epoch": 0.5344511396384595,
|
| 380 |
+
"grad_norm": 0.1305568665266037,
|
| 381 |
+
"learning_rate": 9.439143399552291e-05,
|
| 382 |
+
"loss": 0.21112546920776368,
|
| 383 |
+
"step": 510
|
| 384 |
+
},
|
| 385 |
+
{
|
| 386 |
+
"epoch": 0.5449305737490175,
|
| 387 |
+
"grad_norm": 0.11998175084590912,
|
| 388 |
+
"learning_rate": 9.412849511119074e-05,
|
| 389 |
+
"loss": 0.21422922611236572,
|
| 390 |
+
"step": 520
|
| 391 |
+
},
|
| 392 |
+
{
|
| 393 |
+
"epoch": 0.5554100078595756,
|
| 394 |
+
"grad_norm": 0.15220341086387634,
|
| 395 |
+
"learning_rate": 9.385991677175046e-05,
|
| 396 |
+
"loss": 0.20999882221221924,
|
| 397 |
+
"step": 530
|
| 398 |
+
},
|
| 399 |
+
{
|
| 400 |
+
"epoch": 0.5658894419701336,
|
| 401 |
+
"grad_norm": 0.13170023262500763,
|
| 402 |
+
"learning_rate": 9.358573330050004e-05,
|
| 403 |
+
"loss": 0.20208392143249512,
|
| 404 |
+
"step": 540
|
| 405 |
+
},
|
| 406 |
+
{
|
| 407 |
+
"epoch": 0.5763688760806917,
|
| 408 |
+
"grad_norm": 0.10457764565944672,
|
| 409 |
+
"learning_rate": 9.330597973705219e-05,
|
| 410 |
+
"loss": 0.1908803701400757,
|
| 411 |
+
"step": 550
|
| 412 |
+
},
|
| 413 |
+
{
|
| 414 |
+
"epoch": 0.5868483101912497,
|
| 415 |
+
"grad_norm": 0.12568537890911102,
|
| 416 |
+
"learning_rate": 9.302069183285637e-05,
|
| 417 |
+
"loss": 0.19316340684890748,
|
| 418 |
+
"step": 560
|
| 419 |
+
},
|
| 420 |
+
{
|
| 421 |
+
"epoch": 0.5973277443018077,
|
| 422 |
+
"grad_norm": 0.14824528992176056,
|
| 423 |
+
"learning_rate": 9.272990604662988e-05,
|
| 424 |
+
"loss": 0.18987581729888917,
|
| 425 |
+
"step": 570
|
| 426 |
+
},
|
| 427 |
+
{
|
| 428 |
+
"epoch": 0.6078071784123658,
|
| 429 |
+
"grad_norm": 0.14521734416484833,
|
| 430 |
+
"learning_rate": 9.243365953969861e-05,
|
| 431 |
+
"loss": 0.19232832193374633,
|
| 432 |
+
"step": 580
|
| 433 |
+
},
|
| 434 |
+
{
|
| 435 |
+
"epoch": 0.6182866125229237,
|
| 436 |
+
"grad_norm": 0.1335408091545105,
|
| 437 |
+
"learning_rate": 9.213199017124793e-05,
|
| 438 |
+
"loss": 0.1758212924003601,
|
| 439 |
+
"step": 590
|
| 440 |
+
},
|
| 441 |
+
{
|
| 442 |
+
"epoch": 0.6287660466334818,
|
| 443 |
+
"grad_norm": 0.11143071949481964,
|
| 444 |
+
"learning_rate": 9.182493649348447e-05,
|
| 445 |
+
"loss": 0.19117680788040162,
|
| 446 |
+
"step": 600
|
| 447 |
+
},
|
| 448 |
+
{
|
| 449 |
+
"epoch": 0.6392454807440399,
|
| 450 |
+
"grad_norm": 0.14789296686649323,
|
| 451 |
+
"learning_rate": 9.151253774670921e-05,
|
| 452 |
+
"loss": 0.184559965133667,
|
| 453 |
+
"step": 610
|
| 454 |
+
},
|
| 455 |
+
{
|
| 456 |
+
"epoch": 0.6497249148545978,
|
| 457 |
+
"grad_norm": 0.10541336238384247,
|
| 458 |
+
"learning_rate": 9.119483385430283e-05,
|
| 459 |
+
"loss": 0.1720304846763611,
|
| 460 |
+
"step": 620
|
| 461 |
+
},
|
| 462 |
+
{
|
| 463 |
+
"epoch": 0.6602043489651559,
|
| 464 |
+
"grad_norm": 0.12105975300073624,
|
| 465 |
+
"learning_rate": 9.087186541762358e-05,
|
| 466 |
+
"loss": 0.17654836177825928,
|
| 467 |
+
"step": 630
|
| 468 |
+
},
|
| 469 |
+
{
|
| 470 |
+
"epoch": 0.6706837830757139,
|
| 471 |
+
"grad_norm": 0.13114669919013977,
|
| 472 |
+
"learning_rate": 9.054367371081858e-05,
|
| 473 |
+
"loss": 0.1696592688560486,
|
| 474 |
+
"step": 640
|
| 475 |
+
},
|
| 476 |
+
{
|
| 477 |
+
"epoch": 0.6811632171862719,
|
| 478 |
+
"grad_norm": 0.13745592534542084,
|
| 479 |
+
"learning_rate": 9.021030067554919e-05,
|
| 480 |
+
"loss": 0.15404462814331055,
|
| 481 |
+
"step": 650
|
| 482 |
+
},
|
| 483 |
+
{
|
| 484 |
+
"epoch": 0.69164265129683,
|
| 485 |
+
"grad_norm": 0.15927442908287048,
|
| 486 |
+
"learning_rate": 8.987178891563094e-05,
|
| 487 |
+
"loss": 0.17024366855621337,
|
| 488 |
+
"step": 660
|
| 489 |
+
},
|
| 490 |
+
{
|
| 491 |
+
"epoch": 0.702122085407388,
|
| 492 |
+
"grad_norm": 0.13737429678440094,
|
| 493 |
+
"learning_rate": 8.952818169158903e-05,
|
| 494 |
+
"loss": 0.1602048397064209,
|
| 495 |
+
"step": 670
|
| 496 |
+
},
|
| 497 |
+
{
|
| 498 |
+
"epoch": 0.712601519517946,
|
| 499 |
+
"grad_norm": 0.13941751420497894,
|
| 500 |
+
"learning_rate": 8.91795229151297e-05,
|
| 501 |
+
"loss": 0.18057082891464232,
|
| 502 |
+
"step": 680
|
| 503 |
+
},
|
| 504 |
+
{
|
| 505 |
+
"epoch": 0.7230809536285041,
|
| 506 |
+
"grad_norm": 0.14242954552173615,
|
| 507 |
+
"learning_rate": 8.882585714352856e-05,
|
| 508 |
+
"loss": 0.14863334894180297,
|
| 509 |
+
"step": 690
|
| 510 |
+
},
|
| 511 |
+
{
|
| 512 |
+
"epoch": 0.7335603877390621,
|
| 513 |
+
"grad_norm": 0.15553542971611023,
|
| 514 |
+
"learning_rate": 8.846722957393626e-05,
|
| 515 |
+
"loss": 0.15701137781143187,
|
| 516 |
+
"step": 700
|
| 517 |
+
},
|
| 518 |
+
{
|
| 519 |
+
"epoch": 0.7440398218496201,
|
| 520 |
+
"grad_norm": 0.12901411950588226,
|
| 521 |
+
"learning_rate": 8.810368603760249e-05,
|
| 522 |
+
"loss": 0.15571318864822387,
|
| 523 |
+
"step": 710
|
| 524 |
+
},
|
| 525 |
+
{
|
| 526 |
+
"epoch": 0.7545192559601781,
|
| 527 |
+
"grad_norm": 0.13449430465698242,
|
| 528 |
+
"learning_rate": 8.773527299401902e-05,
|
| 529 |
+
"loss": 0.16418551206588744,
|
| 530 |
+
"step": 720
|
| 531 |
+
},
|
| 532 |
+
{
|
| 533 |
+
"epoch": 0.7649986900707362,
|
| 534 |
+
"grad_norm": 0.10630270838737488,
|
| 535 |
+
"learning_rate": 8.736203752498218e-05,
|
| 536 |
+
"loss": 0.16800801753997802,
|
| 537 |
+
"step": 730
|
| 538 |
+
},
|
| 539 |
+
{
|
| 540 |
+
"epoch": 0.7754781241812942,
|
| 541 |
+
"grad_norm": 0.11299935728311539,
|
| 542 |
+
"learning_rate": 8.698402732857611e-05,
|
| 543 |
+
"loss": 0.15700833797454833,
|
| 544 |
+
"step": 740
|
| 545 |
+
},
|
| 546 |
+
{
|
| 547 |
+
"epoch": 0.7859575582918522,
|
| 548 |
+
"grad_norm": 0.11920930445194244,
|
| 549 |
+
"learning_rate": 8.660129071307707e-05,
|
| 550 |
+
"loss": 0.15091001987457275,
|
| 551 |
+
"step": 750
|
| 552 |
+
},
|
| 553 |
+
{
|
| 554 |
+
"epoch": 0.7859575582918522,
|
| 555 |
+
"eval_loss": 0.1356429010629654,
|
| 556 |
+
"eval_runtime": 94.0557,
|
| 557 |
+
"eval_samples_per_second": 3.307,
|
| 558 |
+
"eval_steps_per_second": 3.307,
|
| 559 |
+
"step": 750
|
| 560 |
+
},
|
| 561 |
+
{
|
| 562 |
+
"epoch": 0.7964369924024103,
|
| 563 |
+
"grad_norm": 0.13870343565940857,
|
| 564 |
+
"learning_rate": 8.621387659077986e-05,
|
| 565 |
+
"loss": 0.1422027826309204,
|
| 566 |
+
"step": 760
|
| 567 |
+
},
|
| 568 |
+
{
|
| 569 |
+
"epoch": 0.8069164265129684,
|
| 570 |
+
"grad_norm": 0.12753477692604065,
|
| 571 |
+
"learning_rate": 8.582183447174697e-05,
|
| 572 |
+
"loss": 0.142450213432312,
|
| 573 |
+
"step": 770
|
| 574 |
+
},
|
| 575 |
+
{
|
| 576 |
+
"epoch": 0.8173958606235263,
|
| 577 |
+
"grad_norm": 0.11877496540546417,
|
| 578 |
+
"learning_rate": 8.542521445748141e-05,
|
| 579 |
+
"loss": 0.15361062288284302,
|
| 580 |
+
"step": 780
|
| 581 |
+
},
|
| 582 |
+
{
|
| 583 |
+
"epoch": 0.8278752947340844,
|
| 584 |
+
"grad_norm": 0.1200249195098877,
|
| 585 |
+
"learning_rate": 8.502406723452392e-05,
|
| 586 |
+
"loss": 0.14647477865219116,
|
| 587 |
+
"step": 790
|
| 588 |
+
},
|
| 589 |
+
{
|
| 590 |
+
"epoch": 0.8383547288446423,
|
| 591 |
+
"grad_norm": 0.12913794815540314,
|
| 592 |
+
"learning_rate": 8.461844406797543e-05,
|
| 593 |
+
"loss": 0.1591552734375,
|
| 594 |
+
"step": 800
|
| 595 |
+
},
|
| 596 |
+
{
|
| 597 |
+
"epoch": 0.8488341629552004,
|
| 598 |
+
"grad_norm": 0.17270176112651825,
|
| 599 |
+
"learning_rate": 8.420839679494558e-05,
|
| 600 |
+
"loss": 0.1495436668395996,
|
| 601 |
+
"step": 810
|
| 602 |
+
},
|
| 603 |
+
{
|
| 604 |
+
"epoch": 0.8593135970657585,
|
| 605 |
+
"grad_norm": 0.15545596182346344,
|
| 606 |
+
"learning_rate": 8.379397781792808e-05,
|
| 607 |
+
"loss": 0.15377395153045653,
|
| 608 |
+
"step": 820
|
| 609 |
+
},
|
| 610 |
+
{
|
| 611 |
+
"epoch": 0.8697930311763165,
|
| 612 |
+
"grad_norm": 0.12941111624240875,
|
| 613 |
+
"learning_rate": 8.337524009810395e-05,
|
| 614 |
+
"loss": 0.14733861684799193,
|
| 615 |
+
"step": 830
|
| 616 |
+
},
|
| 617 |
+
{
|
| 618 |
+
"epoch": 0.8802724652868745,
|
| 619 |
+
"grad_norm": 0.13152749836444855,
|
| 620 |
+
"learning_rate": 8.295223714857319e-05,
|
| 621 |
+
"loss": 0.13980752229690552,
|
| 622 |
+
"step": 840
|
| 623 |
+
},
|
| 624 |
+
{
|
| 625 |
+
"epoch": 0.8907518993974325,
|
| 626 |
+
"grad_norm": 0.11208872497081757,
|
| 627 |
+
"learning_rate": 8.252502302751612e-05,
|
| 628 |
+
"loss": 0.12019969224929809,
|
| 629 |
+
"step": 850
|
| 630 |
+
},
|
| 631 |
+
{
|
| 632 |
+
"epoch": 0.9012313335079906,
|
| 633 |
+
"grad_norm": 0.11118603497743607,
|
| 634 |
+
"learning_rate": 8.209365233128482e-05,
|
| 635 |
+
"loss": 0.13822466135025024,
|
| 636 |
+
"step": 860
|
| 637 |
+
},
|
| 638 |
+
{
|
| 639 |
+
"epoch": 0.9117107676185486,
|
| 640 |
+
"grad_norm": 0.11705653369426727,
|
| 641 |
+
"learning_rate": 8.165818018742605e-05,
|
| 642 |
+
"loss": 0.1439664840698242,
|
| 643 |
+
"step": 870
|
| 644 |
+
},
|
| 645 |
+
{
|
| 646 |
+
"epoch": 0.9221902017291066,
|
| 647 |
+
"grad_norm": 0.08817730098962784,
|
| 648 |
+
"learning_rate": 8.121866224763606e-05,
|
| 649 |
+
"loss": 0.13380355834960939,
|
| 650 |
+
"step": 880
|
| 651 |
+
},
|
| 652 |
+
{
|
| 653 |
+
"epoch": 0.9326696358396647,
|
| 654 |
+
"grad_norm": 0.1092257872223854,
|
| 655 |
+
"learning_rate": 8.077515468064851e-05,
|
| 656 |
+
"loss": 0.12982802391052245,
|
| 657 |
+
"step": 890
|
| 658 |
+
},
|
| 659 |
+
{
|
| 660 |
+
"epoch": 0.9431490699502227,
|
| 661 |
+
"grad_norm": 0.12680962681770325,
|
| 662 |
+
"learning_rate": 8.032771416505647e-05,
|
| 663 |
+
"loss": 0.1489071011543274,
|
| 664 |
+
"step": 900
|
| 665 |
+
},
|
| 666 |
+
{
|
| 667 |
+
"epoch": 0.9536285040607807,
|
| 668 |
+
"grad_norm": 0.11953219771385193,
|
| 669 |
+
"learning_rate": 7.987639788206888e-05,
|
| 670 |
+
"loss": 0.14020267724990845,
|
| 671 |
+
"step": 910
|
| 672 |
+
},
|
| 673 |
+
{
|
| 674 |
+
"epoch": 0.9641079381713388,
|
| 675 |
+
"grad_norm": 0.1041467934846878,
|
| 676 |
+
"learning_rate": 7.942126350820318e-05,
|
| 677 |
+
"loss": 0.1439213275909424,
|
| 678 |
+
"step": 920
|
| 679 |
+
},
|
| 680 |
+
{
|
| 681 |
+
"epoch": 0.9745873722818967,
|
| 682 |
+
"grad_norm": 0.1277916431427002,
|
| 683 |
+
"learning_rate": 7.896236920791442e-05,
|
| 684 |
+
"loss": 0.1468779683113098,
|
| 685 |
+
"step": 930
|
| 686 |
+
},
|
| 687 |
+
{
|
| 688 |
+
"epoch": 0.9850668063924548,
|
| 689 |
+
"grad_norm": 0.11245205253362656,
|
| 690 |
+
"learning_rate": 7.849977362616201e-05,
|
| 691 |
+
"loss": 0.12012372016906739,
|
| 692 |
+
"step": 940
|
| 693 |
+
},
|
| 694 |
+
{
|
| 695 |
+
"epoch": 0.9955462405030129,
|
| 696 |
+
"grad_norm": 0.12230483442544937,
|
| 697 |
+
"learning_rate": 7.803353588091522e-05,
|
| 698 |
+
"loss": 0.1488939881324768,
|
| 699 |
+
"step": 950
|
| 700 |
+
},
|
| 701 |
+
{
|
| 702 |
+
"epoch": 1.005239717055279,
|
| 703 |
+
"grad_norm": 0.14185865223407745,
|
| 704 |
+
"learning_rate": 7.7563715555598e-05,
|
| 705 |
+
"loss": 0.11488113403320313,
|
| 706 |
+
"step": 960
|
| 707 |
+
},
|
| 708 |
+
{
|
| 709 |
+
"epoch": 1.015719151165837,
|
| 710 |
+
"grad_norm": 0.10545773804187775,
|
| 711 |
+
"learning_rate": 7.709037269147459e-05,
|
| 712 |
+
"loss": 0.10712549686431885,
|
| 713 |
+
"step": 970
|
| 714 |
+
},
|
| 715 |
+
{
|
| 716 |
+
"epoch": 1.026198585276395,
|
| 717 |
+
"grad_norm": 0.10376274585723877,
|
| 718 |
+
"learning_rate": 7.661356777997631e-05,
|
| 719 |
+
"loss": 0.11428828239440918,
|
| 720 |
+
"step": 980
|
| 721 |
+
},
|
| 722 |
+
{
|
| 723 |
+
"epoch": 1.0366780193869531,
|
| 724 |
+
"grad_norm": 0.09950564056634903,
|
| 725 |
+
"learning_rate": 7.613336175497111e-05,
|
| 726 |
+
"loss": 0.09823058247566223,
|
| 727 |
+
"step": 990
|
| 728 |
+
},
|
| 729 |
+
{
|
| 730 |
+
"epoch": 1.0471574534975112,
|
| 731 |
+
"grad_norm": 0.10412753373384476,
|
| 732 |
+
"learning_rate": 7.564981598497643e-05,
|
| 733 |
+
"loss": 0.1106558084487915,
|
| 734 |
+
"step": 1000
|
| 735 |
+
},
|
| 736 |
+
{
|
| 737 |
+
"epoch": 1.0471574534975112,
|
| 738 |
+
"eval_loss": 0.11185819655656815,
|
| 739 |
+
"eval_runtime": 93.808,
|
| 740 |
+
"eval_samples_per_second": 3.315,
|
| 741 |
+
"eval_steps_per_second": 3.315,
|
| 742 |
+
"step": 1000
|
| 743 |
+
},
|
| 744 |
+
{
|
| 745 |
+
"epoch": 1.057636887608069,
|
| 746 |
+
"grad_norm": 0.10430868715047836,
|
| 747 |
+
"learning_rate": 7.516299226531645e-05,
|
| 748 |
+
"loss": 0.11168640851974487,
|
| 749 |
+
"step": 1010
|
| 750 |
+
},
|
| 751 |
+
{
|
| 752 |
+
"epoch": 1.0681163217186271,
|
| 753 |
+
"grad_norm": 0.09646806865930557,
|
| 754 |
+
"learning_rate": 7.467295281022501e-05,
|
| 755 |
+
"loss": 0.10711305141448975,
|
| 756 |
+
"step": 1020
|
| 757 |
+
},
|
| 758 |
+
{
|
| 759 |
+
"epoch": 1.0785957558291852,
|
| 760 |
+
"grad_norm": 0.13060614466667175,
|
| 761 |
+
"learning_rate": 7.417976024489474e-05,
|
| 762 |
+
"loss": 0.10001810789108276,
|
| 763 |
+
"step": 1030
|
| 764 |
+
},
|
| 765 |
+
{
|
| 766 |
+
"epoch": 1.0890751899397433,
|
| 767 |
+
"grad_norm": 0.10389085114002228,
|
| 768 |
+
"learning_rate": 7.368347759747393e-05,
|
| 769 |
+
"loss": 0.11893858909606933,
|
| 770 |
+
"step": 1040
|
| 771 |
+
},
|
| 772 |
+
{
|
| 773 |
+
"epoch": 1.0995546240503014,
|
| 774 |
+
"grad_norm": 0.11291550099849701,
|
| 775 |
+
"learning_rate": 7.318416829101164e-05,
|
| 776 |
+
"loss": 0.1079628586769104,
|
| 777 |
+
"step": 1050
|
| 778 |
+
},
|
| 779 |
+
{
|
| 780 |
+
"epoch": 1.1100340581608594,
|
| 781 |
+
"grad_norm": 0.10372598469257355,
|
| 782 |
+
"learning_rate": 7.268189613535255e-05,
|
| 783 |
+
"loss": 0.10332397222518921,
|
| 784 |
+
"step": 1060
|
| 785 |
+
},
|
| 786 |
+
{
|
| 787 |
+
"epoch": 1.1205134922714173,
|
| 788 |
+
"grad_norm": 0.12971536815166473,
|
| 789 |
+
"learning_rate": 7.217672531898225e-05,
|
| 790 |
+
"loss": 0.10804877281188965,
|
| 791 |
+
"step": 1070
|
| 792 |
+
},
|
| 793 |
+
{
|
| 794 |
+
"epoch": 1.1309929263819753,
|
| 795 |
+
"grad_norm": 0.10902425646781921,
|
| 796 |
+
"learning_rate": 7.166872040082431e-05,
|
| 797 |
+
"loss": 0.09947454929351807,
|
| 798 |
+
"step": 1080
|
| 799 |
+
},
|
| 800 |
+
{
|
| 801 |
+
"epoch": 1.1414723604925334,
|
| 802 |
+
"grad_norm": 0.09305932372808456,
|
| 803 |
+
"learning_rate": 7.11579463019897e-05,
|
| 804 |
+
"loss": 0.09406971335411071,
|
| 805 |
+
"step": 1090
|
| 806 |
+
},
|
| 807 |
+
{
|
| 808 |
+
"epoch": 1.1519517946030915,
|
| 809 |
+
"grad_norm": 0.11485275626182556,
|
| 810 |
+
"learning_rate": 7.064446829748034e-05,
|
| 811 |
+
"loss": 0.09943979978561401,
|
| 812 |
+
"step": 1100
|
| 813 |
+
},
|
| 814 |
+
{
|
| 815 |
+
"epoch": 1.1624312287136496,
|
| 816 |
+
"grad_norm": 0.09556467831134796,
|
| 817 |
+
"learning_rate": 7.0128352007847e-05,
|
| 818 |
+
"loss": 0.10862170457839966,
|
| 819 |
+
"step": 1110
|
| 820 |
+
},
|
| 821 |
+
{
|
| 822 |
+
"epoch": 1.1729106628242074,
|
| 823 |
+
"grad_norm": 0.11937833577394485,
|
| 824 |
+
"learning_rate": 6.96096633908034e-05,
|
| 825 |
+
"loss": 0.10385221242904663,
|
| 826 |
+
"step": 1120
|
| 827 |
+
},
|
| 828 |
+
{
|
| 829 |
+
"epoch": 1.1833900969347655,
|
| 830 |
+
"grad_norm": 0.11560507863759995,
|
| 831 |
+
"learning_rate": 6.908846873279691e-05,
|
| 832 |
+
"loss": 0.09252402186393738,
|
| 833 |
+
"step": 1130
|
| 834 |
+
},
|
| 835 |
+
{
|
| 836 |
+
"epoch": 1.1938695310453236,
|
| 837 |
+
"grad_norm": 0.11119654029607773,
|
| 838 |
+
"learning_rate": 6.856483464053758e-05,
|
| 839 |
+
"loss": 0.09637172818183899,
|
| 840 |
+
"step": 1140
|
| 841 |
+
},
|
| 842 |
+
{
|
| 843 |
+
"epoch": 1.2043489651558816,
|
| 844 |
+
"grad_norm": 0.11722644418478012,
|
| 845 |
+
"learning_rate": 6.803882803248585e-05,
|
| 846 |
+
"loss": 0.09078751802444458,
|
| 847 |
+
"step": 1150
|
| 848 |
+
},
|
| 849 |
+
{
|
| 850 |
+
"epoch": 1.2148283992664397,
|
| 851 |
+
"grad_norm": 0.10487739741802216,
|
| 852 |
+
"learning_rate": 6.751051613030082e-05,
|
| 853 |
+
"loss": 0.10334972143173218,
|
| 854 |
+
"step": 1160
|
| 855 |
+
},
|
| 856 |
+
{
|
| 857 |
+
"epoch": 1.2253078333769976,
|
| 858 |
+
"grad_norm": 0.10202383995056152,
|
| 859 |
+
"learning_rate": 6.697996645024937e-05,
|
| 860 |
+
"loss": 0.08661433458328247,
|
| 861 |
+
"step": 1170
|
| 862 |
+
},
|
| 863 |
+
{
|
| 864 |
+
"epoch": 1.2357872674875556,
|
| 865 |
+
"grad_norm": 0.11801143735647202,
|
| 866 |
+
"learning_rate": 6.644724679457804e-05,
|
| 867 |
+
"loss": 0.0997927188873291,
|
| 868 |
+
"step": 1180
|
| 869 |
+
},
|
| 870 |
+
{
|
| 871 |
+
"epoch": 1.2462667015981137,
|
| 872 |
+
"grad_norm": 0.10949107259511948,
|
| 873 |
+
"learning_rate": 6.591242524284802e-05,
|
| 874 |
+
"loss": 0.0977592945098877,
|
| 875 |
+
"step": 1190
|
| 876 |
+
},
|
| 877 |
+
{
|
| 878 |
+
"epoch": 1.2567461357086718,
|
| 879 |
+
"grad_norm": 0.10221222043037415,
|
| 880 |
+
"learning_rate": 6.537557014323487e-05,
|
| 881 |
+
"loss": 0.0970361053943634,
|
| 882 |
+
"step": 1200
|
| 883 |
+
},
|
| 884 |
+
{
|
| 885 |
+
"epoch": 1.2672255698192298,
|
| 886 |
+
"grad_norm": 0.10554748773574829,
|
| 887 |
+
"learning_rate": 6.483675010379393e-05,
|
| 888 |
+
"loss": 0.09007551074028015,
|
| 889 |
+
"step": 1210
|
| 890 |
+
},
|
| 891 |
+
{
|
| 892 |
+
"epoch": 1.2777050039297877,
|
| 893 |
+
"grad_norm": 0.11625627428293228,
|
| 894 |
+
"learning_rate": 6.429603398369242e-05,
|
| 895 |
+
"loss": 0.08734490275382996,
|
| 896 |
+
"step": 1220
|
| 897 |
+
},
|
| 898 |
+
{
|
| 899 |
+
"epoch": 1.2881844380403458,
|
| 900 |
+
"grad_norm": 0.10624277591705322,
|
| 901 |
+
"learning_rate": 6.37534908844095e-05,
|
| 902 |
+
"loss": 0.09858485460281372,
|
| 903 |
+
"step": 1230
|
| 904 |
+
},
|
| 905 |
+
{
|
| 906 |
+
"epoch": 1.2986638721509038,
|
| 907 |
+
"grad_norm": 0.10184557735919952,
|
| 908 |
+
"learning_rate": 6.320919014090534e-05,
|
| 909 |
+
"loss": 0.09335023164749146,
|
| 910 |
+
"step": 1240
|
| 911 |
+
},
|
| 912 |
+
{
|
| 913 |
+
"epoch": 1.309143306261462,
|
| 914 |
+
"grad_norm": 0.10787283629179001,
|
| 915 |
+
"learning_rate": 6.266320131276051e-05,
|
| 916 |
+
"loss": 0.08665563464164734,
|
| 917 |
+
"step": 1250
|
| 918 |
+
},
|
| 919 |
+
{
|
| 920 |
+
"epoch": 1.309143306261462,
|
| 921 |
+
"eval_loss": 0.08951585739850998,
|
| 922 |
+
"eval_runtime": 94.0567,
|
| 923 |
+
"eval_samples_per_second": 3.307,
|
| 924 |
+
"eval_steps_per_second": 3.307,
|
| 925 |
+
"step": 1250
|
| 926 |
+
},
|
| 927 |
+
{
|
| 928 |
+
"epoch": 1.31962274037202,
|
| 929 |
+
"grad_norm": 0.10836981981992722,
|
| 930 |
+
"learning_rate": 6.211559417528631e-05,
|
| 931 |
+
"loss": 0.0933380126953125,
|
| 932 |
+
"step": 1260
|
| 933 |
+
},
|
| 934 |
+
{
|
| 935 |
+
"epoch": 1.3301021744825778,
|
| 936 |
+
"grad_norm": 0.1397171914577484,
|
| 937 |
+
"learning_rate": 6.156643871060795e-05,
|
| 938 |
+
"loss": 0.09835371971130372,
|
| 939 |
+
"step": 1270
|
| 940 |
+
},
|
| 941 |
+
{
|
| 942 |
+
"epoch": 1.340581608593136,
|
| 943 |
+
"grad_norm": 0.11242218315601349,
|
| 944 |
+
"learning_rate": 6.101580509872097e-05,
|
| 945 |
+
"loss": 0.09398673176765442,
|
| 946 |
+
"step": 1280
|
| 947 |
+
},
|
| 948 |
+
{
|
| 949 |
+
"epoch": 1.351061042703694,
|
| 950 |
+
"grad_norm": 0.10235017538070679,
|
| 951 |
+
"learning_rate": 6.0463763708522536e-05,
|
| 952 |
+
"loss": 0.10350929498672486,
|
| 953 |
+
"step": 1290
|
| 954 |
+
},
|
| 955 |
+
{
|
| 956 |
+
"epoch": 1.361540476814252,
|
| 957 |
+
"grad_norm": 0.09327106177806854,
|
| 958 |
+
"learning_rate": 5.99103850888186e-05,
|
| 959 |
+
"loss": 0.09580238461494446,
|
| 960 |
+
"step": 1300
|
| 961 |
+
},
|
| 962 |
+
{
|
| 963 |
+
"epoch": 1.3720199109248101,
|
| 964 |
+
"grad_norm": 0.12995658814907074,
|
| 965 |
+
"learning_rate": 5.9355739959307976e-05,
|
| 966 |
+
"loss": 0.08437412977218628,
|
| 967 |
+
"step": 1310
|
| 968 |
+
},
|
| 969 |
+
{
|
| 970 |
+
"epoch": 1.382499345035368,
|
| 971 |
+
"grad_norm": 0.11962983757257462,
|
| 972 |
+
"learning_rate": 5.879989920154466e-05,
|
| 973 |
+
"loss": 0.08409937620162963,
|
| 974 |
+
"step": 1320
|
| 975 |
+
},
|
| 976 |
+
{
|
| 977 |
+
"epoch": 1.392978779145926,
|
| 978 |
+
"grad_norm": 0.09431737661361694,
|
| 979 |
+
"learning_rate": 5.824293384987941e-05,
|
| 980 |
+
"loss": 0.09504773020744324,
|
| 981 |
+
"step": 1330
|
| 982 |
+
},
|
| 983 |
+
{
|
| 984 |
+
"epoch": 1.4034582132564841,
|
| 985 |
+
"grad_norm": 0.13824374973773956,
|
| 986 |
+
"learning_rate": 5.768491508238188e-05,
|
| 987 |
+
"loss": 0.09193333983421326,
|
| 988 |
+
"step": 1340
|
| 989 |
+
},
|
| 990 |
+
{
|
| 991 |
+
"epoch": 1.4139376473670422,
|
| 992 |
+
"grad_norm": 0.10595858097076416,
|
| 993 |
+
"learning_rate": 5.712591421174422e-05,
|
| 994 |
+
"loss": 0.08976472616195678,
|
| 995 |
+
"step": 1350
|
| 996 |
+
},
|
| 997 |
+
{
|
| 998 |
+
"epoch": 1.4244170814776003,
|
| 999 |
+
"grad_norm": 0.09911809861660004,
|
| 1000 |
+
"learning_rate": 5.6566002676167725e-05,
|
| 1001 |
+
"loss": 0.07597061395645141,
|
| 1002 |
+
"step": 1360
|
| 1003 |
+
},
|
| 1004 |
+
{
|
| 1005 |
+
"epoch": 1.4348965155881581,
|
| 1006 |
+
"grad_norm": 0.09723466634750366,
|
| 1007 |
+
"learning_rate": 5.60052520302332e-05,
|
| 1008 |
+
"loss": 0.10513757467269898,
|
| 1009 |
+
"step": 1370
|
| 1010 |
+
},
|
| 1011 |
+
{
|
| 1012 |
+
"epoch": 1.4453759496987162,
|
| 1013 |
+
"grad_norm": 0.11331687867641449,
|
| 1014 |
+
"learning_rate": 5.5443733935756615e-05,
|
| 1015 |
+
"loss": 0.09019948840141297,
|
| 1016 |
+
"step": 1380
|
| 1017 |
+
},
|
| 1018 |
+
{
|
| 1019 |
+
"epoch": 1.4558553838092743,
|
| 1020 |
+
"grad_norm": 0.13363589346408844,
|
| 1021 |
+
"learning_rate": 5.4881520152630886e-05,
|
| 1022 |
+
"loss": 0.08314153552055359,
|
| 1023 |
+
"step": 1390
|
| 1024 |
+
},
|
| 1025 |
+
{
|
| 1026 |
+
"epoch": 1.4663348179198323,
|
| 1027 |
+
"grad_norm": 0.14111892879009247,
|
| 1028 |
+
"learning_rate": 5.4318682529655404e-05,
|
| 1029 |
+
"loss": 0.07892010807991028,
|
| 1030 |
+
"step": 1400
|
| 1031 |
+
},
|
| 1032 |
+
{
|
| 1033 |
+
"epoch": 1.4768142520303904,
|
| 1034 |
+
"grad_norm": 0.13948485255241394,
|
| 1035 |
+
"learning_rate": 5.3755292995353913e-05,
|
| 1036 |
+
"loss": 0.0840128481388092,
|
| 1037 |
+
"step": 1410
|
| 1038 |
+
},
|
| 1039 |
+
{
|
| 1040 |
+
"epoch": 1.4872936861409483,
|
| 1041 |
+
"grad_norm": 0.12535949051380157,
|
| 1042 |
+
"learning_rate": 5.31914235487823e-05,
|
| 1043 |
+
"loss": 0.07869629859924317,
|
| 1044 |
+
"step": 1420
|
| 1045 |
+
},
|
| 1046 |
+
{
|
| 1047 |
+
"epoch": 1.4977731202515066,
|
| 1048 |
+
"grad_norm": 0.10041694343090057,
|
| 1049 |
+
"learning_rate": 5.2627146250327484e-05,
|
| 1050 |
+
"loss": 0.08074848055839538,
|
| 1051 |
+
"step": 1430
|
| 1052 |
+
},
|
| 1053 |
+
{
|
| 1054 |
+
"epoch": 1.5082525543620644,
|
| 1055 |
+
"grad_norm": 0.10112891346216202,
|
| 1056 |
+
"learning_rate": 5.2062533212498275e-05,
|
| 1057 |
+
"loss": 0.0860810935497284,
|
| 1058 |
+
"step": 1440
|
| 1059 |
+
},
|
| 1060 |
+
{
|
| 1061 |
+
"epoch": 1.5187319884726225,
|
| 1062 |
+
"grad_norm": 0.11297477036714554,
|
| 1063 |
+
"learning_rate": 5.149765659070973e-05,
|
| 1064 |
+
"loss": 0.08794642686843872,
|
| 1065 |
+
"step": 1450
|
| 1066 |
+
},
|
| 1067 |
+
{
|
| 1068 |
+
"epoch": 1.5292114225831805,
|
| 1069 |
+
"grad_norm": 0.10511091351509094,
|
| 1070 |
+
"learning_rate": 5.0932588574061945e-05,
|
| 1071 |
+
"loss": 0.07854819297790527,
|
| 1072 |
+
"step": 1460
|
| 1073 |
+
},
|
| 1074 |
+
{
|
| 1075 |
+
"epoch": 1.5396908566937384,
|
| 1076 |
+
"grad_norm": 0.09333530068397522,
|
| 1077 |
+
"learning_rate": 5.036740137611453e-05,
|
| 1078 |
+
"loss": 0.08821435570716858,
|
| 1079 |
+
"step": 1470
|
| 1080 |
+
},
|
| 1081 |
+
{
|
| 1082 |
+
"epoch": 1.5501702908042967,
|
| 1083 |
+
"grad_norm": 0.11480343341827393,
|
| 1084 |
+
"learning_rate": 4.980216722565804e-05,
|
| 1085 |
+
"loss": 0.08062278628349304,
|
| 1086 |
+
"step": 1480
|
| 1087 |
+
},
|
| 1088 |
+
{
|
| 1089 |
+
"epoch": 1.5606497249148545,
|
| 1090 |
+
"grad_norm": 0.08406255394220352,
|
| 1091 |
+
"learning_rate": 4.923695835748338e-05,
|
| 1092 |
+
"loss": 0.0940588355064392,
|
| 1093 |
+
"step": 1490
|
| 1094 |
+
},
|
| 1095 |
+
{
|
| 1096 |
+
"epoch": 1.5711291590254126,
|
| 1097 |
+
"grad_norm": 0.12927693128585815,
|
| 1098 |
+
"learning_rate": 4.8671847003150447e-05,
|
| 1099 |
+
"loss": 0.0775177538394928,
|
| 1100 |
+
"step": 1500
|
| 1101 |
+
},
|
| 1102 |
+
{
|
| 1103 |
+
"epoch": 1.5711291590254126,
|
| 1104 |
+
"eval_loss": 0.07877222448587418,
|
| 1105 |
+
"eval_runtime": 34.4389,
|
| 1106 |
+
"eval_samples_per_second": 9.03,
|
| 1107 |
+
"eval_steps_per_second": 9.03,
|
| 1108 |
+
"step": 1500
|
| 1109 |
+
},
|
| 1110 |
+
{
|
| 1111 |
+
"epoch": 1.5816085931359707,
|
| 1112 |
+
"grad_norm": 0.1255076378583908,
|
| 1113 |
+
"learning_rate": 4.810690538175728e-05,
|
| 1114 |
+
"loss": 0.09362970590591431,
|
| 1115 |
+
"step": 1510
|
| 1116 |
+
},
|
| 1117 |
+
{
|
| 1118 |
+
"epoch": 1.5920880272465285,
|
| 1119 |
+
"grad_norm": 0.1326853185892105,
|
| 1120 |
+
"learning_rate": 4.754220569071068e-05,
|
| 1121 |
+
"loss": 0.08364834189414978,
|
| 1122 |
+
"step": 1520
|
| 1123 |
+
},
|
| 1124 |
+
{
|
| 1125 |
+
"epoch": 1.6025674613570868,
|
| 1126 |
+
"grad_norm": 0.10229979455471039,
|
| 1127 |
+
"learning_rate": 4.697782009649962e-05,
|
| 1128 |
+
"loss": 0.0725843846797943,
|
| 1129 |
+
"step": 1530
|
| 1130 |
+
},
|
| 1131 |
+
{
|
| 1132 |
+
"epoch": 1.6130468954676447,
|
| 1133 |
+
"grad_norm": 0.11407258361577988,
|
| 1134 |
+
"learning_rate": 4.641382072547272e-05,
|
| 1135 |
+
"loss": 0.07566151022911072,
|
| 1136 |
+
"step": 1540
|
| 1137 |
+
},
|
| 1138 |
+
{
|
| 1139 |
+
"epoch": 1.6235263295782028,
|
| 1140 |
+
"grad_norm": 0.09398165345191956,
|
| 1141 |
+
"learning_rate": 4.585027965462075e-05,
|
| 1142 |
+
"loss": 0.087736576795578,
|
| 1143 |
+
"step": 1550
|
| 1144 |
+
},
|
| 1145 |
+
{
|
| 1146 |
+
"epoch": 1.6340057636887608,
|
| 1147 |
+
"grad_norm": 0.11289424449205399,
|
| 1148 |
+
"learning_rate": 4.528726890236544e-05,
|
| 1149 |
+
"loss": 0.08366051316261292,
|
| 1150 |
+
"step": 1560
|
| 1151 |
+
},
|
| 1152 |
+
{
|
| 1153 |
+
"epoch": 1.6444851977993187,
|
| 1154 |
+
"grad_norm": 0.09478718787431717,
|
| 1155 |
+
"learning_rate": 4.4724860419355746e-05,
|
| 1156 |
+
"loss": 0.0885531723499298,
|
| 1157 |
+
"step": 1570
|
| 1158 |
+
},
|
| 1159 |
+
{
|
| 1160 |
+
"epoch": 1.654964631909877,
|
| 1161 |
+
"grad_norm": 0.09163404256105423,
|
| 1162 |
+
"learning_rate": 4.416312607927295e-05,
|
| 1163 |
+
"loss": 0.08392030596733094,
|
| 1164 |
+
"step": 1580
|
| 1165 |
+
},
|
| 1166 |
+
{
|
| 1167 |
+
"epoch": 1.6654440660204348,
|
| 1168 |
+
"grad_norm": 0.11422222852706909,
|
| 1169 |
+
"learning_rate": 4.360213766964542e-05,
|
| 1170 |
+
"loss": 0.08059985041618348,
|
| 1171 |
+
"step": 1590
|
| 1172 |
+
},
|
| 1173 |
+
{
|
| 1174 |
+
"epoch": 1.675923500130993,
|
| 1175 |
+
"grad_norm": 0.08131479471921921,
|
| 1176 |
+
"learning_rate": 4.304196688267438e-05,
|
| 1177 |
+
"loss": 0.07613803148269653,
|
| 1178 |
+
"step": 1600
|
| 1179 |
+
},
|
| 1180 |
+
{
|
| 1181 |
+
"epoch": 1.686402934241551,
|
| 1182 |
+
"grad_norm": 0.09615079313516617,
|
| 1183 |
+
"learning_rate": 4.248268530607199e-05,
|
| 1184 |
+
"loss": 0.07764078378677368,
|
| 1185 |
+
"step": 1610
|
| 1186 |
+
},
|
| 1187 |
+
{
|
| 1188 |
+
"epoch": 1.696882368352109,
|
| 1189 |
+
"grad_norm": 0.09730526059865952,
|
| 1190 |
+
"learning_rate": 4.192436441391271e-05,
|
| 1191 |
+
"loss": 0.07644452452659607,
|
| 1192 |
+
"step": 1620
|
| 1193 |
+
},
|
| 1194 |
+
{
|
| 1195 |
+
"epoch": 1.707361802462667,
|
| 1196 |
+
"grad_norm": 0.09649327397346497,
|
| 1197 |
+
"learning_rate": 4.136707555749907e-05,
|
| 1198 |
+
"loss": 0.07866159081459045,
|
| 1199 |
+
"step": 1630
|
| 1200 |
+
},
|
| 1201 |
+
{
|
| 1202 |
+
"epoch": 1.717841236573225,
|
| 1203 |
+
"grad_norm": 0.11804413050413132,
|
| 1204 |
+
"learning_rate": 4.0810889956243415e-05,
|
| 1205 |
+
"loss": 0.06996130347251892,
|
| 1206 |
+
"step": 1640
|
| 1207 |
+
},
|
| 1208 |
+
{
|
| 1209 |
+
"epoch": 1.728320670683783,
|
| 1210 |
+
"grad_norm": 0.09874672442674637,
|
| 1211 |
+
"learning_rate": 4.025587868856622e-05,
|
| 1212 |
+
"loss": 0.07877404093742371,
|
| 1213 |
+
"step": 1650
|
| 1214 |
+
},
|
| 1215 |
+
{
|
| 1216 |
+
"epoch": 1.738800104794341,
|
| 1217 |
+
"grad_norm": 0.11149467527866364,
|
| 1218 |
+
"learning_rate": 3.9702112682812544e-05,
|
| 1219 |
+
"loss": 0.07241421341896057,
|
| 1220 |
+
"step": 1660
|
| 1221 |
+
},
|
| 1222 |
+
{
|
| 1223 |
+
"epoch": 1.7492795389048992,
|
| 1224 |
+
"grad_norm": 0.08748896420001984,
|
| 1225 |
+
"learning_rate": 3.914966270818766e-05,
|
| 1226 |
+
"loss": 0.07336459755897522,
|
| 1227 |
+
"step": 1670
|
| 1228 |
+
},
|
| 1229 |
+
{
|
| 1230 |
+
"epoch": 1.7597589730154573,
|
| 1231 |
+
"grad_norm": 0.1172696202993393,
|
| 1232 |
+
"learning_rate": 3.859859936571307e-05,
|
| 1233 |
+
"loss": 0.07742337584495544,
|
| 1234 |
+
"step": 1680
|
| 1235 |
+
},
|
| 1236 |
+
{
|
| 1237 |
+
"epoch": 1.770238407126015,
|
| 1238 |
+
"grad_norm": 0.0719197615981102,
|
| 1239 |
+
"learning_rate": 3.8048993079203925e-05,
|
| 1240 |
+
"loss": 0.06242966651916504,
|
| 1241 |
+
"step": 1690
|
| 1242 |
+
},
|
| 1243 |
+
{
|
| 1244 |
+
"epoch": 1.7807178412365732,
|
| 1245 |
+
"grad_norm": 0.12380168586969376,
|
| 1246 |
+
"learning_rate": 3.750091408626907e-05,
|
| 1247 |
+
"loss": 0.07270430326461792,
|
| 1248 |
+
"step": 1700
|
| 1249 |
+
},
|
| 1250 |
+
{
|
| 1251 |
+
"epoch": 1.7911972753471312,
|
| 1252 |
+
"grad_norm": 0.1587221622467041,
|
| 1253 |
+
"learning_rate": 3.6954432429335015e-05,
|
| 1254 |
+
"loss": 0.06409866213798524,
|
| 1255 |
+
"step": 1710
|
| 1256 |
+
},
|
| 1257 |
+
{
|
| 1258 |
+
"epoch": 1.8016767094576893,
|
| 1259 |
+
"grad_norm": 0.10983912646770477,
|
| 1260 |
+
"learning_rate": 3.640961794669482e-05,
|
| 1261 |
+
"loss": 0.06610031127929687,
|
| 1262 |
+
"step": 1720
|
| 1263 |
+
},
|
| 1264 |
+
{
|
| 1265 |
+
"epoch": 1.8121561435682474,
|
| 1266 |
+
"grad_norm": 0.11023026704788208,
|
| 1267 |
+
"learning_rate": 3.586654026358287e-05,
|
| 1268 |
+
"loss": 0.06866579055786133,
|
| 1269 |
+
"step": 1730
|
| 1270 |
+
},
|
| 1271 |
+
{
|
| 1272 |
+
"epoch": 1.8226355776788052,
|
| 1273 |
+
"grad_norm": 0.11857719719409943,
|
| 1274 |
+
"learning_rate": 3.532526878327719e-05,
|
| 1275 |
+
"loss": 0.06734356880187989,
|
| 1276 |
+
"step": 1740
|
| 1277 |
+
},
|
| 1278 |
+
{
|
| 1279 |
+
"epoch": 1.8331150117893635,
|
| 1280 |
+
"grad_norm": 0.09280339628458023,
|
| 1281 |
+
"learning_rate": 3.478587267822987e-05,
|
| 1282 |
+
"loss": 0.06897796392440796,
|
| 1283 |
+
"step": 1750
|
| 1284 |
+
},
|
| 1285 |
+
{
|
| 1286 |
+
"epoch": 1.8331150117893635,
|
| 1287 |
+
"eval_loss": 0.06596127897500992,
|
| 1288 |
+
"eval_runtime": 35.5001,
|
| 1289 |
+
"eval_samples_per_second": 8.761,
|
| 1290 |
+
"eval_steps_per_second": 8.761,
|
| 1291 |
+
"step": 1750
|
| 1292 |
+
},
|
| 1293 |
+
{
|
| 1294 |
+
"epoch": 1.8435944458999214,
|
| 1295 |
+
"grad_norm": 0.1175367683172226,
|
| 1296 |
+
"learning_rate": 3.424842088122716e-05,
|
| 1297 |
+
"loss": 0.08288194537162781,
|
| 1298 |
+
"step": 1760
|
| 1299 |
+
},
|
| 1300 |
+
{
|
| 1301 |
+
"epoch": 1.8540738800104795,
|
| 1302 |
+
"grad_norm": 0.10271462798118591,
|
| 1303 |
+
"learning_rate": 3.371298207658003e-05,
|
| 1304 |
+
"loss": 0.05643013119697571,
|
| 1305 |
+
"step": 1770
|
| 1306 |
+
},
|
| 1307 |
+
{
|
| 1308 |
+
"epoch": 1.8645533141210375,
|
| 1309 |
+
"grad_norm": 0.11965195834636688,
|
| 1310 |
+
"learning_rate": 3.3179624691346654e-05,
|
| 1311 |
+
"loss": 0.07403092980384826,
|
| 1312 |
+
"step": 1780
|
| 1313 |
+
},
|
| 1314 |
+
{
|
| 1315 |
+
"epoch": 1.8750327482315954,
|
| 1316 |
+
"grad_norm": 0.09981680661439896,
|
| 1317 |
+
"learning_rate": 3.2648416886587686e-05,
|
| 1318 |
+
"loss": 0.07118859887123108,
|
| 1319 |
+
"step": 1790
|
| 1320 |
+
},
|
| 1321 |
+
{
|
| 1322 |
+
"epoch": 1.8855121823421537,
|
| 1323 |
+
"grad_norm": 0.07787375897169113,
|
| 1324 |
+
"learning_rate": 3.2119426548655435e-05,
|
| 1325 |
+
"loss": 0.07219682335853576,
|
| 1326 |
+
"step": 1800
|
| 1327 |
+
},
|
| 1328 |
+
{
|
| 1329 |
+
"epoch": 1.8959916164527115,
|
| 1330 |
+
"grad_norm": 0.1303507387638092,
|
| 1331 |
+
"learning_rate": 3.1592721280518404e-05,
|
| 1332 |
+
"loss": 0.07636030912399291,
|
| 1333 |
+
"step": 1810
|
| 1334 |
+
},
|
| 1335 |
+
{
|
| 1336 |
+
"epoch": 1.9064710505632696,
|
| 1337 |
+
"grad_norm": 0.09162267297506332,
|
| 1338 |
+
"learning_rate": 3.106836839312175e-05,
|
| 1339 |
+
"loss": 0.06230143308639526,
|
| 1340 |
+
"step": 1820
|
| 1341 |
+
},
|
| 1342 |
+
{
|
| 1343 |
+
"epoch": 1.9169504846738277,
|
| 1344 |
+
"grad_norm": 0.11375878751277924,
|
| 1345 |
+
"learning_rate": 3.054643489678526e-05,
|
| 1346 |
+
"loss": 0.060506826639175414,
|
| 1347 |
+
"step": 1830
|
| 1348 |
+
},
|
| 1349 |
+
{
|
| 1350 |
+
"epoch": 1.9274299187843855,
|
| 1351 |
+
"grad_norm": 0.1377716213464737,
|
| 1352 |
+
"learning_rate": 3.0026987492639668e-05,
|
| 1353 |
+
"loss": 0.08148540854454041,
|
| 1354 |
+
"step": 1840
|
| 1355 |
+
},
|
| 1356 |
+
{
|
| 1357 |
+
"epoch": 1.9379093528949438,
|
| 1358 |
+
"grad_norm": 0.10483554750680923,
|
| 1359 |
+
"learning_rate": 2.951009256410255e-05,
|
| 1360 |
+
"loss": 0.07040726542472839,
|
| 1361 |
+
"step": 1850
|
| 1362 |
+
},
|
| 1363 |
+
{
|
| 1364 |
+
"epoch": 1.9483887870055017,
|
| 1365 |
+
"grad_norm": 0.08736151456832886,
|
| 1366 |
+
"learning_rate": 2.8995816168394702e-05,
|
| 1367 |
+
"loss": 0.04931557774543762,
|
| 1368 |
+
"step": 1860
|
| 1369 |
+
},
|
| 1370 |
+
{
|
| 1371 |
+
"epoch": 1.9588682211160597,
|
| 1372 |
+
"grad_norm": 0.11461569368839264,
|
| 1373 |
+
"learning_rate": 2.848422402809828e-05,
|
| 1374 |
+
"loss": 0.057559752464294435,
|
| 1375 |
+
"step": 1870
|
| 1376 |
+
},
|
| 1377 |
+
{
|
| 1378 |
+
"epoch": 1.9693476552266178,
|
| 1379 |
+
"grad_norm": 0.09060918539762497,
|
| 1380 |
+
"learning_rate": 2.7975381522757803e-05,
|
| 1381 |
+
"loss": 0.06379705667495728,
|
| 1382 |
+
"step": 1880
|
| 1383 |
+
},
|
| 1384 |
+
{
|
| 1385 |
+
"epoch": 1.9798270893371757,
|
| 1386 |
+
"grad_norm": 0.07104971259832382,
|
| 1387 |
+
"learning_rate": 2.746935368052477e-05,
|
| 1388 |
+
"loss": 0.05813115239143372,
|
| 1389 |
+
"step": 1890
|
| 1390 |
+
},
|
| 1391 |
+
{
|
| 1392 |
+
"epoch": 1.990306523447734,
|
| 1393 |
+
"grad_norm": 0.10802938044071198,
|
| 1394 |
+
"learning_rate": 2.696620516984733e-05,
|
| 1395 |
+
"loss": 0.07732833027839661,
|
| 1396 |
+
"step": 1900
|
| 1397 |
+
},
|
| 1398 |
+
{
|
| 1399 |
+
"epoch": 2.0,
|
| 1400 |
+
"grad_norm": 0.16884952783584595,
|
| 1401 |
+
"learning_rate": 2.6466000291206004e-05,
|
| 1402 |
+
"loss": 0.06166202425956726,
|
| 1403 |
+
"step": 1910
|
| 1404 |
+
},
|
| 1405 |
+
{
|
| 1406 |
+
"epoch": 2.010479434110558,
|
| 1407 |
+
"grad_norm": 0.08582179993391037,
|
| 1408 |
+
"learning_rate": 2.5968802968896228e-05,
|
| 1409 |
+
"loss": 0.04766199886798859,
|
| 1410 |
+
"step": 1920
|
| 1411 |
+
},
|
| 1412 |
+
{
|
| 1413 |
+
"epoch": 2.020958868221116,
|
| 1414 |
+
"grad_norm": 0.1457364708185196,
|
| 1415 |
+
"learning_rate": 2.5474676742859048e-05,
|
| 1416 |
+
"loss": 0.03826354146003723,
|
| 1417 |
+
"step": 1930
|
| 1418 |
+
},
|
| 1419 |
+
{
|
| 1420 |
+
"epoch": 2.031438302331674,
|
| 1421 |
+
"grad_norm": 0.09275342524051666,
|
| 1422 |
+
"learning_rate": 2.4983684760561023e-05,
|
| 1423 |
+
"loss": 0.045059433579444884,
|
| 1424 |
+
"step": 1940
|
| 1425 |
+
},
|
| 1426 |
+
{
|
| 1427 |
+
"epoch": 2.0419177364422323,
|
| 1428 |
+
"grad_norm": 0.09085927903652191,
|
| 1429 |
+
"learning_rate": 2.44958897689242e-05,
|
| 1430 |
+
"loss": 0.04904903173446655,
|
| 1431 |
+
"step": 1950
|
| 1432 |
+
},
|
| 1433 |
+
{
|
| 1434 |
+
"epoch": 2.05239717055279,
|
| 1435 |
+
"grad_norm": 0.11733179539442062,
|
| 1436 |
+
"learning_rate": 2.401135410630731e-05,
|
| 1437 |
+
"loss": 0.05008396506309509,
|
| 1438 |
+
"step": 1960
|
| 1439 |
+
},
|
| 1440 |
+
{
|
| 1441 |
+
"epoch": 2.062876604663348,
|
| 1442 |
+
"grad_norm": 0.0894237607717514,
|
| 1443 |
+
"learning_rate": 2.3530139694539095e-05,
|
| 1444 |
+
"loss": 0.04057626128196716,
|
| 1445 |
+
"step": 1970
|
| 1446 |
+
},
|
| 1447 |
+
{
|
| 1448 |
+
"epoch": 2.0733560387739063,
|
| 1449 |
+
"grad_norm": 0.08560927212238312,
|
| 1450 |
+
"learning_rate": 2.305230803100496e-05,
|
| 1451 |
+
"loss": 0.04843136668205261,
|
| 1452 |
+
"step": 1980
|
| 1453 |
+
},
|
| 1454 |
+
{
|
| 1455 |
+
"epoch": 2.083835472884464,
|
| 1456 |
+
"grad_norm": 0.07991836220026016,
|
| 1457 |
+
"learning_rate": 2.257792018078793e-05,
|
| 1458 |
+
"loss": 0.0544127106666565,
|
| 1459 |
+
"step": 1990
|
| 1460 |
+
},
|
| 1461 |
+
{
|
| 1462 |
+
"epoch": 2.0943149069950224,
|
| 1463 |
+
"grad_norm": 0.08846250921487808,
|
| 1464 |
+
"learning_rate": 2.210703676886461e-05,
|
| 1465 |
+
"loss": 0.0459000825881958,
|
| 1466 |
+
"step": 2000
|
| 1467 |
+
},
|
| 1468 |
+
{
|
| 1469 |
+
"epoch": 2.0943149069950224,
|
| 1470 |
+
"eval_loss": 0.060011014342308044,
|
| 1471 |
+
"eval_runtime": 36.3755,
|
| 1472 |
+
"eval_samples_per_second": 8.55,
|
| 1473 |
+
"eval_steps_per_second": 8.55,
|
| 1474 |
+
"step": 2000
|
| 1475 |
+
},
|
| 1476 |
+
{
|
| 1477 |
+
"epoch": 2.1047943411055803,
|
| 1478 |
+
"grad_norm": 0.10082945972681046,
|
| 1479 |
+
"learning_rate": 2.1639717972357678e-05,
|
| 1480 |
+
"loss": 0.038090622425079344,
|
| 1481 |
+
"step": 2010
|
| 1482 |
+
},
|
| 1483 |
+
{
|
| 1484 |
+
"epoch": 2.115273775216138,
|
| 1485 |
+
"grad_norm": 0.05712248757481575,
|
| 1486 |
+
"learning_rate": 2.1176023512845376e-05,
|
| 1487 |
+
"loss": 0.04598597884178161,
|
| 1488 |
+
"step": 2020
|
| 1489 |
+
},
|
| 1490 |
+
{
|
| 1491 |
+
"epoch": 2.1257532093266964,
|
| 1492 |
+
"grad_norm": 0.11628362536430359,
|
| 1493 |
+
"learning_rate": 2.0716012648729353e-05,
|
| 1494 |
+
"loss": 0.04984880685806274,
|
| 1495 |
+
"step": 2030
|
| 1496 |
+
},
|
| 1497 |
+
{
|
| 1498 |
+
"epoch": 2.1362326434372543,
|
| 1499 |
+
"grad_norm": 0.10635484755039215,
|
| 1500 |
+
"learning_rate": 2.025974416766171e-05,
|
| 1501 |
+
"loss": 0.04293925166130066,
|
| 1502 |
+
"step": 2040
|
| 1503 |
+
},
|
| 1504 |
+
{
|
| 1505 |
+
"epoch": 2.1467120775478126,
|
| 1506 |
+
"grad_norm": 0.1017381027340889,
|
| 1507 |
+
"learning_rate": 1.9807276379032113e-05,
|
| 1508 |
+
"loss": 0.04305694401264191,
|
| 1509 |
+
"step": 2050
|
| 1510 |
+
},
|
| 1511 |
+
{
|
| 1512 |
+
"epoch": 2.1571915116583704,
|
| 1513 |
+
"grad_norm": 0.13550882041454315,
|
| 1514 |
+
"learning_rate": 1.9358667106516055e-05,
|
| 1515 |
+
"loss": 0.04478869140148163,
|
| 1516 |
+
"step": 2060
|
| 1517 |
+
},
|
| 1518 |
+
{
|
| 1519 |
+
"epoch": 2.1676709457689283,
|
| 1520 |
+
"grad_norm": 0.08526366949081421,
|
| 1521 |
+
"learning_rate": 1.8913973680685226e-05,
|
| 1522 |
+
"loss": 0.036646312475204466,
|
| 1523 |
+
"step": 2070
|
| 1524 |
+
},
|
| 1525 |
+
{
|
| 1526 |
+
"epoch": 2.1781503798794866,
|
| 1527 |
+
"grad_norm": 0.10932011157274246,
|
| 1528 |
+
"learning_rate": 1.8473252931680928e-05,
|
| 1529 |
+
"loss": 0.042200219631195066,
|
| 1530 |
+
"step": 2080
|
| 1531 |
+
},
|
| 1532 |
+
{
|
| 1533 |
+
"epoch": 2.1886298139900444,
|
| 1534 |
+
"grad_norm": 0.08768360316753387,
|
| 1535 |
+
"learning_rate": 1.803656118195136e-05,
|
| 1536 |
+
"loss": 0.0437488317489624,
|
| 1537 |
+
"step": 2090
|
| 1538 |
+
},
|
| 1539 |
+
{
|
| 1540 |
+
"epoch": 2.1991092481006027,
|
| 1541 |
+
"grad_norm": 0.08362651616334915,
|
| 1542 |
+
"learning_rate": 1.760395423905379e-05,
|
| 1543 |
+
"loss": 0.04669668078422547,
|
| 1544 |
+
"step": 2100
|
| 1545 |
+
},
|
| 1546 |
+
{
|
| 1547 |
+
"epoch": 2.2095886822111606,
|
| 1548 |
+
"grad_norm": 0.08554034680128098,
|
| 1549 |
+
"learning_rate": 1.7175487388522588e-05,
|
| 1550 |
+
"loss": 0.034989356994628906,
|
| 1551 |
+
"step": 2110
|
| 1552 |
+
},
|
| 1553 |
+
{
|
| 1554 |
+
"epoch": 2.220068116321719,
|
| 1555 |
+
"grad_norm": 0.08215561509132385,
|
| 1556 |
+
"learning_rate": 1.6751215386803986e-05,
|
| 1557 |
+
"loss": 0.040298929810523985,
|
| 1558 |
+
"step": 2120
|
| 1559 |
+
},
|
| 1560 |
+
{
|
| 1561 |
+
"epoch": 2.2305475504322767,
|
| 1562 |
+
"grad_norm": 0.0840689167380333,
|
| 1563 |
+
"learning_rate": 1.6331192454258337e-05,
|
| 1564 |
+
"loss": 0.041704925894737246,
|
| 1565 |
+
"step": 2130
|
| 1566 |
+
},
|
| 1567 |
+
{
|
| 1568 |
+
"epoch": 2.2410269845428346,
|
| 1569 |
+
"grad_norm": 0.06530614197254181,
|
| 1570 |
+
"learning_rate": 1.5915472268231018e-05,
|
| 1571 |
+
"loss": 0.03651900887489319,
|
| 1572 |
+
"step": 2140
|
| 1573 |
+
},
|
| 1574 |
+
{
|
| 1575 |
+
"epoch": 2.251506418653393,
|
| 1576 |
+
"grad_norm": 0.12431822717189789,
|
| 1577 |
+
"learning_rate": 1.550410795619261e-05,
|
| 1578 |
+
"loss": 0.04806804955005646,
|
| 1579 |
+
"step": 2150
|
| 1580 |
+
},
|
| 1581 |
+
{
|
| 1582 |
+
"epoch": 2.2619858527639507,
|
| 1583 |
+
"grad_norm": 0.09592410176992416,
|
| 1584 |
+
"learning_rate": 1.509715208894949e-05,
|
| 1585 |
+
"loss": 0.0454313725233078,
|
| 1586 |
+
"step": 2160
|
| 1587 |
+
},
|
| 1588 |
+
{
|
| 1589 |
+
"epoch": 2.2724652868745085,
|
| 1590 |
+
"grad_norm": 0.07589780539274216,
|
| 1591 |
+
"learning_rate": 1.469465667392536e-05,
|
| 1592 |
+
"loss": 0.03574602603912354,
|
| 1593 |
+
"step": 2170
|
| 1594 |
+
},
|
| 1595 |
+
{
|
| 1596 |
+
"epoch": 2.282944720985067,
|
| 1597 |
+
"grad_norm": 0.09734483063220978,
|
| 1598 |
+
"learning_rate": 1.4296673148515038e-05,
|
| 1599 |
+
"loss": 0.04358702301979065,
|
| 1600 |
+
"step": 2180
|
| 1601 |
+
},
|
| 1602 |
+
{
|
| 1603 |
+
"epoch": 2.2934241550956247,
|
| 1604 |
+
"grad_norm": 0.0974339172244072,
|
| 1605 |
+
"learning_rate": 1.3903252373510838e-05,
|
| 1606 |
+
"loss": 0.04603351950645447,
|
| 1607 |
+
"step": 2190
|
| 1608 |
+
},
|
| 1609 |
+
{
|
| 1610 |
+
"epoch": 2.303903589206183,
|
| 1611 |
+
"grad_norm": 0.09025271981954575,
|
| 1612 |
+
"learning_rate": 1.3514444626602773e-05,
|
| 1613 |
+
"loss": 0.040065237879753114,
|
| 1614 |
+
"step": 2200
|
| 1615 |
+
},
|
| 1616 |
+
{
|
| 1617 |
+
"epoch": 2.314383023316741,
|
| 1618 |
+
"grad_norm": 0.07625086605548859,
|
| 1619 |
+
"learning_rate": 1.3130299595953338e-05,
|
| 1620 |
+
"loss": 0.044061675667762756,
|
| 1621 |
+
"step": 2210
|
| 1622 |
+
},
|
| 1623 |
+
{
|
| 1624 |
+
"epoch": 2.324862457427299,
|
| 1625 |
+
"grad_norm": 0.07306221127510071,
|
| 1626 |
+
"learning_rate": 1.2750866373847465e-05,
|
| 1627 |
+
"loss": 0.03366467654705048,
|
| 1628 |
+
"step": 2220
|
| 1629 |
+
},
|
| 1630 |
+
{
|
| 1631 |
+
"epoch": 2.335341891537857,
|
| 1632 |
+
"grad_norm": 0.08357638120651245,
|
| 1633 |
+
"learning_rate": 1.2376193450418715e-05,
|
| 1634 |
+
"loss": 0.041424044966697694,
|
| 1635 |
+
"step": 2230
|
| 1636 |
+
},
|
| 1637 |
+
{
|
| 1638 |
+
"epoch": 2.345821325648415,
|
| 1639 |
+
"grad_norm": 0.09153921157121658,
|
| 1640 |
+
"learning_rate": 1.2006328707452459e-05,
|
| 1641 |
+
"loss": 0.03938372135162353,
|
| 1642 |
+
"step": 2240
|
| 1643 |
+
},
|
| 1644 |
+
{
|
| 1645 |
+
"epoch": 2.356300759758973,
|
| 1646 |
+
"grad_norm": 0.09109660983085632,
|
| 1647 |
+
"learning_rate": 1.1641319412266765e-05,
|
| 1648 |
+
"loss": 0.04015985131263733,
|
| 1649 |
+
"step": 2250
|
| 1650 |
+
},
|
| 1651 |
+
{
|
| 1652 |
+
"epoch": 2.356300759758973,
|
| 1653 |
+
"eval_loss": 0.05486458167433739,
|
| 1654 |
+
"eval_runtime": 36.8119,
|
| 1655 |
+
"eval_samples_per_second": 8.448,
|
| 1656 |
+
"eval_steps_per_second": 8.448,
|
| 1657 |
+
"step": 2250
|
| 1658 |
+
},
|
| 1659 |
+
{
|
| 1660 |
+
"epoch": 2.366780193869531,
|
| 1661 |
+
"grad_norm": 0.052502721548080444,
|
| 1662 |
+
"learning_rate": 1.1281212211671822e-05,
|
| 1663 |
+
"loss": 0.0270554780960083,
|
| 1664 |
+
"step": 2260
|
| 1665 |
+
},
|
| 1666 |
+
{
|
| 1667 |
+
"epoch": 2.377259627980089,
|
| 1668 |
+
"grad_norm": 0.07931812107563019,
|
| 1669 |
+
"learning_rate": 1.0926053126008584e-05,
|
| 1670 |
+
"loss": 0.0417300134897232,
|
| 1671 |
+
"step": 2270
|
| 1672 |
+
},
|
| 1673 |
+
{
|
| 1674 |
+
"epoch": 2.387739062090647,
|
| 1675 |
+
"grad_norm": 0.08996254205703735,
|
| 1676 |
+
"learning_rate": 1.0575887543267609e-05,
|
| 1677 |
+
"loss": 0.037659955024719236,
|
| 1678 |
+
"step": 2280
|
| 1679 |
+
},
|
| 1680 |
+
{
|
| 1681 |
+
"epoch": 2.398218496201205,
|
| 1682 |
+
"grad_norm": 0.08800788223743439,
|
| 1683 |
+
"learning_rate": 1.023076021328867e-05,
|
| 1684 |
+
"loss": 0.048437944054603575,
|
| 1685 |
+
"step": 2290
|
| 1686 |
+
},
|
| 1687 |
+
{
|
| 1688 |
+
"epoch": 2.4086979303117633,
|
| 1689 |
+
"grad_norm": 0.10572271049022675,
|
| 1690 |
+
"learning_rate": 9.890715242041787e-06,
|
| 1691 |
+
"loss": 0.04166909456253052,
|
| 1692 |
+
"step": 2300
|
| 1693 |
+
},
|
| 1694 |
+
{
|
| 1695 |
+
"epoch": 2.419177364422321,
|
| 1696 |
+
"grad_norm": 0.10573071986436844,
|
| 1697 |
+
"learning_rate": 9.555796085990781e-06,
|
| 1698 |
+
"loss": 0.03919607996940613,
|
| 1699 |
+
"step": 2310
|
| 1700 |
+
},
|
| 1701 |
+
{
|
| 1702 |
+
"epoch": 2.4296567985328794,
|
| 1703 |
+
"grad_norm": 0.09714583307504654,
|
| 1704 |
+
"learning_rate": 9.226045546539608e-06,
|
| 1705 |
+
"loss": 0.03530588150024414,
|
| 1706 |
+
"step": 2320
|
| 1707 |
+
},
|
| 1708 |
+
{
|
| 1709 |
+
"epoch": 2.4401362326434373,
|
| 1710 |
+
"grad_norm": 0.09436199069023132,
|
| 1711 |
+
"learning_rate": 8.901505764562518e-06,
|
| 1712 |
+
"loss": 0.05111382007598877,
|
| 1713 |
+
"step": 2330
|
| 1714 |
+
},
|
| 1715 |
+
{
|
| 1716 |
+
"epoch": 2.450615666753995,
|
| 1717 |
+
"grad_norm": 0.06353961676359177,
|
| 1718 |
+
"learning_rate": 8.582218215018656e-06,
|
| 1719 |
+
"loss": 0.03805697858333588,
|
| 1720 |
+
"step": 2340
|
| 1721 |
+
},
|
| 1722 |
+
{
|
| 1723 |
+
"epoch": 2.4610951008645534,
|
| 1724 |
+
"grad_norm": 0.08853815495967865,
|
| 1725 |
+
"learning_rate": 8.268223701651684e-06,
|
| 1726 |
+
"loss": 0.04815975427627563,
|
| 1727 |
+
"step": 2350
|
| 1728 |
+
},
|
| 1729 |
+
{
|
| 1730 |
+
"epoch": 2.4715745349751113,
|
| 1731 |
+
"grad_norm": 0.07472016662359238,
|
| 1732 |
+
"learning_rate": 7.959562351775196e-06,
|
| 1733 |
+
"loss": 0.042247459292411804,
|
| 1734 |
+
"step": 2360
|
| 1735 |
+
},
|
| 1736 |
+
{
|
| 1737 |
+
"epoch": 2.4820539690856696,
|
| 1738 |
+
"grad_norm": 0.12121549248695374,
|
| 1739 |
+
"learning_rate": 7.656273611144632e-06,
|
| 1740 |
+
"loss": 0.040102115273475646,
|
| 1741 |
+
"step": 2370
|
| 1742 |
+
},
|
| 1743 |
+
{
|
| 1744 |
+
"epoch": 2.4925334031962274,
|
| 1745 |
+
"grad_norm": 0.08667747676372528,
|
| 1746 |
+
"learning_rate": 7.358396238916254e-06,
|
| 1747 |
+
"loss": 0.03656341433525086,
|
| 1748 |
+
"step": 2380
|
| 1749 |
+
},
|
| 1750 |
+
{
|
| 1751 |
+
"epoch": 2.5030128373067857,
|
| 1752 |
+
"grad_norm": 0.1162872165441513,
|
| 1753 |
+
"learning_rate": 7.065968302693882e-06,
|
| 1754 |
+
"loss": 0.04052766263484955,
|
| 1755 |
+
"step": 2390
|
| 1756 |
+
},
|
| 1757 |
+
{
|
| 1758 |
+
"epoch": 2.5134922714173435,
|
| 1759 |
+
"grad_norm": 0.07924140989780426,
|
| 1760 |
+
"learning_rate": 6.7790271736639595e-06,
|
| 1761 |
+
"loss": 0.03394221067428589,
|
| 1762 |
+
"step": 2400
|
| 1763 |
+
},
|
| 1764 |
+
{
|
| 1765 |
+
"epoch": 2.5239717055279014,
|
| 1766 |
+
"grad_norm": 0.09523408859968185,
|
| 1767 |
+
"learning_rate": 6.497609521819681e-06,
|
| 1768 |
+
"loss": 0.04119439423084259,
|
| 1769 |
+
"step": 2410
|
| 1770 |
+
},
|
| 1771 |
+
{
|
| 1772 |
+
"epoch": 2.5344511396384597,
|
| 1773 |
+
"grad_norm": 0.12182598561048508,
|
| 1774 |
+
"learning_rate": 6.221751311274731e-06,
|
| 1775 |
+
"loss": 0.05154783725738525,
|
| 1776 |
+
"step": 2420
|
| 1777 |
+
},
|
| 1778 |
+
{
|
| 1779 |
+
"epoch": 2.5449305737490175,
|
| 1780 |
+
"grad_norm": 0.09359873831272125,
|
| 1781 |
+
"learning_rate": 5.951487795667149e-06,
|
| 1782 |
+
"loss": 0.035483264923095705,
|
| 1783 |
+
"step": 2430
|
| 1784 |
+
},
|
| 1785 |
+
{
|
| 1786 |
+
"epoch": 2.5554100078595754,
|
| 1787 |
+
"grad_norm": 0.08514095097780228,
|
| 1788 |
+
"learning_rate": 5.686853513654117e-06,
|
| 1789 |
+
"loss": 0.03830339312553406,
|
| 1790 |
+
"step": 2440
|
| 1791 |
+
},
|
| 1792 |
+
{
|
| 1793 |
+
"epoch": 2.5658894419701337,
|
| 1794 |
+
"grad_norm": 0.10625084489583969,
|
| 1795 |
+
"learning_rate": 5.4278822844979705e-06,
|
| 1796 |
+
"loss": 0.034111028909683226,
|
| 1797 |
+
"step": 2450
|
| 1798 |
+
},
|
| 1799 |
+
{
|
| 1800 |
+
"epoch": 2.5763688760806915,
|
| 1801 |
+
"grad_norm": 0.1004003956913948,
|
| 1802 |
+
"learning_rate": 5.174607203744286e-06,
|
| 1803 |
+
"loss": 0.04465605318546295,
|
| 1804 |
+
"step": 2460
|
| 1805 |
+
},
|
| 1806 |
+
{
|
| 1807 |
+
"epoch": 2.58684831019125,
|
| 1808 |
+
"grad_norm": 0.0962519720196724,
|
| 1809 |
+
"learning_rate": 4.927060638992382e-06,
|
| 1810 |
+
"loss": 0.041056016087532045,
|
| 1811 |
+
"step": 2470
|
| 1812 |
+
},
|
| 1813 |
+
{
|
| 1814 |
+
"epoch": 2.5973277443018077,
|
| 1815 |
+
"grad_norm": 0.06380607187747955,
|
| 1816 |
+
"learning_rate": 4.685274225758846e-06,
|
| 1817 |
+
"loss": 0.03880062401294708,
|
| 1818 |
+
"step": 2480
|
| 1819 |
+
},
|
| 1820 |
+
{
|
| 1821 |
+
"epoch": 2.607807178412366,
|
| 1822 |
+
"grad_norm": 0.07326535880565643,
|
| 1823 |
+
"learning_rate": 4.449278863434647e-06,
|
| 1824 |
+
"loss": 0.03194461762905121,
|
| 1825 |
+
"step": 2490
|
| 1826 |
+
},
|
| 1827 |
+
{
|
| 1828 |
+
"epoch": 2.618286612522924,
|
| 1829 |
+
"grad_norm": 0.12218596786260605,
|
| 1830 |
+
"learning_rate": 4.2191047113362854e-06,
|
| 1831 |
+
"loss": 0.04258840978145599,
|
| 1832 |
+
"step": 2500
|
| 1833 |
+
},
|
| 1834 |
+
{
|
| 1835 |
+
"epoch": 2.618286612522924,
|
| 1836 |
+
"eval_loss": 0.05223666876554489,
|
| 1837 |
+
"eval_runtime": 37.7234,
|
| 1838 |
+
"eval_samples_per_second": 8.244,
|
| 1839 |
+
"eval_steps_per_second": 8.244,
|
| 1840 |
+
"step": 2500
|
| 1841 |
+
},
|
| 1842 |
+
{
|
| 1843 |
+
"epoch": 2.6287660466334817,
|
| 1844 |
+
"grad_norm": 0.08594664931297302,
|
| 1845 |
+
"learning_rate": 3.994781184851598e-06,
|
| 1846 |
+
"loss": 0.04302787780761719,
|
| 1847 |
+
"step": 2510
|
| 1848 |
+
},
|
| 1849 |
+
{
|
| 1850 |
+
"epoch": 2.63924548074404,
|
| 1851 |
+
"grad_norm": 0.08187596499919891,
|
| 1852 |
+
"learning_rate": 3.776336951680548e-06,
|
| 1853 |
+
"loss": 0.0341387003660202,
|
| 1854 |
+
"step": 2520
|
| 1855 |
+
},
|
| 1856 |
+
{
|
| 1857 |
+
"epoch": 2.649724914854598,
|
| 1858 |
+
"grad_norm": 0.10216796398162842,
|
| 1859 |
+
"learning_rate": 3.563799928171596e-06,
|
| 1860 |
+
"loss": 0.04289879500865936,
|
| 1861 |
+
"step": 2530
|
| 1862 |
+
},
|
| 1863 |
+
{
|
| 1864 |
+
"epoch": 2.6602043489651557,
|
| 1865 |
+
"grad_norm": 0.11215174198150635,
|
| 1866 |
+
"learning_rate": 3.3571972757540814e-06,
|
| 1867 |
+
"loss": 0.04055049121379852,
|
| 1868 |
+
"step": 2540
|
| 1869 |
+
},
|
| 1870 |
+
{
|
| 1871 |
+
"epoch": 2.670683783075714,
|
| 1872 |
+
"grad_norm": 0.07941269129514694,
|
| 1873 |
+
"learning_rate": 3.156555397467176e-06,
|
| 1874 |
+
"loss": 0.04118689000606537,
|
| 1875 |
+
"step": 2550
|
| 1876 |
+
},
|
| 1877 |
+
{
|
| 1878 |
+
"epoch": 2.681163217186272,
|
| 1879 |
+
"grad_norm": 0.09404437988996506,
|
| 1880 |
+
"learning_rate": 2.9618999345855547e-06,
|
| 1881 |
+
"loss": 0.03079705536365509,
|
| 1882 |
+
"step": 2560
|
| 1883 |
+
},
|
| 1884 |
+
{
|
| 1885 |
+
"epoch": 2.69164265129683,
|
| 1886 |
+
"grad_norm": 0.1109817698597908,
|
| 1887 |
+
"learning_rate": 2.773255763342647e-06,
|
| 1888 |
+
"loss": 0.038885954022407535,
|
| 1889 |
+
"step": 2570
|
| 1890 |
+
},
|
| 1891 |
+
{
|
| 1892 |
+
"epoch": 2.702122085407388,
|
| 1893 |
+
"grad_norm": 0.09431962668895721,
|
| 1894 |
+
"learning_rate": 2.590646991751472e-06,
|
| 1895 |
+
"loss": 0.043543145060539246,
|
| 1896 |
+
"step": 2580
|
| 1897 |
+
},
|
| 1898 |
+
{
|
| 1899 |
+
"epoch": 2.7126015195179463,
|
| 1900 |
+
"grad_norm": 0.08184763044118881,
|
| 1901 |
+
"learning_rate": 2.414096956523776e-06,
|
| 1902 |
+
"loss": 0.03256987631320953,
|
| 1903 |
+
"step": 2590
|
| 1904 |
+
},
|
| 1905 |
+
{
|
| 1906 |
+
"epoch": 2.723080953628504,
|
| 1907 |
+
"grad_norm": 0.08390141278505325,
|
| 1908 |
+
"learning_rate": 2.2436282200876458e-06,
|
| 1909 |
+
"loss": 0.03908055424690247,
|
| 1910 |
+
"step": 2600
|
| 1911 |
+
},
|
| 1912 |
+
{
|
| 1913 |
+
"epoch": 2.733560387739062,
|
| 1914 |
+
"grad_norm": 0.0762532502412796,
|
| 1915 |
+
"learning_rate": 2.07926256770416e-06,
|
| 1916 |
+
"loss": 0.04899201393127441,
|
| 1917 |
+
"step": 2610
|
| 1918 |
+
},
|
| 1919 |
+
{
|
| 1920 |
+
"epoch": 2.7440398218496203,
|
| 1921 |
+
"grad_norm": 0.08239631354808807,
|
| 1922 |
+
"learning_rate": 1.9210210046832768e-06,
|
| 1923 |
+
"loss": 0.048707082867622375,
|
| 1924 |
+
"step": 2620
|
| 1925 |
+
},
|
| 1926 |
+
{
|
| 1927 |
+
"epoch": 2.754519255960178,
|
| 1928 |
+
"grad_norm": 0.09619107842445374,
|
| 1929 |
+
"learning_rate": 1.7689237536994364e-06,
|
| 1930 |
+
"loss": 0.0372231125831604,
|
| 1931 |
+
"step": 2630
|
| 1932 |
+
},
|
| 1933 |
+
{
|
| 1934 |
+
"epoch": 2.764998690070736,
|
| 1935 |
+
"grad_norm": 0.07099667191505432,
|
| 1936 |
+
"learning_rate": 1.6229902522072293e-06,
|
| 1937 |
+
"loss": 0.03421170711517334,
|
| 1938 |
+
"step": 2640
|
| 1939 |
+
},
|
| 1940 |
+
{
|
| 1941 |
+
"epoch": 2.7754781241812942,
|
| 1942 |
+
"grad_norm": 0.10154753923416138,
|
| 1943 |
+
"learning_rate": 1.4832391499572996e-06,
|
| 1944 |
+
"loss": 0.03656705319881439,
|
| 1945 |
+
"step": 2650
|
| 1946 |
+
},
|
| 1947 |
+
{
|
| 1948 |
+
"epoch": 2.785957558291852,
|
| 1949 |
+
"grad_norm": 0.09349387139081955,
|
| 1950 |
+
"learning_rate": 1.3496883066130173e-06,
|
| 1951 |
+
"loss": 0.03710306882858276,
|
| 1952 |
+
"step": 2660
|
| 1953 |
+
},
|
| 1954 |
+
{
|
| 1955 |
+
"epoch": 2.7964369924024104,
|
| 1956 |
+
"grad_norm": 0.061091430485248566,
|
| 1957 |
+
"learning_rate": 1.2223547894680443e-06,
|
| 1958 |
+
"loss": 0.0308389812707901,
|
| 1959 |
+
"step": 2670
|
| 1960 |
+
},
|
| 1961 |
+
{
|
| 1962 |
+
"epoch": 2.8069164265129682,
|
| 1963 |
+
"grad_norm": 0.09838075935840607,
|
| 1964 |
+
"learning_rate": 1.101254871265256e-06,
|
| 1965 |
+
"loss": 0.03703555166721344,
|
| 1966 |
+
"step": 2680
|
| 1967 |
+
},
|
| 1968 |
+
{
|
| 1969 |
+
"epoch": 2.8173958606235265,
|
| 1970 |
+
"grad_norm": 0.10046928375959396,
|
| 1971 |
+
"learning_rate": 9.864040281170938e-07,
|
| 1972 |
+
"loss": 0.04500553905963898,
|
| 1973 |
+
"step": 2690
|
| 1974 |
+
},
|
| 1975 |
+
{
|
| 1976 |
+
"epoch": 2.8278752947340844,
|
| 1977 |
+
"grad_norm": 0.06770773977041245,
|
| 1978 |
+
"learning_rate": 8.778169375277978e-07,
|
| 1979 |
+
"loss": 0.03823737502098083,
|
| 1980 |
+
"step": 2700
|
| 1981 |
+
},
|
| 1982 |
+
{
|
| 1983 |
+
"epoch": 2.8383547288446422,
|
| 1984 |
+
"grad_norm": 0.08373535424470901,
|
| 1985 |
+
"learning_rate": 7.755074765176618e-07,
|
| 1986 |
+
"loss": 0.03961678743362427,
|
| 1987 |
+
"step": 2710
|
| 1988 |
+
},
|
| 1989 |
+
{
|
| 1990 |
+
"epoch": 2.8488341629552005,
|
| 1991 |
+
"grad_norm": 0.07590050995349884,
|
| 1992 |
+
"learning_rate": 6.794887198496413e-07,
|
| 1993 |
+
"loss": 0.03221273124217987,
|
| 1994 |
+
"step": 2720
|
| 1995 |
+
},
|
| 1996 |
+
{
|
| 1997 |
+
"epoch": 2.8593135970657584,
|
| 1998 |
+
"grad_norm": 0.08507678657770157,
|
| 1999 |
+
"learning_rate": 5.897729383583906e-07,
|
| 2000 |
+
"loss": 0.04571912884712219,
|
| 2001 |
+
"step": 2730
|
| 2002 |
+
},
|
| 2003 |
+
{
|
| 2004 |
+
"epoch": 2.8697930311763162,
|
| 2005 |
+
"grad_norm": 0.06584763526916504,
|
| 2006 |
+
"learning_rate": 5.063715973821659e-07,
|
| 2007 |
+
"loss": 0.03794914484024048,
|
| 2008 |
+
"step": 2740
|
| 2009 |
+
},
|
| 2010 |
+
{
|
| 2011 |
+
"epoch": 2.8802724652868745,
|
| 2012 |
+
"grad_norm": 0.07312892377376556,
|
| 2013 |
+
"learning_rate": 4.292953552975154e-07,
|
| 2014 |
+
"loss": 0.036365586519241336,
|
| 2015 |
+
"step": 2750
|
| 2016 |
+
},
|
| 2017 |
+
{
|
| 2018 |
+
"epoch": 2.8802724652868745,
|
| 2019 |
+
"eval_loss": 0.05090421438217163,
|
| 2020 |
+
"eval_runtime": 85.293,
|
| 2021 |
+
"eval_samples_per_second": 3.646,
|
| 2022 |
+
"eval_steps_per_second": 3.646,
|
| 2023 |
+
"step": 2750
|
| 2024 |
+
},
|
| 2025 |
+
{
|
| 2026 |
+
"epoch": 2.8907518993974324,
|
| 2027 |
+
"grad_norm": 0.08459606021642685,
|
| 2028 |
+
"learning_rate": 3.5855406215725697e-07,
|
| 2029 |
+
"loss": 0.03068857192993164,
|
| 2030 |
+
"step": 2760
|
| 2031 |
+
},
|
| 2032 |
+
{
|
| 2033 |
+
"epoch": 2.9012313335079907,
|
| 2034 |
+
"grad_norm": 0.06866376101970673,
|
| 2035 |
+
"learning_rate": 2.9415675843163515e-07,
|
| 2036 |
+
"loss": 0.03265829384326935,
|
| 2037 |
+
"step": 2770
|
| 2038 |
+
},
|
| 2039 |
+
{
|
| 2040 |
+
"epoch": 2.9117107676185485,
|
| 2041 |
+
"grad_norm": 0.09082643687725067,
|
| 2042 |
+
"learning_rate": 2.361116738529956e-07,
|
| 2043 |
+
"loss": 0.03418546915054321,
|
| 2044 |
+
"step": 2780
|
| 2045 |
+
},
|
| 2046 |
+
{
|
| 2047 |
+
"epoch": 2.922190201729107,
|
| 2048 |
+
"grad_norm": 0.10772739350795746,
|
| 2049 |
+
"learning_rate": 1.8442622636404284e-07,
|
| 2050 |
+
"loss": 0.03810786008834839,
|
| 2051 |
+
"step": 2790
|
| 2052 |
+
},
|
| 2053 |
+
{
|
| 2054 |
+
"epoch": 2.9326696358396647,
|
| 2055 |
+
"grad_norm": 0.08321297913789749,
|
| 2056 |
+
"learning_rate": 1.391070211698764e-07,
|
| 2057 |
+
"loss": 0.04068491756916046,
|
| 2058 |
+
"step": 2800
|
| 2059 |
+
},
|
| 2060 |
+
{
|
| 2061 |
+
"epoch": 2.9431490699502225,
|
| 2062 |
+
"grad_norm": 0.11239277571439743,
|
| 2063 |
+
"learning_rate": 1.0015984989385496e-07,
|
| 2064 |
+
"loss": 0.041029155254364014,
|
| 2065 |
+
"step": 2810
|
| 2066 |
+
},
|
| 2067 |
+
{
|
| 2068 |
+
"epoch": 2.953628504060781,
|
| 2069 |
+
"grad_norm": 0.07199843227863312,
|
| 2070 |
+
"learning_rate": 6.758968983747171e-08,
|
| 2071 |
+
"loss": 0.037902483344078065,
|
| 2072 |
+
"step": 2820
|
| 2073 |
+
},
|
| 2074 |
+
{
|
| 2075 |
+
"epoch": 2.9641079381713387,
|
| 2076 |
+
"grad_norm": 0.08249279856681824,
|
| 2077 |
+
"learning_rate": 4.140070334422985e-08,
|
| 2078 |
+
"loss": 0.03996126651763916,
|
| 2079 |
+
"step": 2830
|
| 2080 |
+
},
|
| 2081 |
+
{
|
| 2082 |
+
"epoch": 2.9745873722818965,
|
| 2083 |
+
"grad_norm": 0.0852220207452774,
|
| 2084 |
+
"learning_rate": 2.1596237267751396e-08,
|
| 2085 |
+
"loss": 0.04228667616844177,
|
| 2086 |
+
"step": 2840
|
| 2087 |
+
},
|
| 2088 |
+
{
|
| 2089 |
+
"epoch": 2.985066806392455,
|
| 2090 |
+
"grad_norm": 0.0858582928776741,
|
| 2091 |
+
"learning_rate": 8.178822544052666e-09,
|
| 2092 |
+
"loss": 0.03813594281673431,
|
| 2093 |
+
"step": 2850
|
| 2094 |
+
},
|
| 2095 |
+
{
|
| 2096 |
+
"epoch": 2.995546240503013,
|
| 2097 |
+
"grad_norm": 0.06642451137304306,
|
| 2098 |
+
"learning_rate": 1.1501738680919084e-09,
|
| 2099 |
+
"loss": 0.033472076058387756,
|
| 2100 |
+
"step": 2860
|
| 2101 |
+
}
|
| 2102 |
+
],
|
| 2103 |
+
"logging_steps": 10,
|
| 2104 |
+
"max_steps": 2865,
|
| 2105 |
+
"num_input_tokens_seen": 0,
|
| 2106 |
+
"num_train_epochs": 3,
|
| 2107 |
+
"save_steps": 250,
|
| 2108 |
+
"stateful_callbacks": {
|
| 2109 |
+
"TrainerControl": {
|
| 2110 |
+
"args": {
|
| 2111 |
+
"should_epoch_stop": false,
|
| 2112 |
+
"should_evaluate": false,
|
| 2113 |
+
"should_log": false,
|
| 2114 |
+
"should_save": true,
|
| 2115 |
+
"should_training_stop": true
|
| 2116 |
+
},
|
| 2117 |
+
"attributes": {}
|
| 2118 |
+
}
|
| 2119 |
+
},
|
| 2120 |
+
"total_flos": 9.031737271887514e+17,
|
| 2121 |
+
"train_batch_size": 2,
|
| 2122 |
+
"trial_name": null,
|
| 2123 |
+
"trial_params": null
|
| 2124 |
+
}
|
checkpoint-2865/training_args.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e8fc737554ff6f82c4ea137b5313611e3b2b3b63fd69b3926d6b1fe9da14c0a6
|
| 3 |
+
size 5201
|
merged/chat_template.jinja
ADDED
|
@@ -0,0 +1,154 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{%- set image_count = namespace(value=0) %}
|
| 2 |
+
{%- set video_count = namespace(value=0) %}
|
| 3 |
+
{%- macro render_content(content, do_vision_count, is_system_content=false) %}
|
| 4 |
+
{%- if content is string %}
|
| 5 |
+
{{- content }}
|
| 6 |
+
{%- elif content is iterable and content is not mapping %}
|
| 7 |
+
{%- for item in content %}
|
| 8 |
+
{%- if 'image' in item or 'image_url' in item or item.type == 'image' %}
|
| 9 |
+
{%- if is_system_content %}
|
| 10 |
+
{{- raise_exception('System message cannot contain images.') }}
|
| 11 |
+
{%- endif %}
|
| 12 |
+
{%- if do_vision_count %}
|
| 13 |
+
{%- set image_count.value = image_count.value + 1 %}
|
| 14 |
+
{%- endif %}
|
| 15 |
+
{%- if add_vision_id %}
|
| 16 |
+
{{- 'Picture ' ~ image_count.value ~ ': ' }}
|
| 17 |
+
{%- endif %}
|
| 18 |
+
{{- '<|vision_start|><|image_pad|><|vision_end|>' }}
|
| 19 |
+
{%- elif 'video' in item or item.type == 'video' %}
|
| 20 |
+
{%- if is_system_content %}
|
| 21 |
+
{{- raise_exception('System message cannot contain videos.') }}
|
| 22 |
+
{%- endif %}
|
| 23 |
+
{%- if do_vision_count %}
|
| 24 |
+
{%- set video_count.value = video_count.value + 1 %}
|
| 25 |
+
{%- endif %}
|
| 26 |
+
{%- if add_vision_id %}
|
| 27 |
+
{{- 'Video ' ~ video_count.value ~ ': ' }}
|
| 28 |
+
{%- endif %}
|
| 29 |
+
{{- '<|vision_start|><|video_pad|><|vision_end|>' }}
|
| 30 |
+
{%- elif 'text' in item %}
|
| 31 |
+
{{- item.text }}
|
| 32 |
+
{%- else %}
|
| 33 |
+
{{- raise_exception('Unexpected item type in content.') }}
|
| 34 |
+
{%- endif %}
|
| 35 |
+
{%- endfor %}
|
| 36 |
+
{%- elif content is none or content is undefined %}
|
| 37 |
+
{{- '' }}
|
| 38 |
+
{%- else %}
|
| 39 |
+
{{- raise_exception('Unexpected content type.') }}
|
| 40 |
+
{%- endif %}
|
| 41 |
+
{%- endmacro %}
|
| 42 |
+
{%- if not messages %}
|
| 43 |
+
{{- raise_exception('No messages provided.') }}
|
| 44 |
+
{%- endif %}
|
| 45 |
+
{%- if tools and tools is iterable and tools is not mapping %}
|
| 46 |
+
{{- '<|im_start|>system\n' }}
|
| 47 |
+
{{- "# Tools\n\nYou have access to the following functions:\n\n<tools>" }}
|
| 48 |
+
{%- for tool in tools %}
|
| 49 |
+
{{- "\n" }}
|
| 50 |
+
{{- tool | tojson }}
|
| 51 |
+
{%- endfor %}
|
| 52 |
+
{{- "\n</tools>" }}
|
| 53 |
+
{{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
|
| 54 |
+
{%- if messages[0].role == 'system' %}
|
| 55 |
+
{%- set content = render_content(messages[0].content, false, true)|trim %}
|
| 56 |
+
{%- if content %}
|
| 57 |
+
{{- '\n\n' + content }}
|
| 58 |
+
{%- endif %}
|
| 59 |
+
{%- endif %}
|
| 60 |
+
{{- '<|im_end|>\n' }}
|
| 61 |
+
{%- else %}
|
| 62 |
+
{%- if messages[0].role == 'system' %}
|
| 63 |
+
{%- set content = render_content(messages[0].content, false, true)|trim %}
|
| 64 |
+
{{- '<|im_start|>system\n' + content + '<|im_end|>\n' }}
|
| 65 |
+
{%- endif %}
|
| 66 |
+
{%- endif %}
|
| 67 |
+
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
|
| 68 |
+
{%- for message in messages[::-1] %}
|
| 69 |
+
{%- set index = (messages|length - 1) - loop.index0 %}
|
| 70 |
+
{%- if ns.multi_step_tool and message.role == "user" %}
|
| 71 |
+
{%- set content = render_content(message.content, false)|trim %}
|
| 72 |
+
{%- if not(content.startswith('<tool_response>') and content.endswith('</tool_response>')) %}
|
| 73 |
+
{%- set ns.multi_step_tool = false %}
|
| 74 |
+
{%- set ns.last_query_index = index %}
|
| 75 |
+
{%- endif %}
|
| 76 |
+
{%- endif %}
|
| 77 |
+
{%- endfor %}
|
| 78 |
+
{%- if ns.multi_step_tool %}
|
| 79 |
+
{{- raise_exception('No user query found in messages.') }}
|
| 80 |
+
{%- endif %}
|
| 81 |
+
{%- for message in messages %}
|
| 82 |
+
{%- set content = render_content(message.content, true)|trim %}
|
| 83 |
+
{%- if message.role == "system" %}
|
| 84 |
+
{%- if not loop.first %}
|
| 85 |
+
{{- raise_exception('System message must be at the beginning.') }}
|
| 86 |
+
{%- endif %}
|
| 87 |
+
{%- elif message.role == "user" %}
|
| 88 |
+
{{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
|
| 89 |
+
{%- elif message.role == "assistant" %}
|
| 90 |
+
{%- set reasoning_content = '' %}
|
| 91 |
+
{%- if message.reasoning_content is string %}
|
| 92 |
+
{%- set reasoning_content = message.reasoning_content %}
|
| 93 |
+
{%- else %}
|
| 94 |
+
{%- if '</think>' in content %}
|
| 95 |
+
{%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
|
| 96 |
+
{%- set content = content.split('</think>')[-1].lstrip('\n') %}
|
| 97 |
+
{%- endif %}
|
| 98 |
+
{%- endif %}
|
| 99 |
+
{%- set reasoning_content = reasoning_content|trim %}
|
| 100 |
+
{%- if loop.index0 > ns.last_query_index %}
|
| 101 |
+
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content + '\n</think>\n\n' + content }}
|
| 102 |
+
{%- else %}
|
| 103 |
+
{{- '<|im_start|>' + message.role + '\n' + content }}
|
| 104 |
+
{%- endif %}
|
| 105 |
+
{%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}
|
| 106 |
+
{%- for tool_call in message.tool_calls %}
|
| 107 |
+
{%- if tool_call.function is defined %}
|
| 108 |
+
{%- set tool_call = tool_call.function %}
|
| 109 |
+
{%- endif %}
|
| 110 |
+
{%- if loop.first %}
|
| 111 |
+
{%- if content|trim %}
|
| 112 |
+
{{- '\n\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
|
| 113 |
+
{%- else %}
|
| 114 |
+
{{- '<tool_call>\n<function=' + tool_call.name + '>\n' }}
|
| 115 |
+
{%- endif %}
|
| 116 |
+
{%- else %}
|
| 117 |
+
{{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
|
| 118 |
+
{%- endif %}
|
| 119 |
+
{%- if tool_call.arguments is defined %}
|
| 120 |
+
{%- for args_name, args_value in tool_call.arguments|items %}
|
| 121 |
+
{{- '<parameter=' + args_name + '>\n' }}
|
| 122 |
+
{%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}
|
| 123 |
+
{{- args_value }}
|
| 124 |
+
{{- '\n</parameter>\n' }}
|
| 125 |
+
{%- endfor %}
|
| 126 |
+
{%- endif %}
|
| 127 |
+
{{- '</function>\n</tool_call>' }}
|
| 128 |
+
{%- endfor %}
|
| 129 |
+
{%- endif %}
|
| 130 |
+
{{- '<|im_end|>\n' }}
|
| 131 |
+
{%- elif message.role == "tool" %}
|
| 132 |
+
{%- if loop.previtem and loop.previtem.role != "tool" %}
|
| 133 |
+
{{- '<|im_start|>user' }}
|
| 134 |
+
{%- endif %}
|
| 135 |
+
{{- '\n<tool_response>\n' }}
|
| 136 |
+
{{- content }}
|
| 137 |
+
{{- '\n</tool_response>' }}
|
| 138 |
+
{%- if not loop.last and loop.nextitem.role != "tool" %}
|
| 139 |
+
{{- '<|im_end|>\n' }}
|
| 140 |
+
{%- elif loop.last %}
|
| 141 |
+
{{- '<|im_end|>\n' }}
|
| 142 |
+
{%- endif %}
|
| 143 |
+
{%- else %}
|
| 144 |
+
{{- raise_exception('Unexpected message role.') }}
|
| 145 |
+
{%- endif %}
|
| 146 |
+
{%- endfor %}
|
| 147 |
+
{%- if add_generation_prompt %}
|
| 148 |
+
{{- '<|im_start|>assistant\n' }}
|
| 149 |
+
{%- if enable_thinking is defined and enable_thinking is true %}
|
| 150 |
+
{{- '<think>\n' }}
|
| 151 |
+
{%- else %}
|
| 152 |
+
{{- '<think>\n\n</think>\n\n' }}
|
| 153 |
+
{%- endif %}
|
| 154 |
+
{%- endif %}
|
merged/config.json
ADDED
|
@@ -0,0 +1,75 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"architectures": [
|
| 3 |
+
"Qwen3_5ForCausalLM"
|
| 4 |
+
],
|
| 5 |
+
"attention_bias": false,
|
| 6 |
+
"attention_dropout": 0.0,
|
| 7 |
+
"attn_output_gate": true,
|
| 8 |
+
"bos_token_id": null,
|
| 9 |
+
"dtype": "bfloat16",
|
| 10 |
+
"eos_token_id": 248046,
|
| 11 |
+
"full_attention_interval": 4,
|
| 12 |
+
"head_dim": 256,
|
| 13 |
+
"hidden_act": "silu",
|
| 14 |
+
"hidden_size": 2048,
|
| 15 |
+
"initializer_range": 0.02,
|
| 16 |
+
"intermediate_size": 6144,
|
| 17 |
+
"layer_types": [
|
| 18 |
+
"linear_attention",
|
| 19 |
+
"linear_attention",
|
| 20 |
+
"linear_attention",
|
| 21 |
+
"full_attention",
|
| 22 |
+
"linear_attention",
|
| 23 |
+
"linear_attention",
|
| 24 |
+
"linear_attention",
|
| 25 |
+
"full_attention",
|
| 26 |
+
"linear_attention",
|
| 27 |
+
"linear_attention",
|
| 28 |
+
"linear_attention",
|
| 29 |
+
"full_attention",
|
| 30 |
+
"linear_attention",
|
| 31 |
+
"linear_attention",
|
| 32 |
+
"linear_attention",
|
| 33 |
+
"full_attention",
|
| 34 |
+
"linear_attention",
|
| 35 |
+
"linear_attention",
|
| 36 |
+
"linear_attention",
|
| 37 |
+
"full_attention",
|
| 38 |
+
"linear_attention",
|
| 39 |
+
"linear_attention",
|
| 40 |
+
"linear_attention",
|
| 41 |
+
"full_attention"
|
| 42 |
+
],
|
| 43 |
+
"linear_conv_kernel_dim": 4,
|
| 44 |
+
"linear_key_head_dim": 128,
|
| 45 |
+
"linear_num_key_heads": 16,
|
| 46 |
+
"linear_num_value_heads": 16,
|
| 47 |
+
"linear_value_head_dim": 128,
|
| 48 |
+
"mamba_ssm_dtype": "float32",
|
| 49 |
+
"max_position_embeddings": 262144,
|
| 50 |
+
"mlp_only_layers": [],
|
| 51 |
+
"model_type": "qwen3_5_text",
|
| 52 |
+
"mtp_num_hidden_layers": 1,
|
| 53 |
+
"mtp_use_dedicated_embeddings": false,
|
| 54 |
+
"num_attention_heads": 8,
|
| 55 |
+
"num_hidden_layers": 24,
|
| 56 |
+
"num_key_value_heads": 2,
|
| 57 |
+
"pad_token_id": 248044,
|
| 58 |
+
"partial_rotary_factor": 0.25,
|
| 59 |
+
"rms_norm_eps": 1e-06,
|
| 60 |
+
"rope_parameters": {
|
| 61 |
+
"mrope_interleaved": true,
|
| 62 |
+
"mrope_section": [
|
| 63 |
+
11,
|
| 64 |
+
11,
|
| 65 |
+
10
|
| 66 |
+
],
|
| 67 |
+
"partial_rotary_factor": 0.25,
|
| 68 |
+
"rope_theta": 10000000,
|
| 69 |
+
"rope_type": "default"
|
| 70 |
+
},
|
| 71 |
+
"tie_word_embeddings": true,
|
| 72 |
+
"transformers_version": "5.3.0",
|
| 73 |
+
"use_cache": false,
|
| 74 |
+
"vocab_size": 248320
|
| 75 |
+
}
|
merged/generation_config.json
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_from_model_config": true,
|
| 3 |
+
"eos_token_id": [
|
| 4 |
+
248046,
|
| 5 |
+
248044
|
| 6 |
+
],
|
| 7 |
+
"pad_token_id": 248044,
|
| 8 |
+
"transformers_version": "5.3.0",
|
| 9 |
+
"use_cache": true
|
| 10 |
+
}
|
merged/model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6cd7701ff97b7561e0dc158635941426e2a6b9fce824390e6995c9485605b2b7
|
| 3 |
+
size 3763692048
|
merged/tokenizer.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:87a7830d63fcf43bf241c3c5242e96e62dd3fdc29224ca26fed8ea333db72de4
|
| 3 |
+
size 19989343
|
merged/tokenizer_config.json
ADDED
|
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"add_prefix_space": false,
|
| 3 |
+
"audio_bos_token": "<|audio_start|>",
|
| 4 |
+
"audio_eos_token": "<|audio_end|>",
|
| 5 |
+
"audio_token": "<|audio_pad|>",
|
| 6 |
+
"backend": "tokenizers",
|
| 7 |
+
"bos_token": null,
|
| 8 |
+
"clean_up_tokenization_spaces": false,
|
| 9 |
+
"eos_token": "<|im_end|>",
|
| 10 |
+
"errors": "replace",
|
| 11 |
+
"image_token": "<|image_pad|>",
|
| 12 |
+
"is_local": true,
|
| 13 |
+
"model_max_length": 262144,
|
| 14 |
+
"model_specific_special_tokens": {
|
| 15 |
+
"audio_bos_token": "<|audio_start|>",
|
| 16 |
+
"audio_eos_token": "<|audio_end|>",
|
| 17 |
+
"audio_token": "<|audio_pad|>",
|
| 18 |
+
"image_token": "<|image_pad|>",
|
| 19 |
+
"video_token": "<|video_pad|>",
|
| 20 |
+
"vision_bos_token": "<|vision_start|>",
|
| 21 |
+
"vision_eos_token": "<|vision_end|>"
|
| 22 |
+
},
|
| 23 |
+
"pad_token": "<|endoftext|>",
|
| 24 |
+
"pretokenize_regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?[\\p{L}\\p{M}]+|\\p{N}| ?[^\\s\\p{L}\\p{M}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
|
| 25 |
+
"split_special_tokens": false,
|
| 26 |
+
"tokenizer_class": "TokenizersBackend",
|
| 27 |
+
"unk_token": null,
|
| 28 |
+
"video_token": "<|video_pad|>",
|
| 29 |
+
"vision_bos_token": "<|vision_start|>",
|
| 30 |
+
"vision_eos_token": "<|vision_end|>"
|
| 31 |
+
}
|
run-config.txt
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
model=/data/pretrained_models/Qwen3.5-2B
|
| 2 |
+
data=/home/lg/workflow_tooluse/Flow_RL_luogan/temp/metamath/metamath-output/setmm-train-qwen35-4b-mixed-12000
|
| 3 |
+
max_length=6144
|
| 4 |
+
micro_batch_size=2
|
| 5 |
+
gradient_accumulation_steps=8
|
| 6 |
+
effective_batch_size=16
|
| 7 |
+
learning_rate=1e-4
|
| 8 |
+
num_train_epochs=3
|
| 9 |
+
direct_ref_mode=same-file-distractors
|
| 10 |
+
same_file_distractor_direct_refs=4
|
| 11 |
+
distractor_seed=0
|
| 12 |
+
gpu_ids=1
|
skipped-tokenization.jsonl
ADDED
|
@@ -0,0 +1,422 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{"label": "upgrwlkedg", "expanded_label": "upgriswlk", "source": "expanded", "tokens": 9508, "reason": "max_length"}
|
| 2 |
+
{"label": "iiconn", "expanded_label": "1re", "source": "expanded", "tokens": 7147, "reason": "max_length"}
|
| 3 |
+
{"label": "dprd2db", "expanded_label": "dprdspan", "source": "expanded", "tokens": 7426, "reason": "max_length"}
|
| 4 |
+
{"label": "pm2mpf", "expanded_label": "pm2mpval", "source": "expanded", "tokens": 8963, "reason": "max_length"}
|
| 5 |
+
{"label": "phllvec", "expanded_label": "isphl", "source": "expanded", "tokens": 10796, "reason": "max_length"}
|
| 6 |
+
{"label": "xrlelttr", "expanded_label": "xrlttr", "source": "expanded", "tokens": 6891, "reason": "max_length"}
|
| 7 |
+
{"label": "trlres", "expanded_label": "wlkres", "source": "expanded", "tokens": 17184, "reason": "max_length"}
|
| 8 |
+
{"label": "pcorev", "expanded_label": "pcorevlem", "source": "expanded", "tokens": 53452, "reason": "max_length"}
|
| 9 |
+
{"label": "clwlkclwwlkf1o", "expanded_label": "clwlkclwwlkfo", "source": "expanded", "tokens": 11298, "reason": "max_length"}
|
| 10 |
+
{"label": "divcan7", "expanded_label": "divdivdiv", "source": "expanded", "tokens": 9114, "reason": "max_length"}
|
| 11 |
+
{"label": "ghmfghm", "expanded_label": "ghmgrp", "source": "expanded", "tokens": 6287, "reason": "max_length"}
|
| 12 |
+
{"label": "zrzeroorngc", "expanded_label": "zrtermorngc", "source": "expanded", "tokens": 9359, "reason": "max_length"}
|
| 13 |
+
{"label": "srgmgp", "expanded_label": "issrg", "source": "expanded", "tokens": 12562, "reason": "max_length"}
|
| 14 |
+
{"label": "m1bits", "expanded_label": "bitscmp", "source": "expanded", "tokens": 15084, "reason": "max_length"}
|
| 15 |
+
{"label": "resinhcl", "expanded_label": "sinhval", "source": "expanded", "tokens": 6694, "reason": "max_length"}
|
| 16 |
+
{"label": "orngogrp", "expanded_label": "isorng", "source": "expanded", "tokens": 9577, "reason": "max_length"}
|
| 17 |
+
{"label": "psrasclcl", "expanded_label": "psrring", "source": "expanded", "tokens": 8539, "reason": "max_length"}
|
| 18 |
+
{"label": "usgrexmpl1", "expanded_label": "usgrexmpl1lem", "source": "expanded", "tokens": 23243, "reason": "max_length"}
|
| 19 |
+
{"label": "lssnvc", "expanded_label": "lssnlm", "source": "expanded", "tokens": 9394, "reason": "max_length"}
|
| 20 |
+
{"label": "xlemul2a", "expanded_label": "xlemul1a", "source": "expanded", "tokens": 17516, "reason": "max_length"}
|
| 21 |
+
{"label": "cphphl", "expanded_label": "iscph", "source": "expanded", "tokens": 8762, "reason": "max_length"}
|
| 22 |
+
{"label": "fclstop", "expanded_label": "isfcls", "source": "expanded", "tokens": 7205, "reason": "max_length"}
|
| 23 |
+
{"label": "lvecprop2d", "expanded_label": "lmodprop2d", "source": "expanded", "tokens": 33457, "reason": "max_length"}
|
| 24 |
+
{"label": "drhmsubcALTV", "expanded_label": "srhmsubcALTV", "source": "expanded", "tokens": 13003, "reason": "max_length"}
|
| 25 |
+
{"label": "nsmndex1", "expanded_label": "smndex1n0mnd", "source": "expanded", "tokens": 8470, "reason": "max_length"}
|
| 26 |
+
{"label": "ringcidALTV", "expanded_label": "ringccatidALTV", "source": "expanded", "tokens": 31357, "reason": "max_length"}
|
| 27 |
+
{"label": "ghmima", "expanded_label": "ghmrn", "source": "expanded", "tokens": 8915, "reason": "max_length"}
|
| 28 |
+
{"label": "pmatassa", "expanded_label": "matassa", "source": "expanded", "tokens": 9716, "reason": "max_length"}
|
| 29 |
+
{"label": "addgt0", "expanded_label": "00id", "source": "expanded", "tokens": 8215, "reason": "max_length"}
|
| 30 |
+
{"label": "ser0", "expanded_label": "00id", "source": "expanded", "tokens": 8607, "reason": "max_length"}
|
| 31 |
+
{"label": "mulgfn", "expanded_label": "mulgfval", "source": "expanded", "tokens": 15315, "reason": "max_length"}
|
| 32 |
+
{"label": "usgrexmpl2", "expanded_label": "usgrexmpl2lem", "source": "expanded", "tokens": 22678, "reason": "max_length"}
|
| 33 |
+
{"label": "indistps2ALT", "expanded_label": "indistopon", "source": "expanded", "tokens": 6910, "reason": "max_length"}
|
| 34 |
+
{"label": "pfx0", "expanded_label": "swrd0", "source": "expanded", "tokens": 7535, "reason": "max_length"}
|
| 35 |
+
{"label": "dprd2db", "expanded_label": "dprd2da", "source": "expanded", "tokens": 71940, "reason": "max_length"}
|
| 36 |
+
{"label": "fmfg", "expanded_label": "fgcl", "source": "expanded", "tokens": 8599, "reason": "max_length"}
|
| 37 |
+
{"label": "numufl", "expanded_label": "filssufilg", "source": "expanded", "tokens": 9093, "reason": "max_length"}
|
| 38 |
+
{"label": "wlkiswwlkupgr", "expanded_label": "wlkiswwlksupgr2", "source": "expanded", "tokens": 12895, "reason": "max_length"}
|
| 39 |
+
{"label": "addcomsr", "expanded_label": "addsrpr", "source": "expanded", "tokens": 14104, "reason": "max_length"}
|
| 40 |
+
{"label": "mat1rhm", "expanded_label": "matring", "source": "expanded", "tokens": 18352, "reason": "max_length"}
|
| 41 |
+
{"label": "dpjid", "expanded_label": "dpjidcl", "source": "expanded", "tokens": 38662, "reason": "max_length"}
|
| 42 |
+
{"label": "metcn4", "expanded_label": "met1stc", "source": "expanded", "tokens": 13166, "reason": "max_length"}
|
| 43 |
+
{"label": "mhmf", "expanded_label": "ismhm", "source": "expanded", "tokens": 7423, "reason": "max_length"}
|
| 44 |
+
{"label": "pr01ssre", "expanded_label": "1re", "source": "expanded", "tokens": 6787, "reason": "max_length"}
|
| 45 |
+
{"label": "zcld2", "expanded_label": "recld2", "source": "expanded", "tokens": 7912, "reason": "max_length"}
|
| 46 |
+
{"label": "submgmmgm", "expanded_label": "issubmgm2", "source": "expanded", "tokens": 6349, "reason": "max_length"}
|
| 47 |
+
{"label": "rehaus", "expanded_label": "tgioo", "source": "expanded", "tokens": 22231, "reason": "max_length"}
|
| 48 |
+
{"label": "pmat1op", "expanded_label": "mat1", "source": "expanded", "tokens": 6209, "reason": "max_length"}
|
| 49 |
+
{"label": "sincos1sgn", "expanded_label": "1re", "source": "expanded", "tokens": 9636, "reason": "max_length"}
|
| 50 |
+
{"label": "wwlksnonfi", "expanded_label": "wwlksnfi", "source": "expanded", "tokens": 6363, "reason": "max_length"}
|
| 51 |
+
{"label": "1rp", "expanded_label": "1re", "source": "expanded", "tokens": 6599, "reason": "max_length"}
|
| 52 |
+
{"label": "qtopcmp", "expanded_label": "cncmp", "source": "expanded", "tokens": 16895, "reason": "max_length"}
|
| 53 |
+
{"label": "sqrtle", "expanded_label": "resqrtcl", "source": "expanded", "tokens": 7775, "reason": "max_length"}
|
| 54 |
+
{"label": "opsrsca", "expanded_label": "psrsca", "source": "expanded", "tokens": 6160, "reason": "max_length"}
|
| 55 |
+
{"label": "pgpfi2", "expanded_label": "pgpfi", "source": "expanded", "tokens": 18037, "reason": "max_length"}
|
| 56 |
+
{"label": "ere", "expanded_label": "1re", "source": "expanded", "tokens": 6736, "reason": "max_length"}
|
| 57 |
+
{"label": "sgrp2nmnd", "expanded_label": "sgrp2nmndlem5", "source": "expanded", "tokens": 6277, "reason": "max_length"}
|
| 58 |
+
{"label": "tgtopon", "expanded_label": "tgcl", "source": "expanded", "tokens": 9236, "reason": "max_length"}
|
| 59 |
+
{"label": "cnptop1", "expanded_label": "iscnp2", "source": "expanded", "tokens": 9488, "reason": "max_length"}
|
| 60 |
+
{"label": "fprod0diag", "expanded_label": "fsum0diaglem", "source": "expanded", "tokens": 6497, "reason": "max_length"}
|
| 61 |
+
{"label": "zprodn0", "expanded_label": "zprod", "source": "expanded", "tokens": 18101, "reason": "max_length"}
|
| 62 |
+
{"label": "ishtpyd", "expanded_label": "ishtpy", "source": "expanded", "tokens": 8698, "reason": "max_length"}
|
| 63 |
+
{"label": "axlttrn", "expanded_label": "ltxrlt", "source": "expanded", "tokens": 8594, "reason": "max_length"}
|
| 64 |
+
{"label": "xrltletr", "expanded_label": "xrlttr", "source": "expanded", "tokens": 6927, "reason": "max_length"}
|
| 65 |
+
{"label": "m2cpmf1o", "expanded_label": "m2cpmfo", "source": "expanded", "tokens": 6756, "reason": "max_length"}
|
| 66 |
+
{"label": "lgricngricex", "expanded_label": "gpg5grlic", "source": "expanded", "tokens": 6365, "reason": "max_length"}
|
| 67 |
+
{"label": "nnrecre", "expanded_label": "1re", "source": "expanded", "tokens": 6790, "reason": "max_length"}
|
| 68 |
+
{"label": "grimedgi", "expanded_label": "grimedg", "source": "expanded", "tokens": 21386, "reason": "max_length"}
|
| 69 |
+
{"label": "neg1lt0", "expanded_label": "1re", "source": "expanded", "tokens": 6830, "reason": "max_length"}
|
| 70 |
+
{"label": "pi1xfrgim", "expanded_label": "pi1xfrcnv", "source": "expanded", "tokens": 19046, "reason": "max_length"}
|
| 71 |
+
{"label": "metreg", "expanded_label": "methaus", "source": "expanded", "tokens": 11319, "reason": "max_length"}
|
| 72 |
+
{"label": "fldiv2", "expanded_label": "fldiv", "source": "expanded", "tokens": 15130, "reason": "max_length"}
|
| 73 |
+
{"label": "erclwwlkn", "expanded_label": "erclwwlknsym", "source": "expanded", "tokens": 7474, "reason": "max_length"}
|
| 74 |
+
{"label": "uvtx2vtx1edg", "expanded_label": "nbgr2vtx1edg", "source": "expanded", "tokens": 8190, "reason": "max_length"}
|
| 75 |
+
{"label": "recms", "expanded_label": "recld2", "source": "expanded", "tokens": 7878, "reason": "max_length"}
|
| 76 |
+
{"label": "iccordt", "expanded_label": "letsr", "source": "expanded", "tokens": 6694, "reason": "max_length"}
|
| 77 |
+
{"label": "crhmsubc", "expanded_label": "srhmsubc", "source": "expanded", "tokens": 12634, "reason": "max_length"}
|
| 78 |
+
{"label": "xmul02", "expanded_label": "xmulcom", "source": "expanded", "tokens": 8030, "reason": "max_length"}
|
| 79 |
+
{"label": "pcabs", "expanded_label": "pcneg", "source": "expanded", "tokens": 8094, "reason": "max_length"}
|
| 80 |
+
{"label": "indisuni", "expanded_label": "indistopon", "source": "expanded", "tokens": 7576, "reason": "max_length"}
|
| 81 |
+
{"label": "rngqiprngho", "expanded_label": "rngqiprngghm", "source": "expanded", "tokens": 10189, "reason": "max_length"}
|
| 82 |
+
{"label": "peano2re", "expanded_label": "1re", "source": "expanded", "tokens": 6658, "reason": "max_length"}
|
| 83 |
+
{"label": "seqabs", "expanded_label": "fsumabs", "source": "expanded", "tokens": 19216, "reason": "max_length"}
|
| 84 |
+
{"label": "prmrp", "expanded_label": "coprm", "source": "expanded", "tokens": 7707, "reason": "max_length"}
|
| 85 |
+
{"label": "addsub", "expanded_label": "addcom", "source": "expanded", "tokens": 6963, "reason": "max_length"}
|
| 86 |
+
{"label": "xmullid", "expanded_label": "xmulcom", "source": "expanded", "tokens": 8165, "reason": "max_length"}
|
| 87 |
+
{"label": "lsslmod", "expanded_label": "islss3", "source": "expanded", "tokens": 11153, "reason": "max_length"}
|
| 88 |
+
{"label": "mat0dimid", "expanded_label": "matring", "source": "expanded", "tokens": 16819, "reason": "max_length"}
|
| 89 |
+
{"label": "1lt4", "expanded_label": "1re", "source": "expanded", "tokens": 6930, "reason": "max_length"}
|
| 90 |
+
{"label": "2ndctop", "expanded_label": "tgcl", "source": "expanded", "tokens": 9706, "reason": "max_length"}
|
| 91 |
+
{"label": "ordthmeo", "expanded_label": "isocnv", "source": "expanded", "tokens": 6246, "reason": "max_length"}
|
| 92 |
+
{"label": "9re", "expanded_label": "1re", "source": "expanded", "tokens": 6826, "reason": "max_length"}
|
| 93 |
+
{"label": "addgegt0", "expanded_label": "00id", "source": "expanded", "tokens": 8276, "reason": "max_length"}
|
| 94 |
+
{"label": "mdetuni", "expanded_label": "mdetuni0", "source": "expanded", "tokens": 36640, "reason": "max_length"}
|
| 95 |
+
{"label": "wlkswwlksen", "expanded_label": "wlkswwlksf1o", "source": "expanded", "tokens": 8077, "reason": "max_length"}
|
| 96 |
+
{"label": "cphnlm", "expanded_label": "iscph", "source": "expanded", "tokens": 8530, "reason": "max_length"}
|
| 97 |
+
{"label": "algcvgb", "expanded_label": "algcvgblem", "source": "expanded", "tokens": 7042, "reason": "max_length"}
|
| 98 |
+
{"label": "neglcm", "expanded_label": "lcmneg", "source": "expanded", "tokens": 8580, "reason": "max_length"}
|
| 99 |
+
{"label": "metflem", "expanded_label": "ismet", "source": "expanded", "tokens": 6299, "reason": "max_length"}
|
| 100 |
+
{"label": "nmoleub2b", "expanded_label": "nmoleub2lem2", "source": "expanded", "tokens": 13552, "reason": "max_length"}
|
| 101 |
+
{"label": "nbedgusgr", "expanded_label": "hasheqf1oi", "source": "expanded", "tokens": 7832, "reason": "max_length"}
|
| 102 |
+
{"label": "drhmsubc", "expanded_label": "srhmsubc", "source": "expanded", "tokens": 12663, "reason": "max_length"}
|
| 103 |
+
{"label": "nnpw2blenfzo2", "expanded_label": "elfzolborelfzop1", "source": "expanded", "tokens": 7424, "reason": "max_length"}
|
| 104 |
+
{"label": "lssvancl2", "expanded_label": "lmodcom", "source": "expanded", "tokens": 11067, "reason": "max_length"}
|
| 105 |
+
{"label": "ordthmeo", "expanded_label": "ordthmeolem", "source": "expanded", "tokens": 26266, "reason": "max_length"}
|
| 106 |
+
{"label": "ismhmd", "expanded_label": "ismhm", "source": "expanded", "tokens": 7438, "reason": "max_length"}
|
| 107 |
+
{"label": "crctcshwlk", "expanded_label": "crctcshlem4", "source": "expanded", "tokens": 6817, "reason": "max_length"}
|
| 108 |
+
{"label": "0reALT", "expanded_label": "1re", "source": "expanded", "tokens": 8961, "reason": "max_length"}
|
| 109 |
+
{"label": "fsumshft", "expanded_label": "mptfzshft", "source": "expanded", "tokens": 7431, "reason": "max_length"}
|
| 110 |
+
{"label": "ltm1", "expanded_label": "1re", "source": "expanded", "tokens": 7209, "reason": "max_length"}
|
| 111 |
+
{"label": "subcn", "expanded_label": "addcnlem", "source": "expanded", "tokens": 10706, "reason": "max_length"}
|
| 112 |
+
{"label": "bastop", "expanded_label": "tgcl", "source": "expanded", "tokens": 9261, "reason": "max_length"}
|
| 113 |
+
{"label": "frgrncvvdeqlem10", "expanded_label": "frgrncvvdeqlem9", "source": "expanded", "tokens": 9599, "reason": "max_length"}
|
| 114 |
+
{"label": "upgrwlkupwlkb", "expanded_label": "upgrwlkupwlk", "source": "expanded", "tokens": 12711, "reason": "max_length"}
|
| 115 |
+
{"label": "gexdvds2", "expanded_label": "oddvds", "source": "expanded", "tokens": 8924, "reason": "max_length"}
|
| 116 |
+
{"label": "mat1ov", "expanded_label": "mat1", "source": "expanded", "tokens": 7234, "reason": "max_length"}
|
| 117 |
+
{"label": "pgjsgr", "expanded_label": "gpgusgra", "source": "expanded", "tokens": 8100, "reason": "max_length"}
|
| 118 |
+
{"label": "m2cpmghm", "expanded_label": "mat2pmatghm", "source": "expanded", "tokens": 14236, "reason": "max_length"}
|
| 119 |
+
{"label": "cnconst", "expanded_label": "cnconst2", "source": "expanded", "tokens": 7114, "reason": "max_length"}
|
| 120 |
+
{"label": "4re", "expanded_label": "1re", "source": "expanded", "tokens": 6779, "reason": "max_length"}
|
| 121 |
+
{"label": "hausflf", "expanded_label": "hausflimi", "source": "expanded", "tokens": 6580, "reason": "max_length"}
|
| 122 |
+
{"label": "gcdmodi", "expanded_label": "modgcd", "source": "expanded", "tokens": 7405, "reason": "max_length"}
|
| 123 |
+
{"label": "xmetutop", "expanded_label": "psmetutop", "source": "expanded", "tokens": 14660, "reason": "max_length"}
|
| 124 |
+
{"label": "issdrg2", "expanded_label": "issubdrg", "source": "expanded", "tokens": 12046, "reason": "max_length"}
|
| 125 |
+
{"label": "uspgrsprf1o", "expanded_label": "uspgrsprfo", "source": "expanded", "tokens": 9662, "reason": "max_length"}
|
| 126 |
+
{"label": "subrgsubrng", "expanded_label": "ringrng", "source": "expanded", "tokens": 6980, "reason": "max_length"}
|
| 127 |
+
{"label": "symgextf1o", "expanded_label": "symgextf1", "source": "expanded", "tokens": 8019, "reason": "max_length"}
|
| 128 |
+
{"label": "mulgp1", "expanded_label": "mulgdir", "source": "expanded", "tokens": 11090, "reason": "max_length"}
|
| 129 |
+
{"label": "nbusgrvtx", "expanded_label": "nbumgrvtx", "source": "expanded", "tokens": 6729, "reason": "max_length"}
|
| 130 |
+
{"label": "rlimcn1b", "expanded_label": "rlimcn1", "source": "expanded", "tokens": 7000, "reason": "max_length"}
|
| 131 |
+
{"label": "subid", "expanded_label": "addrid", "source": "expanded", "tokens": 10979, "reason": "max_length"}
|
| 132 |
+
{"label": "rngridlmcl", "expanded_label": "opprrng", "source": "expanded", "tokens": 7855, "reason": "max_length"}
|
| 133 |
+
{"label": "fusgr1th", "expanded_label": "finsumvtxdg2size", "source": "expanded", "tokens": 15434, "reason": "max_length"}
|
| 134 |
+
{"label": "cnmptc", "expanded_label": "cnconst2", "source": "expanded", "tokens": 6700, "reason": "max_length"}
|
| 135 |
+
{"label": "divalgmodcl", "expanded_label": "divalgmod", "source": "expanded", "tokens": 7672, "reason": "max_length"}
|
| 136 |
+
{"label": "cpmatsrgpmat", "expanded_label": "cpmatmcl", "source": "expanded", "tokens": 7671, "reason": "max_length"}
|
| 137 |
+
{"label": "6re", "expanded_label": "1re", "source": "expanded", "tokens": 6977, "reason": "max_length"}
|
| 138 |
+
{"label": "pn0sr", "expanded_label": "1idsr", "source": "expanded", "tokens": 6286, "reason": "max_length"}
|
| 139 |
+
{"label": "mat1bas", "expanded_label": "matring", "source": "expanded", "tokens": 15980, "reason": "max_length"}
|
| 140 |
+
{"label": "grlicer", "expanded_label": "grlictr", "source": "expanded", "tokens": 11391, "reason": "max_length"}
|
| 141 |
+
{"label": "wlk1ewlk", "expanded_label": "wlk1walk", "source": "expanded", "tokens": 20611, "reason": "max_length"}
|
| 142 |
+
{"label": "gexdvds2", "expanded_label": "gexdvds", "source": "expanded", "tokens": 17221, "reason": "max_length"}
|
| 143 |
+
{"label": "expcncf", "expanded_label": "expcn", "source": "expanded", "tokens": 6499, "reason": "max_length"}
|
| 144 |
+
{"label": "reexpcl", "expanded_label": "1re", "source": "expanded", "tokens": 6871, "reason": "max_length"}
|
| 145 |
+
{"label": "erclwwlk", "expanded_label": "erclwwlktr", "source": "expanded", "tokens": 10470, "reason": "max_length"}
|
| 146 |
+
{"label": "expp1z", "expanded_label": "expaddz", "source": "expanded", "tokens": 11654, "reason": "max_length"}
|
| 147 |
+
{"label": "1wlkd", "expanded_label": "1wlkdlem4", "source": "expanded", "tokens": 7723, "reason": "max_length"}
|
| 148 |
+
{"label": "symgfixf1o", "expanded_label": "symgfixf1", "source": "expanded", "tokens": 8441, "reason": "max_length"}
|
| 149 |
+
{"label": "htpycn", "expanded_label": "ishtpy", "source": "expanded", "tokens": 8322, "reason": "max_length"}
|
| 150 |
+
{"label": "ringccatALTV", "expanded_label": "ringccatidALTV", "source": "expanded", "tokens": 36133, "reason": "max_length"}
|
| 151 |
+
{"label": "dfnbgrss2", "expanded_label": "dfnbgr6", "source": "expanded", "tokens": 7749, "reason": "max_length"}
|
| 152 |
+
{"label": "locfintop", "expanded_label": "islocfin", "source": "expanded", "tokens": 7464, "reason": "max_length"}
|
| 153 |
+
{"label": "rngcidALTV", "expanded_label": "rngccatidALTV", "source": "expanded", "tokens": 31264, "reason": "max_length"}
|
| 154 |
+
{"label": "unitabl", "expanded_label": "unitgrp", "source": "expanded", "tokens": 19083, "reason": "max_length"}
|
| 155 |
+
{"label": "kqt0", "expanded_label": "kqt0lem", "source": "expanded", "tokens": 9261, "reason": "max_length"}
|
| 156 |
+
{"label": "recmet", "expanded_label": "recld2", "source": "expanded", "tokens": 8047, "reason": "max_length"}
|
| 157 |
+
{"label": "gagrp", "expanded_label": "isga", "source": "expanded", "tokens": 9295, "reason": "max_length"}
|
| 158 |
+
{"label": "rngccatALTV", "expanded_label": "rngccatidALTV", "source": "expanded", "tokens": 35837, "reason": "max_length"}
|
| 159 |
+
{"label": "mpl1", "expanded_label": "mplsubrg", "source": "expanded", "tokens": 7919, "reason": "max_length"}
|
| 160 |
+
{"label": "vtxduhgr0edgnel", "expanded_label": "vtxd0nedgb", "source": "expanded", "tokens": 7459, "reason": "max_length"}
|
| 161 |
+
{"label": "pmtrfmvdn0", "expanded_label": "pmtrfrn", "source": "expanded", "tokens": 8866, "reason": "max_length"}
|
| 162 |
+
{"label": "psrbagev2", "expanded_label": "psrbagev1", "source": "expanded", "tokens": 7226, "reason": "max_length"}
|
| 163 |
+
{"label": "rnghmghm", "expanded_label": "isrnghm", "source": "expanded", "tokens": 8318, "reason": "max_length"}
|
| 164 |
+
{"label": "erclwwlk", "expanded_label": "erclwwlksym", "source": "expanded", "tokens": 6259, "reason": "max_length"}
|
| 165 |
+
{"label": "numclwwlk2lem3", "expanded_label": "numclwlk2lem2f1o", "source": "expanded", "tokens": 16930, "reason": "max_length"}
|
| 166 |
+
{"label": "wwlksnonfi", "expanded_label": "iswwlksnon", "source": "expanded", "tokens": 7618, "reason": "max_length"}
|
| 167 |
+
{"label": "erclwwlkn", "expanded_label": "erclwwlkntr", "source": "expanded", "tokens": 13187, "reason": "max_length"}
|
| 168 |
+
{"label": "gcdn0cl", "expanded_label": "gcdcllem3", "source": "expanded", "tokens": 15171, "reason": "max_length"}
|
| 169 |
+
{"label": "symggen2", "expanded_label": "symggen", "source": "expanded", "tokens": 40700, "reason": "max_length"}
|
| 170 |
+
{"label": "mdet0f1o", "expanded_label": "mdet0pr", "source": "expanded", "tokens": 9891, "reason": "max_length"}
|
| 171 |
+
{"label": "ringgrp", "expanded_label": "isring", "source": "expanded", "tokens": 8062, "reason": "max_length"}
|
| 172 |
+
{"label": "mulginvinv", "expanded_label": "mulginvcom", "source": "expanded", "tokens": 10250, "reason": "max_length"}
|
| 173 |
+
{"label": "2zrng0", "expanded_label": "cncrng", "source": "expanded", "tokens": 8315, "reason": "max_length"}
|
| 174 |
+
{"label": "wlkv", "expanded_label": "wksfval", "source": "expanded", "tokens": 7814, "reason": "max_length"}
|
| 175 |
+
{"label": "isncvsngpd", "expanded_label": "isncvsngp", "source": "expanded", "tokens": 7861, "reason": "max_length"}
|
| 176 |
+
{"label": "rlimcn2", "expanded_label": "rlimcn3", "source": "expanded", "tokens": 11998, "reason": "max_length"}
|
| 177 |
+
{"label": "ghmco", "expanded_label": "mhmco", "source": "expanded", "tokens": 8559, "reason": "max_length"}
|
| 178 |
+
{"label": "qdensere2", "expanded_label": "tgioo", "source": "expanded", "tokens": 20132, "reason": "max_length"}
|
| 179 |
+
{"label": "vdgn1frgrv3", "expanded_label": "vdgn1frgrv2", "source": "expanded", "tokens": 8714, "reason": "max_length"}
|
| 180 |
+
{"label": "0le1", "expanded_label": "1re", "source": "expanded", "tokens": 6633, "reason": "max_length"}
|
| 181 |
+
{"label": "wrdlen2", "expanded_label": "wrdlen2i", "source": "expanded", "tokens": 7494, "reason": "max_length"}
|
| 182 |
+
{"label": "wlkiswwlkupgr", "expanded_label": "wlkiswwlks1", "source": "expanded", "tokens": 9327, "reason": "max_length"}
|
| 183 |
+
{"label": "nzrpropd", "expanded_label": "ringpropd", "source": "expanded", "tokens": 19645, "reason": "max_length"}
|
| 184 |
+
{"label": "unitlinv", "expanded_label": "unitgrp", "source": "expanded", "tokens": 13837, "reason": "max_length"}
|
| 185 |
+
{"label": "fmfil", "expanded_label": "fbasrn", "source": "expanded", "tokens": 14221, "reason": "max_length"}
|
| 186 |
+
{"label": "odhash2", "expanded_label": "odf1o2", "source": "expanded", "tokens": 12218, "reason": "max_length"}
|
| 187 |
+
{"label": "xmulmnf2", "expanded_label": "xmulcom", "source": "expanded", "tokens": 8195, "reason": "max_length"}
|
| 188 |
+
{"label": "2arymaptf1o", "expanded_label": "2arymaptfo", "source": "expanded", "tokens": 8887, "reason": "max_length"}
|
| 189 |
+
{"label": "ply1plusgpropd", "expanded_label": "psrplusgpropd", "source": "expanded", "tokens": 9265, "reason": "max_length"}
|
| 190 |
+
{"label": "1le2", "expanded_label": "1re", "source": "expanded", "tokens": 6748, "reason": "max_length"}
|
| 191 |
+
{"label": "uspgrsprf1o", "expanded_label": "uspgrsprf1", "source": "expanded", "tokens": 8507, "reason": "max_length"}
|
| 192 |
+
{"label": "numclwwlk1lem2f1o", "expanded_label": "numclwwlk1lem2fo", "source": "expanded", "tokens": 15086, "reason": "max_length"}
|
| 193 |
+
{"label": "unben", "expanded_label": "unbenlem", "source": "expanded", "tokens": 13691, "reason": "max_length"}
|
| 194 |
+
{"label": "frlmelbas", "expanded_label": "frlmbas", "source": "expanded", "tokens": 9104, "reason": "max_length"}
|
| 195 |
+
{"label": "1arith2", "expanded_label": "1arith", "source": "expanded", "tokens": 27416, "reason": "max_length"}
|
| 196 |
+
{"label": "1arymaptf1o", "expanded_label": "1arymaptf1", "source": "expanded", "tokens": 7210, "reason": "max_length"}
|
| 197 |
+
{"label": "sgrpssmgm", "expanded_label": "mgmnsgrpex", "source": "expanded", "tokens": 7533, "reason": "max_length"}
|
| 198 |
+
{"label": "pj1eq", "expanded_label": "pj1id", "source": "expanded", "tokens": 10282, "reason": "max_length"}
|
| 199 |
+
{"label": "isghmd", "expanded_label": "isghm", "source": "expanded", "tokens": 7692, "reason": "max_length"}
|
| 200 |
+
{"label": "ghmima", "expanded_label": "resghm", "source": "expanded", "tokens": 7045, "reason": "max_length"}
|
| 201 |
+
{"label": "znle", "expanded_label": "znval", "source": "expanded", "tokens": 8606, "reason": "max_length"}
|
| 202 |
+
{"label": "clnbfiusgrfi", "expanded_label": "fusgrfis", "source": "expanded", "tokens": 7030, "reason": "max_length"}
|
| 203 |
+
{"label": "assalmod", "expanded_label": "isassa", "source": "expanded", "tokens": 6614, "reason": "max_length"}
|
| 204 |
+
{"label": "xrs1cmn", "expanded_label": "xaddcom", "source": "expanded", "tokens": 6572, "reason": "max_length"}
|
| 205 |
+
{"label": "pcelnn", "expanded_label": "pcdvdsb", "source": "expanded", "tokens": 10977, "reason": "max_length"}
|
| 206 |
+
{"label": "indistpsALT", "expanded_label": "indistopon", "source": "expanded", "tokens": 7112, "reason": "max_length"}
|
| 207 |
+
{"label": "infssuzle", "expanded_label": "uzwo", "source": "expanded", "tokens": 9520, "reason": "max_length"}
|
| 208 |
+
{"label": "cpmatsrgpmat", "expanded_label": "1elcpmat", "source": "expanded", "tokens": 8164, "reason": "max_length"}
|
| 209 |
+
{"label": "scmatsrng", "expanded_label": "scmatmulcl", "source": "expanded", "tokens": 9402, "reason": "max_length"}
|
| 210 |
+
{"label": "oppgmndb", "expanded_label": "oppgmnd", "source": "expanded", "tokens": 7363, "reason": "max_length"}
|
| 211 |
+
{"label": "2zrngaabl", "expanded_label": "2zrngagrp", "source": "expanded", "tokens": 6148, "reason": "max_length"}
|
| 212 |
+
{"label": "lmodvaddsub4", "expanded_label": "abladdsub4", "source": "expanded", "tokens": 6739, "reason": "max_length"}
|
| 213 |
+
{"label": "m1modnnsub1", "expanded_label": "1re", "source": "expanded", "tokens": 7710, "reason": "max_length"}
|
| 214 |
+
{"label": "haushmph", "expanded_label": "cnhaus", "source": "expanded", "tokens": 13415, "reason": "max_length"}
|
| 215 |
+
{"label": "cncms", "expanded_label": "cncmet", "source": "expanded", "tokens": 9751, "reason": "max_length"}
|
| 216 |
+
{"label": "fprodxp", "expanded_label": "fprod2d", "source": "expanded", "tokens": 7010, "reason": "max_length"}
|
| 217 |
+
{"label": "fsummulc1", "expanded_label": "fsummulc2", "source": "expanded", "tokens": 10995, "reason": "max_length"}
|
| 218 |
+
{"label": "istop2g", "expanded_label": "fiint", "source": "expanded", "tokens": 17441, "reason": "max_length"}
|
| 219 |
+
{"label": "opsrlmod", "expanded_label": "psrlmod", "source": "expanded", "tokens": 21192, "reason": "max_length"}
|
| 220 |
+
{"label": "psradd", "expanded_label": "psrplusg", "source": "expanded", "tokens": 7929, "reason": "max_length"}
|
| 221 |
+
{"label": "phtpyhtpy", "expanded_label": "isphtpy", "source": "expanded", "tokens": 6188, "reason": "max_length"}
|
| 222 |
+
{"label": "refld", "expanded_label": "cncrng", "source": "expanded", "tokens": 8397, "reason": "max_length"}
|
| 223 |
+
{"label": "kgenuni", "expanded_label": "kgentopon", "source": "expanded", "tokens": 9648, "reason": "max_length"}
|
| 224 |
+
{"label": "clsdif", "expanded_label": "clsval2", "source": "expanded", "tokens": 9494, "reason": "max_length"}
|
| 225 |
+
{"label": "pm2mpgrpiso", "expanded_label": "pm2mpghm", "source": "expanded", "tokens": 23647, "reason": "max_length"}
|
| 226 |
+
{"label": "psrasclcl", "expanded_label": "psrlmod", "source": "expanded", "tokens": 18726, "reason": "max_length"}
|
| 227 |
+
{"label": "hashge2el2difb", "expanded_label": "hashge2el2dif", "source": "expanded", "tokens": 8135, "reason": "max_length"}
|
| 228 |
+
{"label": "relin01", "expanded_label": "1re", "source": "expanded", "tokens": 7726, "reason": "max_length"}
|
| 229 |
+
{"label": "nqerid", "expanded_label": "nqerf", "source": "expanded", "tokens": 7228, "reason": "max_length"}
|
| 230 |
+
{"label": "dsmmlmod", "expanded_label": "dsmmlss", "source": "expanded", "tokens": 11428, "reason": "max_length"}
|
| 231 |
+
{"label": "rhmisrnghm", "expanded_label": "ringrng", "source": "expanded", "tokens": 6625, "reason": "max_length"}
|
| 232 |
+
{"label": "orbstaval", "expanded_label": "gastacl", "source": "expanded", "tokens": 9593, "reason": "max_length"}
|
| 233 |
+
{"label": "mat0dim0", "expanded_label": "matring", "source": "expanded", "tokens": 16862, "reason": "max_length"}
|
| 234 |
+
{"label": "4pos", "expanded_label": "1re", "source": "expanded", "tokens": 6841, "reason": "max_length"}
|
| 235 |
+
{"label": "uspgredgleord", "expanded_label": "uspgredg2v", "source": "expanded", "tokens": 6377, "reason": "max_length"}
|
| 236 |
+
{"label": "restt1", "expanded_label": "cnt1", "source": "expanded", "tokens": 6228, "reason": "max_length"}
|
| 237 |
+
{"label": "metres2", "expanded_label": "xmetres2", "source": "expanded", "tokens": 6309, "reason": "max_length"}
|
| 238 |
+
{"label": "2pos", "expanded_label": "1re", "source": "expanded", "tokens": 8970, "reason": "max_length"}
|
| 239 |
+
{"label": "uzfbas", "expanded_label": "uzrest", "source": "expanded", "tokens": 8616, "reason": "max_length"}
|
| 240 |
+
{"label": "grpid", "expanded_label": "grprcan", "source": "expanded", "tokens": 10928, "reason": "max_length"}
|
| 241 |
+
{"label": "wwlksnextbij0", "expanded_label": "wwlksnextinj", "source": "expanded", "tokens": 17544, "reason": "max_length"}
|
| 242 |
+
{"label": "0le2", "expanded_label": "1re", "source": "expanded", "tokens": 9037, "reason": "max_length"}
|
| 243 |
+
{"label": "pmatring", "expanded_label": "matring", "source": "expanded", "tokens": 15581, "reason": "max_length"}
|
| 244 |
+
{"label": "elicc01", "expanded_label": "1re", "source": "expanded", "tokens": 6735, "reason": "max_length"}
|
| 245 |
+
{"label": "gpgprismgr4cycl0", "expanded_label": "gpgprismgr4cycllem11", "source": "expanded", "tokens": 6170, "reason": "max_length"}
|
| 246 |
+
{"label": "wlkiswwlks", "expanded_label": "wlkiswwlks1", "source": "expanded", "tokens": 9332, "reason": "max_length"}
|
| 247 |
+
{"label": "orbstaval", "expanded_label": "eqger", "source": "expanded", "tokens": 16573, "reason": "max_length"}
|
| 248 |
+
{"label": "nlmngp", "expanded_label": "isnlm", "source": "expanded", "tokens": 6526, "reason": "max_length"}
|
| 249 |
+
{"label": "cncdrg", "expanded_label": "cnsubrg", "source": "expanded", "tokens": 11658, "reason": "max_length"}
|
| 250 |
+
{"label": "txbasex", "expanded_label": "txuni2", "source": "expanded", "tokens": 6234, "reason": "max_length"}
|
| 251 |
+
{"label": "neggcd", "expanded_label": "gcdneg", "source": "expanded", "tokens": 6270, "reason": "max_length"}
|
| 252 |
+
{"label": "0cnALT2", "expanded_label": "cnegex", "source": "expanded", "tokens": 8302, "reason": "max_length"}
|
| 253 |
+
{"label": "xrrest", "expanded_label": "xrtgioo", "source": "expanded", "tokens": 12124, "reason": "max_length"}
|
| 254 |
+
{"label": "evls1scafv", "expanded_label": "evls1sca", "source": "expanded", "tokens": 9522, "reason": "max_length"}
|
| 255 |
+
{"label": "assaring", "expanded_label": "isassa", "source": "expanded", "tokens": 6337, "reason": "max_length"}
|
| 256 |
+
{"label": "ramtub", "expanded_label": "ramcl2lem", "source": "expanded", "tokens": 6234, "reason": "max_length"}
|
| 257 |
+
{"label": "gaset", "expanded_label": "isga", "source": "expanded", "tokens": 8960, "reason": "max_length"}
|
| 258 |
+
{"label": "ringmgp", "expanded_label": "isring", "source": "expanded", "tokens": 8047, "reason": "max_length"}
|
| 259 |
+
{"label": "fusgrvtxdgonume", "expanded_label": "vtxdgoddnumeven", "source": "expanded", "tokens": 8717, "reason": "max_length"}
|
| 260 |
+
{"label": "1lt6", "expanded_label": "1re", "source": "expanded", "tokens": 6929, "reason": "max_length"}
|
| 261 |
+
{"label": "rngqiprngho", "expanded_label": "rngqiprnglin", "source": "expanded", "tokens": 9300, "reason": "max_length"}
|
| 262 |
+
{"label": "addnqf", "expanded_label": "nqerf", "source": "expanded", "tokens": 7503, "reason": "max_length"}
|
| 263 |
+
{"label": "1lt10", "expanded_label": "1re", "source": "expanded", "tokens": 6987, "reason": "max_length"}
|
| 264 |
+
{"label": "qtopconn", "expanded_label": "cnconn", "source": "expanded", "tokens": 9265, "reason": "max_length"}
|
| 265 |
+
{"label": "uvtx2vtx1edgb", "expanded_label": "nbuhgr2vtx1edgb", "source": "expanded", "tokens": 9405, "reason": "max_length"}
|
| 266 |
+
{"label": "rngqiprng", "expanded_label": "ringrng", "source": "expanded", "tokens": 6666, "reason": "max_length"}
|
| 267 |
+
{"label": "psgnfitr", "expanded_label": "symggrp", "source": "expanded", "tokens": 6239, "reason": "max_length"}
|
| 268 |
+
{"label": "resttop", "expanded_label": "tgrest", "source": "expanded", "tokens": 9629, "reason": "max_length"}
|
| 269 |
+
{"label": "frgpgrp", "expanded_label": "frgp0", "source": "expanded", "tokens": 17274, "reason": "max_length"}
|
| 270 |
+
{"label": "sqrt2irr0", "expanded_label": "sqrt2irr", "source": "expanded", "tokens": 11416, "reason": "max_length"}
|
| 271 |
+
{"label": "frgr2wsp1", "expanded_label": "wpthswwlks2on", "source": "expanded", "tokens": 10945, "reason": "max_length"}
|
| 272 |
+
{"label": "metelcls", "expanded_label": "met1stc", "source": "expanded", "tokens": 13311, "reason": "max_length"}
|
| 273 |
+
{"label": "crhmsubcALTV", "expanded_label": "srhmsubcALTV", "source": "expanded", "tokens": 12910, "reason": "max_length"}
|
| 274 |
+
{"label": "sgrp2nmnd", "expanded_label": "sgrp2nmndlem4", "source": "expanded", "tokens": 17737, "reason": "max_length"}
|
| 275 |
+
{"label": "phlsrng", "expanded_label": "isphl", "source": "expanded", "tokens": 10453, "reason": "max_length"}
|
| 276 |
+
{"label": "subcn", "expanded_label": "subcn2", "source": "expanded", "tokens": 7032, "reason": "max_length"}
|
| 277 |
+
{"label": "utoptopon", "expanded_label": "utoptop", "source": "expanded", "tokens": 10837, "reason": "max_length"}
|
| 278 |
+
{"label": "metcnp4", "expanded_label": "met1stc", "source": "expanded", "tokens": 13317, "reason": "max_length"}
|
| 279 |
+
{"label": "rhmpsrlem1", "expanded_label": "psrbaglefi", "source": "expanded", "tokens": 8707, "reason": "max_length"}
|
| 280 |
+
{"label": "fcfelbas", "expanded_label": "fcfval", "source": "expanded", "tokens": 6815, "reason": "max_length"}
|
| 281 |
+
{"label": "opprneg", "expanded_label": "grpinvfval", "source": "expanded", "tokens": 7126, "reason": "max_length"}
|
| 282 |
+
{"label": "isgrpde", "expanded_label": "ismndd", "source": "expanded", "tokens": 6733, "reason": "max_length"}
|
| 283 |
+
{"label": "1lt2", "expanded_label": "1re", "source": "expanded", "tokens": 6790, "reason": "max_length"}
|
| 284 |
+
{"label": "hausnlly", "expanded_label": "restnlly", "source": "expanded", "tokens": 7658, "reason": "max_length"}
|
| 285 |
+
{"label": "dvdsunit", "expanded_label": "dvdsrtr", "source": "expanded", "tokens": 6678, "reason": "max_length"}
|
| 286 |
+
{"label": "nbfiusgrfi", "expanded_label": "fusgrfis", "source": "expanded", "tokens": 7133, "reason": "max_length"}
|
| 287 |
+
{"label": "opsrring", "expanded_label": "psrring", "source": "expanded", "tokens": 8414, "reason": "max_length"}
|
| 288 |
+
{"label": "dprd0", "expanded_label": "dprdz", "source": "expanded", "tokens": 14331, "reason": "max_length"}
|
| 289 |
+
{"label": "recmet", "expanded_label": "cncmet", "source": "expanded", "tokens": 9197, "reason": "max_length"}
|
| 290 |
+
{"label": "dvdseq", "expanded_label": "dvdsabseq", "source": "expanded", "tokens": 6996, "reason": "max_length"}
|
| 291 |
+
{"label": "telgsumfz0s", "expanded_label": "telgsumfzs", "source": "expanded", "tokens": 12061, "reason": "max_length"}
|
| 292 |
+
{"label": "evls1fvcl", "expanded_label": "ressply1evl", "source": "expanded", "tokens": 7459, "reason": "max_length"}
|
| 293 |
+
{"label": "t0hmph", "expanded_label": "cnt0", "source": "expanded", "tokens": 8061, "reason": "max_length"}
|
| 294 |
+
{"label": "1le3", "expanded_label": "1re", "source": "expanded", "tokens": 6650, "reason": "max_length"}
|
| 295 |
+
{"label": "zring0", "expanded_label": "cncrng", "source": "expanded", "tokens": 7883, "reason": "max_length"}
|
| 296 |
+
{"label": "ackval42a", "expanded_label": "ackval42", "source": "expanded", "tokens": 6435, "reason": "max_length"}
|
| 297 |
+
{"label": "mplsca", "expanded_label": "psrsca", "source": "expanded", "tokens": 6278, "reason": "max_length"}
|
| 298 |
+
{"label": "subid1", "expanded_label": "addrid", "source": "expanded", "tokens": 10958, "reason": "max_length"}
|
| 299 |
+
{"label": "idmatidpmat", "expanded_label": "mat2pmat1", "source": "expanded", "tokens": 7186, "reason": "max_length"}
|
| 300 |
+
{"label": "vtxduhgrun", "expanded_label": "vtxdun", "source": "expanded", "tokens": 13019, "reason": "max_length"}
|
| 301 |
+
{"label": "nnwo", "expanded_label": "uzwo", "source": "expanded", "tokens": 8873, "reason": "max_length"}
|
| 302 |
+
{"label": "peano2rem", "expanded_label": "1re", "source": "expanded", "tokens": 6713, "reason": "max_length"}
|
| 303 |
+
{"label": "usgrlimprop", "expanded_label": "uspgrlim", "source": "expanded", "tokens": 16778, "reason": "max_length"}
|
| 304 |
+
{"label": "vdgfrgrgt2", "expanded_label": "vdgn1frgrv2", "source": "expanded", "tokens": 9428, "reason": "max_length"}
|
| 305 |
+
{"label": "evls1varsrng", "expanded_label": "evls1var", "source": "expanded", "tokens": 8899, "reason": "max_length"}
|
| 306 |
+
{"label": "indf", "expanded_label": "1re", "source": "expanded", "tokens": 7405, "reason": "max_length"}
|
| 307 |
+
{"label": "usgrn2cycl", "expanded_label": "uspgrn2crct", "source": "expanded", "tokens": 12349, "reason": "max_length"}
|
| 308 |
+
{"label": "addgtge0", "expanded_label": "00id", "source": "expanded", "tokens": 8117, "reason": "max_length"}
|
| 309 |
+
{"label": "zringcrng", "expanded_label": "cncrng", "source": "expanded", "tokens": 7480, "reason": "max_length"}
|
| 310 |
+
{"label": "restt0", "expanded_label": "cnt0", "source": "expanded", "tokens": 10949, "reason": "max_length"}
|
| 311 |
+
{"label": "ring1ne0", "expanded_label": "hashgt12el", "source": "expanded", "tokens": 6637, "reason": "max_length"}
|
| 312 |
+
{"label": "cycsubggenodd", "expanded_label": "dfod2", "source": "expanded", "tokens": 22577, "reason": "max_length"}
|
| 313 |
+
{"label": "0mat2pmat", "expanded_label": "mat2pmatghm", "source": "expanded", "tokens": 15693, "reason": "max_length"}
|
| 314 |
+
{"label": "omndmnd", "expanded_label": "isomnd", "source": "expanded", "tokens": 8458, "reason": "max_length"}
|
| 315 |
+
{"label": "nneop", "expanded_label": "nneo", "source": "expanded", "tokens": 6454, "reason": "max_length"}
|
| 316 |
+
{"label": "eqg0subgecsn", "expanded_label": "eqg0subg", "source": "expanded", "tokens": 6571, "reason": "max_length"}
|
| 317 |
+
{"label": "xltmul2", "expanded_label": "xmulcom", "source": "expanded", "tokens": 14328, "reason": "max_length"}
|
| 318 |
+
{"label": "subrgnrg", "expanded_label": "subgngp", "source": "expanded", "tokens": 7882, "reason": "max_length"}
|
| 319 |
+
{"label": "pi1buni", "expanded_label": "pi1blem", "source": "expanded", "tokens": 7231, "reason": "max_length"}
|
| 320 |
+
{"label": "isoddgcd1", "expanded_label": "coprm", "source": "expanded", "tokens": 7107, "reason": "max_length"}
|
| 321 |
+
{"label": "1lt5", "expanded_label": "1re", "source": "expanded", "tokens": 6962, "reason": "max_length"}
|
| 322 |
+
{"label": "rellycmp", "expanded_label": "cnllycmp", "source": "expanded", "tokens": 22702, "reason": "max_length"}
|
| 323 |
+
{"label": "pm2mprhm", "expanded_label": "matring", "source": "expanded", "tokens": 16674, "reason": "max_length"}
|
| 324 |
+
{"label": "icccld", "expanded_label": "difreicc", "source": "expanded", "tokens": 10085, "reason": "max_length"}
|
| 325 |
+
{"label": "m2cpmrhm", "expanded_label": "matring", "source": "expanded", "tokens": 17131, "reason": "max_length"}
|
| 326 |
+
{"label": "coe1tmfv2", "expanded_label": "coe1tm", "source": "expanded", "tokens": 12134, "reason": "max_length"}
|
| 327 |
+
{"label": "tsmscl", "expanded_label": "eltsms", "source": "expanded", "tokens": 8460, "reason": "max_length"}
|
| 328 |
+
{"label": "eluzadd", "expanded_label": "zaddcl", "source": "expanded", "tokens": 6242, "reason": "max_length"}
|
| 329 |
+
{"label": "sst0", "expanded_label": "cnt0", "source": "expanded", "tokens": 8434, "reason": "max_length"}
|
| 330 |
+
{"label": "clwlkclwwlkf1o", "expanded_label": "clwlkclwwlkf1", "source": "expanded", "tokens": 11644, "reason": "max_length"}
|
| 331 |
+
{"label": "indistop", "expanded_label": "indistopon", "source": "expanded", "tokens": 7204, "reason": "max_length"}
|
| 332 |
+
{"label": "cphnmfval", "expanded_label": "iscph", "source": "expanded", "tokens": 8389, "reason": "max_length"}
|
| 333 |
+
{"label": "fclssscls", "expanded_label": "isfcls", "source": "expanded", "tokens": 7916, "reason": "max_length"}
|
| 334 |
+
{"label": "2zrng", "expanded_label": "2zlidl", "source": "expanded", "tokens": 9569, "reason": "max_length"}
|
| 335 |
+
{"label": "sringcat", "expanded_label": "srhmsubc", "source": "expanded", "tokens": 12711, "reason": "max_length"}
|
| 336 |
+
{"label": "xkotopon", "expanded_label": "xkouni", "source": "expanded", "tokens": 7374, "reason": "max_length"}
|
| 337 |
+
{"label": "abs2dif2", "expanded_label": "abstri", "source": "expanded", "tokens": 7522, "reason": "max_length"}
|
| 338 |
+
{"label": "nrmreg", "expanded_label": "nrmr0reg", "source": "expanded", "tokens": 6220, "reason": "max_length"}
|
| 339 |
+
{"label": "evlsrhm", "expanded_label": "evlsval2", "source": "expanded", "tokens": 8301, "reason": "max_length"}
|
| 340 |
+
{"label": "clwwlkf1o", "expanded_label": "clwwlkf1", "source": "expanded", "tokens": 16313, "reason": "max_length"}
|
| 341 |
+
{"label": "mat2pmatrhm", "expanded_label": "matring", "source": "expanded", "tokens": 26359, "reason": "max_length"}
|
| 342 |
+
{"label": "sqrt1", "expanded_label": "1re", "source": "expanded", "tokens": 7104, "reason": "max_length"}
|
| 343 |
+
{"label": "sumhash", "expanded_label": "ssfi", "source": "expanded", "tokens": 12417, "reason": "max_length"}
|
| 344 |
+
{"label": "pmatcollpw3", "expanded_label": "pmatcollpw", "source": "expanded", "tokens": 12192, "reason": "max_length"}
|
| 345 |
+
{"label": "pcprecl", "expanded_label": "pclem", "source": "expanded", "tokens": 9618, "reason": "max_length"}
|
| 346 |
+
{"label": "1ne2", "expanded_label": "1re", "source": "expanded", "tokens": 6611, "reason": "max_length"}
|
| 347 |
+
{"label": "wlknwwlksnen", "expanded_label": "wlknwwlksnbij", "source": "expanded", "tokens": 7322, "reason": "max_length"}
|
| 348 |
+
{"label": "pm2mpf", "expanded_label": "pm2mpcl", "source": "expanded", "tokens": 7149, "reason": "max_length"}
|
| 349 |
+
{"label": "m2cpmf1", "expanded_label": "mat2pmatf1", "source": "expanded", "tokens": 7912, "reason": "max_length"}
|
| 350 |
+
{"label": "lincsumscmcl", "expanded_label": "lincscmcl", "source": "expanded", "tokens": 12114, "reason": "max_length"}
|
| 351 |
+
{"label": "obsrcl", "expanded_label": "isobs", "source": "expanded", "tokens": 7261, "reason": "max_length"}
|
| 352 |
+
{"label": "evls1pw", "expanded_label": "evls1rhm", "source": "expanded", "tokens": 6659, "reason": "max_length"}
|
| 353 |
+
{"label": "lcmfunsn", "expanded_label": "lcmfunsnlem", "source": "expanded", "tokens": 10953, "reason": "max_length"}
|
| 354 |
+
{"label": "zrzeroorngc", "expanded_label": "zrinitorngc", "source": "expanded", "tokens": 9895, "reason": "max_length"}
|
| 355 |
+
{"label": "rrxmetfi", "expanded_label": "rrxmet", "source": "expanded", "tokens": 47140, "reason": "max_length"}
|
| 356 |
+
{"label": "iicmp", "expanded_label": "1re", "source": "expanded", "tokens": 7086, "reason": "max_length"}
|
| 357 |
+
{"label": "rngabl", "expanded_label": "isrng", "source": "expanded", "tokens": 7974, "reason": "max_length"}
|
| 358 |
+
{"label": "2re", "expanded_label": "1re", "source": "expanded", "tokens": 8834, "reason": "max_length"}
|
| 359 |
+
{"label": "fprodshft", "expanded_label": "mptfzshft", "source": "expanded", "tokens": 7323, "reason": "max_length"}
|
| 360 |
+
{"label": "lmhmlem", "expanded_label": "islmhm", "source": "expanded", "tokens": 7650, "reason": "max_length"}
|
| 361 |
+
{"label": "cncongr", "expanded_label": "cncongr2", "source": "expanded", "tokens": 14989, "reason": "max_length"}
|
| 362 |
+
{"label": "nnssre", "expanded_label": "1re", "source": "expanded", "tokens": 6912, "reason": "max_length"}
|
| 363 |
+
{"label": "1elunit", "expanded_label": "1re", "source": "expanded", "tokens": 6868, "reason": "max_length"}
|
| 364 |
+
{"label": "cnfldhaus", "expanded_label": "methaus", "source": "expanded", "tokens": 10873, "reason": "max_length"}
|
| 365 |
+
{"label": "cnpf", "expanded_label": "iscnp2", "source": "expanded", "tokens": 9405, "reason": "max_length"}
|
| 366 |
+
{"label": "sum2id", "expanded_label": "sumeq2ii", "source": "expanded", "tokens": 10306, "reason": "max_length"}
|
| 367 |
+
{"label": "dsmmlmod", "expanded_label": "prdslmodd", "source": "expanded", "tokens": 27808, "reason": "max_length"}
|
| 368 |
+
{"label": "unitinvcl", "expanded_label": "unitgrp", "source": "expanded", "tokens": 12866, "reason": "max_length"}
|
| 369 |
+
{"label": "bcpascm1", "expanded_label": "bcpasc", "source": "expanded", "tokens": 24545, "reason": "max_length"}
|
| 370 |
+
{"label": "2arymaptf1o", "expanded_label": "2arymaptf1", "source": "expanded", "tokens": 9368, "reason": "max_length"}
|
| 371 |
+
{"label": "xmetf", "expanded_label": "isxmet", "source": "expanded", "tokens": 6591, "reason": "max_length"}
|
| 372 |
+
{"label": "coe1tmfv1", "expanded_label": "coe1tm", "source": "expanded", "tokens": 11689, "reason": "max_length"}
|
| 373 |
+
{"label": "rlimdmo1", "expanded_label": "rlimo1", "source": "expanded", "tokens": 7587, "reason": "max_length"}
|
| 374 |
+
{"label": "nlmlmod", "expanded_label": "isnlm", "source": "expanded", "tokens": 6171, "reason": "max_length"}
|
| 375 |
+
{"label": "7re", "expanded_label": "1re", "source": "expanded", "tokens": 6750, "reason": "max_length"}
|
| 376 |
+
{"label": "omndtos", "expanded_label": "isomnd", "source": "expanded", "tokens": 8315, "reason": "max_length"}
|
| 377 |
+
{"label": "scmatf1o", "expanded_label": "scmatf1", "source": "expanded", "tokens": 10246, "reason": "max_length"}
|
| 378 |
+
{"label": "resthaus", "expanded_label": "cnhaus", "source": "expanded", "tokens": 17432, "reason": "max_length"}
|
| 379 |
+
{"label": "grplactf1o", "expanded_label": "grplactcnv", "source": "expanded", "tokens": 6922, "reason": "max_length"}
|
| 380 |
+
{"label": "bitsinv", "expanded_label": "bitsf1ocnv", "source": "expanded", "tokens": 6737, "reason": "max_length"}
|
| 381 |
+
{"label": "cnmptkc", "expanded_label": "xkoccn", "source": "expanded", "tokens": 14863, "reason": "max_length"}
|
| 382 |
+
{"label": "sringcatALTV", "expanded_label": "srhmsubcALTV", "source": "expanded", "tokens": 12861, "reason": "max_length"}
|
| 383 |
+
{"label": "lcmgcdnn", "expanded_label": "lcmgcd", "source": "expanded", "tokens": 7128, "reason": "max_length"}
|
| 384 |
+
{"label": "cncongr", "expanded_label": "cncongr1", "source": "expanded", "tokens": 25677, "reason": "max_length"}
|
| 385 |
+
{"label": "re0g", "expanded_label": "cncrng", "source": "expanded", "tokens": 7900, "reason": "max_length"}
|
| 386 |
+
{"label": "lmodring", "expanded_label": "islmod", "source": "expanded", "tokens": 22570, "reason": "max_length"}
|
| 387 |
+
{"label": "phimul", "expanded_label": "phimullem", "source": "expanded", "tokens": 59881, "reason": "max_length"}
|
| 388 |
+
{"label": "cycsubgcld", "expanded_label": "cycsubgcl", "source": "expanded", "tokens": 9718, "reason": "max_length"}
|
| 389 |
+
{"label": "zndvds0", "expanded_label": "zndvds", "source": "expanded", "tokens": 8442, "reason": "max_length"}
|
| 390 |
+
{"label": "ghmabl", "expanded_label": "ghmgrp", "source": "expanded", "tokens": 6213, "reason": "max_length"}
|
| 391 |
+
{"label": "pm2mpghmlem1", "expanded_label": "matring", "source": "expanded", "tokens": 16380, "reason": "max_length"}
|
| 392 |
+
{"label": "frgrncvvdeqlem10", "expanded_label": "frgrncvvdeqlem8", "source": "expanded", "tokens": 9143, "reason": "max_length"}
|
| 393 |
+
{"label": "pi1xfrgim", "expanded_label": "pi1xfr", "source": "expanded", "tokens": 56581, "reason": "max_length"}
|
| 394 |
+
{"label": "grlicer", "expanded_label": "grlicsym", "source": "expanded", "tokens": 9352, "reason": "max_length"}
|
| 395 |
+
{"label": "o1const", "expanded_label": "rlimo1", "source": "expanded", "tokens": 8521, "reason": "max_length"}
|
| 396 |
+
{"label": "cphssphl", "expanded_label": "cphsscph", "source": "expanded", "tokens": 10434, "reason": "max_length"}
|
| 397 |
+
{"label": "islmhmd", "expanded_label": "islmhm", "source": "expanded", "tokens": 7502, "reason": "max_length"}
|
| 398 |
+
{"label": "fsumsub", "expanded_label": "fsumadd", "source": "expanded", "tokens": 13138, "reason": "max_length"}
|
| 399 |
+
{"label": "kgenftop", "expanded_label": "kgentopon", "source": "expanded", "tokens": 10064, "reason": "max_length"}
|
| 400 |
+
{"label": "clwwlkf1o", "expanded_label": "clwwlkfo", "source": "expanded", "tokens": 7535, "reason": "max_length"}
|
| 401 |
+
{"label": "mdet0fv0", "expanded_label": "mdet0pr", "source": "expanded", "tokens": 9903, "reason": "max_length"}
|
| 402 |
+
{"label": "usgrexmpl", "expanded_label": "usgrexmplef", "source": "expanded", "tokens": 9478, "reason": "max_length"}
|
| 403 |
+
{"label": "numclwwlk1lem2f1o", "expanded_label": "numclwwlk1lem2f1", "source": "expanded", "tokens": 19478, "reason": "max_length"}
|
| 404 |
+
{"label": "odval2", "expanded_label": "odeq", "source": "expanded", "tokens": 6696, "reason": "max_length"}
|
| 405 |
+
{"label": "phiprm", "expanded_label": "phiprmpw", "source": "expanded", "tokens": 17595, "reason": "max_length"}
|
| 406 |
+
{"label": "3re", "expanded_label": "1re", "source": "expanded", "tokens": 6728, "reason": "max_length"}
|
| 407 |
+
{"label": "ghmf", "expanded_label": "isghm", "source": "expanded", "tokens": 7750, "reason": "max_length"}
|
| 408 |
+
{"label": "5re", "expanded_label": "1re", "source": "expanded", "tokens": 6884, "reason": "max_length"}
|
| 409 |
+
{"label": "cnmgpabl", "expanded_label": "cncrng", "source": "expanded", "tokens": 7869, "reason": "max_length"}
|
| 410 |
+
{"label": "ordtrestixx", "expanded_label": "letsr", "source": "expanded", "tokens": 6760, "reason": "max_length"}
|
| 411 |
+
{"label": "filfinnfr", "expanded_label": "fbfinnfr", "source": "expanded", "tokens": 6699, "reason": "max_length"}
|
| 412 |
+
{"label": "vtxdfiun", "expanded_label": "vtxdun", "source": "expanded", "tokens": 13496, "reason": "max_length"}
|
| 413 |
+
{"label": "cnprcl", "expanded_label": "iscnp2", "source": "expanded", "tokens": 9418, "reason": "max_length"}
|
| 414 |
+
{"label": "1arymaptf1o", "expanded_label": "1arymaptfo", "source": "expanded", "tokens": 6423, "reason": "max_length"}
|
| 415 |
+
{"label": "pc1", "expanded_label": "pczpre", "source": "expanded", "tokens": 10054, "reason": "max_length"}
|
| 416 |
+
{"label": "sshaus", "expanded_label": "cnhaus", "source": "expanded", "tokens": 13569, "reason": "max_length"}
|
| 417 |
+
{"label": "8re", "expanded_label": "1re", "source": "expanded", "tokens": 6721, "reason": "max_length"}
|
| 418 |
+
{"label": "nmoleub2a", "expanded_label": "nmoleub2lem2", "source": "expanded", "tokens": 13300, "reason": "max_length"}
|
| 419 |
+
{"label": "nnesq", "expanded_label": "zesq", "source": "expanded", "tokens": 7574, "reason": "max_length"}
|
| 420 |
+
{"label": "vtxdgfusgrf", "expanded_label": "fusgrfis", "source": "expanded", "tokens": 7818, "reason": "max_length"}
|
| 421 |
+
{"label": "cphsca", "expanded_label": "iscph", "source": "expanded", "tokens": 7872, "reason": "max_length"}
|
| 422 |
+
{"label": "pm2mpf1o", "expanded_label": "pm2mpf1", "source": "expanded", "tokens": 20808, "reason": "max_length"}
|
speed-estimate.md
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Qwen3.5-2B-metamath 8192 speed estimate
|
| 2 |
+
|
| 3 |
+
- Data: metamath-output/setmm-train-qwen35-4b-mixed-12000, 4000 original + 12000 expanded.
|
| 4 |
+
- max_length=8192 keeps about 14492/16000 examples from tokenizer length scan.
|
| 5 |
+
- Training config: Qwen3.5-2B base, FLA fast path available, LoRA rank 32/alpha 64/dropout 0.05, bf16, gradient checkpointing, lr=5e-4, 1 epoch.
|
| 6 |
+
- Batch: per-device train batch size 2, gradient accumulation 8 on one GPU, effective batch size about 16.
|
| 7 |
+
- Smoke run: 59 train examples, 4 optimizer steps, runtime 120.9s. First step was about 94s due to compile/init; later steps were about 8-11s/step on the short smoke sample.
|
| 8 |
+
- Full 1-epoch steps: about 888 optimizer steps after 2% eval split.
|
| 9 |
+
- Estimated full runtime: roughly 4-8 hours depending on length mix, checkpoint/eval cost, and whether the compiled kernels stay warm.
|
| 10 |
+
- Log: /data/pretrained_models/Qwen3.5-2B-metamath/train-8192.log
|
| 11 |
+
- PID file: /data/pretrained_models/Qwen3.5-2B-metamath/train-8192.pid
|
tokenizer.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:87a7830d63fcf43bf241c3c5242e96e62dd3fdc29224ca26fed8ea333db72de4
|
| 3 |
+
size 19989343
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"add_prefix_space": false,
|
| 3 |
+
"audio_bos_token": "<|audio_start|>",
|
| 4 |
+
"audio_eos_token": "<|audio_end|>",
|
| 5 |
+
"audio_token": "<|audio_pad|>",
|
| 6 |
+
"backend": "tokenizers",
|
| 7 |
+
"bos_token": null,
|
| 8 |
+
"clean_up_tokenization_spaces": false,
|
| 9 |
+
"eos_token": "<|im_end|>",
|
| 10 |
+
"errors": "replace",
|
| 11 |
+
"image_token": "<|image_pad|>",
|
| 12 |
+
"is_local": true,
|
| 13 |
+
"model_max_length": 262144,
|
| 14 |
+
"model_specific_special_tokens": {
|
| 15 |
+
"audio_bos_token": "<|audio_start|>",
|
| 16 |
+
"audio_eos_token": "<|audio_end|>",
|
| 17 |
+
"audio_token": "<|audio_pad|>",
|
| 18 |
+
"image_token": "<|image_pad|>",
|
| 19 |
+
"video_token": "<|video_pad|>",
|
| 20 |
+
"vision_bos_token": "<|vision_start|>",
|
| 21 |
+
"vision_eos_token": "<|vision_end|>"
|
| 22 |
+
},
|
| 23 |
+
"pad_token": "<|endoftext|>",
|
| 24 |
+
"pretokenize_regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?[\\p{L}\\p{M}]+|\\p{N}| ?[^\\s\\p{L}\\p{M}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
|
| 25 |
+
"split_special_tokens": false,
|
| 26 |
+
"tokenizer_class": "TokenizersBackend",
|
| 27 |
+
"unk_token": null,
|
| 28 |
+
"video_token": "<|video_pad|>",
|
| 29 |
+
"vision_bos_token": "<|vision_start|>",
|
| 30 |
+
"vision_eos_token": "<|vision_end|>"
|
| 31 |
+
}
|
train-6144-mb2x8-3ep-gpu1.log
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
train-6144-mb2x8-gpu1.log
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 0 |
0%| | 0/955 [00:00<?, ?it/s]
|
| 1 |
0%| | 1/955 [00:08<2:08:48, 8.10s/it]
|
| 2 |
0%| | 2/955 [00:13<1:42:03, 6.43s/it]
|
| 3 |
0%| | 3/955 [00:18<1:33:39, 5.90s/it]
|
| 4 |
0%| | 4/955 [00:24<1:33:10, 5.88s/it]
|
| 5 |
1%| | 5/955 [00:29<1:26:47, 5.48s/it]
|
| 6 |
1%| | 6/955 [00:33<1:22:03, 5.19s/it]
|
| 7 |
1%| | 7/955 [00:39<1:22:53, 5.25s/it]
|
| 8 |
1%| | 8/955 [00:44<1:22:33, 5.23s/it]
|
| 9 |
1%| | 9/955 [00:49<1:22:53, 5.26s/it]
|
| 10 |
1%| | 10/955 [00:55<1:23:03, 5.27s/it]
|
| 11 |
|
| 12 |
1%| | 10/955 [00:55<1:23:03, 5.27s/it]
|
| 13 |
1%| | 11/955 [01:00<1:21:48, 5.20s/it]
|
| 14 |
1%|β | 12/955 [01:06<1:26:21, 5.49s/it]
|
| 15 |
1%|β | 13/955 [01:11<1:26:44, 5.52s/it]
|
| 16 |
1%|β | 14/955 [01:16<1:21:12, 5.18s/it]
|
| 17 |
2%|β | 15/955 [01:23<1:32:44, 5.92s/it]
|
| 18 |
2%|β | 16/955 [01:29<1:31:53, 5.87s/it]
|
| 19 |
2%|β | 17/955 [01:34<1:27:09, 5.57s/it]
|
| 20 |
2%|β | 18/955 [01:39<1:24:54, 5.44s/it]
|
| 21 |
2%|β | 19/955 [01:47<1:35:33, 6.13s/it]
|
| 22 |
2%|β | 20/955 [01:52<1:29:57, 5.77s/it]
|
| 23 |
|
| 24 |
2%|β | 20/955 [01:52<1:29:57, 5.77s/it]
|
| 25 |
2%|β | 21/955 [01:58<1:29:40, 5.76s/it]
|
| 26 |
2%|β | 22/955 [02:02<1:25:43, 5.51s/it]
|
| 27 |
2%|β | 23/955 [02:10<1:36:31, 6.21s/it]
|
| 28 |
3%|β | 24/955 [02:16<1:31:56, 5.92s/it]
|
| 29 |
3%|β | 25/955 [02:22<1:33:55, 6.06s/it]
|
| 30 |
3%|β | 26/955 [02:27<1:27:44, 5.67s/it]
|
| 31 |
3%|β | 27/955 [02:32<1:27:30, 5.66s/it]
|
| 32 |
3%|β | 28/955 [02:40<1:34:33, 6.12s/it]
|
| 33 |
3%|β | 29/955 [02:48<1:47:23, 6.96s/it]
|
| 34 |
3%|β | 30/955 [02:57<1:54:11, 7.41s/it]
|
| 35 |
|
| 36 |
3%|β | 30/955 [02:57<1:54:11, 7.41s/it]
|
| 37 |
3%|β | 31/955 [03:04<1:51:03, 7.21s/it]
|
| 38 |
3%|β | 32/955 [03:09<1:42:57, 6.69s/it]
|
| 39 |
3%|β | 33/955 [03:19<1:56:30, 7.58s/it]
|
| 40 |
4%|β | 34/955 [03:26<1:53:59, 7.43s/it]
|
| 41 |
4%|β | 35/955 [03:31<1:44:08, 6.79s/it]
|
| 42 |
4%|β | 36/955 [03:37<1:39:52, 6.52s/it]
|
| 43 |
4%|β | 37/955 [03:43<1:36:02, 6.28s/it]
|
| 44 |
4%|β | 38/955 [03:49<1:34:08, 6.16s/it]
|
| 45 |
4%|β | 39/955 [03:58<1:47:32, 7.04s/it]
|
| 46 |
4%|β | 40/955 [04:04<1:45:12, 6.90s/it]
|
| 47 |
|
| 48 |
4%|β | 40/955 [04:04<1:45:12, 6.90s/it]
|
| 49 |
4%|β | 41/955 [04:11<1:42:05, 6.70s/it]
|
| 50 |
4%|β | 42/955 [04:18<1:44:20, 6.86s/it]
|
| 51 |
5%|β | 43/955 [04:28<1:58:59, 7.83s/it]
|
| 52 |
5%|β | 44/955 [04:35<1:56:05, 7.65s/it]
|
| 53 |
5%|β | 45/955 [04:43<1:57:25, 7.74s/it]
|
| 54 |
5%|β | 46/955 [04:49<1:51:07, 7.33s/it]
|
| 55 |
5%|β | 47/955 [04:56<1:46:18, 7.02s/it]
|
| 56 |
5%|β | 48/955 [05:01<1:37:38, 6.46s/it]
|
| 57 |
5%|β | 49/955 [05:11<1:51:55, 7.41s/it]
|
| 58 |
5%|β | 50/955 [05:17<1:47:04, 7.10s/it]
|
| 59 |
|
| 60 |
5%|β | 50/955 [05:17<1:47:04, 7.10s/it]Terminated
|
|
|
|
| 1 |
+
Starting Qwen3.5-2B Metamath training
|
| 2 |
+
Output: /data/pretrained_models/Qwen3.5-2B-metamath
|
| 3 |
+
Effective batch size: 16
|
| 4 |
+
/data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
|
| 5 |
+
import pynvml # type: ignore[import]
|
| 6 |
+
`torch_dtype` is deprecated! Use `dtype` instead!
|
| 7 |
+
|
| 8 |
+
warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead.
|
| 9 |
+
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 248046, 'pad_token_id': 248044}.
|
| 10 |
+
trainable params: 21,823,488 || all params: 1,903,648,576 || trainable%: 1.1464
|
| 11 |
+
|
| 12 |
0%| | 0/955 [00:00<?, ?it/s]
|
| 13 |
0%| | 1/955 [00:08<2:08:48, 8.10s/it]
|
| 14 |
0%| | 2/955 [00:13<1:42:03, 6.43s/it]
|
| 15 |
0%| | 3/955 [00:18<1:33:39, 5.90s/it]
|
| 16 |
0%| | 4/955 [00:24<1:33:10, 5.88s/it]
|
| 17 |
1%| | 5/955 [00:29<1:26:47, 5.48s/it]
|
| 18 |
1%| | 6/955 [00:33<1:22:03, 5.19s/it]
|
| 19 |
1%| | 7/955 [00:39<1:22:53, 5.25s/it]
|
| 20 |
1%| | 8/955 [00:44<1:22:33, 5.23s/it]
|
| 21 |
1%| | 9/955 [00:49<1:22:53, 5.26s/it]
|
| 22 |
1%| | 10/955 [00:55<1:23:03, 5.27s/it]
|
| 23 |
|
| 24 |
1%| | 10/955 [00:55<1:23:03, 5.27s/it]
|
| 25 |
1%| | 11/955 [01:00<1:21:48, 5.20s/it]
|
| 26 |
1%|β | 12/955 [01:06<1:26:21, 5.49s/it]
|
| 27 |
1%|β | 13/955 [01:11<1:26:44, 5.52s/it]
|
| 28 |
1%|β | 14/955 [01:16<1:21:12, 5.18s/it]
|
| 29 |
2%|β | 15/955 [01:23<1:32:44, 5.92s/it]
|
| 30 |
2%|β | 16/955 [01:29<1:31:53, 5.87s/it]
|
| 31 |
2%|β | 17/955 [01:34<1:27:09, 5.57s/it]
|
| 32 |
2%|β | 18/955 [01:39<1:24:54, 5.44s/it]
|
| 33 |
2%|β | 19/955 [01:47<1:35:33, 6.13s/it]
|
| 34 |
2%|β | 20/955 [01:52<1:29:57, 5.77s/it]
|
| 35 |
|
| 36 |
2%|β | 20/955 [01:52<1:29:57, 5.77s/it]
|
| 37 |
2%|β | 21/955 [01:58<1:29:40, 5.76s/it]
|
| 38 |
2%|β | 22/955 [02:02<1:25:43, 5.51s/it]
|
| 39 |
2%|β | 23/955 [02:10<1:36:31, 6.21s/it]
|
| 40 |
3%|β | 24/955 [02:16<1:31:56, 5.92s/it]
|
| 41 |
3%|β | 25/955 [02:22<1:33:55, 6.06s/it]
|
| 42 |
3%|β | 26/955 [02:27<1:27:44, 5.67s/it]
|
| 43 |
3%|β | 27/955 [02:32<1:27:30, 5.66s/it]
|
| 44 |
3%|β | 28/955 [02:40<1:34:33, 6.12s/it]
|
| 45 |
3%|β | 29/955 [02:48<1:47:23, 6.96s/it]
|
| 46 |
3%|β | 30/955 [02:57<1:54:11, 7.41s/it]
|
| 47 |
|
| 48 |
3%|β | 30/955 [02:57<1:54:11, 7.41s/it]
|
| 49 |
3%|β | 31/955 [03:04<1:51:03, 7.21s/it]
|
| 50 |
3%|β | 32/955 [03:09<1:42:57, 6.69s/it]
|
| 51 |
3%|β | 33/955 [03:19<1:56:30, 7.58s/it]
|
| 52 |
4%|β | 34/955 [03:26<1:53:59, 7.43s/it]
|
| 53 |
4%|β | 35/955 [03:31<1:44:08, 6.79s/it]
|
| 54 |
4%|β | 36/955 [03:37<1:39:52, 6.52s/it]
|
| 55 |
4%|β | 37/955 [03:43<1:36:02, 6.28s/it]
|
| 56 |
4%|β | 38/955 [03:49<1:34:08, 6.16s/it]
|
| 57 |
4%|β | 39/955 [03:58<1:47:32, 7.04s/it]
|
| 58 |
4%|β | 40/955 [04:04<1:45:12, 6.90s/it]
|
| 59 |
|
| 60 |
4%|β | 40/955 [04:04<1:45:12, 6.90s/it]
|
| 61 |
4%|β | 41/955 [04:11<1:42:05, 6.70s/it]
|
| 62 |
4%|β | 42/955 [04:18<1:44:20, 6.86s/it]
|
| 63 |
5%|β | 43/955 [04:28<1:58:59, 7.83s/it]
|
| 64 |
5%|β | 44/955 [04:35<1:56:05, 7.65s/it]
|
| 65 |
5%|β | 45/955 [04:43<1:57:25, 7.74s/it]
|
| 66 |
5%|β | 46/955 [04:49<1:51:07, 7.33s/it]
|
| 67 |
5%|β | 47/955 [04:56<1:46:18, 7.02s/it]
|
| 68 |
5%|β | 48/955 [05:01<1:37:38, 6.46s/it]
|
| 69 |
5%|β | 49/955 [05:11<1:51:55, 7.41s/it]
|
| 70 |
5%|β | 50/955 [05:17<1:47:04, 7.10s/it]
|
| 71 |
|
| 72 |
5%|β | 50/955 [05:17<1:47:04, 7.10s/it]Terminated
|
train-8192-mb4x4-gpu1.log
ADDED
|
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 0 |
0%| | 0/966 [00:00<?, ?it/s]
|
| 1 |
0%| | 1/966 [00:25<6:46:25, 25.27s/it]
|
| 2 |
0%| | 2/966 [00:30<3:33:54, 13.31s/it]
|
| 3 |
0%| | 3/966 [00:34<2:30:05, 9.35s/it]Traceback (most recent call last):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
0%| | 3/966 [00:45<4:00:55, 15.01s/it]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Starting Qwen3.5-2B Metamath training
|
| 2 |
+
Output: /data/pretrained_models/Qwen3.5-2B-metamath
|
| 3 |
+
Effective batch size: 16
|
| 4 |
+
/data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
|
| 5 |
+
import pynvml # type: ignore[import]
|
| 6 |
+
`torch_dtype` is deprecated! Use `dtype` instead!
|
| 7 |
+
|
| 8 |
+
warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead.
|
| 9 |
+
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 248046, 'pad_token_id': 248044}.
|
| 10 |
+
trainable params: 21,823,488 || all params: 1,903,648,576 || trainable%: 1.1464
|
| 11 |
+
|
| 12 |
0%| | 0/966 [00:00<?, ?it/s]
|
| 13 |
0%| | 1/966 [00:25<6:46:25, 25.27s/it]
|
| 14 |
0%| | 2/966 [00:30<3:33:54, 13.31s/it]
|
| 15 |
0%| | 3/966 [00:34<2:30:05, 9.35s/it]Traceback (most recent call last):
|
| 16 |
+
File "/home/lg/workflow_tooluse/Flow_RL_luogan/temp/metamath/tools/train_qwen35_metamath.py", line 381, in <module>
|
| 17 |
+
main()
|
| 18 |
+
File "/home/lg/workflow_tooluse/Flow_RL_luogan/temp/metamath/tools/train_qwen35_metamath.py", line 339, in main
|
| 19 |
+
trainer.train()
|
| 20 |
+
File "/data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/transformers/trainer.py", line 1424, in train
|
| 21 |
+
return inner_training_loop(
|
| 22 |
+
^^^^^^^^^^^^^^^^^^^^
|
| 23 |
+
File "/data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/transformers/trainer.py", line 1506, in _inner_training_loop
|
| 24 |
+
self._run_epoch(
|
| 25 |
+
File "/data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/transformers/trainer.py", line 1734, in _run_epoch
|
| 26 |
+
tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
|
| 27 |
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 28 |
+
File "/data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/transformers/trainer.py", line 1934, in training_step
|
| 29 |
+
self.accelerator.backward(loss, **kwargs)
|
| 30 |
+
File "/home/lg/.local/lib/python3.12/site-packages/accelerate/accelerator.py", line 2329, in backward
|
| 31 |
+
loss.backward(**kwargs)
|
| 32 |
+
File "/data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/torch/_tensor.py", line 625, in backward
|
| 33 |
+
torch.autograd.backward(
|
| 34 |
+
File "/data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/torch/autograd/__init__.py", line 354, in backward
|
| 35 |
+
_engine_run_backward(
|
| 36 |
+
File "/data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/torch/autograd/graph.py", line 841, in _engine_run_backward
|
| 37 |
+
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
|
| 38 |
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 39 |
+
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 25.02 GiB. GPU 0 has a total capacity of 79.14 GiB of which 18.98 GiB is free. Including non-PyTorch memory, this process has 60.14 GiB memory in use. Of the allocated memory 56.74 GiB is allocated by PyTorch, and 2.36 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
|
| 40 |
+
|
| 41 |
0%| | 3/966 [00:45<4:00:55, 15.01s/it]
|
| 42 |
+
Exception ignored in: <function ResourceTracker.__del__ at 0x7d8e4c558c20>
|
| 43 |
+
Traceback (most recent call last):
|
| 44 |
+
File "/home/lg/.local/lib/python3.12/site-packages/multiprocess/resource_tracker.py", line 80, in __del__
|
| 45 |
+
File "/home/lg/.local/lib/python3.12/site-packages/multiprocess/resource_tracker.py", line 89, in _stop
|
| 46 |
+
File "/home/lg/.local/lib/python3.12/site-packages/multiprocess/resource_tracker.py", line 102, in _stop_locked
|
| 47 |
+
AttributeError: '_thread.RLock' object has no attribute '_recursion_count'
|
train-8192.log
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 0 |
0%| | 0/888 [00:00<?, ?it/s]
|
| 1 |
0%| | 1/888 [00:11<2:53:39, 11.75s/it]
|
| 2 |
0%| | 2/888 [00:17<1:59:09, 8.07s/it]
|
| 3 |
0%| | 3/888 [00:28<2:21:20, 9.58s/it]
|
| 4 |
0%| | 4/888 [00:37<2:18:40, 9.41s/it]
|
| 5 |
1%| | 5/888 [00:47<2:19:10, 9.46s/it]
|
| 6 |
1%| | 6/888 [00:57<2:23:55, 9.79s/it]
|
| 7 |
1%| | 7/888 [01:04<2:09:23, 8.81s/it]
|
| 8 |
1%| | 8/888 [01:15<2:20:01, 9.55s/it]
|
| 9 |
1%| | 9/888 [01:23<2:10:23, 8.90s/it]
|
| 10 |
1%| | 10/888 [01:31<2:09:24, 8.84s/it]
|
| 11 |
|
| 12 |
1%| | 10/888 [01:31<2:09:24, 8.84s/it]
|
| 13 |
1%| | 11/888 [01:42<2:15:42, 9.28s/it]
|
| 14 |
1%|β | 12/888 [01:53<2:24:50, 9.92s/it]
|
| 15 |
1%|β | 13/888 [02:03<2:24:07, 9.88s/it]
|
| 16 |
2%|β | 14/888 [02:19<2:52:47, 11.86s/it]
|
| 17 |
2%|β | 15/888 [02:36<3:11:57, 13.19s/it]
|
| 18 |
2%|β | 16/888 [02:47<3:03:38, 12.64s/it]
|
| 19 |
2%|β | 17/888 [02:59<3:00:47, 12.45s/it]
|
| 20 |
2%|β | 18/888 [03:18<3:27:52, 14.34s/it]
|
|
|
|
| 1 |
+
/data/home/lg/.conda/envs/verl_qwen35/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
|
| 2 |
+
import pynvml # type: ignore[import]
|
| 3 |
+
`torch_dtype` is deprecated! Use `dtype` instead!
|
| 4 |
+
|
| 5 |
+
warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead.
|
| 6 |
+
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 248046, 'pad_token_id': 248044}.
|
| 7 |
+
trainable params: 21,823,488 || all params: 1,903,648,576 || trainable%: 1.1464
|
| 8 |
+
|
| 9 |
0%| | 0/888 [00:00<?, ?it/s]
|
| 10 |
0%| | 1/888 [00:11<2:53:39, 11.75s/it]
|
| 11 |
0%| | 2/888 [00:17<1:59:09, 8.07s/it]
|
| 12 |
0%| | 3/888 [00:28<2:21:20, 9.58s/it]
|
| 13 |
0%| | 4/888 [00:37<2:18:40, 9.41s/it]
|
| 14 |
1%| | 5/888 [00:47<2:19:10, 9.46s/it]
|
| 15 |
1%| | 6/888 [00:57<2:23:55, 9.79s/it]
|
| 16 |
1%| | 7/888 [01:04<2:09:23, 8.81s/it]
|
| 17 |
1%| | 8/888 [01:15<2:20:01, 9.55s/it]
|
| 18 |
1%| | 9/888 [01:23<2:10:23, 8.90s/it]
|
| 19 |
1%| | 10/888 [01:31<2:09:24, 8.84s/it]
|
| 20 |
|
| 21 |
1%| | 10/888 [01:31<2:09:24, 8.84s/it]
|
| 22 |
1%| | 11/888 [01:42<2:15:42, 9.28s/it]
|
| 23 |
1%|β | 12/888 [01:53<2:24:50, 9.92s/it]
|
| 24 |
1%|β | 13/888 [02:03<2:24:07, 9.88s/it]
|
| 25 |
2%|β | 14/888 [02:19<2:52:47, 11.86s/it]
|
| 26 |
2%|β | 15/888 [02:36<3:11:57, 13.19s/it]
|
| 27 |
2%|β | 16/888 [02:47<3:03:38, 12.64s/it]
|
| 28 |
2%|β | 17/888 [02:59<3:00:47, 12.45s/it]
|
| 29 |
2%|β | 18/888 [03:18<3:27:52, 14.34s/it]
|
train-8192.pid
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
2564523
|
train-manifest.json
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"base_model": "/data/pretrained_models/Qwen3.5-2B",
|
| 3 |
+
"original_units": "/home/lg/workflow_tooluse/Flow_RL_luogan/temp/metamath/metamath-output/setmm-train-qwen35-4b-mixed-12000/setmm-proof-units.jsonl",
|
| 4 |
+
"expanded_units": "/home/lg/workflow_tooluse/Flow_RL_luogan/temp/metamath/metamath-output/setmm-train-qwen35-4b-mixed-12000/setmm-expanded-units.jsonl",
|
| 5 |
+
"output_dir": "/data/pretrained_models/Qwen3.5-2B-metamath",
|
| 6 |
+
"merged_dir": "/data/pretrained_models/Qwen3.5-2B-metamath/merged",
|
| 7 |
+
"train_examples": 15267,
|
| 8 |
+
"eval_examples": 311,
|
| 9 |
+
"skipped_examples": 422,
|
| 10 |
+
"max_length": 6144,
|
| 11 |
+
"direct_ref_mode": "same-file-distractors",
|
| 12 |
+
"same_file_distractor_direct_refs": 4,
|
| 13 |
+
"shuffle_direct_refs": true,
|
| 14 |
+
"learning_rate": 0.0001,
|
| 15 |
+
"lora_rank": 32,
|
| 16 |
+
"lora_alpha": 64
|
| 17 |
+
}
|
training_args.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e8fc737554ff6f82c4ea137b5313611e3b2b3b63fd69b3926d6b1fe9da14c0a6
|
| 3 |
+
size 5201
|