Đào Quốc Tuấn commited on
Upload folder using huggingface_hub
Browse files- experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/README.md +207 -0
- experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/adapter_config.json +41 -0
- experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/adapter_model.safetensors +3 -0
- experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/lr_scheduler.pt +3 -0
- experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/optimizer.pt +3 -0
- experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_0.pt +3 -0
- experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_1.pt +3 -0
- experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_2.pt +3 -0
- experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_3.pt +3 -0
- experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_4.pt +3 -0
- experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_5.pt +3 -0
- experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_6.pt +3 -0
- experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_7.pt +3 -0
- experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/special_tokens_map.json +24 -0
- experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/tokenizer.json +0 -0
- experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/tokenizer_config.json +43 -0
- experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/tuandao_mistral-7b_to_tinyllama-1.1b.log +26 -0
- experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/tuandao_mistral-7b_to_tinyllama-1.1b.yaml +54 -0
- experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/tuandao_mistral-7b_to_tinyllama-1.1b_metrics.jsonl +74 -0
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/README.md
ADDED
|
@@ -0,0 +1,207 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
base_model: models/tinyllama-1.1b
|
| 3 |
+
library_name: peft
|
| 4 |
+
pipeline_tag: text-generation
|
| 5 |
+
tags:
|
| 6 |
+
- base_model:adapter:models/tinyllama-1.1b
|
| 7 |
+
- lora
|
| 8 |
+
- transformers
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# Model Card for Model ID
|
| 12 |
+
|
| 13 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
## Model Details
|
| 18 |
+
|
| 19 |
+
### Model Description
|
| 20 |
+
|
| 21 |
+
<!-- Provide a longer summary of what this model is. -->
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
- **Developed by:** [More Information Needed]
|
| 26 |
+
- **Funded by [optional]:** [More Information Needed]
|
| 27 |
+
- **Shared by [optional]:** [More Information Needed]
|
| 28 |
+
- **Model type:** [More Information Needed]
|
| 29 |
+
- **Language(s) (NLP):** [More Information Needed]
|
| 30 |
+
- **License:** [More Information Needed]
|
| 31 |
+
- **Finetuned from model [optional]:** [More Information Needed]
|
| 32 |
+
|
| 33 |
+
### Model Sources [optional]
|
| 34 |
+
|
| 35 |
+
<!-- Provide the basic links for the model. -->
|
| 36 |
+
|
| 37 |
+
- **Repository:** [More Information Needed]
|
| 38 |
+
- **Paper [optional]:** [More Information Needed]
|
| 39 |
+
- **Demo [optional]:** [More Information Needed]
|
| 40 |
+
|
| 41 |
+
## Uses
|
| 42 |
+
|
| 43 |
+
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
| 44 |
+
|
| 45 |
+
### Direct Use
|
| 46 |
+
|
| 47 |
+
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
| 48 |
+
|
| 49 |
+
[More Information Needed]
|
| 50 |
+
|
| 51 |
+
### Downstream Use [optional]
|
| 52 |
+
|
| 53 |
+
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
| 54 |
+
|
| 55 |
+
[More Information Needed]
|
| 56 |
+
|
| 57 |
+
### Out-of-Scope Use
|
| 58 |
+
|
| 59 |
+
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
| 60 |
+
|
| 61 |
+
[More Information Needed]
|
| 62 |
+
|
| 63 |
+
## Bias, Risks, and Limitations
|
| 64 |
+
|
| 65 |
+
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
| 66 |
+
|
| 67 |
+
[More Information Needed]
|
| 68 |
+
|
| 69 |
+
### Recommendations
|
| 70 |
+
|
| 71 |
+
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
| 72 |
+
|
| 73 |
+
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
| 74 |
+
|
| 75 |
+
## How to Get Started with the Model
|
| 76 |
+
|
| 77 |
+
Use the code below to get started with the model.
|
| 78 |
+
|
| 79 |
+
[More Information Needed]
|
| 80 |
+
|
| 81 |
+
## Training Details
|
| 82 |
+
|
| 83 |
+
### Training Data
|
| 84 |
+
|
| 85 |
+
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
| 86 |
+
|
| 87 |
+
[More Information Needed]
|
| 88 |
+
|
| 89 |
+
### Training Procedure
|
| 90 |
+
|
| 91 |
+
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
| 92 |
+
|
| 93 |
+
#### Preprocessing [optional]
|
| 94 |
+
|
| 95 |
+
[More Information Needed]
|
| 96 |
+
|
| 97 |
+
|
| 98 |
+
#### Training Hyperparameters
|
| 99 |
+
|
| 100 |
+
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
| 101 |
+
|
| 102 |
+
#### Speeds, Sizes, Times [optional]
|
| 103 |
+
|
| 104 |
+
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
| 105 |
+
|
| 106 |
+
[More Information Needed]
|
| 107 |
+
|
| 108 |
+
## Evaluation
|
| 109 |
+
|
| 110 |
+
<!-- This section describes the evaluation protocols and provides the results. -->
|
| 111 |
+
|
| 112 |
+
### Testing Data, Factors & Metrics
|
| 113 |
+
|
| 114 |
+
#### Testing Data
|
| 115 |
+
|
| 116 |
+
<!-- This should link to a Dataset Card if possible. -->
|
| 117 |
+
|
| 118 |
+
[More Information Needed]
|
| 119 |
+
|
| 120 |
+
#### Factors
|
| 121 |
+
|
| 122 |
+
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
| 123 |
+
|
| 124 |
+
[More Information Needed]
|
| 125 |
+
|
| 126 |
+
#### Metrics
|
| 127 |
+
|
| 128 |
+
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
| 129 |
+
|
| 130 |
+
[More Information Needed]
|
| 131 |
+
|
| 132 |
+
### Results
|
| 133 |
+
|
| 134 |
+
[More Information Needed]
|
| 135 |
+
|
| 136 |
+
#### Summary
|
| 137 |
+
|
| 138 |
+
|
| 139 |
+
|
| 140 |
+
## Model Examination [optional]
|
| 141 |
+
|
| 142 |
+
<!-- Relevant interpretability work for the model goes here -->
|
| 143 |
+
|
| 144 |
+
[More Information Needed]
|
| 145 |
+
|
| 146 |
+
## Environmental Impact
|
| 147 |
+
|
| 148 |
+
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
| 149 |
+
|
| 150 |
+
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
| 151 |
+
|
| 152 |
+
- **Hardware Type:** [More Information Needed]
|
| 153 |
+
- **Hours used:** [More Information Needed]
|
| 154 |
+
- **Cloud Provider:** [More Information Needed]
|
| 155 |
+
- **Compute Region:** [More Information Needed]
|
| 156 |
+
- **Carbon Emitted:** [More Information Needed]
|
| 157 |
+
|
| 158 |
+
## Technical Specifications [optional]
|
| 159 |
+
|
| 160 |
+
### Model Architecture and Objective
|
| 161 |
+
|
| 162 |
+
[More Information Needed]
|
| 163 |
+
|
| 164 |
+
### Compute Infrastructure
|
| 165 |
+
|
| 166 |
+
[More Information Needed]
|
| 167 |
+
|
| 168 |
+
#### Hardware
|
| 169 |
+
|
| 170 |
+
[More Information Needed]
|
| 171 |
+
|
| 172 |
+
#### Software
|
| 173 |
+
|
| 174 |
+
[More Information Needed]
|
| 175 |
+
|
| 176 |
+
## Citation [optional]
|
| 177 |
+
|
| 178 |
+
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
| 179 |
+
|
| 180 |
+
**BibTeX:**
|
| 181 |
+
|
| 182 |
+
[More Information Needed]
|
| 183 |
+
|
| 184 |
+
**APA:**
|
| 185 |
+
|
| 186 |
+
[More Information Needed]
|
| 187 |
+
|
| 188 |
+
## Glossary [optional]
|
| 189 |
+
|
| 190 |
+
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
| 191 |
+
|
| 192 |
+
[More Information Needed]
|
| 193 |
+
|
| 194 |
+
## More Information [optional]
|
| 195 |
+
|
| 196 |
+
[More Information Needed]
|
| 197 |
+
|
| 198 |
+
## Model Card Authors [optional]
|
| 199 |
+
|
| 200 |
+
[More Information Needed]
|
| 201 |
+
|
| 202 |
+
## Model Card Contact
|
| 203 |
+
|
| 204 |
+
[More Information Needed]
|
| 205 |
+
### Framework versions
|
| 206 |
+
|
| 207 |
+
- PEFT 0.18.0
|
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/adapter_config.json
ADDED
|
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"alora_invocation_tokens": null,
|
| 3 |
+
"alpha_pattern": {},
|
| 4 |
+
"arrow_config": null,
|
| 5 |
+
"auto_mapping": null,
|
| 6 |
+
"base_model_name_or_path": "models/tinyllama-1.1b",
|
| 7 |
+
"bias": "none",
|
| 8 |
+
"corda_config": null,
|
| 9 |
+
"ensure_weight_tying": false,
|
| 10 |
+
"eva_config": null,
|
| 11 |
+
"exclude_modules": null,
|
| 12 |
+
"fan_in_fan_out": false,
|
| 13 |
+
"inference_mode": true,
|
| 14 |
+
"init_lora_weights": true,
|
| 15 |
+
"layer_replication": null,
|
| 16 |
+
"layers_pattern": null,
|
| 17 |
+
"layers_to_transform": null,
|
| 18 |
+
"loftq_config": {},
|
| 19 |
+
"lora_alpha": 8,
|
| 20 |
+
"lora_bias": false,
|
| 21 |
+
"lora_dropout": 0.1,
|
| 22 |
+
"megatron_config": null,
|
| 23 |
+
"megatron_core": "megatron.core",
|
| 24 |
+
"modules_to_save": null,
|
| 25 |
+
"peft_type": "LORA",
|
| 26 |
+
"peft_version": "0.18.0",
|
| 27 |
+
"qalora_group_size": 16,
|
| 28 |
+
"r": 256,
|
| 29 |
+
"rank_pattern": {},
|
| 30 |
+
"revision": null,
|
| 31 |
+
"target_modules": [
|
| 32 |
+
"v_proj",
|
| 33 |
+
"q_proj"
|
| 34 |
+
],
|
| 35 |
+
"target_parameters": null,
|
| 36 |
+
"task_type": "CAUSAL_LM",
|
| 37 |
+
"trainable_token_indices": null,
|
| 38 |
+
"use_dora": false,
|
| 39 |
+
"use_qalora": false,
|
| 40 |
+
"use_rslora": false
|
| 41 |
+
}
|
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/adapter_model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:715bd43e1868106f0802f8c8bd7348ce8cc6b418f239469be529854f406f1a7d
|
| 3 |
+
size 144191184
|
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/lr_scheduler.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:89484437b4db7fc0ed16b7314dcaf495c706be5db0e0e015a11fc8b30ae5bded
|
| 3 |
+
size 1483
|
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/optimizer.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:269e1a840d46baf15d919beeaef74b384a869bffd3e3574994a9fc75cc7b1bc1
|
| 3 |
+
size 825448971
|
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_0.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a9e3a0189969c79fbabd30937ab19edfdbfd0b47f257512d5bbe4094b471280c
|
| 3 |
+
size 33564613
|
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_1.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e1e1d26cd0a93914dc9e222ec5624a3066938490e6da2fa1e252995ca29facd9
|
| 3 |
+
size 33564613
|
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_2.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7f0b5577d9c8000344d321d547c8918fca2ffbdf503794607d98c55437069208
|
| 3 |
+
size 33564613
|
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_3.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4a540628c8420e96fda19b435e4be79634e803991a154ae92f4583bbf8707177
|
| 3 |
+
size 33564613
|
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_4.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ce91dfbf7ce0327f3a9e40a42bfda54a61a469221a82de5cbeb3c0ffa0d3d42e
|
| 3 |
+
size 33564613
|
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_5.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d066e24c5debb125c3dc19105bb805e07c73e05a5dc22880103c7725304f9234
|
| 3 |
+
size 33564613
|
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_6.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1705a9ad8a952865bf6da2b77794dcc87aa986c6d9538546d65e6f00a835eb41
|
| 3 |
+
size 33564613
|
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_7.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:fca00fdfa79fec2d7c9d7eaf53920d175c85eea9fff05c95db4aa75f4de93764
|
| 3 |
+
size 33564613
|
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/special_tokens_map.json
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"bos_token": {
|
| 3 |
+
"content": "<s>",
|
| 4 |
+
"lstrip": false,
|
| 5 |
+
"normalized": false,
|
| 6 |
+
"rstrip": false,
|
| 7 |
+
"single_word": false
|
| 8 |
+
},
|
| 9 |
+
"eos_token": {
|
| 10 |
+
"content": "</s>",
|
| 11 |
+
"lstrip": false,
|
| 12 |
+
"normalized": false,
|
| 13 |
+
"rstrip": false,
|
| 14 |
+
"single_word": false
|
| 15 |
+
},
|
| 16 |
+
"pad_token": "</s>",
|
| 17 |
+
"unk_token": {
|
| 18 |
+
"content": "<unk>",
|
| 19 |
+
"lstrip": false,
|
| 20 |
+
"normalized": false,
|
| 21 |
+
"rstrip": false,
|
| 22 |
+
"single_word": false
|
| 23 |
+
}
|
| 24 |
+
}
|
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/tokenizer_config.json
ADDED
|
@@ -0,0 +1,43 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"add_bos_token": true,
|
| 3 |
+
"add_eos_token": false,
|
| 4 |
+
"add_prefix_space": null,
|
| 5 |
+
"added_tokens_decoder": {
|
| 6 |
+
"0": {
|
| 7 |
+
"content": "<unk>",
|
| 8 |
+
"lstrip": false,
|
| 9 |
+
"normalized": false,
|
| 10 |
+
"rstrip": false,
|
| 11 |
+
"single_word": false,
|
| 12 |
+
"special": true
|
| 13 |
+
},
|
| 14 |
+
"1": {
|
| 15 |
+
"content": "<s>",
|
| 16 |
+
"lstrip": false,
|
| 17 |
+
"normalized": false,
|
| 18 |
+
"rstrip": false,
|
| 19 |
+
"single_word": false,
|
| 20 |
+
"special": true
|
| 21 |
+
},
|
| 22 |
+
"2": {
|
| 23 |
+
"content": "</s>",
|
| 24 |
+
"lstrip": false,
|
| 25 |
+
"normalized": false,
|
| 26 |
+
"rstrip": false,
|
| 27 |
+
"single_word": false,
|
| 28 |
+
"special": true
|
| 29 |
+
}
|
| 30 |
+
},
|
| 31 |
+
"bos_token": "<s>",
|
| 32 |
+
"clean_up_tokenization_spaces": false,
|
| 33 |
+
"eos_token": "</s>",
|
| 34 |
+
"extra_special_tokens": {},
|
| 35 |
+
"legacy": false,
|
| 36 |
+
"model_max_length": 1000000000000000019884624838656,
|
| 37 |
+
"pad_token": "</s>",
|
| 38 |
+
"padding_side": "right",
|
| 39 |
+
"sp_model_kwargs": {},
|
| 40 |
+
"tokenizer_class": "LlamaTokenizer",
|
| 41 |
+
"unk_token": "<unk>",
|
| 42 |
+
"use_default_system_prompt": false
|
| 43 |
+
}
|
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/tuandao_mistral-7b_to_tinyllama-1.1b.log
ADDED
|
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
2025-12-02 05:31:57,909 - root - [32m[1mINFO[0m - Config loaded: {'seed': 42, 'ridge_lambda': '5e-4', 'sinkhorn_reg': 0.1, 'topk': 100, 'row_batch': 8192, 'col_batch': 8192, 'sinkhorn_iters': 200, 'sinkhorn_tol': '1e-5', 'data_path': 'data/dolly/valid.jsonl', 'max_prompt_length': 256, 'max_length': 512, 'student_name': 'tinyllama-1.1b', 'teacher_name': 'mistral-7b-v0.1', 'student_path': 'models/tinyllama-1.1b', 'student_adapter_path': '', 'teacher_path': 'models/mistral-7b-v0.1', 'teacher_adapter_path': 'models/teacher_mistral7b', 'projector_path': 'projectors', 'lora': True, 'lora_r': 256, 'lora_alpha': 8, 'lora_dropout': 0.1, 'num_epochs': 20, 'device': 'cuda', 'learning_rate': '1e-3', 'lr_scheduler': 'cosine', 'warmup_percentage': 0.05, 'batch_size': 8, 'gradient_accumulation_steps': 1, 'alpha': 2.0, 'beta': 0.5, 'theta': 0.2, 'k': 8, 'compute_bi_batch_count': 8, 'lambda_h': 0.5, 'eval_repeat': 1, 'eval_data_path': 'data/dolly/valid.jsonl', 'eval_batch_size': 64, 'user': 'mrtuandao', 'repo': 'weighted-CTKD', 'wandb_project': 'weighted-ctkd'}
|
| 2 |
+
2025-12-02 05:31:59,500 - root - [32m[1mINFO[0m - Wandb initialized with run name: tuandao_mistral-7b-v0.1_to_tinyllama-1.1b_20251202_053157
|
| 3 |
+
2025-12-02 05:31:59,502 - root - [32m[1mINFO[0m - Using device: cuda
|
| 4 |
+
2025-12-02 05:33:43,800 - root - [32m[1mINFO[0m - Lora model initialized
|
| 5 |
+
2025-12-02 05:33:43,892 - root - [32m[1mINFO[0m - Projector loaded from projectors/mistral-7b-v0.1_to_tinyllama-1.1b/mistral-7b-v0.1_to_tinyllama-1.1b_project_matrix.pt
|
| 6 |
+
2025-12-02 05:33:44,094 - weighted_ctkd.kd_dataset - [32m[1mINFO[0m - Start loading data from data/dolly/valid.jsonl
|
| 7 |
+
2025-12-02 05:33:44,647 - weighted_ctkd.kd_dataset - [32m[1mINFO[0m - Start loading data from data/dolly/valid.jsonl
|
| 8 |
+
2025-12-02 05:33:49,044 - root - [32m[1mINFO[0m - Ours selection: Top-8 teacher layers: [0, 1, 2, 3, 4, 29, 30, 31]
|
| 9 |
+
2025-12-02 05:33:49,044 - root - [32m[1mINFO[0m - BI scores: ['0.5697', '0.3668', '0.1914', '0.1315', '0.1093', '0.1063', '0.1510', '0.3541']
|
| 10 |
+
2025-12-02 05:33:49,045 - root - [32m[1mINFO[0m - Mapped student: {0: [0, 1], 1: [2], 2: [3, 4], 19: [29], 20: [30], 21: [31]}
|
| 11 |
+
2025-12-02 05:33:49,444 - root - [32m[1mINFO[0m - Added projector parameters to optimizer
|
| 12 |
+
2025-12-02 05:33:49,450 - root - [32m[1mINFO[0m - Epoch 1/20
|
| 13 |
+
2025-12-02 05:33:51,204 - absl - [32m[1mINFO[0m - Using default tokenizer.
|
| 14 |
+
2025-12-02 05:33:59,827 - root - [32m[1mINFO[0m - Step 1/1260 train rougeL: 0.09286737051895978
|
| 15 |
+
2025-12-02 05:34:00,069 - root - [32m[1mINFO[0m - Step 1/1260 loss: 3.636246681213379, nll_loss: 2.150993585586548, distill_loss: 0.7426265478134155
|
| 16 |
+
2025-12-02 05:35:43,097 - root - [32m[1mINFO[0m - Epoch 1/20 finished
|
| 17 |
+
2025-12-02 05:35:43,136 - absl - [32m[1mINFO[0m - Using default tokenizer.
|
| 18 |
+
2025-12-02 05:35:50,123 - absl - [32m[1mINFO[0m - Using default tokenizer.
|
| 19 |
+
2025-12-02 05:36:00,278 - absl - [32m[1mINFO[0m - Using default tokenizer.
|
| 20 |
+
2025-12-02 05:36:10,423 - absl - [32m[1mINFO[0m - Using default tokenizer.
|
| 21 |
+
2025-12-02 05:36:20,573 - absl - [32m[1mINFO[0m - Using default tokenizer.
|
| 22 |
+
2025-12-02 05:36:30,682 - absl - [32m[1mINFO[0m - Using default tokenizer.
|
| 23 |
+
2025-12-02 05:36:40,772 - absl - [32m[1mINFO[0m - Using default tokenizer.
|
| 24 |
+
2025-12-02 05:36:50,916 - absl - [32m[1mINFO[0m - Using default tokenizer.
|
| 25 |
+
2025-12-02 05:37:00,084 - root - [32m[1mINFO[0m - Epoch 1/20 eval rougeL: 0.15909466383518
|
| 26 |
+
2025-12-02 05:37:01,324 - root - [32m[1mINFO[0m - Epoch 2/20
|
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/tuandao_mistral-7b_to_tinyllama-1.1b.yaml
ADDED
|
@@ -0,0 +1,54 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Teacher training configuration
|
| 2 |
+
seed: 42
|
| 3 |
+
|
| 4 |
+
# create projector parameters
|
| 5 |
+
ridge_lambda: 5e-4
|
| 6 |
+
sinkhorn_reg: 0.1
|
| 7 |
+
topk: 100
|
| 8 |
+
row_batch: 8192
|
| 9 |
+
col_batch: 8192
|
| 10 |
+
sinkhorn_iters: 200
|
| 11 |
+
sinkhorn_tol: 1e-5
|
| 12 |
+
|
| 13 |
+
# Dataset parameters
|
| 14 |
+
data_path: "data/dolly/valid.jsonl"
|
| 15 |
+
max_prompt_length: 256
|
| 16 |
+
max_length: 512
|
| 17 |
+
student_name: "tinyllama-1.1b"
|
| 18 |
+
teacher_name: "mistral-7b-v0.1"
|
| 19 |
+
student_path: "models/tinyllama-1.1b"
|
| 20 |
+
student_adapter_path: ""
|
| 21 |
+
teacher_path: "models/mistral-7b-v0.1"
|
| 22 |
+
teacher_adapter_path: "models/teacher_mistral7b"
|
| 23 |
+
projector_path: "projectors"
|
| 24 |
+
|
| 25 |
+
# Training parameters
|
| 26 |
+
lora: true
|
| 27 |
+
lora_r: 256
|
| 28 |
+
lora_alpha: 8
|
| 29 |
+
lora_dropout: 0.1
|
| 30 |
+
num_epochs: 20
|
| 31 |
+
device: "cuda"
|
| 32 |
+
learning_rate: 1e-3
|
| 33 |
+
lr_scheduler: "cosine"
|
| 34 |
+
warmup_percentage: 0.05
|
| 35 |
+
batch_size: 8
|
| 36 |
+
gradient_accumulation_steps: 1
|
| 37 |
+
alpha: 2.0
|
| 38 |
+
beta: 0.5
|
| 39 |
+
theta: 0.2
|
| 40 |
+
k: 8
|
| 41 |
+
compute_bi_batch_count: 8
|
| 42 |
+
lambda_h: 0.5
|
| 43 |
+
|
| 44 |
+
# Evaluation parameters
|
| 45 |
+
eval_repeat: 1
|
| 46 |
+
eval_data_path: "data/dolly/valid.jsonl"
|
| 47 |
+
eval_batch_size: 64
|
| 48 |
+
|
| 49 |
+
# Huggingface parameters
|
| 50 |
+
user: "mrtuandao"
|
| 51 |
+
repo: "weighted-CTKD"
|
| 52 |
+
|
| 53 |
+
# Wandb parameters
|
| 54 |
+
wandb_project: "weighted-ctkd"
|
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/tuandao_mistral-7b_to_tinyllama-1.1b_metrics.jsonl
ADDED
|
@@ -0,0 +1,74 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{"epoch": 0, "step": 0, "loss": 3.636246681213379, "nll_loss": 2.150993585586548, "distill_loss": 0.7426265478134155}
|
| 2 |
+
{"epoch": 0, "step": 1, "loss": 3.4290106296539307, "nll_loss": 2.081878423690796, "distill_loss": 0.6735661029815674}
|
| 3 |
+
{"epoch": 0, "step": 2, "loss": 2.6680655479431152, "nll_loss": 1.46719491481781, "distill_loss": 0.6004353165626526}
|
| 4 |
+
{"epoch": 0, "step": 3, "loss": 3.413308620452881, "nll_loss": 2.261220932006836, "distill_loss": 0.5760437846183777}
|
| 5 |
+
{"epoch": 0, "step": 4, "loss": 3.270015001296997, "nll_loss": 2.2107362747192383, "distill_loss": 0.5296393632888794}
|
| 6 |
+
{"epoch": 0, "step": 5, "loss": 3.2906668186187744, "nll_loss": 2.1387462615966797, "distill_loss": 0.5759602785110474}
|
| 7 |
+
{"epoch": 0, "step": 6, "loss": 3.2078216075897217, "nll_loss": 2.180889368057251, "distill_loss": 0.5134661197662354}
|
| 8 |
+
{"epoch": 0, "step": 7, "loss": 3.4919862747192383, "nll_loss": 2.5108277797698975, "distill_loss": 0.490579217672348}
|
| 9 |
+
{"epoch": 0, "step": 8, "loss": 3.271066427230835, "nll_loss": 2.286349058151245, "distill_loss": 0.4923587143421173}
|
| 10 |
+
{"epoch": 0, "step": 9, "loss": 2.963165760040283, "nll_loss": 2.0048465728759766, "distill_loss": 0.47915956377983093}
|
| 11 |
+
{"epoch": 0, "step": 10, "loss": 3.1000733375549316, "nll_loss": 2.2031350135803223, "distill_loss": 0.4484691917896271}
|
| 12 |
+
{"epoch": 0, "step": 11, "loss": 2.987344741821289, "nll_loss": 2.1031339168548584, "distill_loss": 0.4421054720878601}
|
| 13 |
+
{"epoch": 0, "step": 12, "loss": 2.91876482963562, "nll_loss": 2.0210299491882324, "distill_loss": 0.44886747002601624}
|
| 14 |
+
{"epoch": 0, "step": 13, "loss": 3.1252598762512207, "nll_loss": 2.218106746673584, "distill_loss": 0.45357653498649597}
|
| 15 |
+
{"epoch": 0, "step": 14, "loss": 2.852810859680176, "nll_loss": 1.9506586790084839, "distill_loss": 0.45107606053352356}
|
| 16 |
+
{"epoch": 0, "step": 15, "loss": 3.0891385078430176, "nll_loss": 2.193445920944214, "distill_loss": 0.44784626364707947}
|
| 17 |
+
{"epoch": 0, "step": 16, "loss": 3.0579774379730225, "nll_loss": 2.2003753185272217, "distill_loss": 0.4288010895252228}
|
| 18 |
+
{"epoch": 0, "step": 17, "loss": 3.087275743484497, "nll_loss": 2.296391487121582, "distill_loss": 0.39544209837913513}
|
| 19 |
+
{"epoch": 0, "step": 18, "loss": 2.4850993156433105, "nll_loss": 1.6171865463256836, "distill_loss": 0.4339563846588135}
|
| 20 |
+
{"epoch": 0, "step": 19, "loss": 2.649646759033203, "nll_loss": 1.8323817253112793, "distill_loss": 0.4086324870586395}
|
| 21 |
+
{"epoch": 0, "step": 20, "loss": 2.9256372451782227, "nll_loss": 2.1449480056762695, "distill_loss": 0.39034461975097656}
|
| 22 |
+
{"epoch": 0, "step": 21, "loss": 2.8907132148742676, "nll_loss": 2.0887677669525146, "distill_loss": 0.4009726643562317}
|
| 23 |
+
{"epoch": 0, "step": 22, "loss": 2.5686354637145996, "nll_loss": 1.668267011642456, "distill_loss": 0.450184166431427}
|
| 24 |
+
{"epoch": 0, "step": 23, "loss": 3.04226016998291, "nll_loss": 2.2470481395721436, "distill_loss": 0.3976059556007385}
|
| 25 |
+
{"epoch": 0, "step": 24, "loss": 2.1243538856506348, "nll_loss": 1.3421952724456787, "distill_loss": 0.3910793364048004}
|
| 26 |
+
{"epoch": 0, "step": 25, "loss": 2.909428596496582, "nll_loss": 2.1522693634033203, "distill_loss": 0.37857958674430847}
|
| 27 |
+
{"epoch": 0, "step": 26, "loss": 2.61659836769104, "nll_loss": 1.8968509435653687, "distill_loss": 0.3598736822605133}
|
| 28 |
+
{"epoch": 0, "step": 27, "loss": 2.784691095352173, "nll_loss": 1.9601420164108276, "distill_loss": 0.4122745394706726}
|
| 29 |
+
{"epoch": 0, "step": 28, "loss": 2.4066662788391113, "nll_loss": 1.486325740814209, "distill_loss": 0.46017026901245117}
|
| 30 |
+
{"epoch": 0, "step": 29, "loss": 2.6199193000793457, "nll_loss": 1.847617506980896, "distill_loss": 0.38615092635154724}
|
| 31 |
+
{"epoch": 0, "step": 30, "loss": 2.2417097091674805, "nll_loss": 1.5145374536514282, "distill_loss": 0.36358609795570374}
|
| 32 |
+
{"epoch": 0, "step": 31, "loss": 2.5266754627227783, "nll_loss": 1.811596155166626, "distill_loss": 0.35753968358039856}
|
| 33 |
+
{"epoch": 0, "step": 32, "loss": 2.6539199352264404, "nll_loss": 1.9047660827636719, "distill_loss": 0.3745769262313843}
|
| 34 |
+
{"epoch": 0, "step": 33, "loss": 2.4649674892425537, "nll_loss": 1.7129267454147339, "distill_loss": 0.3760204017162323}
|
| 35 |
+
{"epoch": 0, "step": 34, "loss": 1.8670575618743896, "nll_loss": 1.161777138710022, "distill_loss": 0.35264018177986145}
|
| 36 |
+
{"epoch": 0, "step": 35, "loss": 2.681079149246216, "nll_loss": 1.9443445205688477, "distill_loss": 0.3683673143386841}
|
| 37 |
+
{"epoch": 0, "step": 36, "loss": 2.1921796798706055, "nll_loss": 1.5153636932373047, "distill_loss": 0.3384079337120056}
|
| 38 |
+
{"epoch": 0, "step": 37, "loss": 2.2259342670440674, "nll_loss": 1.5203224420547485, "distill_loss": 0.35280588269233704}
|
| 39 |
+
{"epoch": 0, "step": 38, "loss": 2.606497049331665, "nll_loss": 1.9084241390228271, "distill_loss": 0.34903642535209656}
|
| 40 |
+
{"epoch": 0, "step": 39, "loss": 2.595412492752075, "nll_loss": 1.7961845397949219, "distill_loss": 0.39961397647857666}
|
| 41 |
+
{"epoch": 0, "step": 40, "loss": 2.935417413711548, "nll_loss": 2.2409603595733643, "distill_loss": 0.3472284972667694}
|
| 42 |
+
{"epoch": 0, "step": 41, "loss": 2.4306864738464355, "nll_loss": 1.6755659580230713, "distill_loss": 0.3775603175163269}
|
| 43 |
+
{"epoch": 0, "step": 42, "loss": 2.528837203979492, "nll_loss": 1.8123151063919067, "distill_loss": 0.35826101899147034}
|
| 44 |
+
{"epoch": 0, "step": 43, "loss": 2.020371437072754, "nll_loss": 1.3152990341186523, "distill_loss": 0.35253626108169556}
|
| 45 |
+
{"epoch": 0, "step": 44, "loss": 2.0832347869873047, "nll_loss": 1.4359556436538696, "distill_loss": 0.32363957166671753}
|
| 46 |
+
{"epoch": 0, "step": 45, "loss": 2.5002126693725586, "nll_loss": 1.8215147256851196, "distill_loss": 0.33934903144836426}
|
| 47 |
+
{"epoch": 0, "step": 46, "loss": 2.598348617553711, "nll_loss": 1.6738619804382324, "distill_loss": 0.46224337816238403}
|
| 48 |
+
{"epoch": 0, "step": 47, "loss": 2.4864816665649414, "nll_loss": 1.809826135635376, "distill_loss": 0.3383277654647827}
|
| 49 |
+
{"epoch": 0, "step": 48, "loss": 2.6783900260925293, "nll_loss": 2.0141756534576416, "distill_loss": 0.33210718631744385}
|
| 50 |
+
{"epoch": 0, "step": 49, "loss": 2.332019567489624, "nll_loss": 1.6047790050506592, "distill_loss": 0.36362025141716003}
|
| 51 |
+
{"epoch": 0, "step": 50, "loss": 2.8168227672576904, "nll_loss": 1.8088455200195312, "distill_loss": 0.5039886236190796}
|
| 52 |
+
{"epoch": 0, "step": 51, "loss": 2.9042434692382812, "nll_loss": 2.2219815254211426, "distill_loss": 0.3411310315132141}
|
| 53 |
+
{"epoch": 0, "step": 52, "loss": 2.082211971282959, "nll_loss": 1.410250186920166, "distill_loss": 0.33598095178604126}
|
| 54 |
+
{"epoch": 0, "step": 53, "loss": 2.7209815979003906, "nll_loss": 2.067466974258423, "distill_loss": 0.3267573416233063}
|
| 55 |
+
{"epoch": 0, "step": 54, "loss": 2.606290340423584, "nll_loss": 1.930795669555664, "distill_loss": 0.3377472758293152}
|
| 56 |
+
{"epoch": 0, "step": 55, "loss": 2.327575445175171, "nll_loss": 1.643258810043335, "distill_loss": 0.3421582877635956}
|
| 57 |
+
{"epoch": 0, "step": 56, "loss": 2.7237071990966797, "nll_loss": 2.023085355758667, "distill_loss": 0.3503108620643616}
|
| 58 |
+
{"epoch": 0, "step": 57, "loss": 2.2928857803344727, "nll_loss": 1.524363398551941, "distill_loss": 0.38426122069358826}
|
| 59 |
+
{"epoch": 0, "step": 58, "loss": 2.462594509124756, "nll_loss": 1.6919004917144775, "distill_loss": 0.38534700870513916}
|
| 60 |
+
{"epoch": 0, "step": 59, "loss": 2.0680174827575684, "nll_loss": 1.372825264930725, "distill_loss": 0.34759610891342163}
|
| 61 |
+
{"epoch": 0, "step": 60, "loss": 2.2622427940368652, "nll_loss": 1.4664366245269775, "distill_loss": 0.3979030251502991}
|
| 62 |
+
{"epoch": 0, "step": 61, "loss": 2.96134614944458, "nll_loss": 2.2939865589141846, "distill_loss": 0.333679735660553}
|
| 63 |
+
{"epoch": 0, "step": 62, "loss": 2.846637725830078, "nll_loss": 2.1685850620269775, "distill_loss": 0.3390263617038727}
|
| 64 |
+
{"epoch": 0, "step": 63, "eval_rougeL": 0.15909466383518}
|
| 65 |
+
{"epoch": 1, "step": 63, "loss": 2.2191264629364014, "nll_loss": 1.6201109886169434, "distill_loss": 0.299507737159729}
|
| 66 |
+
{"epoch": 1, "step": 64, "loss": 2.6083555221557617, "nll_loss": 2.0391645431518555, "distill_loss": 0.28459545969963074}
|
| 67 |
+
{"epoch": 1, "step": 65, "loss": 2.2139182090759277, "nll_loss": 1.6681245565414429, "distill_loss": 0.27289676666259766}
|
| 68 |
+
{"epoch": 1, "step": 66, "loss": 2.297097682952881, "nll_loss": 1.6633434295654297, "distill_loss": 0.3168770670890808}
|
| 69 |
+
{"epoch": 1, "step": 67, "loss": 2.120558738708496, "nll_loss": 1.4698359966278076, "distill_loss": 0.325361430644989}
|
| 70 |
+
{"epoch": 1, "step": 68, "loss": 2.1717400550842285, "nll_loss": 1.5839065313339233, "distill_loss": 0.2939167022705078}
|
| 71 |
+
{"epoch": 1, "step": 69, "loss": 1.9910533428192139, "nll_loss": 1.349638819694519, "distill_loss": 0.3207072615623474}
|
| 72 |
+
{"epoch": 1, "step": 70, "loss": 2.4292263984680176, "nll_loss": 1.7662307024002075, "distill_loss": 0.33149784803390503}
|
| 73 |
+
{"epoch": 1, "step": 71, "loss": 2.368708848953247, "nll_loss": 1.846887230873108, "distill_loss": 0.2609108090400696}
|
| 74 |
+
{"epoch": 1, "step": 72, "loss": 2.213268995285034, "nll_loss": 1.5102648735046387, "distill_loss": 0.35150203108787537}
|