Đào Quốc Tuấn commited on Dec 2, 2025

Commit

73f737e

verified ·

1 Parent(s): de3bc6e

Upload folder using huggingface_hub

Browse files

Files changed (19) hide show

experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/README.md +207 -0
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/adapter_config.json +41 -0
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/adapter_model.safetensors +3 -0
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/lr_scheduler.pt +3 -0
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/optimizer.pt +3 -0
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_0.pt +3 -0
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_1.pt +3 -0
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_2.pt +3 -0
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_3.pt +3 -0
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_4.pt +3 -0
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_5.pt +3 -0
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_6.pt +3 -0
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_7.pt +3 -0
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/special_tokens_map.json +24 -0
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/tokenizer.json +0 -0
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/tokenizer_config.json +43 -0
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/tuandao_mistral-7b_to_tinyllama-1.1b.log +26 -0
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/tuandao_mistral-7b_to_tinyllama-1.1b.yaml +54 -0
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/tuandao_mistral-7b_to_tinyllama-1.1b_metrics.jsonl +74 -0

experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/README.md ADDED Viewed

	@@ -0,0 +1,207 @@

+---
+base_model: models/tinyllama-1.1b
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:models/tinyllama-1.1b
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.0

experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/adapter_config.json ADDED Viewed

	@@ -0,0 +1,41 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "models/tinyllama-1.1b",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 8,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.0",
+  "qalora_group_size": 16,
+  "r": 256,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "q_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:715bd43e1868106f0802f8c8bd7348ce8cc6b418f239469be529854f406f1a7d
+size 144191184

experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/lr_scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:89484437b4db7fc0ed16b7314dcaf495c706be5db0e0e015a11fc8b30ae5bded
+size 1483

experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:269e1a840d46baf15d919beeaef74b384a869bffd3e3574994a9fc75cc7b1bc1
+size 825448971

experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_0.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a9e3a0189969c79fbabd30937ab19edfdbfd0b47f257512d5bbe4094b471280c
+size 33564613

experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_1.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e1e1d26cd0a93914dc9e222ec5624a3066938490e6da2fa1e252995ca29facd9
+size 33564613

experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_2.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7f0b5577d9c8000344d321d547c8918fca2ffbdf503794607d98c55437069208
+size 33564613

experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_3.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4a540628c8420e96fda19b435e4be79634e803991a154ae92f4583bbf8707177
+size 33564613

experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_4.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ce91dfbf7ce0327f3a9e40a42bfda54a61a469221a82de5cbeb3c0ffa0d3d42e
+size 33564613

experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_5.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d066e24c5debb125c3dc19105bb805e07c73e05a5dc22880103c7725304f9234
+size 33564613

experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_6.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1705a9ad8a952865bf6da2b77794dcc87aa986c6d9538546d65e6f00a835eb41
+size 33564613

experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_7.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fca00fdfa79fec2d7c9d7eaf53920d175c85eea9fff05c95db4aa75f4de93764
+size 33564613

experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "</s>",
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,43 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "extra_special_tokens": {},
+  "legacy": false,
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "</s>",
+  "padding_side": "right",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}

experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/tuandao_mistral-7b_to_tinyllama-1.1b.log ADDED Viewed

	@@ -0,0 +1,26 @@

+2025-12-02 05:31:57,909 - root - [32m[1mINFO[0m - Config loaded: {'seed': 42, 'ridge_lambda': '5e-4', 'sinkhorn_reg': 0.1, 'topk': 100, 'row_batch': 8192, 'col_batch': 8192, 'sinkhorn_iters': 200, 'sinkhorn_tol': '1e-5', 'data_path': 'data/dolly/valid.jsonl', 'max_prompt_length': 256, 'max_length': 512, 'student_name': 'tinyllama-1.1b', 'teacher_name': 'mistral-7b-v0.1', 'student_path': 'models/tinyllama-1.1b', 'student_adapter_path': '', 'teacher_path': 'models/mistral-7b-v0.1', 'teacher_adapter_path': 'models/teacher_mistral7b', 'projector_path': 'projectors', 'lora': True, 'lora_r': 256, 'lora_alpha': 8, 'lora_dropout': 0.1, 'num_epochs': 20, 'device': 'cuda', 'learning_rate': '1e-3', 'lr_scheduler': 'cosine', 'warmup_percentage': 0.05, 'batch_size': 8, 'gradient_accumulation_steps': 1, 'alpha': 2.0, 'beta': 0.5, 'theta': 0.2, 'k': 8, 'compute_bi_batch_count': 8, 'lambda_h': 0.5, 'eval_repeat': 1, 'eval_data_path': 'data/dolly/valid.jsonl', 'eval_batch_size': 64, 'user': 'mrtuandao', 'repo': 'weighted-CTKD', 'wandb_project': 'weighted-ctkd'}
+2025-12-02 05:31:59,500 - root - [32m[1mINFO[0m - Wandb initialized with run name: tuandao_mistral-7b-v0.1_to_tinyllama-1.1b_20251202_053157
+2025-12-02 05:31:59,502 - root - [32m[1mINFO[0m - Using device: cuda
+2025-12-02 05:33:43,800 - root - [32m[1mINFO[0m - Lora model initialized
+2025-12-02 05:33:43,892 - root - [32m[1mINFO[0m - Projector loaded from projectors/mistral-7b-v0.1_to_tinyllama-1.1b/mistral-7b-v0.1_to_tinyllama-1.1b_project_matrix.pt
+2025-12-02 05:33:44,094 - weighted_ctkd.kd_dataset - [32m[1mINFO[0m - Start loading data from data/dolly/valid.jsonl
+2025-12-02 05:33:44,647 - weighted_ctkd.kd_dataset - [32m[1mINFO[0m - Start loading data from data/dolly/valid.jsonl
+2025-12-02 05:33:49,044 - root - [32m[1mINFO[0m - Ours selection: Top-8 teacher layers: [0, 1, 2, 3, 4, 29, 30, 31]
+2025-12-02 05:33:49,044 - root - [32m[1mINFO[0m - BI scores: ['0.5697', '0.3668', '0.1914', '0.1315', '0.1093', '0.1063', '0.1510', '0.3541']
+2025-12-02 05:33:49,045 - root - [32m[1mINFO[0m - Mapped student: {0: [0, 1], 1: [2], 2: [3, 4], 19: [29], 20: [30], 21: [31]}
+2025-12-02 05:33:49,444 - root - [32m[1mINFO[0m - Added projector parameters to optimizer
+2025-12-02 05:33:49,450 - root - [32m[1mINFO[0m - Epoch 1/20
+2025-12-02 05:33:51,204 - absl - [32m[1mINFO[0m - Using default tokenizer.
+2025-12-02 05:33:59,827 - root - [32m[1mINFO[0m - Step 1/1260 train rougeL: 0.09286737051895978
+2025-12-02 05:34:00,069 - root - [32m[1mINFO[0m - Step 1/1260 loss: 3.636246681213379, nll_loss: 2.150993585586548, distill_loss: 0.7426265478134155
+2025-12-02 05:35:43,097 - root - [32m[1mINFO[0m - Epoch 1/20 finished
+2025-12-02 05:35:43,136 - absl - [32m[1mINFO[0m - Using default tokenizer.
+2025-12-02 05:35:50,123 - absl - [32m[1mINFO[0m - Using default tokenizer.
+2025-12-02 05:36:00,278 - absl - [32m[1mINFO[0m - Using default tokenizer.
+2025-12-02 05:36:10,423 - absl - [32m[1mINFO[0m - Using default tokenizer.
+2025-12-02 05:36:20,573 - absl - [32m[1mINFO[0m - Using default tokenizer.
+2025-12-02 05:36:30,682 - absl - [32m[1mINFO[0m - Using default tokenizer.
+2025-12-02 05:36:40,772 - absl - [32m[1mINFO[0m - Using default tokenizer.
+2025-12-02 05:36:50,916 - absl - [32m[1mINFO[0m - Using default tokenizer.
+2025-12-02 05:37:00,084 - root - [32m[1mINFO[0m - Epoch 1/20 eval rougeL: 0.15909466383518
+2025-12-02 05:37:01,324 - root - [32m[1mINFO[0m - Epoch 2/20

experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/tuandao_mistral-7b_to_tinyllama-1.1b.yaml ADDED Viewed

	@@ -0,0 +1,54 @@

+# Teacher training configuration
+seed: 42
+# create projector parameters
+ridge_lambda: 5e-4
+sinkhorn_reg: 0.1
+topk: 100
+row_batch: 8192
+col_batch: 8192
+sinkhorn_iters: 200
+sinkhorn_tol: 1e-5
+# Dataset parameters
+data_path: "data/dolly/valid.jsonl"
+max_prompt_length: 256
+max_length: 512
+student_name: "tinyllama-1.1b"
+teacher_name: "mistral-7b-v0.1"
+student_path: "models/tinyllama-1.1b"
+student_adapter_path: ""
+teacher_path: "models/mistral-7b-v0.1"
+teacher_adapter_path: "models/teacher_mistral7b"
+projector_path: "projectors"
+# Training parameters
+lora: true
+lora_r: 256
+lora_alpha: 8
+lora_dropout: 0.1
+num_epochs: 20
+device: "cuda"
+learning_rate: 1e-3
+lr_scheduler: "cosine"
+warmup_percentage: 0.05
+batch_size: 8
+gradient_accumulation_steps: 1
+alpha: 2.0
+beta: 0.5
+theta: 0.2
+k: 8
+compute_bi_batch_count: 8
+lambda_h: 0.5
+# Evaluation parameters
+eval_repeat: 1
+eval_data_path: "data/dolly/valid.jsonl"
+eval_batch_size: 64
+# Huggingface parameters
+user: "mrtuandao"
+repo: "weighted-CTKD"
+# Wandb parameters
+wandb_project: "weighted-ctkd"

experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/tuandao_mistral-7b_to_tinyllama-1.1b_metrics.jsonl ADDED Viewed

	@@ -0,0 +1,74 @@

+{"epoch": 0, "step": 0, "loss": 3.636246681213379, "nll_loss": 2.150993585586548, "distill_loss": 0.7426265478134155}
+{"epoch": 0, "step": 1, "loss": 3.4290106296539307, "nll_loss": 2.081878423690796, "distill_loss": 0.6735661029815674}
+{"epoch": 0, "step": 2, "loss": 2.6680655479431152, "nll_loss": 1.46719491481781, "distill_loss": 0.6004353165626526}
+{"epoch": 0, "step": 3, "loss": 3.413308620452881, "nll_loss": 2.261220932006836, "distill_loss": 0.5760437846183777}
+{"epoch": 0, "step": 4, "loss": 3.270015001296997, "nll_loss": 2.2107362747192383, "distill_loss": 0.5296393632888794}
+{"epoch": 0, "step": 5, "loss": 3.2906668186187744, "nll_loss": 2.1387462615966797, "distill_loss": 0.5759602785110474}
+{"epoch": 0, "step": 6, "loss": 3.2078216075897217, "nll_loss": 2.180889368057251, "distill_loss": 0.5134661197662354}
+{"epoch": 0, "step": 7, "loss": 3.4919862747192383, "nll_loss": 2.5108277797698975, "distill_loss": 0.490579217672348}
+{"epoch": 0, "step": 8, "loss": 3.271066427230835, "nll_loss": 2.286349058151245, "distill_loss": 0.4923587143421173}
+{"epoch": 0, "step": 9, "loss": 2.963165760040283, "nll_loss": 2.0048465728759766, "distill_loss": 0.47915956377983093}
+{"epoch": 0, "step": 10, "loss": 3.1000733375549316, "nll_loss": 2.2031350135803223, "distill_loss": 0.4484691917896271}
+{"epoch": 0, "step": 11, "loss": 2.987344741821289, "nll_loss": 2.1031339168548584, "distill_loss": 0.4421054720878601}
+{"epoch": 0, "step": 12, "loss": 2.91876482963562, "nll_loss": 2.0210299491882324, "distill_loss": 0.44886747002601624}
+{"epoch": 0, "step": 13, "loss": 3.1252598762512207, "nll_loss": 2.218106746673584, "distill_loss": 0.45357653498649597}
+{"epoch": 0, "step": 14, "loss": 2.852810859680176, "nll_loss": 1.9506586790084839, "distill_loss": 0.45107606053352356}
+{"epoch": 0, "step": 15, "loss": 3.0891385078430176, "nll_loss": 2.193445920944214, "distill_loss": 0.44784626364707947}
+{"epoch": 0, "step": 16, "loss": 3.0579774379730225, "nll_loss": 2.2003753185272217, "distill_loss": 0.4288010895252228}
+{"epoch": 0, "step": 17, "loss": 3.087275743484497, "nll_loss": 2.296391487121582, "distill_loss": 0.39544209837913513}
+{"epoch": 0, "step": 18, "loss": 2.4850993156433105, "nll_loss": 1.6171865463256836, "distill_loss": 0.4339563846588135}
+{"epoch": 0, "step": 19, "loss": 2.649646759033203, "nll_loss": 1.8323817253112793, "distill_loss": 0.4086324870586395}
+{"epoch": 0, "step": 20, "loss": 2.9256372451782227, "nll_loss": 2.1449480056762695, "distill_loss": 0.39034461975097656}
+{"epoch": 0, "step": 21, "loss": 2.8907132148742676, "nll_loss": 2.0887677669525146, "distill_loss": 0.4009726643562317}
+{"epoch": 0, "step": 22, "loss": 2.5686354637145996, "nll_loss": 1.668267011642456, "distill_loss": 0.450184166431427}
+{"epoch": 0, "step": 23, "loss": 3.04226016998291, "nll_loss": 2.2470481395721436, "distill_loss": 0.3976059556007385}
+{"epoch": 0, "step": 24, "loss": 2.1243538856506348, "nll_loss": 1.3421952724456787, "distill_loss": 0.3910793364048004}
+{"epoch": 0, "step": 25, "loss": 2.909428596496582, "nll_loss": 2.1522693634033203, "distill_loss": 0.37857958674430847}
+{"epoch": 0, "step": 26, "loss": 2.61659836769104, "nll_loss": 1.8968509435653687, "distill_loss": 0.3598736822605133}
+{"epoch": 0, "step": 27, "loss": 2.784691095352173, "nll_loss": 1.9601420164108276, "distill_loss": 0.4122745394706726}
+{"epoch": 0, "step": 28, "loss": 2.4066662788391113, "nll_loss": 1.486325740814209, "distill_loss": 0.46017026901245117}
+{"epoch": 0, "step": 29, "loss": 2.6199193000793457, "nll_loss": 1.847617506980896, "distill_loss": 0.38615092635154724}
+{"epoch": 0, "step": 30, "loss": 2.2417097091674805, "nll_loss": 1.5145374536514282, "distill_loss": 0.36358609795570374}
+{"epoch": 0, "step": 31, "loss": 2.5266754627227783, "nll_loss": 1.811596155166626, "distill_loss": 0.35753968358039856}
+{"epoch": 0, "step": 32, "loss": 2.6539199352264404, "nll_loss": 1.9047660827636719, "distill_loss": 0.3745769262313843}
+{"epoch": 0, "step": 33, "loss": 2.4649674892425537, "nll_loss": 1.7129267454147339, "distill_loss": 0.3760204017162323}
+{"epoch": 0, "step": 34, "loss": 1.8670575618743896, "nll_loss": 1.161777138710022, "distill_loss": 0.35264018177986145}
+{"epoch": 0, "step": 35, "loss": 2.681079149246216, "nll_loss": 1.9443445205688477, "distill_loss": 0.3683673143386841}
+{"epoch": 0, "step": 36, "loss": 2.1921796798706055, "nll_loss": 1.5153636932373047, "distill_loss": 0.3384079337120056}
+{"epoch": 0, "step": 37, "loss": 2.2259342670440674, "nll_loss": 1.5203224420547485, "distill_loss": 0.35280588269233704}
+{"epoch": 0, "step": 38, "loss": 2.606497049331665, "nll_loss": 1.9084241390228271, "distill_loss": 0.34903642535209656}
+{"epoch": 0, "step": 39, "loss": 2.595412492752075, "nll_loss": 1.7961845397949219, "distill_loss": 0.39961397647857666}
+{"epoch": 0, "step": 40, "loss": 2.935417413711548, "nll_loss": 2.2409603595733643, "distill_loss": 0.3472284972667694}
+{"epoch": 0, "step": 41, "loss": 2.4306864738464355, "nll_loss": 1.6755659580230713, "distill_loss": 0.3775603175163269}
+{"epoch": 0, "step": 42, "loss": 2.528837203979492, "nll_loss": 1.8123151063919067, "distill_loss": 0.35826101899147034}
+{"epoch": 0, "step": 43, "loss": 2.020371437072754, "nll_loss": 1.3152990341186523, "distill_loss": 0.35253626108169556}
+{"epoch": 0, "step": 44, "loss": 2.0832347869873047, "nll_loss": 1.4359556436538696, "distill_loss": 0.32363957166671753}
+{"epoch": 0, "step": 45, "loss": 2.5002126693725586, "nll_loss": 1.8215147256851196, "distill_loss": 0.33934903144836426}
+{"epoch": 0, "step": 46, "loss": 2.598348617553711, "nll_loss": 1.6738619804382324, "distill_loss": 0.46224337816238403}
+{"epoch": 0, "step": 47, "loss": 2.4864816665649414, "nll_loss": 1.809826135635376, "distill_loss": 0.3383277654647827}
+{"epoch": 0, "step": 48, "loss": 2.6783900260925293, "nll_loss": 2.0141756534576416, "distill_loss": 0.33210718631744385}
+{"epoch": 0, "step": 49, "loss": 2.332019567489624, "nll_loss": 1.6047790050506592, "distill_loss": 0.36362025141716003}
+{"epoch": 0, "step": 50, "loss": 2.8168227672576904, "nll_loss": 1.8088455200195312, "distill_loss": 0.5039886236190796}
+{"epoch": 0, "step": 51, "loss": 2.9042434692382812, "nll_loss": 2.2219815254211426, "distill_loss": 0.3411310315132141}
+{"epoch": 0, "step": 52, "loss": 2.082211971282959, "nll_loss": 1.410250186920166, "distill_loss": 0.33598095178604126}
+{"epoch": 0, "step": 53, "loss": 2.7209815979003906, "nll_loss": 2.067466974258423, "distill_loss": 0.3267573416233063}
+{"epoch": 0, "step": 54, "loss": 2.606290340423584, "nll_loss": 1.930795669555664, "distill_loss": 0.3377472758293152}
+{"epoch": 0, "step": 55, "loss": 2.327575445175171, "nll_loss": 1.643258810043335, "distill_loss": 0.3421582877635956}
+{"epoch": 0, "step": 56, "loss": 2.7237071990966797, "nll_loss": 2.023085355758667, "distill_loss": 0.3503108620643616}
+{"epoch": 0, "step": 57, "loss": 2.2928857803344727, "nll_loss": 1.524363398551941, "distill_loss": 0.38426122069358826}
+{"epoch": 0, "step": 58, "loss": 2.462594509124756, "nll_loss": 1.6919004917144775, "distill_loss": 0.38534700870513916}
+{"epoch": 0, "step": 59, "loss": 2.0680174827575684, "nll_loss": 1.372825264930725, "distill_loss": 0.34759610891342163}
+{"epoch": 0, "step": 60, "loss": 2.2622427940368652, "nll_loss": 1.4664366245269775, "distill_loss": 0.3979030251502991}
+{"epoch": 0, "step": 61, "loss": 2.96134614944458, "nll_loss": 2.2939865589141846, "distill_loss": 0.333679735660553}
+{"epoch": 0, "step": 62, "loss": 2.846637725830078, "nll_loss": 2.1685850620269775, "distill_loss": 0.3390263617038727}
+{"epoch": 0, "step": 63, "eval_rougeL": 0.15909466383518}
+{"epoch": 1, "step": 63, "loss": 2.2191264629364014, "nll_loss": 1.6201109886169434, "distill_loss": 0.299507737159729}
+{"epoch": 1, "step": 64, "loss": 2.6083555221557617, "nll_loss": 2.0391645431518555, "distill_loss": 0.28459545969963074}
+{"epoch": 1, "step": 65, "loss": 2.2139182090759277, "nll_loss": 1.6681245565414429, "distill_loss": 0.27289676666259766}
+{"epoch": 1, "step": 66, "loss": 2.297097682952881, "nll_loss": 1.6633434295654297, "distill_loss": 0.3168770670890808}
+{"epoch": 1, "step": 67, "loss": 2.120558738708496, "nll_loss": 1.4698359966278076, "distill_loss": 0.325361430644989}
+{"epoch": 1, "step": 68, "loss": 2.1717400550842285, "nll_loss": 1.5839065313339233, "distill_loss": 0.2939167022705078}
+{"epoch": 1, "step": 69, "loss": 1.9910533428192139, "nll_loss": 1.349638819694519, "distill_loss": 0.3207072615623474}
+{"epoch": 1, "step": 70, "loss": 2.4292263984680176, "nll_loss": 1.7662307024002075, "distill_loss": 0.33149784803390503}
+{"epoch": 1, "step": 71, "loss": 2.368708848953247, "nll_loss": 1.846887230873108, "distill_loss": 0.2609108090400696}
+{"epoch": 1, "step": 72, "loss": 2.213268995285034, "nll_loss": 1.5102648735046387, "distill_loss": 0.35150203108787537}