Đào Quốc Tuấn commited on
Commit
73f737e
·
verified ·
1 Parent(s): de3bc6e

Upload folder using huggingface_hub

Browse files
Files changed (19) hide show
  1. experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/README.md +207 -0
  2. experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/adapter_config.json +41 -0
  3. experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/adapter_model.safetensors +3 -0
  4. experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/lr_scheduler.pt +3 -0
  5. experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/optimizer.pt +3 -0
  6. experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_0.pt +3 -0
  7. experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_1.pt +3 -0
  8. experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_2.pt +3 -0
  9. experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_3.pt +3 -0
  10. experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_4.pt +3 -0
  11. experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_5.pt +3 -0
  12. experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_6.pt +3 -0
  13. experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_7.pt +3 -0
  14. experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/special_tokens_map.json +24 -0
  15. experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/tokenizer.json +0 -0
  16. experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/tokenizer_config.json +43 -0
  17. experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/tuandao_mistral-7b_to_tinyllama-1.1b.log +26 -0
  18. experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/tuandao_mistral-7b_to_tinyllama-1.1b.yaml +54 -0
  19. experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/tuandao_mistral-7b_to_tinyllama-1.1b_metrics.jsonl +74 -0
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: models/tinyllama-1.1b
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - base_model:adapter:models/tinyllama-1.1b
7
+ - lora
8
+ - transformers
9
+ ---
10
+
11
+ # Model Card for Model ID
12
+
13
+ <!-- Provide a quick summary of what the model is/does. -->
14
+
15
+
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ <!-- Provide a longer summary of what this model is. -->
22
+
23
+
24
+
25
+ - **Developed by:** [More Information Needed]
26
+ - **Funded by [optional]:** [More Information Needed]
27
+ - **Shared by [optional]:** [More Information Needed]
28
+ - **Model type:** [More Information Needed]
29
+ - **Language(s) (NLP):** [More Information Needed]
30
+ - **License:** [More Information Needed]
31
+ - **Finetuned from model [optional]:** [More Information Needed]
32
+
33
+ ### Model Sources [optional]
34
+
35
+ <!-- Provide the basic links for the model. -->
36
+
37
+ - **Repository:** [More Information Needed]
38
+ - **Paper [optional]:** [More Information Needed]
39
+ - **Demo [optional]:** [More Information Needed]
40
+
41
+ ## Uses
42
+
43
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
44
+
45
+ ### Direct Use
46
+
47
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
48
+
49
+ [More Information Needed]
50
+
51
+ ### Downstream Use [optional]
52
+
53
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
+
55
+ [More Information Needed]
56
+
57
+ ### Out-of-Scope Use
58
+
59
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
+
61
+ [More Information Needed]
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
66
+
67
+ [More Information Needed]
68
+
69
+ ### Recommendations
70
+
71
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
+
73
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ Use the code below to get started with the model.
78
+
79
+ [More Information Needed]
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
86
+
87
+ [More Information Needed]
88
+
89
+ ### Training Procedure
90
+
91
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
92
+
93
+ #### Preprocessing [optional]
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ #### Training Hyperparameters
99
+
100
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
101
+
102
+ #### Speeds, Sizes, Times [optional]
103
+
104
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
105
+
106
+ [More Information Needed]
107
+
108
+ ## Evaluation
109
+
110
+ <!-- This section describes the evaluation protocols and provides the results. -->
111
+
112
+ ### Testing Data, Factors & Metrics
113
+
114
+ #### Testing Data
115
+
116
+ <!-- This should link to a Dataset Card if possible. -->
117
+
118
+ [More Information Needed]
119
+
120
+ #### Factors
121
+
122
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
123
+
124
+ [More Information Needed]
125
+
126
+ #### Metrics
127
+
128
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
129
+
130
+ [More Information Needed]
131
+
132
+ ### Results
133
+
134
+ [More Information Needed]
135
+
136
+ #### Summary
137
+
138
+
139
+
140
+ ## Model Examination [optional]
141
+
142
+ <!-- Relevant interpretability work for the model goes here -->
143
+
144
+ [More Information Needed]
145
+
146
+ ## Environmental Impact
147
+
148
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
149
+
150
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
151
+
152
+ - **Hardware Type:** [More Information Needed]
153
+ - **Hours used:** [More Information Needed]
154
+ - **Cloud Provider:** [More Information Needed]
155
+ - **Compute Region:** [More Information Needed]
156
+ - **Carbon Emitted:** [More Information Needed]
157
+
158
+ ## Technical Specifications [optional]
159
+
160
+ ### Model Architecture and Objective
161
+
162
+ [More Information Needed]
163
+
164
+ ### Compute Infrastructure
165
+
166
+ [More Information Needed]
167
+
168
+ #### Hardware
169
+
170
+ [More Information Needed]
171
+
172
+ #### Software
173
+
174
+ [More Information Needed]
175
+
176
+ ## Citation [optional]
177
+
178
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
179
+
180
+ **BibTeX:**
181
+
182
+ [More Information Needed]
183
+
184
+ **APA:**
185
+
186
+ [More Information Needed]
187
+
188
+ ## Glossary [optional]
189
+
190
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
191
+
192
+ [More Information Needed]
193
+
194
+ ## More Information [optional]
195
+
196
+ [More Information Needed]
197
+
198
+ ## Model Card Authors [optional]
199
+
200
+ [More Information Needed]
201
+
202
+ ## Model Card Contact
203
+
204
+ [More Information Needed]
205
+ ### Framework versions
206
+
207
+ - PEFT 0.18.0
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/adapter_config.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alora_invocation_tokens": null,
3
+ "alpha_pattern": {},
4
+ "arrow_config": null,
5
+ "auto_mapping": null,
6
+ "base_model_name_or_path": "models/tinyllama-1.1b",
7
+ "bias": "none",
8
+ "corda_config": null,
9
+ "ensure_weight_tying": false,
10
+ "eva_config": null,
11
+ "exclude_modules": null,
12
+ "fan_in_fan_out": false,
13
+ "inference_mode": true,
14
+ "init_lora_weights": true,
15
+ "layer_replication": null,
16
+ "layers_pattern": null,
17
+ "layers_to_transform": null,
18
+ "loftq_config": {},
19
+ "lora_alpha": 8,
20
+ "lora_bias": false,
21
+ "lora_dropout": 0.1,
22
+ "megatron_config": null,
23
+ "megatron_core": "megatron.core",
24
+ "modules_to_save": null,
25
+ "peft_type": "LORA",
26
+ "peft_version": "0.18.0",
27
+ "qalora_group_size": 16,
28
+ "r": 256,
29
+ "rank_pattern": {},
30
+ "revision": null,
31
+ "target_modules": [
32
+ "v_proj",
33
+ "q_proj"
34
+ ],
35
+ "target_parameters": null,
36
+ "task_type": "CAUSAL_LM",
37
+ "trainable_token_indices": null,
38
+ "use_dora": false,
39
+ "use_qalora": false,
40
+ "use_rslora": false
41
+ }
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:715bd43e1868106f0802f8c8bd7348ce8cc6b418f239469be529854f406f1a7d
3
+ size 144191184
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/lr_scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:89484437b4db7fc0ed16b7314dcaf495c706be5db0e0e015a11fc8b30ae5bded
3
+ size 1483
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:269e1a840d46baf15d919beeaef74b384a869bffd3e3574994a9fc75cc7b1bc1
3
+ size 825448971
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_0.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a9e3a0189969c79fbabd30937ab19edfdbfd0b47f257512d5bbe4094b471280c
3
+ size 33564613
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_1.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e1e1d26cd0a93914dc9e222ec5624a3066938490e6da2fa1e252995ca29facd9
3
+ size 33564613
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_2.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7f0b5577d9c8000344d321d547c8918fca2ffbdf503794607d98c55437069208
3
+ size 33564613
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_3.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4a540628c8420e96fda19b435e4be79634e803991a154ae92f4583bbf8707177
3
+ size 33564613
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_4.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ce91dfbf7ce0327f3a9e40a42bfda54a61a469221a82de5cbeb3c0ffa0d3d42e
3
+ size 33564613
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_5.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d066e24c5debb125c3dc19105bb805e07c73e05a5dc22880103c7725304f9234
3
+ size 33564613
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_6.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1705a9ad8a952865bf6da2b77794dcc87aa986c6d9538546d65e6f00a835eb41
3
+ size 33564613
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/projector_7.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fca00fdfa79fec2d7c9d7eaf53920d175c85eea9fff05c95db4aa75f4de93764
3
+ size 33564613
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "</s>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/checkpoints/epoch_0/tokenizer_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": null,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ }
30
+ },
31
+ "bos_token": "<s>",
32
+ "clean_up_tokenization_spaces": false,
33
+ "eos_token": "</s>",
34
+ "extra_special_tokens": {},
35
+ "legacy": false,
36
+ "model_max_length": 1000000000000000019884624838656,
37
+ "pad_token": "</s>",
38
+ "padding_side": "right",
39
+ "sp_model_kwargs": {},
40
+ "tokenizer_class": "LlamaTokenizer",
41
+ "unk_token": "<unk>",
42
+ "use_default_system_prompt": false
43
+ }
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/tuandao_mistral-7b_to_tinyllama-1.1b.log ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2025-12-02 05:31:57,909 - root - INFO - Config loaded: {'seed': 42, 'ridge_lambda': '5e-4', 'sinkhorn_reg': 0.1, 'topk': 100, 'row_batch': 8192, 'col_batch': 8192, 'sinkhorn_iters': 200, 'sinkhorn_tol': '1e-5', 'data_path': 'data/dolly/valid.jsonl', 'max_prompt_length': 256, 'max_length': 512, 'student_name': 'tinyllama-1.1b', 'teacher_name': 'mistral-7b-v0.1', 'student_path': 'models/tinyllama-1.1b', 'student_adapter_path': '', 'teacher_path': 'models/mistral-7b-v0.1', 'teacher_adapter_path': 'models/teacher_mistral7b', 'projector_path': 'projectors', 'lora': True, 'lora_r': 256, 'lora_alpha': 8, 'lora_dropout': 0.1, 'num_epochs': 20, 'device': 'cuda', 'learning_rate': '1e-3', 'lr_scheduler': 'cosine', 'warmup_percentage': 0.05, 'batch_size': 8, 'gradient_accumulation_steps': 1, 'alpha': 2.0, 'beta': 0.5, 'theta': 0.2, 'k': 8, 'compute_bi_batch_count': 8, 'lambda_h': 0.5, 'eval_repeat': 1, 'eval_data_path': 'data/dolly/valid.jsonl', 'eval_batch_size': 64, 'user': 'mrtuandao', 'repo': 'weighted-CTKD', 'wandb_project': 'weighted-ctkd'}
2
+ 2025-12-02 05:31:59,500 - root - INFO - Wandb initialized with run name: tuandao_mistral-7b-v0.1_to_tinyllama-1.1b_20251202_053157
3
+ 2025-12-02 05:31:59,502 - root - INFO - Using device: cuda
4
+ 2025-12-02 05:33:43,800 - root - INFO - Lora model initialized
5
+ 2025-12-02 05:33:43,892 - root - INFO - Projector loaded from projectors/mistral-7b-v0.1_to_tinyllama-1.1b/mistral-7b-v0.1_to_tinyllama-1.1b_project_matrix.pt
6
+ 2025-12-02 05:33:44,094 - weighted_ctkd.kd_dataset - INFO - Start loading data from data/dolly/valid.jsonl
7
+ 2025-12-02 05:33:44,647 - weighted_ctkd.kd_dataset - INFO - Start loading data from data/dolly/valid.jsonl
8
+ 2025-12-02 05:33:49,044 - root - INFO - Ours selection: Top-8 teacher layers: [0, 1, 2, 3, 4, 29, 30, 31]
9
+ 2025-12-02 05:33:49,044 - root - INFO - BI scores: ['0.5697', '0.3668', '0.1914', '0.1315', '0.1093', '0.1063', '0.1510', '0.3541']
10
+ 2025-12-02 05:33:49,045 - root - INFO - Mapped student: {0: [0, 1], 1: [2], 2: [3, 4], 19: [29], 20: [30], 21: [31]}
11
+ 2025-12-02 05:33:49,444 - root - INFO - Added projector parameters to optimizer
12
+ 2025-12-02 05:33:49,450 - root - INFO - Epoch 1/20
13
+ 2025-12-02 05:33:51,204 - absl - INFO - Using default tokenizer.
14
+ 2025-12-02 05:33:59,827 - root - INFO - Step 1/1260 train rougeL: 0.09286737051895978
15
+ 2025-12-02 05:34:00,069 - root - INFO - Step 1/1260 loss: 3.636246681213379, nll_loss: 2.150993585586548, distill_loss: 0.7426265478134155
16
+ 2025-12-02 05:35:43,097 - root - INFO - Epoch 1/20 finished
17
+ 2025-12-02 05:35:43,136 - absl - INFO - Using default tokenizer.
18
+ 2025-12-02 05:35:50,123 - absl - INFO - Using default tokenizer.
19
+ 2025-12-02 05:36:00,278 - absl - INFO - Using default tokenizer.
20
+ 2025-12-02 05:36:10,423 - absl - INFO - Using default tokenizer.
21
+ 2025-12-02 05:36:20,573 - absl - INFO - Using default tokenizer.
22
+ 2025-12-02 05:36:30,682 - absl - INFO - Using default tokenizer.
23
+ 2025-12-02 05:36:40,772 - absl - INFO - Using default tokenizer.
24
+ 2025-12-02 05:36:50,916 - absl - INFO - Using default tokenizer.
25
+ 2025-12-02 05:37:00,084 - root - INFO - Epoch 1/20 eval rougeL: 0.15909466383518
26
+ 2025-12-02 05:37:01,324 - root - INFO - Epoch 2/20
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/tuandao_mistral-7b_to_tinyllama-1.1b.yaml ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Teacher training configuration
2
+ seed: 42
3
+
4
+ # create projector parameters
5
+ ridge_lambda: 5e-4
6
+ sinkhorn_reg: 0.1
7
+ topk: 100
8
+ row_batch: 8192
9
+ col_batch: 8192
10
+ sinkhorn_iters: 200
11
+ sinkhorn_tol: 1e-5
12
+
13
+ # Dataset parameters
14
+ data_path: "data/dolly/valid.jsonl"
15
+ max_prompt_length: 256
16
+ max_length: 512
17
+ student_name: "tinyllama-1.1b"
18
+ teacher_name: "mistral-7b-v0.1"
19
+ student_path: "models/tinyllama-1.1b"
20
+ student_adapter_path: ""
21
+ teacher_path: "models/mistral-7b-v0.1"
22
+ teacher_adapter_path: "models/teacher_mistral7b"
23
+ projector_path: "projectors"
24
+
25
+ # Training parameters
26
+ lora: true
27
+ lora_r: 256
28
+ lora_alpha: 8
29
+ lora_dropout: 0.1
30
+ num_epochs: 20
31
+ device: "cuda"
32
+ learning_rate: 1e-3
33
+ lr_scheduler: "cosine"
34
+ warmup_percentage: 0.05
35
+ batch_size: 8
36
+ gradient_accumulation_steps: 1
37
+ alpha: 2.0
38
+ beta: 0.5
39
+ theta: 0.2
40
+ k: 8
41
+ compute_bi_batch_count: 8
42
+ lambda_h: 0.5
43
+
44
+ # Evaluation parameters
45
+ eval_repeat: 1
46
+ eval_data_path: "data/dolly/valid.jsonl"
47
+ eval_batch_size: 64
48
+
49
+ # Huggingface parameters
50
+ user: "mrtuandao"
51
+ repo: "weighted-CTKD"
52
+
53
+ # Wandb parameters
54
+ wandb_project: "weighted-ctkd"
experiments/tuandao_mistral-7b-v0.1_to_tinyllama-1.1b/20251202_053157/tuandao_mistral-7b_to_tinyllama-1.1b_metrics.jsonl ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"epoch": 0, "step": 0, "loss": 3.636246681213379, "nll_loss": 2.150993585586548, "distill_loss": 0.7426265478134155}
2
+ {"epoch": 0, "step": 1, "loss": 3.4290106296539307, "nll_loss": 2.081878423690796, "distill_loss": 0.6735661029815674}
3
+ {"epoch": 0, "step": 2, "loss": 2.6680655479431152, "nll_loss": 1.46719491481781, "distill_loss": 0.6004353165626526}
4
+ {"epoch": 0, "step": 3, "loss": 3.413308620452881, "nll_loss": 2.261220932006836, "distill_loss": 0.5760437846183777}
5
+ {"epoch": 0, "step": 4, "loss": 3.270015001296997, "nll_loss": 2.2107362747192383, "distill_loss": 0.5296393632888794}
6
+ {"epoch": 0, "step": 5, "loss": 3.2906668186187744, "nll_loss": 2.1387462615966797, "distill_loss": 0.5759602785110474}
7
+ {"epoch": 0, "step": 6, "loss": 3.2078216075897217, "nll_loss": 2.180889368057251, "distill_loss": 0.5134661197662354}
8
+ {"epoch": 0, "step": 7, "loss": 3.4919862747192383, "nll_loss": 2.5108277797698975, "distill_loss": 0.490579217672348}
9
+ {"epoch": 0, "step": 8, "loss": 3.271066427230835, "nll_loss": 2.286349058151245, "distill_loss": 0.4923587143421173}
10
+ {"epoch": 0, "step": 9, "loss": 2.963165760040283, "nll_loss": 2.0048465728759766, "distill_loss": 0.47915956377983093}
11
+ {"epoch": 0, "step": 10, "loss": 3.1000733375549316, "nll_loss": 2.2031350135803223, "distill_loss": 0.4484691917896271}
12
+ {"epoch": 0, "step": 11, "loss": 2.987344741821289, "nll_loss": 2.1031339168548584, "distill_loss": 0.4421054720878601}
13
+ {"epoch": 0, "step": 12, "loss": 2.91876482963562, "nll_loss": 2.0210299491882324, "distill_loss": 0.44886747002601624}
14
+ {"epoch": 0, "step": 13, "loss": 3.1252598762512207, "nll_loss": 2.218106746673584, "distill_loss": 0.45357653498649597}
15
+ {"epoch": 0, "step": 14, "loss": 2.852810859680176, "nll_loss": 1.9506586790084839, "distill_loss": 0.45107606053352356}
16
+ {"epoch": 0, "step": 15, "loss": 3.0891385078430176, "nll_loss": 2.193445920944214, "distill_loss": 0.44784626364707947}
17
+ {"epoch": 0, "step": 16, "loss": 3.0579774379730225, "nll_loss": 2.2003753185272217, "distill_loss": 0.4288010895252228}
18
+ {"epoch": 0, "step": 17, "loss": 3.087275743484497, "nll_loss": 2.296391487121582, "distill_loss": 0.39544209837913513}
19
+ {"epoch": 0, "step": 18, "loss": 2.4850993156433105, "nll_loss": 1.6171865463256836, "distill_loss": 0.4339563846588135}
20
+ {"epoch": 0, "step": 19, "loss": 2.649646759033203, "nll_loss": 1.8323817253112793, "distill_loss": 0.4086324870586395}
21
+ {"epoch": 0, "step": 20, "loss": 2.9256372451782227, "nll_loss": 2.1449480056762695, "distill_loss": 0.39034461975097656}
22
+ {"epoch": 0, "step": 21, "loss": 2.8907132148742676, "nll_loss": 2.0887677669525146, "distill_loss": 0.4009726643562317}
23
+ {"epoch": 0, "step": 22, "loss": 2.5686354637145996, "nll_loss": 1.668267011642456, "distill_loss": 0.450184166431427}
24
+ {"epoch": 0, "step": 23, "loss": 3.04226016998291, "nll_loss": 2.2470481395721436, "distill_loss": 0.3976059556007385}
25
+ {"epoch": 0, "step": 24, "loss": 2.1243538856506348, "nll_loss": 1.3421952724456787, "distill_loss": 0.3910793364048004}
26
+ {"epoch": 0, "step": 25, "loss": 2.909428596496582, "nll_loss": 2.1522693634033203, "distill_loss": 0.37857958674430847}
27
+ {"epoch": 0, "step": 26, "loss": 2.61659836769104, "nll_loss": 1.8968509435653687, "distill_loss": 0.3598736822605133}
28
+ {"epoch": 0, "step": 27, "loss": 2.784691095352173, "nll_loss": 1.9601420164108276, "distill_loss": 0.4122745394706726}
29
+ {"epoch": 0, "step": 28, "loss": 2.4066662788391113, "nll_loss": 1.486325740814209, "distill_loss": 0.46017026901245117}
30
+ {"epoch": 0, "step": 29, "loss": 2.6199193000793457, "nll_loss": 1.847617506980896, "distill_loss": 0.38615092635154724}
31
+ {"epoch": 0, "step": 30, "loss": 2.2417097091674805, "nll_loss": 1.5145374536514282, "distill_loss": 0.36358609795570374}
32
+ {"epoch": 0, "step": 31, "loss": 2.5266754627227783, "nll_loss": 1.811596155166626, "distill_loss": 0.35753968358039856}
33
+ {"epoch": 0, "step": 32, "loss": 2.6539199352264404, "nll_loss": 1.9047660827636719, "distill_loss": 0.3745769262313843}
34
+ {"epoch": 0, "step": 33, "loss": 2.4649674892425537, "nll_loss": 1.7129267454147339, "distill_loss": 0.3760204017162323}
35
+ {"epoch": 0, "step": 34, "loss": 1.8670575618743896, "nll_loss": 1.161777138710022, "distill_loss": 0.35264018177986145}
36
+ {"epoch": 0, "step": 35, "loss": 2.681079149246216, "nll_loss": 1.9443445205688477, "distill_loss": 0.3683673143386841}
37
+ {"epoch": 0, "step": 36, "loss": 2.1921796798706055, "nll_loss": 1.5153636932373047, "distill_loss": 0.3384079337120056}
38
+ {"epoch": 0, "step": 37, "loss": 2.2259342670440674, "nll_loss": 1.5203224420547485, "distill_loss": 0.35280588269233704}
39
+ {"epoch": 0, "step": 38, "loss": 2.606497049331665, "nll_loss": 1.9084241390228271, "distill_loss": 0.34903642535209656}
40
+ {"epoch": 0, "step": 39, "loss": 2.595412492752075, "nll_loss": 1.7961845397949219, "distill_loss": 0.39961397647857666}
41
+ {"epoch": 0, "step": 40, "loss": 2.935417413711548, "nll_loss": 2.2409603595733643, "distill_loss": 0.3472284972667694}
42
+ {"epoch": 0, "step": 41, "loss": 2.4306864738464355, "nll_loss": 1.6755659580230713, "distill_loss": 0.3775603175163269}
43
+ {"epoch": 0, "step": 42, "loss": 2.528837203979492, "nll_loss": 1.8123151063919067, "distill_loss": 0.35826101899147034}
44
+ {"epoch": 0, "step": 43, "loss": 2.020371437072754, "nll_loss": 1.3152990341186523, "distill_loss": 0.35253626108169556}
45
+ {"epoch": 0, "step": 44, "loss": 2.0832347869873047, "nll_loss": 1.4359556436538696, "distill_loss": 0.32363957166671753}
46
+ {"epoch": 0, "step": 45, "loss": 2.5002126693725586, "nll_loss": 1.8215147256851196, "distill_loss": 0.33934903144836426}
47
+ {"epoch": 0, "step": 46, "loss": 2.598348617553711, "nll_loss": 1.6738619804382324, "distill_loss": 0.46224337816238403}
48
+ {"epoch": 0, "step": 47, "loss": 2.4864816665649414, "nll_loss": 1.809826135635376, "distill_loss": 0.3383277654647827}
49
+ {"epoch": 0, "step": 48, "loss": 2.6783900260925293, "nll_loss": 2.0141756534576416, "distill_loss": 0.33210718631744385}
50
+ {"epoch": 0, "step": 49, "loss": 2.332019567489624, "nll_loss": 1.6047790050506592, "distill_loss": 0.36362025141716003}
51
+ {"epoch": 0, "step": 50, "loss": 2.8168227672576904, "nll_loss": 1.8088455200195312, "distill_loss": 0.5039886236190796}
52
+ {"epoch": 0, "step": 51, "loss": 2.9042434692382812, "nll_loss": 2.2219815254211426, "distill_loss": 0.3411310315132141}
53
+ {"epoch": 0, "step": 52, "loss": 2.082211971282959, "nll_loss": 1.410250186920166, "distill_loss": 0.33598095178604126}
54
+ {"epoch": 0, "step": 53, "loss": 2.7209815979003906, "nll_loss": 2.067466974258423, "distill_loss": 0.3267573416233063}
55
+ {"epoch": 0, "step": 54, "loss": 2.606290340423584, "nll_loss": 1.930795669555664, "distill_loss": 0.3377472758293152}
56
+ {"epoch": 0, "step": 55, "loss": 2.327575445175171, "nll_loss": 1.643258810043335, "distill_loss": 0.3421582877635956}
57
+ {"epoch": 0, "step": 56, "loss": 2.7237071990966797, "nll_loss": 2.023085355758667, "distill_loss": 0.3503108620643616}
58
+ {"epoch": 0, "step": 57, "loss": 2.2928857803344727, "nll_loss": 1.524363398551941, "distill_loss": 0.38426122069358826}
59
+ {"epoch": 0, "step": 58, "loss": 2.462594509124756, "nll_loss": 1.6919004917144775, "distill_loss": 0.38534700870513916}
60
+ {"epoch": 0, "step": 59, "loss": 2.0680174827575684, "nll_loss": 1.372825264930725, "distill_loss": 0.34759610891342163}
61
+ {"epoch": 0, "step": 60, "loss": 2.2622427940368652, "nll_loss": 1.4664366245269775, "distill_loss": 0.3979030251502991}
62
+ {"epoch": 0, "step": 61, "loss": 2.96134614944458, "nll_loss": 2.2939865589141846, "distill_loss": 0.333679735660553}
63
+ {"epoch": 0, "step": 62, "loss": 2.846637725830078, "nll_loss": 2.1685850620269775, "distill_loss": 0.3390263617038727}
64
+ {"epoch": 0, "step": 63, "eval_rougeL": 0.15909466383518}
65
+ {"epoch": 1, "step": 63, "loss": 2.2191264629364014, "nll_loss": 1.6201109886169434, "distill_loss": 0.299507737159729}
66
+ {"epoch": 1, "step": 64, "loss": 2.6083555221557617, "nll_loss": 2.0391645431518555, "distill_loss": 0.28459545969963074}
67
+ {"epoch": 1, "step": 65, "loss": 2.2139182090759277, "nll_loss": 1.6681245565414429, "distill_loss": 0.27289676666259766}
68
+ {"epoch": 1, "step": 66, "loss": 2.297097682952881, "nll_loss": 1.6633434295654297, "distill_loss": 0.3168770670890808}
69
+ {"epoch": 1, "step": 67, "loss": 2.120558738708496, "nll_loss": 1.4698359966278076, "distill_loss": 0.325361430644989}
70
+ {"epoch": 1, "step": 68, "loss": 2.1717400550842285, "nll_loss": 1.5839065313339233, "distill_loss": 0.2939167022705078}
71
+ {"epoch": 1, "step": 69, "loss": 1.9910533428192139, "nll_loss": 1.349638819694519, "distill_loss": 0.3207072615623474}
72
+ {"epoch": 1, "step": 70, "loss": 2.4292263984680176, "nll_loss": 1.7662307024002075, "distill_loss": 0.33149784803390503}
73
+ {"epoch": 1, "step": 71, "loss": 2.368708848953247, "nll_loss": 1.846887230873108, "distill_loss": 0.2609108090400696}
74
+ {"epoch": 1, "step": 72, "loss": 2.213268995285034, "nll_loss": 1.5102648735046387, "distill_loss": 0.35150203108787537}