skomadinajs commited on
Commit
4e7b7cd
·
verified ·
1 Parent(s): 29bf1fc

Training in progress, step 85

Browse files
Files changed (2) hide show
  1. adapter_model.safetensors +1 -1
  2. debug.log +10 -0
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6fb871ec57fdd5a075afc1bbb81610c1d059794283533e521e8b19dce5fc1d80
3
  size 369150544
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:82e40c2b54407a749b25cb0f2fb3cbb5e9e1da94764b23560d520c1b87fd0175
3
  size 369150544
debug.log CHANGED
@@ -408,3 +408,13 @@ wandb: WARNING Saving files without folders. If you want to preserve subdirector
408
  [2025-12-16 23:02:55,572] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9]
409
  {'eval_loss': 0.7037167549133301, 'eval_runtime': 11.6258, 'eval_samples_per_second': 8.774, 'eval_steps_per_second': 2.236, 'memory/max_active (GiB)': 44.57, 'memory/max_allocated (GiB)': 44.57, 'memory/device_reserved (GiB)': 67.48, 'epoch': 4.45}
410
  [2025-12-16 23:03:19,526] [INFO] [axolotl.core.trainers.base._save:665] [PID:27] Saving model checkpoint to /workspace-data/output/checkpoint-81
 
 
 
 
 
 
 
 
 
 
 
408
  [2025-12-16 23:02:55,572] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9]
409
  {'eval_loss': 0.7037167549133301, 'eval_runtime': 11.6258, 'eval_samples_per_second': 8.774, 'eval_steps_per_second': 2.236, 'memory/max_active (GiB)': 44.57, 'memory/max_allocated (GiB)': 44.57, 'memory/device_reserved (GiB)': 67.48, 'epoch': 4.45}
410
  [2025-12-16 23:03:19,526] [INFO] [axolotl.core.trainers.base._save:665] [PID:27] Saving model checkpoint to /workspace-data/output/checkpoint-81
411
+ {'loss': 0.495, 'grad_norm': 0.22582735121250153, 'learning_rate': 1.3287526608711131e-06, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4492.37, 'epoch': 4.56}
412
+ {'loss': 0.494, 'grad_norm': 0.22348730266094208, 'learning_rate': 3.3274175058067846e-07, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'tokens_per_second_per_gpu': 4489.26, 'epoch': 4.68}
413
+ [2025-12-16 23:04:13,132] [INFO] [axolotl.core.trainers.base.evaluate:377] [PID:27] Running evaluation step...
414
+ [2025-12-16 23:04:14,299] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5913589000701904
415
+ [2025-12-16 23:04:14,901] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.6010839939117432
416
+ [2025-12-16 23:04:15,471] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5690572261810303
417
+ [2025-12-16 23:04:16,044] [DEBUG] [axolotl.utils.samplers.multipack.__len__:462] [PID:27] generate_batches time: 0.5720643997192383
418
+ [2025-12-16 23:04:16,044] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:438] [PID:27] gather_len_batches: [9]
419
+ {'eval_loss': 0.7037572264671326, 'eval_runtime': 11.4254, 'eval_samples_per_second': 8.927, 'eval_steps_per_second': 2.276, 'memory/max_active (GiB)': 58.46, 'memory/max_allocated (GiB)': 58.46, 'memory/device_reserved (GiB)': 67.48, 'epoch': 4.73}
420
+ [2025-12-16 23:04:27,478] [INFO] [axolotl.core.trainers.base._save:665] [PID:27] Saving model checkpoint to /workspace-data/output/checkpoint-85